Re: Possible multi-threading bug

Patricia Shanahan Tue, 30 Nov 2010 04:00:24 -0800

On 11/30/2010 1:43 AM, Peter Firmstone wrote:

Patricia Shanahan wrote:

Tom Hobbs wrote:

Yes, you're right.


I knew about the non-atomicity of ++, my concern was a call to reset
creeping in between the two parts of that operation.


That is a very good point.

Leaving reset unsynchronized, even with volatile, would lead to
results that would not be possible with full synchronization. Suppose
thread A is going to do a reset, and thread B is going to do an
increment, and everybody agrees the count is currently 10.

....

Ah yes, correct, my mistake, easy to stuff up isn't it? ;)

In essence I agree, unfortunately we don't know when we need the
performance, because we can't test scalability. I've only got 4 threads!


I've only got 8 threads on my largest system.

I am very, very strongly opposed to attempting performance tuningwithout measurement. I've seen it tried many times over several decades,and it is always a disaster. I've compared e.g. estimates of wherebottlenecks will be in an operating system prepared by the OS developersto actual measurements, and I've yet to see the developers get it right.That includes my own efforts at guessing bottlenecks, before I learnedthat it is futile.

Without measurement, you get either get focused effort in entirely thewrong places or diffuse effort spread over the whole system. What isreally needed is focused effort on a few real bottlenecks that will goway beyond any set of rules that could reasonably be applied throughoutthe system.

I believe the solution is to establish a working relationship with usersof River who have larger systems. They have a vested interest in helpingus make it more scalable.

Here are some simple tips I've found useful:

Have you measured the effectiveness of these tips on real worldscalability? What sort of gain do you see?

There may also be opportunities in the area of data structure andalgorithm scalability. For example, TaskManager uses an ArrayList torepresent something that is basically a FIFO with some reordering. Thatis a good idea if, and only if, the queue lengths are always very shorteven on large systems. However, I have no idea whether TaskManager queuelengths tend to increase on large systems or not.

  1. If all mutators are atomic and don't depend on previous state, a
     volatile reference or field may be sufficient, but now we have
     concurrency utilities, why not use an atomic reference or field
     instead? Then if we find we later need a method based on previous
     state, it's easily proven correct.

Do we have the concurrency utilities? The java.util.concurrent packagesare all "Since 1.5", and some of their classes are "Since 1.6". We canonly use them if we are abandoning any chance of River running with a1.4 rt.jar.

To my mind, the advances in java.util and its sub-packages are a reallystrong motivation for getting to 1.5 or, better, 1.6 ASAP.

Currently, a quick grep indicates java.util.concurrent is only used in./qa/src/com/sun/jini/qa/harness/HeartOfTheMachine.java, which is partof the QA infrastructure, not the production system.

...

Have you got any examples of a formal proof of correctness? Just out of
curiosity?

Unfortunately, the proofs of correctness I've written that related toconcurrency were based on confidential data and written on the job whenI was working as a performance architect for Cray Research and SunMicrosystems.

I do have some coursework proofs from the UCSD Design of Algorithmsgraduate course. I'll dig out an example to send to you directly.

Proof of correctness of large systems is a research topic. I'm talkingabout looking very narrowly at a class that has been shown to beperformance-critical and is being subjected to special performancetuning that uses risky techniques such as volatile.


Patricia

Re: Possible multi-threading bug

Reply via email to