Re: Progress, and a problem

Gregg Wonderly Fri, 03 Dec 2010 07:17:33 -0800

On 12/3/2010 1:27 AM, Patricia Shanahan wrote:

I'm currently hunting an intermittent bug found by the test
qa/src/com/sun/jini/test/impl/outrigger/matching/StressTestWithShutdown.td


After a failure on Hudson, I modified the .td file to make it fail more often by
increasing the number of entries (10,000), readers (1000), and writers (1000).

The writers write entries in an OutriggerServerImpl JavaSpace. The readers read,
and then take, entries that the writers wrote. Sometimes, a reader fails to find
an entry a writer claims to have written, causing a timeout.

The outrigger implementation depends on the class FastList which seems to use
the infamous Double Checked Locking idiom
(http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html)

The good news is that any memory model related error in FastList, or the related
class EntryHolder, would be a plausible cause of the observed symptom. The bad
news is that FastList and EntryHolder seem to have been written to be very
aggressively parallel, possibly by someone who was only familiar with
sequentially consistent memory. :-(

The important issue in FastList is that it was written with the JDK1.4 memorymodel. After moving River to Java 1.5, we'd have the JSR166 work and the new,consistent memory model where volatile has a true meaning. However, this codein particular is quite complex as you have noted, so even adjusting to the newmemory model could be problematic.

Many people are using Dan Creswell's Blitz JavaSpaces implementation orcommercial versions. I'm partially inclined to suggest that we should discussEOL of outrigger at some point. Even though Javaspaces is a large part of whatJini has been recognized for, it has a focused audience and if we don't havesomeone with knowledge and interest to support outrigger, it may be more of awart than River can deal with.

Usually, it is easy to fix a problem once it has been located. This may be a bit
more difficult, especially because I assume the parallelism is needed for
acceptable JavaSpace performance.

One of the issues that I've found in network intensive applications, is that thelatency of communications is so huge compared to code paths, that all activethreads will fairly quickly end up hovering on top of any use of "synchronized"so that there is always the worst case contention for such protected resources.

It's important to understand how to deal with this by either minimizingsynchronization time, or avoiding funneling kinds of locking mechanisms.


Gregg Wonderly

Re: Progress, and a problem

Reply via email to