Thanks Patricia,

Good to have someone with experience who can explain why.

Your parstore program is interesting.

Perhaps I should obtain a second hand 8 core 32 way T2000, I'll investigate 
this option in the new year.

In spite of fixing synchronization errors, I'm not seeing contention during any 
tests with thread dumps, the standard tests probably aren't sufficiently 
loading the system.

Regards,

Peter.


----- Original message -----
> I don't think you need to assume increased JVM optimization to explain
> the phenomenon. It is typical of what happens when increasing thread
> safety in a system that has been subject to testing, but not full thread
> safety design and code review.
>
> Testing, and fixing test failures, tends to eliminate only bugs that
> have high probability in the current environment of sequences of events
> and their relative timings. Fixing a bug often shifts the probabilities.
> A sequence that brings out another bug was low probability with the
> fixed bug, but high probability without it.
>
> It may be worth trying a variation of a program, parstore, I wrote at
> Sun Microsystems, but you can only do this sort of thing on a dedicated
> system. Parstore spawned a specified number of threads, each of which
> did nothing but the fastest form of store supported on Sun SPARC
> systems, a dump of the floating point registers to memory, in a
> continuous loop. By filling various queues, keeping buses busy, and
> generally shifting timings around, it brought out several new test
> failures.
>
> Patricia
>
> On 12/17/2013 10:48 AM, Peter wrote:
> > What I've found during the refactoring process is that fixing one
> > data race bug will expose another.
> >
> > The latest test failure on multiple architectures has previously
> > never failed during testing (its the only test failure on ubuntu x64
> > and arm jdk7), here a boolean field is written to while synchronized
> > by an event listening thread, the field value is then read
> > unsynchronized by the test thread, twice in short succession, the
> > first read should have caused the test to pass, but this read
> > experiences a data race.   The second read of the field sees the
> > updated value and causes a test failure.   Without the race condition
> > the test thread wouldn't arrive at this point.
> >
> > I don't know the internal workings of the jvm, but it seems like the
> > more correct code is, the more agressively the jvm optimises.
> >
> > So now I've fixed so many other issues, these race conditions are
> > starting to fail every time, rather than occassionally.
>

Reply via email to