Ok, that's definitely worth trying, I don't have suitable hardware right now, so will see if I can pick up something like a T2000.
Thanks, Peter. ----- Original message ----- > We might be able to get some benefit more portably by writing a Java > program that spawns a specified number of threads, all of which > continuously increment the same AtomicLong. That will cause both CPU > load and, with multiple processor chips, bus activity. Run it, along > with River tests, specifying different numbers of processors. > > Patricia > > On 12/17/2013 8:07 PM, Peter wrote: > > Thanks Patricia, > > > > Good to have someone with experience who can explain why. > > > > Your parstore program is interesting. > > > > Perhaps I should obtain a second hand 8 core 32 way T2000, I'll > > investigate this option in the new year. > > > > In spite of fixing synchronization errors, I'm not seeing contention > > during any tests with thread dumps, the standard tests probably aren't > > sufficiently loading the system. > > > > Regards, > > > > Peter. > > > > > > ----- Original message ----- > > > I don't think you need to assume increased JVM optimization to > > > explain the phenomenon. It is typical of what happens when > > > increasing thread safety in a system that has been subject to > > > testing, but not full thread safety design and code review. > > > > > > Testing, and fixing test failures, tends to eliminate only bugs that > > > have high probability in the current environment of sequences of > > > events and their relative timings. Fixing a bug often shifts the > > > probabilities. A sequence that brings out another bug was low > > > probability with the fixed bug, but high probability without it. > > > > > > It may be worth trying a variation of a program, parstore, I wrote at > > > Sun Microsystems, but you can only do this sort of thing on a > > > dedicated system. Parstore spawned a specified number of threads, > > > each of which did nothing but the fastest form of store supported on > > > Sun SPARC systems, a dump of the floating point registers to memory, > > > in a continuous loop. By filling various queues, keeping buses busy, > > > and generally shifting timings around, it brought out several new > > > test failures. > > > > > > Patricia > > > > > > On 12/17/2013 10:48 AM, Peter wrote: > > > > What I've found during the refactoring process is that fixing one > > > > data race bug will expose another. > > > > > > > > The latest test failure on multiple architectures has previously > > > > never failed during testing (its the only test failure on ubuntu > > > > x64 and arm jdk7), here a boolean field is written to while > > > > synchronized by an event listening thread, the field value is then > > > > read unsynchronized by the test thread, twice in short succession, > > > > the first read should have caused the test to pass, but this read > > > > experiences a data race. The second read of the field sees the > > > > updated value and causes a test failure. Without the race > > > > condition the test thread wouldn't arrive at this point. > > > > > > > > I don't know the internal workings of the jvm, but it seems like > > > > the more correct code is, the more agressively the jvm optimises. > > > > > > > > So now I've fixed so many other issues, these race conditions are > > > > starting to fail every time, rather than occassionally. > > > > > > > >