Alan,

On 12/11/2011 9:58 PM, Alan Bateman wrote:
On 11/11/2011 16:56, Gary Adams wrote:
CR 6860309 - TEST_BUG: Insufficient sleep time in
java/lang/Runtime/exec/StreamsSurviveDestroy.java

A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

http://cr.openjdk.java.net/~gadams/6860309/


-Xcomp on a slow machine, always fun when testing the untestable.

I agree with David but I don't think there is perfect solution. I would
suggest using a CountDownLatch or other synchronization so that the main
thread waits until the Copier thread is just about to do the read. Then
do a sleep in the main thread before invoking the destroy method. I
suspect that is the best that you can do as can't be guaranteed that the
Copier thread is blocked in the underlying read.

Will the exec'd process block until the copier threads read from its output streams? If not then the copier threads (well stdin anyway) could read their input and have terminated before the main thread even reaches the original sleep() call.

I don't think this test can be written correctly as-is. Even using a CountDownLatch won't help because you have to sync with two copier threads, so the first could be finished before the second signals the latch.

I would think we would need to exec our own process (a Java one of course) that assists with the synchronization issue - ie by not terminating until it receives an input token. At least that way we know the copier threads can not proceed passed the read() calls, even if we can't be 100% certain they are in the read at the time the process is destroyed.

Gary: while fixing timing bugs is a worthwhile goal in terms of test stability etc it is rarely if ever "low hanging fruit" as you have found.

David

-Alan.

Reply via email to