Re: CR 6860309 - solaris timing issue on thread startup

David Holmes Sun, 13 Nov 2011 16:43:28 -0800

Alan,

On 12/11/2011 9:58 PM, Alan Bateman wrote:

On 11/11/2011 16:56, Gary Adams wrote:

CR 6860309 - TEST_BUG: Insufficient sleep time in
java/lang/Runtime/exec/StreamsSurviveDestroy.java


A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

http://cr.openjdk.java.net/~gadams/6860309/

-Xcomp on a slow machine, always fun when testing the untestable.

I agree with David but I don't think there is perfect solution. I would
suggest using a CountDownLatch or other synchronization so that the main
thread waits until the Copier thread is just about to do the read. Then
do a sleep in the main thread before invoking the destroy method. I
suspect that is the best that you can do as can't be guaranteed that the
Copier thread is blocked in the underlying read.

Will the exec'd process block until the copier threads read from itsoutput streams? If not then the copier threads (well stdin anyway) couldread their input and have terminated before the main thread even reachesthe original sleep() call.

I don't think this test can be written correctly as-is. Even using aCountDownLatch won't help because you have to sync with two copierthreads, so the first could be finished before the second signals the latch.

I would think we would need to exec our own process (a Java one ofcourse) that assists with the synchronization issue - ie by notterminating until it receives an input token. At least that way we knowthe copier threads can not proceed passed the read() calls, even if wecan't be 100% certain they are in the read at the time the process isdestroyed.

Gary: while fixing timing bugs is a worthwhile goal in terms of teststability etc it is rarely if ever "low hanging fruit" as you have found.


David

-Alan.

Re: CR 6860309 - solaris timing issue on thread startup

Reply via email to