Re: CR 6860309 - solaris timing issue on thread startup

2011-11-15 Thread David Holmes

Alan,

On 15/11/2011 11:26 PM, Alan Bateman wrote:

On 15/11/2011 00:56, David Holmes wrote:

:
25 * @bug 4820217 6860309

As per other emails and still waiting from confirmation from Alan. I
don't think the @bug should be updated for any of these test fixes.

The @bug tag is intended to list the bug numbers of bugs that are useful
when you have a test failure and sometimes there can be really useful
information in a bug that turns out to be a test bug. I'm not aware of
any rule that says that the bug numbers shouldn't include bugs that were
test only bugs but it's almost always bug numbers of bugs that caused
the test to be extended to cover additional scenarios. For the tests
being discussed here then I think we should just use our best judgment.
Worse case it's easy to dig into the history with hg.


That was somewhat non-committal :) To me @bug says these are the bugs 
that this test is checking the fix for hence not applicable in any of 
the recent timing/race test fixes.


David


-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-15 Thread Gary Adams

Where can we find the definition of the tag contents?
Whichever way this discussion goes, we should
update that documentation with the conclusions.

On 11/15/11 4:29 PM, David Holmes wrote:

Alan,

On 15/11/2011 11:26 PM, Alan Bateman wrote:

On 15/11/2011 00:56, David Holmes wrote:

:
25 * @bug 4820217 6860309

As per other emails and still waiting from confirmation from Alan. I
don't think the @bug should be updated for any of these test fixes.

The @bug tag is intended to list the bug numbers of bugs that are useful
when you have a test failure and sometimes there can be really useful
information in a bug that turns out to be a test bug. I'm not aware of
any rule that says that the bug numbers shouldn't include bugs that were
test only bugs but it's almost always bug numbers of bugs that caused
the test to be extended to cover additional scenarios. For the tests
being discussed here then I think we should just use our best judgment.
Worse case it's easy to dig into the history with hg.


That was somewhat non-committal :) To me @bug says these are the bugs 
that this test is checking the fix for hence not applicable in any of 
the recent timing/race test fixes.


David


-Alan.




Re: CR 6860309 - solaris timing issue on thread startup

2011-11-15 Thread Alan Bateman

On 15/11/2011 21:29, David Holmes wrote:


That was somewhat non-committal :) To me @bug says these are the bugs 
that this test is checking the fix for hence not applicable in any of 
the recent timing/race test fixes.
It's non-committal because I don't think this has come up before, it's 
really something for the developer's guide. In any case, I don't think 
this discussion should stop us pushing Gary's fixes as he can easily 
revert the @bug tags for now.


-Alan.


RE: CR 6860309 - solaris timing issue on thread startup

2011-11-15 Thread Iris Clark
Hi.

The current practice may be different, but...

The original intent was that every bug would either have a unit/regression test 
or a BugTraq keyword explaining why a test was not provided.  See step 6 on 
this page for the list of valid keywords:

http://openjdk.java.net/guide/changePlanning.html#bug

In the case of bugs against regression tests I've seen two approaches: 

1. Addition of the bugid to the @bug tag
2. Addition of the noreg-self keyword to bugtraq

Both technically fulfill the original intent.  At one point there were audits 
to enforce this.

Knowing how the audits worked (I personally preformed them for a while), I 
tended to favor adding the bugid to @bug as that would minimize the number of 
BugTraq queries.  Even when the network and BugTraq were functioning perfectly, 
a BugTraq query always took longer than just looking at the source (which we 
were already looking at).

Again, the above may no longer be the current practice or recommendation.

Thanks,
iris

-Original Message-
From: Alan Bateman 
Sent: Tuesday, November 15, 2011 2:37 PM
To: David Holmes
Cc: Gary Adams; core-libs-dev@openjdk.java.net
Subject: Re: CR 6860309 - solaris timing issue on thread startup

On 15/11/2011 21:29, David Holmes wrote:

 That was somewhat non-committal :) To me @bug says these are the bugs 
 that this test is checking the fix for hence not applicable in any of 
 the recent timing/race test fixes.
It's non-committal because I don't think this has come up before, it's really 
something for the developer's guide. In any case, I don't think this discussion 
should stop us pushing Gary's fixes as he can easily revert the @bug tags for 
now.

-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-15 Thread David Holmes

Hi Iris,

Still seems to me, based on the FAQ,

http://openjdk.java.net/jtreg/faq.html

that the intent is for @bug to refer to the bug that the test is testing.

But as it is looking like this has been used in an ad-hoc manner anyway 
I'll shut up now. ;-)


Cheers,
David

On 16/11/2011 10:47 AM, Iris Clark wrote:

Hi.

The current practice may be different, but...

The original intent was that every bug would either have a unit/regression test 
or a BugTraq keyword explaining why a test was not provided.  See step 6 on 
this page for the list of valid keywords:

http://openjdk.java.net/guide/changePlanning.html#bug

In the case of bugs against regression tests I've seen two approaches:

1. Addition of the bugid to the @bug tag
2. Addition of the noreg-self keyword to bugtraq

Both technically fulfill the original intent.  At one point there were audits 
to enforce this.

Knowing how the audits worked (I personally preformed them for a while), I 
tended to favor adding the bugid to @bug as that would minimize the number of 
BugTraq queries.  Even when the network and BugTraq were functioning perfectly, 
a BugTraq query always took longer than just looking at the source (which we 
were already looking at).

Again, the above may no longer be the current practice or recommendation.

Thanks,
iris

-Original Message-
From: Alan Bateman
Sent: Tuesday, November 15, 2011 2:37 PM
To: David Holmes
Cc: Gary Adams; core-libs-dev@openjdk.java.net
Subject: Re: CR 6860309 - solaris timing issue on thread startup

On 15/11/2011 21:29, David Holmes wrote:


That was somewhat non-committal :) To me @bug says these are the bugs
that this test is checking the fix for hence not applicable in any of
the recent timing/race test fixes.

It's non-committal because I don't think this has come up before, it's really 
something for the developer's guide. In any case, I don't think this discussion 
should stop us pushing Gary's fixes as he can easily revert the @bug tags for 
now.

-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-14 Thread Alan Bateman

On 14/11/2011 00:42, David Holmes wrote:


Will the exec'd process block until the copier threads read from its 
output streams? If not then the copier threads (well stdin anyway) 
could read their input and have terminated before the main thread even 
reaches the original sleep() call.


I don't think this test can be written correctly as-is. Even using a 
CountDownLatch won't help because you have to sync with two copier 
threads, so the first could be finished before the second signals the 
latch.


I would think we would need to exec our own process (a Java one of 
course) that assists with the synchronization issue - ie by not 
terminating until it receives an input token. At least that way we 
know the copier threads can not proceed passed the read() calls, even 
if we can't be 100% certain they are in the read at the time the 
process is destroyed.
The test runs cat without any arguments so the Copier threads will block 
when they read from the stream. If we can get the main thread to wait 
until the Copier threads are just about to do the read, sleep for a bit, 
then the threads should be blocked in the read when we destroy the 
process. It's not ideal but I don't think there is any other way to test 
this.


-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-14 Thread Rémi Forax

On 11/14/2011 05:09 PM, Gary Adams wrote:

 I've updated the webrev for CR#6860309 using a CountDownLatch.
The main thread will wait til both worker threads are ready to
block on the read() before the process is destroyed.

   http://cr.openjdk.java.net/~gadams/6860309/

Tested with -Xcomp, but still may need a older slow solaris/sparc
machine to confirm the fix works for the original bug
submitter.



Hi Gary,
I think it's better if a Copier takes the CountDownLatch as parameter
of the constructor instead of being declared static.

Rémi





Re: CR 6860309 - solaris timing issue on thread startup

2011-11-14 Thread Gary Adams

 Updated to move static latch to Copier
constructor argument.

On 11/14/11 11:09 AM, Gary Adams wrote:

 I've updated the webrev for CR#6860309 using a CountDownLatch.
The main thread will wait til both worker threads are ready to
block on the read() before the process is destroyed.

   http://cr.openjdk.java.net/~gadams/6860309/

Tested with -Xcomp, but still may need a older slow solaris/sparc
machine to confirm the fix works for the original bug
submitter.





Re: CR 6860309 - solaris timing issue on thread startup

2011-11-14 Thread David Holmes

On 14/11/2011 6:05 PM, Alan Bateman wrote:

The test runs cat without any arguments so the Copier threads will block
when they read from the stream.


Thank's Alan that critical point had escaped my notice.

David

 If we can get the main thread to wait

until the Copier threads are just about to do the read, sleep for a bit,
then the threads should be blocked in the read when we destroy the
process. It's not ideal but I don't think there is any other way to test
this.

-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-13 Thread David Holmes

Alan,

On 12/11/2011 9:58 PM, Alan Bateman wrote:

On 11/11/2011 16:56, Gary Adams wrote:

CR 6860309 - TEST_BUG: Insufficient sleep time in
java/lang/Runtime/exec/StreamsSurviveDestroy.java

A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

http://cr.openjdk.java.net/~gadams/6860309/



-Xcomp on a slow machine, always fun when testing the untestable.

I agree with David but I don't think there is perfect solution. I would
suggest using a CountDownLatch or other synchronization so that the main
thread waits until the Copier thread is just about to do the read. Then
do a sleep in the main thread before invoking the destroy method. I
suspect that is the best that you can do as can't be guaranteed that the
Copier thread is blocked in the underlying read.


Will the exec'd process block until the copier threads read from its 
output streams? If not then the copier threads (well stdin anyway) could 
read their input and have terminated before the main thread even reaches 
the original sleep() call.


I don't think this test can be written correctly as-is. Even using a 
CountDownLatch won't help because you have to sync with two copier 
threads, so the first could be finished before the second signals the latch.


I would think we would need to exec our own process (a Java one of 
course) that assists with the synchronization issue - ie by not 
terminating until it receives an input token. At least that way we know 
the copier threads can not proceed passed the read() calls, even if we 
can't be 100% certain they are in the read at the time the process is 
destroyed.


Gary: while fixing timing bugs is a worthwhile goal in terms of test 
stability etc it is rarely if ever low hanging fruit as you have found.


David


-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-12 Thread Alan Bateman

On 11/11/2011 16:56, Gary Adams wrote:
 CR 6860309 - TEST_BUG: Insufficient sleep time in 
java/lang/Runtime/exec/StreamsSurviveDestroy.java


A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

  http://cr.openjdk.java.net/~gadams/6860309/



-Xcomp on a slow machine, always fun when testing the untestable.

I agree with David but I don't think there is perfect solution. I would 
suggest using a CountDownLatch or other synchronization so that the main 
thread waits until the Copier thread is just about to do the read. Then 
do a sleep in the main thread before invoking the destroy method. I 
suspect that is the best that you can do as can't be guaranteed that the 
Copier thread is blocked in the underlying read.


-Alan.


Re: CR 6860309 - solaris timing issue on thread startup

2011-11-12 Thread Gary Adams

On 11/12/11 6:58 AM, Alan Bateman wrote:

On 11/11/2011 16:56, Gary Adams wrote:
 CR 6860309 - TEST_BUG: Insufficient sleep time in 
java/lang/Runtime/exec/StreamsSurviveDestroy.java


A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

  http://cr.openjdk.java.net/~gadams/6860309/



-Xcomp on a slow machine, always fun when testing the untestable.

I agree with David but I don't think there is perfect solution. I 
would suggest using a CountDownLatch or other synchronization so that 
the main thread waits until the Copier thread is just about to do the 
read. Then do a sleep in the main thread before invoking the destroy 
method. I suspect that is the best that you can do as can't be 
guaranteed that the Copier thread is blocked in the underlying read.


-Alan.


I'll give it a try on Monday. Ideally, I could get a reproducible failure
before trying to fix the problem. Unfortunately, the only slow machines
I have on hand are linux/arm, which might have different issues
from the solaris/sparc originally reported.



Re: CR 6860309 - solaris timing issue on thread startup

2011-11-11 Thread David Holmes

Hi Gary,

On 12/11/2011 2:56 AM, Gary Adams wrote:

CR 6860309 - TEST_BUG: Insufficient sleep time in
java/lang/Runtime/exec/StreamsSurviveDestroy.java

A timing problem is reported for slow solaris systems for this
test to start up a process and systematically torture the underlying
threads processing data from the running process.

On my fast solaris machine I can not reproduce the error,
but it is reasonable to assume that on a slower machine there
could be scheduling issues that could delay the thread startup
past the designated 100 millisecond delay in the main thread.

This webrev suggests gating the process destruction until both
worker threads are alive.

http://cr.openjdk.java.net/~gadams/6860309/


This doesn't achieve anything I'm afraid. isAlive() will return true 
once start() has been invoked on the thread (it doesn't depend on the 
new thread getting scheduled to run), hence there will only be a single 
call to sleep - no different from today. If the issue is that c1 and c2 
have not performed some action necessary before the process interaction 
then that action needs to be synchronized with explicitly.


David
-