Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

2020-06-03 Thread Daniil Titov
Hi Chris,

 

> Do you think 60 seconds is a bit long? Isn't the expectation that the join 
> should happen almost immediately or not at all?


In case if an exception is thrown in the try block after the thread is started 
and before it is moved in the terminated state the join never happens at all. 
And in other cases the join should happen immediately. I  agree that 60 seconds 
look as a bit long but I just wanted to minimize the odds that that we miss the 
log and the root cause  if the issue is reproduced on some slow machine with 
such VM options as, say,  -Xcomp.

 

>I don't think you need a separate bug, but you should document in the bug what 
>currently can and can't be reproduce and what is being fixed.


I will update the bug with this information.

 

Best regards,

Daniil

 

 

From: Chris Plummer 
Date: Wednesday, June 3, 2020 at 12:46 PM
To: Daniil Titov , David Holmes 
, serviceability-dev 

Subject: Re: RFR: 8081652: 
java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out 
intermittently

 

Hi Daniil,

Ok, I misread getLog() and see that is now always returns the log. Do you think 
60 seconds is a bit long? Isn't the expectation that the join should happen 
almost immediately or not at all?

I don't think you need a separate bug, but you should document in the bug what 
currently can and can't be reproduce and what is being fixed.

thanks,

Chris

On 6/3/20 12:08 PM, Daniil Titov wrote:

Hi Chris,

 

I was not able to reproduce the original issue anymore in Mach5. However, the 
test itself has a potential for a deadlock (that was also reported) and in the 
proposed change we fix it.  The log still should be printed and the expectation 
is that we will be able to see the underlying problem in it if it ever 
reproduced.

 

I could create a separate bug ( not sure if the subtask is a good fit here 
since the change fixes some problem in the test ) and close the current one as 
not reproducible if you think it is a better approach.

 

Regarding Thread.suspend() and Thread.resume() methods the test also checks the 
thread state after these methods are invoked  and since these deprecated 
methods are still  in API I don’t think we should exclude them from being 
tested.

 

Best regards,

Daniil

 

From: Chris Plummer 
Date: Wednesday, June 3, 2020 at 11:40 AM
To: David Holmes , Daniil Titov 
, serviceability-dev 

Subject: Re: RFR: 8081652: 
java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out 
intermittently

 

On 6/1/20 12:10 AM, David Holmes wrote:

Hi Daniil, 

On 30/05/2020 10:07 am, Daniil Titov wrote: 



Please review a change [1] that  fixes an intermittent test timeout. 

The main logic of the test has this basic structure: 

try { 
   // lots of thread state manipulation of target 
} 
finally { 
   thread.getLog(); 
} 

and as David noticed in his comment  ( the last comment in [2] )  if an 
exception occurs anywhere 
in the try block we can hang waiting for the join() in getLog() because we 
haven't executed the logic that 
tells the thread to terminate. 


So the fix puts a timeout on the join() which means the test will no longer 
timeout but it will still fail when whatever was leading to the timeout now 
happens. So as a diagnostic fix this seems fine. Hopefully the logger will show 
what we need to see and determine the real underlying problem. 

If this change is really just diagnostic in nature, then it should be a 
subtask. However, it seems to me it will actually hide the failure. The test 
won't get a timeout and won't print the log. Am I missing something?

Also, after reading through the bug comments it looks like the getLog()/join() 
timeout issue is different from the main issue that caused the CR to be filed 
in the first place. Comments regarding the initial problem are:

"According to the stack trace the test seems to hang on trying to load the 
'java.lang.Math' class concurrently. "

"Need to see some native stacks to understand why the classloading thread is 
not proceeding even though RUNNABLE."

"I should have looked at the test first - it uses Thread.suspend and 
Thread.resume and so is inherently deadlock prone."

 

Does this issue no longer exist, or have we decided that since the test is 
expected to be deadlock prone to just ignore it.

thanks,

Chris




Thanks, 
David 
- 




Testing:  Running a modified test that explicitly throws a runtime exception 
inside the try block shows the fix solves the problem. 
  Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are 
in progress. 

[1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ 
[2] https://bugs.openjdk.java.net/browse/JDK-8081652 

Thank you, 
Daniil 













Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

2020-06-03 Thread Chris Plummer

  
  
Hi Daniil,
  
  Ok, I misread getLog() and see that is now always returns the log.
  Do you think 60 seconds is a bit long? Isn't the expectation that
  the join should happen almost immediately or not at all?
  
  I don't think you need a separate bug, but you should document in
  the bug what currently can and can't be reproduce and what is
  being fixed.
  
  thanks,
  
  Chris
  
  On 6/3/20 12:08 PM, Daniil Titov wrote:


  
  
  
  
Hi Chris,
 
I was not able to reproduce the original
  issue anymore in Mach5. However, the test itself has a
  potential for a deadlock (that was also reported) and in the
  proposed change we fix it.  The log still should be printed
  and the expectation is that we will be able to see the
  underlying problem in it if it ever reproduced.
 
I could create a separate bug ( not sure if
  the subtask is a good fit here since the change fixes some
  problem in the test ) and close the current one as not
  reproducible if you think it is a better approach.
 
Regarding Thread.suspend() and
  Thread.resume() methods the test also checks the thread state
  after these methods are invoked  and since these deprecated
  methods are still  in API I don’t think we should exclude them
  from being tested.
 
Best regards,
Daniil
 

  From: Chris Plummer
  
  Date: Wednesday, June 3, 2020 at 11:40 AM
  To: David Holmes ,
  Daniil Titov ,
  serviceability-dev
  
  Subject: Re: RFR: 8081652:
  java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java
  timed out intermittently


   


  On 6/1/20 12:10 AM, David Holmes wrote:


  Hi Daniil, 

On 30/05/2020 10:07 am, Daniil Titov wrote: 


  
Please review a change [1] that  fixes
  an intermittent test timeout. 
  
  The main logic of the test has this basic structure: 
  
  try { 
     // lots of thread state manipulation of target 
  } 
  finally { 
     thread.getLog(); 
  } 
  
  and as David noticed in his comment  ( the last comment in
  [2] )  if an exception occurs anywhere 
  in the try block we can hang waiting for the join() in
  getLog() because we haven't executed the logic that 
  tells the thread to terminate. 
  
  
So the fix puts a timeout on the join() which means the test
will no longer timeout but it will still fail when whatever
was leading to the timeout now happens. So as a diagnostic
fix this seems fine. Hopefully the logger will show what we
need to see and determine the real underlying problem. 

If this change is really just diagnostic in
  nature, then it should be a subtask. However, it seems to me
  it will actually hide the failure. The test won't get a
  timeout and won't print the log. Am I missing something?
  
  Also, after reading through the bug comments it looks like the
  getLog()/join() timeout issue is different from the main issue
  that caused the CR to be filed in the first place. Comments
  regarding the initial problem are:
  
  "According to the stack trace the test seems to hang on trying
  to load the 'java.lang.Math' class concurrently. "
  
  "Need to see some native stacks to understand why the
  classloading thread is not proceeding even though RUNNABLE."
  
  "I should have looked at the test first - it uses
  Thread.suspend and Thread.resume and so is inherently deadlock
  prone."

   

Does this issue no longer exist, or have we
  decided that since the test is expected to be deadlock prone
  to just ignore it.
  
  thanks,
  
  Chris
  
  

  
Thanks, 
David 
- 



  
Testing: 
  Running a modified test that explicitly throws a runtime
  exception inside the try block shows the fix solves the
  problem. 
    Mach5 tier1-tier3 tests passed. Mach5
  tier4-tier5 tests are in progress. 
  
  [1] http

Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

2020-06-03 Thread Daniil Titov
Hi Chris,

 

I was not able to reproduce the original issue anymore in Mach5. However, the 
test itself has a potential for a deadlock (that was also reported) and in the 
proposed change we fix it.  The log still should be printed and the expectation 
is that we will be able to see the underlying problem in it if it ever 
reproduced.

 

I could create a separate bug ( not sure if the subtask is a good fit here 
since the change fixes some problem in the test ) and close the current one as 
not reproducible if you think it is a better approach.

 

Regarding Thread.suspend() and Thread.resume() methods the test also checks the 
thread state after these methods are invoked  and since these deprecated 
methods are still  in API I don’t think we should exclude them from being 
tested.

 

Best regards,

Daniil

 

From: Chris Plummer 
Date: Wednesday, June 3, 2020 at 11:40 AM
To: David Holmes , Daniil Titov 
, serviceability-dev 

Subject: Re: RFR: 8081652: 
java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out 
intermittently

 

On 6/1/20 12:10 AM, David Holmes wrote:

Hi Daniil, 

On 30/05/2020 10:07 am, Daniil Titov wrote: 


Please review a change [1] that  fixes an intermittent test timeout. 

The main logic of the test has this basic structure: 

try { 
   // lots of thread state manipulation of target 
} 
finally { 
   thread.getLog(); 
} 

and as David noticed in his comment  ( the last comment in [2] )  if an 
exception occurs anywhere 
in the try block we can hang waiting for the join() in getLog() because we 
haven't executed the logic that 
tells the thread to terminate. 


So the fix puts a timeout on the join() which means the test will no longer 
timeout but it will still fail when whatever was leading to the timeout now 
happens. So as a diagnostic fix this seems fine. Hopefully the logger will show 
what we need to see and determine the real underlying problem. 

If this change is really just diagnostic in nature, then it should be a 
subtask. However, it seems to me it will actually hide the failure. The test 
won't get a timeout and won't print the log. Am I missing something?

Also, after reading through the bug comments it looks like the getLog()/join() 
timeout issue is different from the main issue that caused the CR to be filed 
in the first place. Comments regarding the initial problem are:

"According to the stack trace the test seems to hang on trying to load the 
'java.lang.Math' class concurrently. "

"Need to see some native stacks to understand why the classloading thread is 
not proceeding even though RUNNABLE."

"I should have looked at the test first - it uses Thread.suspend and 
Thread.resume and so is inherently deadlock prone."

 

Does this issue no longer exist, or have we decided that since the test is 
expected to be deadlock prone to just ignore it.

thanks,

Chris



Thanks, 
David 
- 



Testing:  Running a modified test that explicitly throws a runtime exception 
inside the try block shows the fix solves the problem. 
  Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are 
in progress. 

[1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/ 
[2] https://bugs.openjdk.java.net/browse/JDK-8081652 

Thank you, 
Daniil 








Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

2020-06-03 Thread Chris Plummer

  
  
On 6/1/20 12:10 AM, David Holmes wrote:

Hi
  Daniil,
  
  
  On 30/05/2020 10:07 am, Daniil Titov wrote:
  
  Please review a change [1] that  fixes an
intermittent test timeout.


The main logic of the test has this basic structure:


try {

   // lots of thread state manipulation of target

}

finally {

   thread.getLog();

}


and as David noticed in his comment  ( the last comment in [2]
)  if an exception occurs anywhere

in the try block we can hang waiting for the join() in getLog()
because we haven't executed the logic that

tells the thread to terminate.

  
  
  So the fix puts a timeout on the join() which means the test will
  no longer timeout but it will still fail when whatever was leading
  to the timeout now happens. So as a diagnostic fix this seems
  fine. Hopefully the logger will show what we need to see and
  determine the real underlying problem.
  

If this change is really just diagnostic in nature, then it should
be a subtask. However, it seems to me it will actually hide the
failure. The test won't get a timeout and won't print the log. Am I
missing something?

Also, after reading through the bug comments it looks like the
getLog()/join() timeout issue is different from the main issue that
caused the CR to be filed in the first place. Comments regarding the
initial problem are:

"According to the stack trace the test seems to hang on trying to
load the 'java.lang.Math' class concurrently. "

"Need to see some native stacks to understand why the classloading
thread is not proceeding even though RUNNABLE."

"I should have looked at the test first - it uses Thread.suspend and
Thread.resume and so is inherently deadlock prone."


Does this issue no longer exist, or have we decided that since the
test is expected to be deadlock prone to just ignore it.

thanks,

Chris

  
  Thanks,
  
  David
  
  -
  
  
  Testing:  Running a modified test that
explicitly throws a runtime exception inside the try block shows
the fix solves the problem.

  Mach5 tier1-tier3 tests passed. Mach5
tier4-tier5 tests are in progress.


[1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/

[2] https://bugs.openjdk.java.net/browse/JDK-8081652


Thank you,

Daniil




  


  




Re: RFR: 8081652: java/lang/management/ThreadMXBean/ThreadMXBeanStateTest.java timed out intermittently

2020-06-01 Thread David Holmes

Hi Daniil,

On 30/05/2020 10:07 am, Daniil Titov wrote:

Please review a change [1] that  fixes an intermittent test timeout.

The main logic of the test has this basic structure:

try {
   // lots of thread state manipulation of target
}
finally {
   thread.getLog();
}

and as David noticed in his comment  ( the last comment in [2] )  if an 
exception occurs anywhere
in the try block we can hang waiting for the join() in getLog() because we 
haven't executed the logic that
tells the thread to terminate.


So the fix puts a timeout on the join() which means the test will no 
longer timeout but it will still fail when whatever was leading to the 
timeout now happens. So as a diagnostic fix this seems fine. Hopefully 
the logger will show what we need to see and determine the real 
underlying problem.


Thanks,
David
-


Testing:  Running a modified test that explicitly throws a runtime exception 
inside the try block shows the fix solves the problem.
  Mach5 tier1-tier3 tests passed. Mach5 tier4-tier5 tests are 
in progress.

[1] http://cr.openjdk.java.net/~dtitov/8081652/webrev.01/
[2] https://bugs.openjdk.java.net/browse/JDK-8081652

Thank you,
Daniil