Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-04-01 Thread Serguei Spitsyn
On Sat, 30 Mar 2024 01:32:27 GMT, Daniel D. Daugherty  
wrote:

>> It wait for some increment of time upto a maximum of 100 seconds.
>
> I'm good with that. Thanks for clarifying.

Thanks, Dan.
I've decided to increase this timeout to 20 secs. Pushed now.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1546232711


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-29 Thread Daniel D . Daugherty
On Fri, 29 Mar 2024 23:02:03 GMT, Serguei Spitsyn  wrote:

>> So that would mean that the native side would always wait for 100 seconds?
>> Or will it wait for some increment of time upto a maximum of 100 seconds?
>
> It wait for some increment of time upto a maximum of 100 seconds.

I'm good with that. Thanks for clarifying.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544999877


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-29 Thread Serguei Spitsyn
On Fri, 29 Mar 2024 22:18:24 GMT, Daniel D. Daugherty  
wrote:

>> Thank you for the example and for catching the typo. The timeout factor also 
>> needs to be passed to the native side. I think, this fragment is not worth 
>> this kind of extra complexity. One approach would be to just make it big 
>> enough, eg. make it 100 seconds instead of 10.  Another - to get rid of this 
>> trap at all. What would you prefer?
>
> So that would mean that the native side would always wait for 100 seconds?
> Or will it wait for some increment of time upto a maximum of 100 seconds?

It wait for some increment of time upto a maximum of 100 seconds.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544937616


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-29 Thread Daniel D . Daugherty
On Fri, 29 Mar 2024 20:44:48 GMT, Serguei Spitsyn  wrote:

>> runtime/8176717/TestInheritFD.java has an example of what I'm talking about:
>> 
>> public static float timeoutFactor = 
>> Float.parseFloat(System.getProperty("test.timeout.factor", "1.0"));
>> public static long subProcessTimeout = (long)(15L * timeoutFactor);
>> 
>> so you fetch the test.timeout.factor value and then you scale your delay 
>> value.
>> 
>> Also:
>> 
>> nit typo: s/waitig/waiting/
>
> Thank you for the example and for catching the typo. The timeout factor also 
> needs to be passed to the native side. I think, this fragment is not worth 
> this kind of extra complexity. One approach would be to just make it big 
> enough, eg. make it 100 seconds instead of 10.  Another - to get rid of this 
> trap at all. What would you prefer?

So that would mean that the native side would always wait for 100 seconds?
Or will it wait for some increment of time upto a maximum of 100 seconds?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544915772


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-29 Thread Serguei Spitsyn
On Fri, 29 Mar 2024 13:29:15 GMT, Daniel D. Daugherty  
wrote:

>> Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've 
>> added this with `-Xcomp` consideration as the worst case scenario in mind. 
>> Now, I think it is more save to make it 10 seconds instead of one. Is it 
>> going to be good enough? In fact, I've added this for manual testing to save 
>> time in waiting for test completion when it is deadlocked. Also, this is 
>> better for diagnosability.
>
> runtime/8176717/TestInheritFD.java has an example of what I'm talking about:
> 
> public static float timeoutFactor = 
> Float.parseFloat(System.getProperty("test.timeout.factor", "1.0"));
> public static long subProcessTimeout = (long)(15L * timeoutFactor);
> 
> so you fetch the test.timeout.factor value and then you scale your delay 
> value.
> 
> Also:
> 
> nit typo: s/waitig/waiting/

Thank you for the example and for catching the typo. The timeout factor also 
needs to be passed to the native side. I think, this fragment is not worth this 
kind of extra complexity. One approach would be to just make it big enough, eg. 
make it 100 seconds instead of 10.  Another - to get rid of this trap at all. 
What would you prefer?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544855730


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-29 Thread Daniel D . Daugherty
On Thu, 28 Mar 2024 23:29:53 GMT, Serguei Spitsyn  wrote:

>> Caught this comment in passing. Delays like this should be scaled with
>> defaultTimeoutFactor so that test tasks that invoke tests with options
>> that can slow the test down, e.g., `-Xcomp`, can be accommodated.
>> 
>> I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped
>> from 4 to 10.
>
> Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've 
> added this with `-Xcomp` consideration as the worst case scenario in mind. 
> Now, I think it is more save to make it 10 seconds instead of one. Is it 
> going to be good enough? In fact, I've added this for manual testing to save 
> time in waiting for test completion when it is deadlocked. Also, this is 
> better for diagnosability.

runtime/8176717/TestInheritFD.java has an example of what I'm talking about:

public static float timeoutFactor = 
Float.parseFloat(System.getProperty("test.timeout.factor", "1.0"));
public static long subProcessTimeout = (long)(15L * timeoutFactor);

so you fetch the test.timeout.factor value and then you scale your delay value.

Also:

nit typo: s/waitig/waiting/

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544497738


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-28 Thread Serguei Spitsyn
On Fri, 29 Mar 2024 03:50:03 GMT, Chris Plummer  wrote:

>> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
>>  line 161:
>> 
>>> 159:   int attempts = 0;
>>> 160:   while (!bp_sync_reached) {
>>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");
>> 
>> Suggestion:
>> 
>> LOG("Main: ensureAtBreakpoint: waiting 10 millis\n");
>> 
>> 
>> I think this should be moved directly above the wait(10) call.
>
> I think you missed this suggestion. There is a typo in the log message, and 
> this LOG should be move to just before the wait() statement below.

Sorry, missed to do this. Fixed now.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544041108


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-28 Thread Chris Plummer
On Wed, 27 Mar 2024 19:59:28 GMT, Chris Plummer  wrote:

>> Serguei Spitsyn has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   review: improve diagnostics and reliability
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
>  line 161:
> 
>> 159:   int attempts = 0;
>> 160:   while (!bp_sync_reached) {
>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");
> 
> Suggestion:
> 
> LOG("Main: ensureAtBreakpoint: waiting 10 millis\n");
> 
> 
> I think this should be moved directly above the wait(10) call.

I think you missed this suggestion. There is a typo in the log message, and 
this LOG should be move to just before the wait() statement below.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544028878


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-28 Thread Serguei Spitsyn
On Wed, 27 Mar 2024 20:27:42 GMT, Daniel D. Daugherty  
wrote:

>> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
>>  line 163:
>> 
>>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");
>>> 162: if (++attempts > 100) {
>>> 163:   fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec");
>> 
>> 1 second isn't very long when you are talking about something that is 
>> relying on debugger/debuggee communication. Maybe wait 100ms at a time for a 
>> total of 10 seconds.
>
> Caught this comment in passing. Delays like this should be scaled with
> defaultTimeoutFactor so that test tasks that invoke tests with options
> that can slow the test down, e.g., `-Xcomp`, can be accommodated.
> 
> I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped
> from 4 to 10.

Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've added 
this with `-Xcomp` consideration as the worst case scenario in mind. Now, I 
think it is more save to make it 10 seconds instead of one. Is it going to be 
good enough? In fact, I've added this for manual testing to save time in 
waiting for test completion when it is deadlocked.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543839536


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-28 Thread Serguei Spitsyn
On Wed, 27 Mar 2024 20:06:54 GMT, Chris Plummer  wrote:

>> Serguei Spitsyn has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   review: improve diagnostics and reliability
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java
>  line 148:
> 
>> 146: log("Main #B.2: got expected JVMTI_ERROR_NONE");
>> 147: resumeThread(testTaskThread);
>> 148: 
> 
> You probably don't need this minor edit or the copyright update any more.
> 
> However, it's still unclear to me how the ensureAtBreakpoint() below is 
> suppose to work if we already called notifyAtBreakpoint() between the 1st and 
> 2nd ensureAtBreakpoint(). From what I can tell, notifyAtBreakpoint() clears 
> the flag that ensureAtBreakpoint() checks for, and there is no 2nd breakpoint 
> to set it again. I would expect the ensureAtBreakpoint() below to always wait 
> indefinitely, but that doesn't appear to be the case. So how is this working? 
> I must be missing something.

Thanks, Chris. I noticed that some an additional cleanup is needed in the Java 
part.
The `TestTask.run()` has two calls to the method `B()`:

public void run() {
log("TestTask.run: started");

A();
sleep(1); // to cause yield

prepareAgent(TestTask.class, false); // No doPopFrame
B();
sleep(1); // to cause yield

prepareAgent(TestTask.class, true); // doPopFrame
B();
sleep(1); // to cause yield

C();
finished = true;
}

The second `B()` call does the JVMTI `PopFrame` call on its own thread in the 
`Breakpoint()` event hadler.
Please, note, `true` in the `prepareAgent` parameters:
   `prepareAgent(TestTask.class, true); // doPopFrame`
So, the second `ensureAtBreakpoint()` is a sync point with it.

> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
>  line 58:
> 
>> 56:   LOG("Breakpoint: In method TestTask.B(): after sync section\n");
>> 57: 
>> 58:   if (do_pop_frame != JNI_FALSE) {
> 
> Suggestion:
> 
>   if (do_pop_frame) {

Okay, thanks. I can do this change.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543825742
PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543826541


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-27 Thread Daniel D . Daugherty
On Wed, 27 Mar 2024 20:08:19 GMT, Chris Plummer  wrote:

>> Serguei Spitsyn has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   review: improve diagnostics and reliability
>
> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
>  line 163:
> 
>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");
>> 162: if (++attempts > 100) {
>> 163:   fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec");
> 
> 1 second isn't very long when you are talking about something that is relying 
> on debugger/debuggee communication. Maybe wait 100ms at a time for a total of 
> 10 seconds.

Caught this comment in passing. Delays like this should be scaled with
defaultTimeoutFactor so that test tasks that invoke tests with options
that can slow the test down, e.g., `-Xcomp`, can be accommodated.

I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped
from 4 to 10.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541986223


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-27 Thread Chris Plummer
On Wed, 27 Mar 2024 06:44:37 GMT, Serguei Spitsyn  wrote:

>> This PR fixes a synchronization issue in the test:
>>   `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest`
>>   
>> The method `notifyAtBreakpoint()` can notify the `TestTask` thread when it 
>> has not reached an expected breakpoint yet.
>> The fix is to add a call to the method `ensureAtBreakpoint()` one more time 
>> in the `B2` sub-test. It is needed after the top-most frame was popped with 
>> the JVMTI `PopFrame`, and the target thread needs to reach the breakpoint 
>> again after its execution was resumed.
>> 
>> The time is very intermittent. At least, I was not able to reproduce the 
>> timeout failure in thousands of mach5 runs with the `-Xcomp` option.
>> 
>> Testing:
>>  - Run the test 
>> `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` thousands 
>> times in mach5
>
> Serguei Spitsyn has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   review: improve diagnostics and reliability

test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java 
line 148:

> 146: log("Main #B.2: got expected JVMTI_ERROR_NONE");
> 147: resumeThread(testTaskThread);
> 148: 

You probably don't need this minor edit or the copyright update any more.

However, it's still unclear to me how the ensureAtBreakpoint() below is suppose 
to work if we already called notifyAtBreakpoint() between the 1st and 2nd 
ensureAtBreakpoint(). From what I can tell, notifyAtBreakpoint() clears the 
flag that ensureAtBreakpoint() checks for, and there is no 2nd breakpoint to 
set it again. I would expect the ensureAtBreakpoint() below to always wait 
indefinitely, but that doesn't appear to be the case. So how is this working? I 
must be missing something.

test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
 line 58:

> 56:   LOG("Breakpoint: In method TestTask.B(): after sync section\n");
> 57: 
> 58:   if (do_pop_frame != JNI_FALSE) {

Suggestion:

  if (do_pop_frame) {

test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
 line 161:

> 159:   int attempts = 0;
> 160:   while (!bp_sync_reached) {
> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");

Suggestion:

LOG("Main: ensureAtBreakpoint: waiting 10 millis\n");


I think this should be moved directly above the wait(10) call.

test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp
 line 163:

> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n");
> 162: if (++attempts > 100) {
> 163:   fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec");

1 second isn't very long when you are talking about something that is relying 
on debugger/debuggee communication. Maybe wait 100ms at a time for a total of 
10 seconds.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541957419
PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541958086
PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541947950
PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541960905


Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]

2024-03-26 Thread Serguei Spitsyn
> This PR fixes a synchronization issue in the test:
>   `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest`
>   
> The method `notifyAtBreakpoint()` can notify the `TestTask` thread when it 
> has not reached an expected breakpoint yet.
> The fix is to add a call to the method `ensureAtBreakpoint()` one more time 
> in the `B2` sub-test. It is needed after the top-most frame was popped with 
> the JVMTI `PopFrame`, and the target thread needs to reach the breakpoint 
> again after its execution was resumed.
> 
> The time is very intermittent. At least, I was not able to reproduce the 
> timeout failure in thousands of mach5 runs with the `-Xcomp` option.
> 
> Testing:
>  - Run the test 
> `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` thousands 
> times in mach5

Serguei Spitsyn has updated the pull request incrementally with one additional 
commit since the last revision:

  review: improve diagnostics and reliability

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18419/files
  - new: https://git.openjdk.org/jdk/pull/18419/files/1502492f..f4f92e92

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18419&range=03
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18419&range=02-03

  Stats: 19 lines in 2 files changed: 9 ins; 4 del; 6 mod
  Patch: https://git.openjdk.org/jdk/pull/18419.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18419/head:pull/18419

PR: https://git.openjdk.org/jdk/pull/18419