Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Sat, 30 Mar 2024 01:32:27 GMT, Daniel D. Daugherty wrote: >> It wait for some increment of time upto a maximum of 100 seconds. > > I'm good with that. Thanks for clarifying. Thanks, Dan. I've decided to increase this timeout to 20 secs. Pushed now. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1546232711
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Fri, 29 Mar 2024 23:02:03 GMT, Serguei Spitsyn wrote: >> So that would mean that the native side would always wait for 100 seconds? >> Or will it wait for some increment of time upto a maximum of 100 seconds? > > It wait for some increment of time upto a maximum of 100 seconds. I'm good with that. Thanks for clarifying. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544999877
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Fri, 29 Mar 2024 22:18:24 GMT, Daniel D. Daugherty wrote: >> Thank you for the example and for catching the typo. The timeout factor also >> needs to be passed to the native side. I think, this fragment is not worth >> this kind of extra complexity. One approach would be to just make it big >> enough, eg. make it 100 seconds instead of 10. Another - to get rid of this >> trap at all. What would you prefer? > > So that would mean that the native side would always wait for 100 seconds? > Or will it wait for some increment of time upto a maximum of 100 seconds? It wait for some increment of time upto a maximum of 100 seconds. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544937616
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Fri, 29 Mar 2024 20:44:48 GMT, Serguei Spitsyn wrote: >> runtime/8176717/TestInheritFD.java has an example of what I'm talking about: >> >> public static float timeoutFactor = >> Float.parseFloat(System.getProperty("test.timeout.factor", "1.0")); >> public static long subProcessTimeout = (long)(15L * timeoutFactor); >> >> so you fetch the test.timeout.factor value and then you scale your delay >> value. >> >> Also: >> >> nit typo: s/waitig/waiting/ > > Thank you for the example and for catching the typo. The timeout factor also > needs to be passed to the native side. I think, this fragment is not worth > this kind of extra complexity. One approach would be to just make it big > enough, eg. make it 100 seconds instead of 10. Another - to get rid of this > trap at all. What would you prefer? So that would mean that the native side would always wait for 100 seconds? Or will it wait for some increment of time upto a maximum of 100 seconds? - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544915772
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Fri, 29 Mar 2024 13:29:15 GMT, Daniel D. Daugherty wrote: >> Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've >> added this with `-Xcomp` consideration as the worst case scenario in mind. >> Now, I think it is more save to make it 10 seconds instead of one. Is it >> going to be good enough? In fact, I've added this for manual testing to save >> time in waiting for test completion when it is deadlocked. Also, this is >> better for diagnosability. > > runtime/8176717/TestInheritFD.java has an example of what I'm talking about: > > public static float timeoutFactor = > Float.parseFloat(System.getProperty("test.timeout.factor", "1.0")); > public static long subProcessTimeout = (long)(15L * timeoutFactor); > > so you fetch the test.timeout.factor value and then you scale your delay > value. > > Also: > > nit typo: s/waitig/waiting/ Thank you for the example and for catching the typo. The timeout factor also needs to be passed to the native side. I think, this fragment is not worth this kind of extra complexity. One approach would be to just make it big enough, eg. make it 100 seconds instead of 10. Another - to get rid of this trap at all. What would you prefer? - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544855730
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Thu, 28 Mar 2024 23:29:53 GMT, Serguei Spitsyn wrote: >> Caught this comment in passing. Delays like this should be scaled with >> defaultTimeoutFactor so that test tasks that invoke tests with options >> that can slow the test down, e.g., `-Xcomp`, can be accommodated. >> >> I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped >> from 4 to 10. > > Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've > added this with `-Xcomp` consideration as the worst case scenario in mind. > Now, I think it is more save to make it 10 seconds instead of one. Is it > going to be good enough? In fact, I've added this for manual testing to save > time in waiting for test completion when it is deadlocked. Also, this is > better for diagnosability. runtime/8176717/TestInheritFD.java has an example of what I'm talking about: public static float timeoutFactor = Float.parseFloat(System.getProperty("test.timeout.factor", "1.0")); public static long subProcessTimeout = (long)(15L * timeoutFactor); so you fetch the test.timeout.factor value and then you scale your delay value. Also: nit typo: s/waitig/waiting/ - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544497738
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Fri, 29 Mar 2024 03:50:03 GMT, Chris Plummer wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp >> line 161: >> >>> 159: int attempts = 0; >>> 160: while (!bp_sync_reached) { >>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); >> >> Suggestion: >> >> LOG("Main: ensureAtBreakpoint: waiting 10 millis\n"); >> >> >> I think this should be moved directly above the wait(10) call. > > I think you missed this suggestion. There is a typo in the log message, and > this LOG should be move to just before the wait() statement below. Sorry, missed to do this. Fixed now. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544041108
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Wed, 27 Mar 2024 19:59:28 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one >> additional commit since the last revision: >> >> review: improve diagnostics and reliability > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp > line 161: > >> 159: int attempts = 0; >> 160: while (!bp_sync_reached) { >> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); > > Suggestion: > > LOG("Main: ensureAtBreakpoint: waiting 10 millis\n"); > > > I think this should be moved directly above the wait(10) call. I think you missed this suggestion. There is a typo in the log message, and this LOG should be move to just before the wait() statement below. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1544028878
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Wed, 27 Mar 2024 20:27:42 GMT, Daniel D. Daugherty wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp >> line 163: >> >>> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); >>> 162: if (++attempts > 100) { >>> 163: fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec"); >> >> 1 second isn't very long when you are talking about something that is >> relying on debugger/debuggee communication. Maybe wait 100ms at a time for a >> total of 10 seconds. > > Caught this comment in passing. Delays like this should be scaled with > defaultTimeoutFactor so that test tasks that invoke tests with options > that can slow the test down, e.g., `-Xcomp`, can be accommodated. > > I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped > from 4 to 10. Thanks for the comments, Chris and Dan. Updated as Chris suggested. I've added this with `-Xcomp` consideration as the worst case scenario in mind. Now, I think it is more save to make it 10 seconds instead of one. Is it going to be good enough? In fact, I've added this for manual testing to save time in waiting for test completion when it is deadlocked. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543839536
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Wed, 27 Mar 2024 20:06:54 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one >> additional commit since the last revision: >> >> review: improve diagnostics and reliability > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java > line 148: > >> 146: log("Main #B.2: got expected JVMTI_ERROR_NONE"); >> 147: resumeThread(testTaskThread); >> 148: > > You probably don't need this minor edit or the copyright update any more. > > However, it's still unclear to me how the ensureAtBreakpoint() below is > suppose to work if we already called notifyAtBreakpoint() between the 1st and > 2nd ensureAtBreakpoint(). From what I can tell, notifyAtBreakpoint() clears > the flag that ensureAtBreakpoint() checks for, and there is no 2nd breakpoint > to set it again. I would expect the ensureAtBreakpoint() below to always wait > indefinitely, but that doesn't appear to be the case. So how is this working? > I must be missing something. Thanks, Chris. I noticed that some an additional cleanup is needed in the Java part. The `TestTask.run()` has two calls to the method `B()`: public void run() { log("TestTask.run: started"); A(); sleep(1); // to cause yield prepareAgent(TestTask.class, false); // No doPopFrame B(); sleep(1); // to cause yield prepareAgent(TestTask.class, true); // doPopFrame B(); sleep(1); // to cause yield C(); finished = true; } The second `B()` call does the JVMTI `PopFrame` call on its own thread in the `Breakpoint()` event hadler. Please, note, `true` in the `prepareAgent` parameters: `prepareAgent(TestTask.class, true); // doPopFrame` So, the second `ensureAtBreakpoint()` is a sync point with it. > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp > line 58: > >> 56: LOG("Breakpoint: In method TestTask.B(): after sync section\n"); >> 57: >> 58: if (do_pop_frame != JNI_FALSE) { > > Suggestion: > > if (do_pop_frame) { Okay, thanks. I can do this change. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543825742 PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1543826541
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Wed, 27 Mar 2024 20:08:19 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one >> additional commit since the last revision: >> >> review: improve diagnostics and reliability > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp > line 163: > >> 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); >> 162: if (++attempts > 100) { >> 163: fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec"); > > 1 second isn't very long when you are talking about something that is relying > on debugger/debuggee communication. Maybe wait 100ms at a time for a total of > 10 seconds. Caught this comment in passing. Delays like this should be scaled with defaultTimeoutFactor so that test tasks that invoke tests with options that can slow the test down, e.g., `-Xcomp`, can be accommodated. I believe the defaultTimeoutFact for `-Xcomp` test tasks gets bumped from 4 to 10. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541986223
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
On Wed, 27 Mar 2024 06:44:37 GMT, Serguei Spitsyn wrote: >> This PR fixes a synchronization issue in the test: >> `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` >> >> The method `notifyAtBreakpoint()` can notify the `TestTask` thread when it >> has not reached an expected breakpoint yet. >> The fix is to add a call to the method `ensureAtBreakpoint()` one more time >> in the `B2` sub-test. It is needed after the top-most frame was popped with >> the JVMTI `PopFrame`, and the target thread needs to reach the breakpoint >> again after its execution was resumed. >> >> The time is very intermittent. At least, I was not able to reproduce the >> timeout failure in thousands of mach5 runs with the `-Xcomp` option. >> >> Testing: >> - Run the test >> `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` thousands >> times in mach5 > > Serguei Spitsyn has updated the pull request incrementally with one > additional commit since the last revision: > > review: improve diagnostics and reliability test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 148: > 146: log("Main #B.2: got expected JVMTI_ERROR_NONE"); > 147: resumeThread(testTaskThread); > 148: You probably don't need this minor edit or the copyright update any more. However, it's still unclear to me how the ensureAtBreakpoint() below is suppose to work if we already called notifyAtBreakpoint() between the 1st and 2nd ensureAtBreakpoint(). From what I can tell, notifyAtBreakpoint() clears the flag that ensureAtBreakpoint() checks for, and there is no 2nd breakpoint to set it again. I would expect the ensureAtBreakpoint() below to always wait indefinitely, but that doesn't appear to be the case. So how is this working? I must be missing something. test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 58: > 56: LOG("Breakpoint: In method TestTask.B(): after sync section\n"); > 57: > 58: if (do_pop_frame != JNI_FALSE) { Suggestion: if (do_pop_frame) { test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 161: > 159: int attempts = 0; > 160: while (!bp_sync_reached) { > 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); Suggestion: LOG("Main: ensureAtBreakpoint: waiting 10 millis\n"); I think this should be moved directly above the wait(10) call. test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 163: > 161: LOG("Main: ensureAtBreakpoint: waitig 5 millis\n"); > 162: if (++attempts > 100) { > 163: fatal(jni, "Main: ensureAtBreakpoint: waited 1 sec"); 1 second isn't very long when you are talking about something that is relying on debugger/debuggee communication. Maybe wait 100ms at a time for a total of 10 seconds. - PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541957419 PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541958086 PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541947950 PR Review Comment: https://git.openjdk.org/jdk/pull/18419#discussion_r1541960905
Re: RFR: 8328665: serviceability/jvmti/vthread/PopFrameTest failed with a timeout [v4]
> This PR fixes a synchronization issue in the test: > `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` > > The method `notifyAtBreakpoint()` can notify the `TestTask` thread when it > has not reached an expected breakpoint yet. > The fix is to add a call to the method `ensureAtBreakpoint()` one more time > in the `B2` sub-test. It is needed after the top-most frame was popped with > the JVMTI `PopFrame`, and the target thread needs to reach the breakpoint > again after its execution was resumed. > > The time is very intermittent. At least, I was not able to reproduce the > timeout failure in thousands of mach5 runs with the `-Xcomp` option. > > Testing: > - Run the test > `test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest` thousands > times in mach5 Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: improve diagnostics and reliability - Changes: - all: https://git.openjdk.org/jdk/pull/18419/files - new: https://git.openjdk.org/jdk/pull/18419/files/1502492f..f4f92e92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18419&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18419&range=02-03 Stats: 19 lines in 2 files changed: 9 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18419.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18419/head:pull/18419 PR: https://git.openjdk.org/jdk/pull/18419