Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Sat, 7 May 2022 04:59:44 GMT, Thomas Stuefe wrote: >> Jaroslav Bachorik has refreshed the contents of this pull request, and >> previous commits have been removed. The incremental views will show >> differences compared to the previous content of the PR. The pull request >> contains one new commit since the last revision: >> >> Move 'in_asgct' flag to JavaThread > > src/hotspot/share/prims/forte.cpp line 594: > >> 592: } >> 593: >> 594: thread->set_in_asgct(true); > > Suggestion: Use a small RAII Mark object instead. That way you prevent future > errors if someone should add a return in the mids of AGCT. Done - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread src/hotspot/share/prims/forte.cpp line 594: > 592: } > 593: > 594: thread->set_in_asgct(true); Suggestion: Use a small RAII Mark object instead. That way you prevent future errors if someone should add a return in the mids of AGCT. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 14:20:30 GMT, Jaroslav Bachorik wrote: >> Functionally this looks good now - thanks. >> >> The only concern is the overhead added to `find_blob` to account for this >> very special case. Can you do some benchmarking of this? >> >> Thanks. > > Hi @dholmes-ora , thanks for taking time to review this. > > I am running `jmh-jdk-microbenchmarks` now but I am going to be away for a > week now so I will get back to you with the results only then. Hi @jbachorik, thanks a lot for taking my suggestion. This looks way less intrusive now! - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread Good catch! - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread David, your example has an extra || result->is_zombie(), but otherwise this sounds perfect. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread Looking more closely this even be put in a new final clause in the guarantee: guarantee(result == NULL || !result->is_zombie() || result->is_locked_by_vm() || VMError::is_error_reported() || result->is_zombie() && current_thread_in_asgct() , "unsafe access to zombie method"); where `current_thread_in_asgct()` is a static helper. That addresses all overhead concerns. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 20:17:53 GMT, Dean Long wrote: >> Jaroslav Bachorik has refreshed the contents of this pull request, and >> previous commits have been removed. The incremental views will show >> differences compared to the previous content of the PR. The pull request >> contains one new commit since the last revision: >> >> Move 'in_asgct' flag to JavaThread > > It seems like you should be able to mitigate the extra overhead by only doing > that extra work when we actually see a zombie method, which should be rare. Yes good point @dean-long ! Check for the zombie first and only then do the expensive Thread::current_or_null_safe() etc. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread It seems like you should be able to mitigate the extra overhead by only doing that extra work when we actually see a zombie method, which should be rare. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 10:08:09 GMT, David Holmes wrote: >> Jaroslav Bachorik has refreshed the contents of this pull request, and >> previous commits have been removed. The incremental views will show >> differences compared to the previous content of the PR. The pull request >> contains one new commit since the last revision: >> >> Move 'in_asgct' flag to JavaThread > > Functionally this looks good now - thanks. > > The only concern is the overhead added to `find_blob` to account for this > very special case. Can you do some benchmarking of this? > > Thanks. Hi @dholmes-ora , thanks for taking time to review this. I am running `jmh-jdk-microbenchmarks` now but I am going to be away for a week now so I will get back to you with the results only then. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik wrote: >> A gist of the fix is to allow relaxed special handling of code blob lookup >> when done for ASGCT. >> >> Currently, a guarantee will fail when we happen to hit a zombie method which >> is still on stack. While this would indicate a serious error for the normal >> execution flow, in case of ASGCT being in progress when the executing thread >> can be expected at any possible method this is something which may happen >> and we really should not take the profiled JVM down due to it. >> >> >> Unfortunately, I am not able to create a simple reproducer for the crash >> other that testing in our production where the crash is happening >> sporadically. >> However, thanks to @parttimenerd and his [ASGCT stress >> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be >> reproduced quite reliably. >> >> >> >> _Note: This is a followup PR for #8061_ > > Jaroslav Bachorik has refreshed the contents of this pull request, and > previous commits have been removed. The incremental views will show > differences compared to the previous content of the PR. The pull request > contains one new commit since the last revision: > > Move 'in_asgct' flag to JavaThread Functionally this looks good now - thanks. The only concern is the overhead added to `find_blob` to account for this very special case. Can you do some benchmarking of this? Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8549
Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]
> A gist of the fix is to allow relaxed special handling of code blob lookup > when done for ASGCT. > > Currently, a guarantee will fail when we happen to hit a zombie method which > is still on stack. While this would indicate a serious error for the normal > execution flow, in case of ASGCT being in progress when the executing thread > can be expected at any possible method this is something which may happen and > we really should not take the profiled JVM down due to it. > > > Unfortunately, I am not able to create a simple reproducer for the crash > other that testing in our production where the crash is happening > sporadically. > However, thanks to @parttimenerd and his [ASGCT stress > test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be > reproduced quite reliably. > > > > _Note: This is a followup PR for #8061_ Jaroslav Bachorik has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Move 'in_asgct' flag to JavaThread - Changes: - all: https://git.openjdk.java.net/jdk/pull/8549/files - new: https://git.openjdk.java.net/jdk/pull/8549/files/33ac63eb..949609c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=8549=05 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8549=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/8549.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8549/head:pull/8549 PR: https://git.openjdk.java.net/jdk/pull/8549