Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-16 Thread Jaroslav Bachorik
On Sat, 7 May 2022 04:59:44 GMT, Thomas Stuefe  wrote:

>> Jaroslav Bachorik has refreshed the contents of this pull request, and 
>> previous commits have been removed. The incremental views will show 
>> differences compared to the previous content of the PR. The pull request 
>> contains one new commit since the last revision:
>> 
>>   Move 'in_asgct' flag to JavaThread
>
> src/hotspot/share/prims/forte.cpp line 594:
> 
>> 592:   }
>> 593: 
>> 594:   thread->set_in_asgct(true);
> 
> Suggestion: Use a small RAII Mark object instead. That way you prevent future 
> errors if someone should add a return in the mids of AGCT.

Done

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Thomas Stuefe
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

src/hotspot/share/prims/forte.cpp line 594:

> 592:   }
> 593: 
> 594:   thread->set_in_asgct(true);

Suggestion: Use a small RAII Mark object instead. That way you prevent future 
errors if someone should add a return in the mids of AGCT.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Thomas Stuefe
On Fri, 6 May 2022 14:20:30 GMT, Jaroslav Bachorik  
wrote:

>> Functionally this looks good now - thanks.
>> 
>> The only concern is the overhead added to `find_blob` to account for this 
>> very special case. Can you do some benchmarking of this?
>> 
>> Thanks.
>
> Hi @dholmes-ora , thanks for taking time to review this.
> 
> I am running `jmh-jdk-microbenchmarks` now but I am going to be away for a 
> week now so I will get back to you with the results only then.

Hi @jbachorik, thanks a lot for taking my suggestion. This looks way less 
intrusive now!

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread David Holmes
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

Good catch!

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Dean Long
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

David, your example has an extra || result->is_zombie(), but otherwise this 
sounds perfect.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread David Holmes
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

Looking more closely this even be put in a new final clause in the guarantee:


guarantee(result == NULL || 
  !result->is_zombie() || 
  result->is_locked_by_vm() || 
  VMError::is_error_reported() ||
  result->is_zombie() && current_thread_in_asgct()
  , "unsafe access to zombie method");


where `current_thread_in_asgct()` is a static helper. That addresses all 
overhead concerns.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread David Holmes
On Fri, 6 May 2022 20:17:53 GMT, Dean Long  wrote:

>> Jaroslav Bachorik has refreshed the contents of this pull request, and 
>> previous commits have been removed. The incremental views will show 
>> differences compared to the previous content of the PR. The pull request 
>> contains one new commit since the last revision:
>> 
>>   Move 'in_asgct' flag to JavaThread
>
> It seems like you should be able to mitigate the extra overhead by only doing 
> that extra work when we actually see a zombie method, which should be rare.

Yes good point @dean-long ! Check for the zombie first and only then do the 
expensive Thread::current_or_null_safe() etc.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Dean Long
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

It seems like you should be able to mitigate the extra overhead by only doing 
that extra work when we actually see a zombie method, which should be rare.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Jaroslav Bachorik
On Fri, 6 May 2022 10:08:09 GMT, David Holmes  wrote:

>> Jaroslav Bachorik has refreshed the contents of this pull request, and 
>> previous commits have been removed. The incremental views will show 
>> differences compared to the previous content of the PR. The pull request 
>> contains one new commit since the last revision:
>> 
>>   Move 'in_asgct' flag to JavaThread
>
> Functionally this looks good now - thanks.
> 
> The only concern is the overhead added to `find_blob` to account for this 
> very special case. Can you do some benchmarking of this?
> 
> Thanks.

Hi @dholmes-ora , thanks for taking time to review this.

I am running `jmh-jdk-microbenchmarks` now but I am going to be away for a week 
now so I will get back to you with the results only then.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread David Holmes
On Fri, 6 May 2022 09:40:37 GMT, Jaroslav Bachorik  
wrote:

>> A gist of the fix is to allow relaxed special handling of code blob lookup 
>> when done for ASGCT. 
>> 
>> Currently, a guarantee will fail when we happen to hit a zombie method which 
>> is still on stack. While this would indicate a serious error for the normal 
>> execution flow, in case of ASGCT being in progress when the executing thread 
>> can be expected at any possible method this is something which may happen 
>> and we really should not take the profiled JVM down due to it.
>> 
>> 
>> Unfortunately, I am not able to create a simple reproducer for the crash 
>> other that testing in our production where the crash is happening 
>> sporadically.
>> However, thanks to @parttimenerd and his [ASGCT stress 
>> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
>> reproduced quite reliably.
>> 
>> 
>> 
>> _Note: This is a followup PR for #8061_
>
> Jaroslav Bachorik has refreshed the contents of this pull request, and 
> previous commits have been removed. The incremental views will show 
> differences compared to the previous content of the PR. The pull request 
> contains one new commit since the last revision:
> 
>   Move 'in_asgct' flag to JavaThread

Functionally this looks good now - thanks.

The only concern is the overhead added to `find_blob` to account for this very 
special case. Can you do some benchmarking of this?

Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8549


Re: RFR: 8283849: AsyncGetCallTrace may crash JVM on guarantee [v6]

2022-05-06 Thread Jaroslav Bachorik
> A gist of the fix is to allow relaxed special handling of code blob lookup 
> when done for ASGCT. 
> 
> Currently, a guarantee will fail when we happen to hit a zombie method which 
> is still on stack. While this would indicate a serious error for the normal 
> execution flow, in case of ASGCT being in progress when the executing thread 
> can be expected at any possible method this is something which may happen and 
> we really should not take the profiled JVM down due to it.
> 
> 
> Unfortunately, I am not able to create a simple reproducer for the crash 
> other that testing in our production where the crash is happening 
> sporadically.
> However, thanks to @parttimenerd and his [ASGCT stress 
> test](https://github.com/parttimenerd/asgct2-tester.git) the problem can be 
> reproduced quite reliably.
> 
> 
> 
> _Note: This is a followup PR for #8061_

Jaroslav Bachorik has refreshed the contents of this pull request, and previous 
commits have been removed. The incremental views will show differences compared 
to the previous content of the PR. The pull request contains one new commit 
since the last revision:

  Move 'in_asgct' flag to JavaThread

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8549/files
  - new: https://git.openjdk.java.net/jdk/pull/8549/files/33ac63eb..949609c8

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=8549=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=8549=04-05

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8549.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8549/head:pull/8549

PR: https://git.openjdk.java.net/jdk/pull/8549