Re: RFR: 8329597: C2: Intrinsify Reference.clear [v10]
On Tue, 15 Oct 2024 11:07:32 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Export in JVMCI too Thank you for testing! Here goes. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2416954569
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v10]
On Tue, 15 Oct 2024 11:07:32 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Export in JVMCI too Marked as reviewed by rcastanedalo (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2371670897
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Tue, 15 Oct 2024 09:24:30 GMT, Roberto Castañeda Lozano wrote: > I will run some internal CI testing and report back in one or two days. The test results look good. I tested the changes (up to commit 9f7ad7ab) on top of jdk-24+19 running tier1-tier5 on all Oracle-supported platforms. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2416038769
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Tue, 15 Oct 2024 10:02:15 GMT, Yudi Zheng wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to >> a merge or a rebase. The pull request now contains 18 commits: >> >> - Simplify: just do keep alive checks >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - More precise bit-unmasks >> - Reconcile with late barrier expansion in G1 >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - Review comments >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - Also dispatch to slow-path on other arches >> - Fix other arches >> - Tighten up comments in Reference javadoc >> - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab > > src/hotspot/share/gc/z/zBarrierSetRuntime.hpp line 43: > >> 41: static void store_barrier_on_oop_field_with_healing(oop* p); >> 42: static void store_barrier_on_oop_field_without_healing(oop* p); >> 43: static void >> no_keepalive_store_barrier_on_oop_field_without_healing(oop* p); > > Could you please export this to JVMCI? I.e., > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 5452cca96b8..46aeb996c56 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -847,6 +847,7 @@ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_store_good)) > \ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::no_keepalive_load_barrier_on_weak_oop_field_preloaded)) > \ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::no_keepalive_load_barrier_on_phantom_oop_field_preloaded)) > \ > + ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing) > \ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::store_barrier_on_native_oop_field_without_healing)) > \ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::store_barrier_on_oop_field_with_healing)) > \ >ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, > ZBarrierSetRuntime::store_barrier_on_oop_field_without_healing)) > \ > > Thanks! Done! - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1800937787
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v10]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Export in JVMCI too - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/9f7ad7ab..479781df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=08-09 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab src/hotspot/share/gc/z/zBarrierSetRuntime.hpp line 43: > 41: static void store_barrier_on_oop_field_with_healing(oop* p); > 42: static void store_barrier_on_oop_field_without_healing(oop* p); > 43: static void > no_keepalive_store_barrier_on_oop_field_without_healing(oop* p); Could you please export this to JVMCI? I.e., diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 5452cca96b8..46aeb996c56 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -847,6 +847,7 @@ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_store_good)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_weak_oop_field_preloaded))\ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_phantom_oop_field_preloaded)) \ + ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_native_oop_field_without_healing))\ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_with_healing)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_without_healing)) \ Thanks! - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1800848130
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Tue, 15 Oct 2024 09:18:05 GMT, Aleksey Shipilev wrote: > Thanks for review, folks. I am re-running testing locally here. Would > appreciate if you can give this patch a spin through your CIs as well. I will run some internal CI testing and report back in one or two days. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2413362811
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Thanks for review, folks. I am re-running testing locally here. Would appreciate if you can give this patch a spin through your CIs as well. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2413348085
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab C2 change (intrinsics code) is fine. - Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2367483219
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Thanks! I see Kim reviewed JDK parts, so we need another Reviewer for Hotspot parts. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2410579109
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Looks good. - Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2364968158
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v8]
On Mon, 7 Oct 2024 08:15:21 GMT, Erik Österlund wrote: >> Aleksey Shipilev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> More precise bit-unmasks > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 342: > >> 340: } >> 341: if ((on_weak || on_phantom) && no_keepalive) { >> 342: // Be extra paranoid around this path. Only accept null stores, > > I think there might be some orthogonal stuff that is unnecessarily mixed up > here. When no_keepalive is manually specified, then we shouldn't do the > pre-write barrier, regardless of reference strength. Similarly, when the new > value is null, we don't need to perform the post write barrier, regardless of > reference strength. Roberto added some code in refine_barrier_by_new_val_type > that already *should* take care of the latter part. It allows types to flow > around a bit, and then checks if the type of the new value is provably null, > and then removes the post write barrier. The existing logic for that should > be strictly more powerful than the new check you added, I think. > > Based on the above explanation, I think I'm proposing this block is replaced > with this simpler condition: > > if (no_keepalive) { > access.set_barrier_data(access.barrier_data() & ~G1C2BarrierPre); > } Right. We also do not need this complexity in Shenandoah barriers. This check was dragged here from the load barriers that _want_ to check if we are reading the `Reference.referent` and feed it to SATB _unless_ there is a no-keep-alive. For store barriers it is unnecessary, and we can just do keep-alive checks straight up. Should be done in new commit, testing now. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1793115683
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v9]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Simplify: just do keep alive checks - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - More precise bit-unmasks - Reconcile with late barrier expansion in G1 - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Review comments - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Also dispatch to slow-path on other arches - Fix other arches - Tighten up comments in Reference javadoc - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab - Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=08 Stats: 361 lines in 25 files changed: 340 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v8]
On Sun, 6 Oct 2024 14:43:09 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > More precise bit-unmasks One last thing... src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 342: > 340: } > 341: if ((on_weak || on_phantom) && no_keepalive) { > 342: // Be extra paranoid around this path. Only accept null stores, I think there might be some orthogonal stuff that is unnecessarily mixed up here. When no_keepalive is manually specified, then we shouldn't do the pre-write barrier, regardless of reference strength. Similarly, when the new value is null, we don't need to perform the post write barrier, regardless of reference strength. Roberto added some code in refine_barrier_by_new_val_type that already *should* take care of the latter part. It allows types to flow around a bit, and then checks if the type of the new value is provably null, and then removes the post write barrier. The existing logic for that should be strictly more powerful than the new check you added, I think. Based on the above explanation, I think I'm proposing this block is replaced with this simpler condition: if (no_keepalive) { access.set_barrier_data(access.barrier_data() & ~G1C2BarrierPre); } - Changes requested by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2351232319 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1789742277
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v8]
On Sun, 6 Oct 2024 14:43:09 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > More precise bit-unmasks Tests pass with the new change. I eyeballed G1 perfasm output on new benchmark, and there are no barriers in sight as well. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2396092920
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Thu, 3 Oct 2024 16:54:40 GMT, Aleksey Shipilev wrote: >> test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: >> >>> 1: // >>> 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. >> >> Drive-by comment: The `// *` format looks weird. > > Actually, this constellation of single-line comments should be replaced with > a multi-line comment block. The fix is now in my (unpublished) patch queue, > thanks. Yes, that's what I meant. Thanks. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1789539477
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v8]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More precise bit-unmasks - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/c3338302..cf808b9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v7]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Reconcile with late barrier expansion in G1 - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Review comments - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Also dispatch to slow-path on other arches - Fix other arches - Tighten up comments in Reference javadoc - Attempt at implementing ZGC AArch64 parts - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Amend the test case for guaranteing it works under different compilation regimes - ... and 5 more: https://git.openjdk.org/jdk/compare/ebb4759c...c3338302 - Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=06 Stats: 372 lines in 25 files changed: 351 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Wed, 2 Oct 2024 10:47:09 GMT, Tobias Hartmann wrote: >> Aleksey Shipilev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Also dispatch to slow-path on other arches > > test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: > >> 1: // >> 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. > > Drive-by comment: The `// *` format looks weird. Actually, this constellation of single-line comments should be replaced with a multi-line comment block. The fix is now in my (unpublished) patch queue, thanks. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786548233
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Thu, 3 Oct 2024 12:14:08 GMT, Erik Österlund wrote: >> Aleksey Shipilev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Also dispatch to slow-path on other arches > > src/hotspot/share/opto/library_call.cpp line 7002: > >> 7000: // Add memory barrier to prevent commoning the accesses in this code, >> 7001: // since GC can change the value of referent at any time. >> 7002: insert_mem_bar(Op_MemBarCPUOrder); > > I think this CPU memory barrier and comment above are confusing and obviously > just taken from the referent loading intrinsics. The commoning it is talking > about is to short circuit a second load with the result of a first load of > the referent field, since the compiler "knows" the first load would give the > "same" answer unless "something happened" (GC clearing it). > In this case the mutator just cleared it, so what the compiler thinks is that > null is stored in that field. And that's completely accurate, and the GC is > not going to transition the field from null to some random other object. > Let's remove this CPU memory barrier and the misleading comments. Right. I removed it in my local patch queue. Now I have to reconcile this PR with late barrier expansion in G1, I'll push the update once that is done. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786549120
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Also dispatch to slow-path on other arches Changes requested by eosterlund (Reviewer). src/hotspot/share/opto/library_call.cpp line 7002: > 7000: // Add memory barrier to prevent commoning the accesses in this code, > 7001: // since GC can change the value of referent at any time. > 7002: insert_mem_bar(Op_MemBarCPUOrder); I think this CPU memory barrier and comment above are confusing and obviously just taken from the referent loading intrinsics. The commoning it is talking about is to short circuit a second load with the result of a first load of the referent field, since the compiler "knows" the first load would give the "same" answer unless "something happened" (GC clearing it). In this case the mutator just cleared it, so what the compiler thinks is that null is stored in that field. And that's completely accurate, and the GC is not going to transition the field from null to some random other object. Let's remove this CPU memory barrier and the misleading comments. - PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2345438249 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786116003
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Mon, 30 Sep 2024 16:32:48 GMT, Aleksey Shipilev wrote: > > I think we need a new > > ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* > > p) and to make that the selected slow path function when > > ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call > > ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the > > trick. > > Thanks! See new commits: is that the shape you were thinking of? Yeah that's perfect. Thanks! - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2391252177
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Also dispatch to slow-path on other arches test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: > 1: // > 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. Drive-by comment: The `// *` format looks weird. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1784292673
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Also dispatch to slow-path on other arches Removing my "Request changes" as request has been satisfied. I've only really looked at the changes in java.base, which look fine. I've skimmed some of the compiler code, but don't feel qualified to properly review it. So don't count or wait for me as a reviewer. - PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2338962626
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Mon, 30 Sep 2024 16:45:12 GMT, Aleksey Shipilev wrote: >> src/java.base/share/classes/java/lang/ref/Reference.java line 420: >> >>> 418: /* Implementation of clear(), also used by enqueue(). A simple >>> 419: * assignment of the referent field won't do for some garbage >>> 420: * collectors. >> >> Description of clear0 is rendered stale by this change. The first sentence >> is no longer true, since it's now >> clearImpl that has that role. The second sentence probably ought to also be >> moved into the description of >> clearImpl. > > Thanks! I tightened up comments a bit, take another look? Yes, that's better. Thanks. - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1782006243
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v6]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also dispatch to slow-path on other arches - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/8ba681a4..4fe4a911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04-05 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 27 Sep 2024 23:51:13 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Amend the test case for guaranteing it works under different compilation >> regimes > > src/java.base/share/classes/java/lang/ref/Reference.java line 420: > >> 418: /* Implementation of clear(), also used by enqueue(). A simple >> 419: * assignment of the referent field won't do for some garbage >> 420: * collectors. > > Description of clear0 is rendered stale by this change. The first sentence > is no longer true, since it's now > clearImpl that has that role. The second sentence probably ought to also be > moved into the description of > clearImpl. Thanks! I tightened up comments a bit, take another look? - PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1781452602
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v5]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Fix other arches - Tighten up comments in Reference javadoc - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/cba0a8e9..8ba681a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03-04 Stats: 16 lines in 4 files changed: 7 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Mon, 30 Sep 2024 15:08:53 GMT, Erik Österlund wrote: > I think we need a new > ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* > p) and to make that the selected slow path function when ZBarrierNoKeepalive > is set on a StorePNode. Its implementation would call > ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the > trick. Thanks! See new commits: is that the shape you were thinking of? Once we get AArch64 parts right, I'll copy-paste that to other arches. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383669579
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v4]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Attempt at implementing ZGC AArch64 parts - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Amend the test case for guaranteing it works under different compilation regimes - More precise barriers - Tests work - More touchups - Fixing the conditions, fixing the tests - Crude prototype, still failing the tests - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/437f2329..cba0a8e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02-03 Stats: 258786 lines in 3084 files changed: 211178 ins; 30411 del; 17197 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 27 Sep 2024 18:57:51 GMT, Aleksey Shipilev wrote: > > Is ZGC affected by this? I see only G1 and Shenandoah changes. > > Good question. > > ZGC expands the GC barriers late. This is why the IR test configuration that > tests ZGC shows the same result as with other collectors: no additional fluff > in IR. I would not expect we need anything else in late expansion for ZGC for > Reference.clear, but maybe I am tired and cannot see it. Can you confirm this > is fine, @fisk? ZGC needs some changes. Without doing anything, we propagate the AS_NO_KEEPALIVE decorator to the corresponding ZBarrierNoKeepalive bit being set in the barrier data of the StorePNode. However, we don't really do anything special with that information and we will in practice end up keeping the referent alive when clearing it with generational ZGC. The point of introducing the native implementation in the first place, was to make sure our GCs don't keep the referent alive when clearing it, as the user intention is clearly to not keep it alive. I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. Please let me know if you need further assistance. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383479242
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation > regimes Changes requested by kbarrett (Reviewer). src/java.base/share/classes/java/lang/ref/Reference.java line 420: > 418: /* Implementation of clear(), also used by enqueue(). A simple > 419: * assignment of the referent field won't do for some garbage > 420: * collectors. Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now clearImpl that has that role. The second sentence probably ought to also be moved into the description of clearImpl. - PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2334816850 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1779311136
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation > regimes There is coming JEP for later G1 barriers expansion similar to ZGC. Will you still need this intrinsic after it? I assume Shenandoah will follow G1 later. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379910959
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 27 Sep 2024 17:44:38 GMT, Vladimir Kozlov wrote: > Is ZGC affected by this? I see only G1 and Shenandoah changes. Good question. ZGC expands the GC barriers late. This is why the IR test configuration that tests ZGC shows the same result as with other collectors: no additional fluff in IR. I would not expect we need anything else in late expansion for ZGC for Reference.clear, but maybe I am tired and cannot see it. Can you confirm this is fine, @fisk? - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379881102
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation > regimes Is ZGC affected by this? I see only G1 and Shenandoah changes. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379772205
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation > regimes We keep seeing `Reference.clear` native call on hot paths in services in JDK 17+. I would like to get this PR moving again. Please take a look :) - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379346593
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native >> method for `Reference.clear`. The original patch skipped intrinsification of >> this method, because we thought `Reference.clear` is not on a performance >> sensitive path. However, it shows up prominently on simple benchmarks that >> touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with >> `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 >> Access API, this work models this after existing code for `refersTo0` >> intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation > regimes Not now, bot. Still waiting for reviews. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2295989638
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v3]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Amend the test case for guaranteing it works under different compilation regimes - Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/79ece901..437f2329 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01-02 Stats: 36 lines in 1 file changed: 18 ins; 7 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Split out the `refersTo` test to https://github.com/openjdk/jdk/pull/20215. Yeah, so this version seems to work well on tests. I am being extra paranoid about only accepting `null` stores, since `AS_NO_KEEPALIVE` means all other barriers like inter-generational post-barriers in G1 should still work. G1 barrier set delegates the stores to `CardTable/ModRefBarrierSet`, which: a) does not know which barriers can be bypassed by `AS_NO_KEEPALIVE`; b) calls back `G1BarrierSet` for prebarrier generation, but already loses the decorators. So the simplest way to deal with this is to handle this special case specially. I think this is insanely sane, given how sharp-edged `AS_NO_KEEPALIVE` is. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2233066721 PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2234005550
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev wrote: >>> > The reason we did not do this before is that this is not a strong >>> > reference store. Strong reference stores with a SATB collector will keep >>> > the referent alive, which is typically the exact opposite of what a user >>> > wants when they clear a Reference. >>> >>> You mean not doing this store just on the Java side? Yes, I agree, it would >>> be awkward. In intrinsic, we are storing with the same decorators that >>> `JVM_ReferenceClear` is using, which should be good with SATB collectors. >>> Perhaps I am misunderstanding the comment. >> >> The runtime use of the Access API knows how to resolve an unknown oop ref >> strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. >> However, we do not have support for that in the C2 backend. In fact, it does >> not understand non-strong oop stores at all. Because there hasn't really >> been a use case for it, other than clearing a Reference. That's the precise >> reason why we do not have a clear intrinsic; it would have to add that >> infrastructure. > >> The runtime use of the Access API knows how to resolve an unknown oop ref >> strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. >> However, we do not have support for that in the C2 backend. In fact, it does >> not understand non-strong oop stores at all. > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if > it cannot deliver. Apparently not, I see it just does pre-barriers when it is > not sure what strongness the store is. Hrmpf. OK, let me see what can be done > here. It might be just easier to further specialize `Reference.clear` in > subclasses and carry down the actual strongness, like we do with `refersTo0` > currently. This would still require C2 backend adjustments to handle > `AS_NO_KEEPALIVE` on stores, but at least we would not have to guess about > the strongness type in C2 intrinsic. > I should have read what I was replying to more carefully, rather than > focusing on what was further up in the thread. Looks like you (@shipilev) > already spotted the refersTo stuff. But the enqueue => clear0 could have > easily been missed, so perhaps not an entirely unneeded suggestion. Yeah, thanks. The `enqueue => clear0` was indeed easy to miss. Pushed the crude prototype that follows `refersTo` example and drills some new `AS_NO_KEEPALIVE` holes in C2 Access API to cover this intrinsic case. Super untested. IR tests are still failing, I'll take more in-depth look there. (Perhaps it would not be possible to clearly match the absence of pre-barrier in IR tests, we'll see.) - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231613218
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Mon, 15 Jul 2024 16:09:39 GMT, Kim Barrett wrote: > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if > > it cannot deliver. Apparently not, [...] > > Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both > Reference and PhantomReference. I think you should be able to model on those > and the intrinsic implementation for refersTo to get what you want. > > One additional complication is that Reference.enqueue intentionally calls > clear0. If implementing clear similarly to refersTo, then enqueue should be > changed to call clearImpl. I should have read what I was replying to more carefully, rather than focusing on what was further up in the thread. Looks like you (@shipilev) already spotted the refersTo stuff. But the enqueue => clear0 could have easily been missed, so perhaps not an entirely unneeded suggestion. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2231464762
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Fri, 12 Jul 2024 13:19:31 GMT, Aleksey Shipilev wrote: > > The runtime use of the Access API knows how to resolve an unknown oop ref > > strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. > > However, we do not have support for that in the C2 backend. In fact, it > > does not understand non-strong oop stores at all. > > Aw, nice usability landmine. I thought C2 barrier set would assert on me if > it cannot deliver. Apparently not, [...] Reference.refersTo has similar issues. See refersToImpl and refersTo0 in both Reference and PhantomReference. I think you should be able to model on those and the intrinsic implementation for refersTo to get what you want. One additional complication is that Reference.enqueue intentionally calls clear0. If implementing clear similarly to refersTo, then enqueue should be changed to call clearImpl. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2228872926
Re: RFR: 8329597: C2: Intrinsify Reference.clear [v2]
> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 > Access API, this work models this after existing code for `refersTo0` > intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - More precise barriers - Tests work - More touchups - Fixing the conditions, fixing the tests - Crude prototype, still failing the tests - Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=01 Stats: 329 lines in 13 files changed: 328 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Fri, 12 Jul 2024 11:57:56 GMT, Erik Österlund wrote: > The runtime use of the Access API knows how to resolve an unknown oop ref > strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. > However, we do not have support for that in the C2 backend. In fact, it does > not understand non-strong oop stores at all. Aw, nice usability landmine. I thought C2 barrier set would assert on me if it cannot deliver. Apparently not, I see it just does pre-barriers when it is not sure what strongness the store is. Hrmpf. OK, let me see what can be done here. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225577027
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Fri, 12 Jul 2024 10:22:42 GMT, Aleksey Shipilev wrote: > > The reason we did not do this before is that this is not a strong reference > > store. Strong reference stores with a SATB collector will keep the referent > > alive, which is typically the exact opposite of what a user wants when they > > clear a Reference. > > You mean not doing this store just on the Java side? Yes, I agree, it would > be awkward. In intrinsic, we are storing with the same decorators that > `JVM_ReferenceClear` is using, which should be good with SATB collectors. > Perhaps I am misunderstanding the comment. The runtime use of the Access API knows how to resolve an unknown oop ref strength using AccessBarrierSupport::resolve_unknown_oop_ref_strength. However, we do not have support for that in the C2 backend. In fact, it does not understand non-strong oop stores at all. Because there hasn't really been a use case for it, other than clearing a Reference. That's the precise reason why we do not have a clear intrinsic; it would have to add that infrastructure. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225430174
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Fri, 12 Jul 2024 10:16:13 GMT, Erik Österlund wrote: > The reason we did not do this before is that this is not a strong reference > store. Strong reference stores with a SATB collector will keep the referent > alive, which is typically the exact opposite of what a user wants when they > clear a Reference. You mean not doing this store just on the Java side? Yes, I agree, it would be awkward. In intrinsic, we are storing with the same decorators that `JVM_ReferenceClear` is using, which should be good with SATB collectors. Perhaps I am misunderstanding the comment. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225277261
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` The reason we did not do this before is that this is not a strong reference store. Strong reference stores with a SATB collector will keep the referent alive, which is typically the exact opposite of what a user wants when they clear a Reference. - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2225266939
Re: RFR: 8329597: C2: Intrinsify Reference.clear
On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native > method for `Reference.clear`. The original patch skipped intrinsification of > this method, because we thought `Reference.clear` is not on a performance > sensitive path. However, it shows up prominently on simple benchmarks that > touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with > `RRWL` benchmarks. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` On Mac AArch64, which suffers from both native call and WX transition: Benchmark Mode Cnt Score Error Units # Intrinsic OFF ReferenceClear.phantom avgt9 52,297 ? 0,294 ns/op ReferenceClear.phantom_new avgt9 57,075 ? 0,296 ns/op ReferenceClear.soft avgt9 52,567 ? 0,393 ns/op ReferenceClear.soft_new avgt9 57,640 ? 0,264 ns/op ReferenceClear.weak avgt9 53,018 ? 1,285 ns/op ReferenceClear.weak_new avgt9 57,227 ? 0,483 ns/op # Intrinsic ON (default) ReferenceClear.phantom avgt9 0,780 ? 0,017 ns/op ReferenceClear.soft avgt9 0,784 ? 0,022 ns/op ReferenceClear.weak avgt9 0,793 ? 0,033 ns/op ReferenceClear.phantom_new avgt9 3,018 ? 0,015 ns/op ReferenceClear.soft_new avgt9 3,268 ? 0,014 ns/op ReferenceClear.weak_new avgt9 3,004 ? 0,057 ns/op On x86_64 m7a.16xlarge, which only suffers from the native call: Benchmark Mode Cnt Score Error Units # Intrinsic OFF ReferenceClear.phantom avgt9 14.643 ± 0.049 ns/op ReferenceClear.soft avgt9 14.939 ± 0.438 ns/op ReferenceClear.weak avgt9 14.648 ± 0.081 ns/op ReferenceClear.phantom_new avgt9 19.859 ± 2.405 ns/op ReferenceClear.soft_new avgt9 20.208 ± 1.805 ns/op ReferenceClear.weak_new avgt9 20.385 ± 2.570 ns/op # Intrinsic ON (default) ReferenceClear.phantom avgt9 0.821 ± 0.010 ns/op ReferenceClear.soft avgt9 0.817 ± 0.007 ns/op ReferenceClear.weak avgt9 0.819 ± 0.010 ns/op ReferenceClear.phantom_new avgt9 4.195 ± 0.729 ns/op ReferenceClear.soft_new avgt9 4.315 ± 0.599 ns/op ReferenceClear.weak_new avgt9 3.986 ± 0.596 ns/op - PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2223248114
RFR: 8329597: C2: Intrinsify Reference.clear
[JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. Additional testing: - [x] Linux x86_64 server fastdebug, `all` - [ ] Linux AArch64 server fastdebug, `all` - Commit messages: - Move the membar at the end - Revert C1 parts - Work Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329597 Stats: 132 lines in 7 files changed: 132 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139