Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]
On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev wrote: >> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for >> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed >> (useless?) to call to runtime for a single memory barrier. We can simplify >> the native `Unsafe` interface by falling back to `fullFence` when >> `{load|store}Fence` intrinsics are not available. This would be similar to >> what `Unsafe.{loadLoad|storeStore}Fences` do. >> >> This is the behavior of these intrinsics now, on x86_64, using benchmarks >> from JDK-8276054: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Default >> Single.acquire avgt3 0.407 ± 0.060 ns/op >> Single.fullavgt3 4.693 ± 0.005 ns/op >> Single.loadLoadavgt3 0.415 ± 0.095 ns/op >> Single.plain avgt3 0.406 ± 0.002 ns/op >> Single.release avgt3 0.408 ± 0.047 ns/op >> Single.storeStore avgt3 0.408 ± 0.043 ns/op >> >> # -XX:DisableIntrinsic=_storeFence >> Single.acquire avgt3 0.408 ± 0.016 ns/op >> Single.fullavgt3 4.694 ± 0.002 ns/op >> Single.loadLoadavgt3 0.406 ± 0.002 ns/op >> Single.plain avgt3 0.406 ± 0.001 ns/op >> Single.release avgt3 4.694 ± 0.003 ns/op <--- upgraded to full >> Single.storeStore avgt3 4.690 ± 0.005 ns/op <--- upgraded to full >> >> # -XX:DisableIntrinsic=_loadFence >> Single.acquire avgt3 4.691 ± 0.001 ns/op <--- upgraded to full >> Single.fullavgt3 4.693 ± 0.009 ns/op >> Single.loadLoadavgt3 4.693 ± 0.013 ns/op <--- upgraded to full >> Single.plain avgt3 0.408 ± 0.072 ns/op >> Single.release avgt3 0.415 ± 0.016 ns/op >> Single.storeStore avgt3 0.416 ± 0.041 ns/op >> >> # -XX:DisableIntrinsic=_fullFence >> Single.acquire avgt3 0.406 ± 0.014 ns/op >> Single.fullavgt3 15.836 ± 0.151 ns/op <--- calls runtime >> Single.loadLoadavgt3 0.406 ± 0.001 ns/op >> Single.plain avgt3 0.426 ± 0.361 ns/op >> Single.release avgt3 0.407 ± 0.021 ns/op >> Single.storeStore avgt3 0.410 ± 0.061 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence >> Single.acquire avgt3 15.822 ± 0.282 ns/op <--- upgraded, calls >> runtime >> Single.fullavgt3 15.851 ± 0.127 ns/op <--- calls runtime >> Single.loadLoadavgt3 15.829 ± 0.045 ns/op <--- upgraded, calls >> runtime >> Single.plain avgt3 0.406 ± 0.001 ns/op >> Single.release avgt3 0.414 ± 0.156 ns/op >> Single.storeStore avgt3 0.422 ± 0.452 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_storeFence >> Single.acquire avgt3 0.407 ± 0.016 ns/op >> Single.fullavgt3 15.347 ± 6.783 ns/op <--- calls runtime >> Single.loadLoadavgt3 0.406 ± 0.001 ns/op >> Single.plain avgt3 0.406 ± 0.002 ns/op >> Single.release avgt3 15.828 ± 0.019 ns/op <--- upgraded, calls >> runtime >> Single.storeStore avgt3 15.834 ± 0.045 ns/op <--- upgraded, calls >> runtime >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence >> Single.acquire avgt3 15.838 ± 0.030 ns/op <--- upgraded, calls >> runtime >> Single.fullavgt3 15.854 ± 0.277 ns/op <--- calls runtime >> Single.loadLoadavgt3 15.826 ± 0.160 ns/op <--- upgraded, calls >> runtime >> Single.plain avgt3 0.406 ± 0.003 ns/op >> Single.release avgt3 15.838 ± 0.019 ns/op <--- upgraded, calls >> runtime >> Single.storeStore avgt3 15.844 ± 0.104 ns/op <--- upgraded, calls >> runtime >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Restore RN for fullFence Thanks! - PR: https://git.openjdk.java.net/jdk/pull/6149
Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]
On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev wrote: >> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for >> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed >> (useless?) to call to runtime for a single memory barrier. We can simplify >> the native `Unsafe` interface by falling back to `fullFence` when >> `{load|store}Fence` intrinsics are not available. This would be similar to >> what `Unsafe.{loadLoad|storeStore}Fences` do. >> >> This is the behavior of these intrinsics now, on x86_64, using benchmarks >> from JDK-8276054: >> >> >> Benchmark Mode Cnt Score Error Units >> >> # Default >> Single.acquire avgt3 0.407 ± 0.060 ns/op >> Single.fullavgt3 4.693 ± 0.005 ns/op >> Single.loadLoadavgt3 0.415 ± 0.095 ns/op >> Single.plain avgt3 0.406 ± 0.002 ns/op >> Single.release avgt3 0.408 ± 0.047 ns/op >> Single.storeStore avgt3 0.408 ± 0.043 ns/op >> >> # -XX:DisableIntrinsic=_storeFence >> Single.acquire avgt3 0.408 ± 0.016 ns/op >> Single.fullavgt3 4.694 ± 0.002 ns/op >> Single.loadLoadavgt3 0.406 ± 0.002 ns/op >> Single.plain avgt3 0.406 ± 0.001 ns/op >> Single.release avgt3 4.694 ± 0.003 ns/op <--- upgraded to full >> Single.storeStore avgt3 4.690 ± 0.005 ns/op <--- upgraded to full >> >> # -XX:DisableIntrinsic=_loadFence >> Single.acquire avgt3 4.691 ± 0.001 ns/op <--- upgraded to full >> Single.fullavgt3 4.693 ± 0.009 ns/op >> Single.loadLoadavgt3 4.693 ± 0.013 ns/op <--- upgraded to full >> Single.plain avgt3 0.408 ± 0.072 ns/op >> Single.release avgt3 0.415 ± 0.016 ns/op >> Single.storeStore avgt3 0.416 ± 0.041 ns/op >> >> # -XX:DisableIntrinsic=_fullFence >> Single.acquire avgt3 0.406 ± 0.014 ns/op >> Single.fullavgt3 15.836 ± 0.151 ns/op <--- calls runtime >> Single.loadLoadavgt3 0.406 ± 0.001 ns/op >> Single.plain avgt3 0.426 ± 0.361 ns/op >> Single.release avgt3 0.407 ± 0.021 ns/op >> Single.storeStore avgt3 0.410 ± 0.061 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence >> Single.acquire avgt3 15.822 ± 0.282 ns/op <--- upgraded, calls >> runtime >> Single.fullavgt3 15.851 ± 0.127 ns/op <--- calls runtime >> Single.loadLoadavgt3 15.829 ± 0.045 ns/op <--- upgraded, calls >> runtime >> Single.plain avgt3 0.406 ± 0.001 ns/op >> Single.release avgt3 0.414 ± 0.156 ns/op >> Single.storeStore avgt3 0.422 ± 0.452 ns/op >> >> # -XX:DisableIntrinsic=_fullFence,_storeFence >> Single.acquire avgt3 0.407 ± 0.016 ns/op >> Single.fullavgt3 15.347 ± 6.783 ns/op <--- calls runtime >> Single.loadLoadavgt3 0.406 ± 0.001 ns/op >> Single.plain avgt3 0.406 ± 0.002 ns/op >> Single.release avgt3 15.828 ± 0.019 ns/op <--- upgraded, calls >> runtime >> Single.storeStore avgt3 15.834 ± 0.045 ns/op <--- upgraded, calls >> runtime >> >> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence >> Single.acquire avgt3 15.838 ± 0.030 ns/op <--- upgraded, calls >> runtime >> Single.fullavgt3 15.854 ± 0.277 ns/op <--- calls runtime >> Single.loadLoadavgt3 15.826 ± 0.160 ns/op <--- upgraded, calls >> runtime >> Single.plain avgt3 0.406 ± 0.003 ns/op >> Single.release avgt3 15.838 ± 0.019 ns/op <--- upgraded, calls >> runtime >> Single.storeStore avgt3 15.844 ± 0.104 ns/op <--- upgraded, calls >> runtime >> >> >> Additional testing: >> - [x] Linux x86_64 fastdebug `tier1` > > Aleksey Shipilev has updated the pull request incrementally with one > additional commit since the last revision: > > Restore RN for fullFence Marked as reviewed by dholmes (Reviewer). - PR: https://git.openjdk.java.net/jdk/pull/6149
Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]
On Mon, 1 Nov 2021 07:32:18 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/classfile/vmIntrinsics.hpp line 526: >> >>> 524:do_name( storeFence_name, >>> "storeFence")\ >>> 525:do_alias(storeFence_signature, >>> void_method_signature) \ >>> 526: do_intrinsic(_fullFence,jdk_internal_misc_Unsafe, >>> fullFence_name, fullFence_signature, F_R) \ >> >> Why did you drop the N from F_RN? AFAICS the fullFence method is still >> native. > > Good spot! That's indeed incorrect, fixed in new commit. I am surprised > `CheckIntrinsics` did not found this discrepancy. I believe "native" flags > are not checked at all? For example, existing `_hashCode` intrinsic is also > `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef > up those checks separately. This `CheckIntrinsics` oddity is handled by #6187. - PR: https://git.openjdk.java.net/jdk/pull/6149
Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]
On Mon, 1 Nov 2021 02:09:19 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Restore RN for fullFence > > src/hotspot/share/classfile/vmIntrinsics.hpp line 526: > >> 524:do_name( storeFence_name, >> "storeFence")\ >> 525:do_alias(storeFence_signature, >> void_method_signature) \ >> 526: do_intrinsic(_fullFence,jdk_internal_misc_Unsafe, >> fullFence_name, fullFence_signature, F_R) \ > > Why did you drop the N from F_RN? AFAICS the fullFence method is still native. Good spot! That's indeed incorrect, fixed in new commit. I am surprised `CheckIntrinsics` did not found this discrepancy. I believe "native" flags are not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef up those checks separately. - PR: https://git.openjdk.java.net/jdk/pull/6149
Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]
> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for > `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) > to call to runtime for a single memory barrier. We can simplify the native > `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` > intrinsics are not available. This would be similar to what > `Unsafe.{loadLoad|storeStore}Fences` do. > > This is the behavior of these intrinsics now, on x86_64, using benchmarks > from JDK-8276054: > > > Benchmark Mode Cnt Score Error Units > > # Default > Single.acquire avgt3 0.407 ± 0.060 ns/op > Single.fullavgt3 4.693 ± 0.005 ns/op > Single.loadLoadavgt3 0.415 ± 0.095 ns/op > Single.plain avgt3 0.406 ± 0.002 ns/op > Single.release avgt3 0.408 ± 0.047 ns/op > Single.storeStore avgt3 0.408 ± 0.043 ns/op > > # -XX:DisableIntrinsic=_storeFence > Single.acquire avgt3 0.408 ± 0.016 ns/op > Single.fullavgt3 4.694 ± 0.002 ns/op > Single.loadLoadavgt3 0.406 ± 0.002 ns/op > Single.plain avgt3 0.406 ± 0.001 ns/op > Single.release avgt3 4.694 ± 0.003 ns/op <--- upgraded to full > Single.storeStore avgt3 4.690 ± 0.005 ns/op <--- upgraded to full > > # -XX:DisableIntrinsic=_loadFence > Single.acquire avgt3 4.691 ± 0.001 ns/op <--- upgraded to full > Single.fullavgt3 4.693 ± 0.009 ns/op > Single.loadLoadavgt3 4.693 ± 0.013 ns/op <--- upgraded to full > Single.plain avgt3 0.408 ± 0.072 ns/op > Single.release avgt3 0.415 ± 0.016 ns/op > Single.storeStore avgt3 0.416 ± 0.041 ns/op > > # -XX:DisableIntrinsic=_fullFence > Single.acquire avgt3 0.406 ± 0.014 ns/op > Single.fullavgt3 15.836 ± 0.151 ns/op <--- calls runtime > Single.loadLoadavgt3 0.406 ± 0.001 ns/op > Single.plain avgt3 0.426 ± 0.361 ns/op > Single.release avgt3 0.407 ± 0.021 ns/op > Single.storeStore avgt3 0.410 ± 0.061 ns/op > > # -XX:DisableIntrinsic=_fullFence,_loadFence > Single.acquire avgt3 15.822 ± 0.282 ns/op <--- upgraded, calls > runtime > Single.fullavgt3 15.851 ± 0.127 ns/op <--- calls runtime > Single.loadLoadavgt3 15.829 ± 0.045 ns/op <--- upgraded, calls > runtime > Single.plain avgt3 0.406 ± 0.001 ns/op > Single.release avgt3 0.414 ± 0.156 ns/op > Single.storeStore avgt3 0.422 ± 0.452 ns/op > > # -XX:DisableIntrinsic=_fullFence,_storeFence > Single.acquire avgt3 0.407 ± 0.016 ns/op > Single.fullavgt3 15.347 ± 6.783 ns/op <--- calls runtime > Single.loadLoadavgt3 0.406 ± 0.001 ns/op > Single.plain avgt3 0.406 ± 0.002 ns/op > Single.release avgt3 15.828 ± 0.019 ns/op <--- upgraded, calls > runtime > Single.storeStore avgt3 15.834 ± 0.045 ns/op <--- upgraded, calls > runtime > > # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence > Single.acquire avgt3 15.838 ± 0.030 ns/op <--- upgraded, calls > runtime > Single.fullavgt3 15.854 ± 0.277 ns/op <--- calls runtime > Single.loadLoadavgt3 15.826 ± 0.160 ns/op <--- upgraded, calls > runtime > Single.plain avgt3 0.406 ± 0.003 ns/op > Single.release avgt3 15.838 ± 0.019 ns/op <--- upgraded, calls > runtime > Single.storeStore avgt3 15.844 ± 0.104 ns/op <--- upgraded, calls > runtime > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Restore RN for fullFence - Changes: - all: https://git.openjdk.java.net/jdk/pull/6149/files - new: https://git.openjdk.java.net/jdk/pull/6149/files/e2c623be..a0fd03ee Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6149=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6149=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6149.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6149/head:pull/6149 PR: https://git.openjdk.java.net/jdk/pull/6149