Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]

2021-11-04 Thread Aleksey Shipilev
On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev  wrote:

>> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for 
>> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed 
>> (useless?) to call to runtime for a single memory barrier. We can simplify 
>> the native `Unsafe` interface by falling back to `fullFence` when 
>> `{load|store}Fence` intrinsics are not available. This would be similar to 
>> what `Unsafe.{loadLoad|storeStore}Fences` do. 
>> 
>> This is the behavior of these intrinsics now, on x86_64, using benchmarks 
>> from JDK-8276054:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   Error  Units
>> 
>> # Default
>> Single.acquire avgt3   0.407 ± 0.060  ns/op
>> Single.fullavgt3   4.693 ± 0.005  ns/op
>> Single.loadLoadavgt3   0.415 ± 0.095  ns/op
>> Single.plain   avgt3   0.406 ± 0.002  ns/op
>> Single.release avgt3   0.408 ± 0.047  ns/op
>> Single.storeStore  avgt3   0.408 ± 0.043  ns/op
>> 
>> # -XX:DisableIntrinsic=_storeFence
>> Single.acquire avgt3   0.408 ± 0.016  ns/op
>> Single.fullavgt3   4.694 ± 0.002  ns/op
>> Single.loadLoadavgt3   0.406 ± 0.002  ns/op
>> Single.plain   avgt3   0.406 ± 0.001  ns/op
>> Single.release avgt3   4.694 ± 0.003  ns/op <--- upgraded to full
>> Single.storeStore  avgt3   4.690 ± 0.005  ns/op <--- upgraded to full
>> 
>> # -XX:DisableIntrinsic=_loadFence
>> Single.acquire avgt3   4.691 ± 0.001  ns/op <--- upgraded to full
>> Single.fullavgt3   4.693 ± 0.009  ns/op
>> Single.loadLoadavgt3   4.693 ± 0.013  ns/op <--- upgraded to full
>> Single.plain   avgt3   0.408 ± 0.072  ns/op
>> Single.release avgt3   0.415 ± 0.016  ns/op
>> Single.storeStore  avgt3   0.416 ± 0.041  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence
>> Single.acquire avgt3   0.406 ± 0.014  ns/op
>> Single.fullavgt3  15.836 ± 0.151  ns/op <--- calls runtime
>> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
>> Single.plain   avgt3   0.426 ± 0.361  ns/op
>> Single.release avgt3   0.407 ± 0.021  ns/op
>> Single.storeStore  avgt3   0.410 ± 0.061  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence
>> Single.acquire avgt3  15.822 ± 0.282  ns/op <--- upgraded, calls 
>> runtime
>> Single.fullavgt3  15.851 ± 0.127  ns/op <--- calls runtime
>> Single.loadLoadavgt3  15.829 ± 0.045  ns/op <--- upgraded, calls 
>> runtime
>> Single.plain   avgt3   0.406 ± 0.001  ns/op
>> Single.release avgt3   0.414 ± 0.156  ns/op
>> Single.storeStore  avgt3   0.422 ± 0.452  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_storeFence
>> Single.acquire avgt3   0.407 ± 0.016  ns/op
>> Single.fullavgt3  15.347 ± 6.783  ns/op <--- calls runtime
>> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
>> Single.plain   avgt3   0.406 ± 0.002  ns/op 
>> Single.release avgt3  15.828 ± 0.019  ns/op <--- upgraded, calls 
>> runtime
>> Single.storeStore  avgt3  15.834 ± 0.045  ns/op <--- upgraded, calls 
>> runtime
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
>> Single.acquire avgt3  15.838 ± 0.030  ns/op <--- upgraded, calls 
>> runtime
>> Single.fullavgt3  15.854 ± 0.277  ns/op <--- calls runtime
>> Single.loadLoadavgt3  15.826 ± 0.160  ns/op <--- upgraded, calls 
>> runtime
>> Single.plain   avgt3   0.406 ± 0.003  ns/op
>> Single.release avgt3  15.838 ± 0.019  ns/op <--- upgraded, calls 
>> runtime
>> Single.storeStore  avgt3  15.844 ± 0.104  ns/op <--- upgraded, calls 
>> runtime
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Restore RN for fullFence

Thanks!

-

PR: https://git.openjdk.java.net/jdk/pull/6149


Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]

2021-11-03 Thread David Holmes
On Mon, 1 Nov 2021 07:36:53 GMT, Aleksey Shipilev  wrote:

>> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for 
>> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed 
>> (useless?) to call to runtime for a single memory barrier. We can simplify 
>> the native `Unsafe` interface by falling back to `fullFence` when 
>> `{load|store}Fence` intrinsics are not available. This would be similar to 
>> what `Unsafe.{loadLoad|storeStore}Fences` do. 
>> 
>> This is the behavior of these intrinsics now, on x86_64, using benchmarks 
>> from JDK-8276054:
>> 
>> 
>> Benchmark  Mode  Cnt  Score   Error  Units
>> 
>> # Default
>> Single.acquire avgt3   0.407 ± 0.060  ns/op
>> Single.fullavgt3   4.693 ± 0.005  ns/op
>> Single.loadLoadavgt3   0.415 ± 0.095  ns/op
>> Single.plain   avgt3   0.406 ± 0.002  ns/op
>> Single.release avgt3   0.408 ± 0.047  ns/op
>> Single.storeStore  avgt3   0.408 ± 0.043  ns/op
>> 
>> # -XX:DisableIntrinsic=_storeFence
>> Single.acquire avgt3   0.408 ± 0.016  ns/op
>> Single.fullavgt3   4.694 ± 0.002  ns/op
>> Single.loadLoadavgt3   0.406 ± 0.002  ns/op
>> Single.plain   avgt3   0.406 ± 0.001  ns/op
>> Single.release avgt3   4.694 ± 0.003  ns/op <--- upgraded to full
>> Single.storeStore  avgt3   4.690 ± 0.005  ns/op <--- upgraded to full
>> 
>> # -XX:DisableIntrinsic=_loadFence
>> Single.acquire avgt3   4.691 ± 0.001  ns/op <--- upgraded to full
>> Single.fullavgt3   4.693 ± 0.009  ns/op
>> Single.loadLoadavgt3   4.693 ± 0.013  ns/op <--- upgraded to full
>> Single.plain   avgt3   0.408 ± 0.072  ns/op
>> Single.release avgt3   0.415 ± 0.016  ns/op
>> Single.storeStore  avgt3   0.416 ± 0.041  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence
>> Single.acquire avgt3   0.406 ± 0.014  ns/op
>> Single.fullavgt3  15.836 ± 0.151  ns/op <--- calls runtime
>> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
>> Single.plain   avgt3   0.426 ± 0.361  ns/op
>> Single.release avgt3   0.407 ± 0.021  ns/op
>> Single.storeStore  avgt3   0.410 ± 0.061  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence
>> Single.acquire avgt3  15.822 ± 0.282  ns/op <--- upgraded, calls 
>> runtime
>> Single.fullavgt3  15.851 ± 0.127  ns/op <--- calls runtime
>> Single.loadLoadavgt3  15.829 ± 0.045  ns/op <--- upgraded, calls 
>> runtime
>> Single.plain   avgt3   0.406 ± 0.001  ns/op
>> Single.release avgt3   0.414 ± 0.156  ns/op
>> Single.storeStore  avgt3   0.422 ± 0.452  ns/op
>> 
>> # -XX:DisableIntrinsic=_fullFence,_storeFence
>> Single.acquire avgt3   0.407 ± 0.016  ns/op
>> Single.fullavgt3  15.347 ± 6.783  ns/op <--- calls runtime
>> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
>> Single.plain   avgt3   0.406 ± 0.002  ns/op 
>> Single.release avgt3  15.828 ± 0.019  ns/op <--- upgraded, calls 
>> runtime
>> Single.storeStore  avgt3  15.834 ± 0.045  ns/op <--- upgraded, calls 
>> runtime
>> 
>> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
>> Single.acquire avgt3  15.838 ± 0.030  ns/op <--- upgraded, calls 
>> runtime
>> Single.fullavgt3  15.854 ± 0.277  ns/op <--- calls runtime
>> Single.loadLoadavgt3  15.826 ± 0.160  ns/op <--- upgraded, calls 
>> runtime
>> Single.plain   avgt3   0.406 ± 0.003  ns/op
>> Single.release avgt3  15.838 ± 0.019  ns/op <--- upgraded, calls 
>> runtime
>> Single.storeStore  avgt3  15.844 ± 0.104  ns/op <--- upgraded, calls 
>> runtime
>> 
>> 
>> Additional testing:
>>  - [x] Linux x86_64 fastdebug `tier1`
>
> Aleksey Shipilev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Restore RN for fullFence

Marked as reviewed by dholmes (Reviewer).

-

PR: https://git.openjdk.java.net/jdk/pull/6149


Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]

2021-11-01 Thread Aleksey Shipilev
On Mon, 1 Nov 2021 07:32:18 GMT, Aleksey Shipilev  wrote:

>> src/hotspot/share/classfile/vmIntrinsics.hpp line 526:
>> 
>>> 524:do_name( storeFence_name,   
>>>  "storeFence")\
>>> 525:do_alias(storeFence_signature,  
>>>  void_method_signature)   \
>>> 526:   do_intrinsic(_fullFence,jdk_internal_misc_Unsafe,
>>>  fullFence_name, fullFence_signature,   F_R)  \
>> 
>> Why did you drop the N from F_RN? AFAICS the fullFence method is still 
>> native.
>
> Good spot! That's indeed incorrect, fixed in new commit. I am surprised 
> `CheckIntrinsics` did not found this discrepancy.  I believe "native" flags 
> are not checked at all? For example, existing `_hashCode` intrinsic is also 
> `F_R`, while it covers the native `java.lang.Object::hashCode`. I try to beef 
> up those checks separately.

This `CheckIntrinsics` oddity is handled by #6187.

-

PR: https://git.openjdk.java.net/jdk/pull/6149


Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]

2021-11-01 Thread Aleksey Shipilev
On Mon, 1 Nov 2021 02:09:19 GMT, David Holmes  wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Restore RN for fullFence
>
> src/hotspot/share/classfile/vmIntrinsics.hpp line 526:
> 
>> 524:do_name( storeFence_name,
>> "storeFence")\
>> 525:do_alias(storeFence_signature,   
>> void_method_signature)   \
>> 526:   do_intrinsic(_fullFence,jdk_internal_misc_Unsafe, 
>> fullFence_name, fullFence_signature,   F_R)  \
> 
> Why did you drop the N from F_RN? AFAICS the fullFence method is still native.

Good spot! That's indeed incorrect, fixed in new commit. I am surprised 
`CheckIntrinsics` did not found this discrepancy.  I believe "native" flags are 
not checked at all? For example, existing `_hashCode` intrinsic is also `F_R`, 
while it covers the native `java.lang.Object::hashCode`. I try to beef up those 
checks separately.

-

PR: https://git.openjdk.java.net/jdk/pull/6149


Re: RFR: 8276096: Simplify Unsafe.{load|store}Fence fallbacks by delegating to fullFence [v2]

2021-11-01 Thread Aleksey Shipilev
> `Unsafe.{load|store}Fence` falls back to `unsafe.cpp` for 
> `OrderAccess::{acquire|release}Fence()`. It seems too heavy-handed (useless?) 
> to call to runtime for a single memory barrier. We can simplify the native 
> `Unsafe` interface by falling back to `fullFence` when `{load|store}Fence` 
> intrinsics are not available. This would be similar to what 
> `Unsafe.{loadLoad|storeStore}Fences` do. 
> 
> This is the behavior of these intrinsics now, on x86_64, using benchmarks 
> from JDK-8276054:
> 
> 
> Benchmark  Mode  Cnt  Score   Error  Units
> 
> # Default
> Single.acquire avgt3   0.407 ± 0.060  ns/op
> Single.fullavgt3   4.693 ± 0.005  ns/op
> Single.loadLoadavgt3   0.415 ± 0.095  ns/op
> Single.plain   avgt3   0.406 ± 0.002  ns/op
> Single.release avgt3   0.408 ± 0.047  ns/op
> Single.storeStore  avgt3   0.408 ± 0.043  ns/op
> 
> # -XX:DisableIntrinsic=_storeFence
> Single.acquire avgt3   0.408 ± 0.016  ns/op
> Single.fullavgt3   4.694 ± 0.002  ns/op
> Single.loadLoadavgt3   0.406 ± 0.002  ns/op
> Single.plain   avgt3   0.406 ± 0.001  ns/op
> Single.release avgt3   4.694 ± 0.003  ns/op <--- upgraded to full
> Single.storeStore  avgt3   4.690 ± 0.005  ns/op <--- upgraded to full
> 
> # -XX:DisableIntrinsic=_loadFence
> Single.acquire avgt3   4.691 ± 0.001  ns/op <--- upgraded to full
> Single.fullavgt3   4.693 ± 0.009  ns/op
> Single.loadLoadavgt3   4.693 ± 0.013  ns/op <--- upgraded to full
> Single.plain   avgt3   0.408 ± 0.072  ns/op
> Single.release avgt3   0.415 ± 0.016  ns/op
> Single.storeStore  avgt3   0.416 ± 0.041  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence
> Single.acquire avgt3   0.406 ± 0.014  ns/op
> Single.fullavgt3  15.836 ± 0.151  ns/op <--- calls runtime
> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
> Single.plain   avgt3   0.426 ± 0.361  ns/op
> Single.release avgt3   0.407 ± 0.021  ns/op
> Single.storeStore  avgt3   0.410 ± 0.061  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence
> Single.acquire avgt3  15.822 ± 0.282  ns/op <--- upgraded, calls 
> runtime
> Single.fullavgt3  15.851 ± 0.127  ns/op <--- calls runtime
> Single.loadLoadavgt3  15.829 ± 0.045  ns/op <--- upgraded, calls 
> runtime
> Single.plain   avgt3   0.406 ± 0.001  ns/op
> Single.release avgt3   0.414 ± 0.156  ns/op
> Single.storeStore  avgt3   0.422 ± 0.452  ns/op
> 
> # -XX:DisableIntrinsic=_fullFence,_storeFence
> Single.acquire avgt3   0.407 ± 0.016  ns/op
> Single.fullavgt3  15.347 ± 6.783  ns/op <--- calls runtime
> Single.loadLoadavgt3   0.406 ± 0.001  ns/op
> Single.plain   avgt3   0.406 ± 0.002  ns/op 
> Single.release avgt3  15.828 ± 0.019  ns/op <--- upgraded, calls 
> runtime
> Single.storeStore  avgt3  15.834 ± 0.045  ns/op <--- upgraded, calls 
> runtime
> 
> # -XX:DisableIntrinsic=_fullFence,_loadFence,_storeFence
> Single.acquire avgt3  15.838 ± 0.030  ns/op <--- upgraded, calls 
> runtime
> Single.fullavgt3  15.854 ± 0.277  ns/op <--- calls runtime
> Single.loadLoadavgt3  15.826 ± 0.160  ns/op <--- upgraded, calls 
> runtime
> Single.plain   avgt3   0.406 ± 0.003  ns/op
> Single.release avgt3  15.838 ± 0.019  ns/op <--- upgraded, calls 
> runtime
> Single.storeStore  avgt3  15.844 ± 0.104  ns/op <--- upgraded, calls 
> runtime
> 
> 
> Additional testing:
>  - [x] Linux x86_64 fastdebug `tier1`

Aleksey Shipilev has updated the pull request incrementally with one additional 
commit since the last revision:

  Restore RN for fullFence

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6149/files
  - new: https://git.openjdk.java.net/jdk/pull/6149/files/e2c623be..a0fd03ee

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6149=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6149=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6149.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6149/head:pull/6149

PR: https://git.openjdk.java.net/jdk/pull/6149