On Thu, 13 Jun 2024 07:36:34 GMT, Emanuel Peter <epe...@openjdk.org> wrote:

>> Thanks to @eme64 's patience and help, I found a way to use MergeStore 
>> without doing boundary checking.
>> 
>> It would be even better if Unsafe.putChar could be used for MergeStore in 
>> the future.
>> 
>> ## 1. JavaCode
>> 
>> class StringUTF16 {
>>     public static void putCharsAt(byte[] value, int i, char c1, char c2, 
>> char c3, char c4) {
>>         // Don't use the putChar method, Its instrinsic will cause C2 unable 
>> to combining values into larger stores.
>>         putChar1(value, i    , c1);
>>         putChar1(value, i + 1, c2);
>>         putChar1(value, i + 2, c3);
>>         putChar1(value, i + 3, c4);
>>     }
>> 
>>     public static void putCharsAt(byte[] value, int i, char c1, char c2, 
>> char c3, char c4, char c5) {
>>         // Don't use the putChar method, Its instrinsic will cause C2 unable 
>> to combining values into larger stores.
>>         putChar1(value, i    , c1);
>>         putChar1(value, i + 1, c2);
>>         putChar1(value, i + 2, c3);
>>         putChar1(value, i + 3, c4);
>>         putChar1(value, i + 4, c5);
>>     }
>> 
>>     static void putChar1(byte[] value, int i, char c) {
>>         int address = Unsafe.ARRAY_BYTE_BASE_OFFSET + (i << 1);
>>         Unsafe.getUnsafe().putByte(value, address    , (byte)(c >> 
>> HI_BYTE_SHIFT));
>>         Unsafe.getUnsafe().putByte(value, address + 1, (byte)(c >> 
>> LO_BYTE_SHIFT));
>>     }
>> }
>> 
>> 
>> ## 2. Benchmark Numbers
>> The performance numbers under MacBookPro M1 Max are as follows:
>> 
>> -Benchmark                                   Mode  Cnt  Score   Error  Units
>> -StringBuilders.toStringCharWithNull8Latin1  avgt   15  7.073 ? 0.010  ns/op
>> -StringBuilders.toStringCharWithNull8Utf16   avgt   15  9.298 ? 0.015  ns/op
>> -StringBuilders.toStringCharWithBool8Latin1  avgt   15  7.387 ? 0.021  ns/op
>> -StringBuilders.toStringCharWithBool8Utf16   avgt   15  9.615 ? 0.011  ns/op
>> 
>> +Benchmark                                   Mode  Cnt  Score   Error  Units
>> +StringBuilders.toStringCharWithNull8Latin1  avgt   15  5.628 ? 0.017  ns/op 
>> +25.67%
>> +StringBuilders.toStringCharWithNull8Utf16   avgt   15  6.873 ? 0.026  ns/op 
>> +35.28%
>> +StringBuilders.toStringCharWithBool8Latin1  avgt   15  6.480 ? 0.077  ns/op 
>> +13.99%
>> +StringBuilders.toStringCharWithBool8Utf16   avgt   15  7.456 ? 0.059  ns/op 
>> +28.95%
>> 
>> 
>> ## 3.  TraceMergeStores
>> 
>> [TraceMergeStores]: Replace
>>  65  StoreB  === 54 7 64 12  [[ 87 ]]  @byte[int:>=0] 
>> (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; unsafe  
>> Memory: @byte[int:>=0] (java/la...
>
> @wenshao I'm glad we figured it out a bit, and I hope you learned a thing or 
> two :)
> 
>> It would be even better if Unsafe.putChar could be used for MergeStore in 
>> the future.
> 
> The `_putCharStringU` intrinsic uses 
> `LibraryCallKit::inline_string_char_access`. It seems to generate IR nodes, I 
> speculate it could be `StoreC`. We'd have to do more digging to see why that 
> pattern is not accepted by `MergeStore`.
> 
> @wenshao I'm not sure if everything is right with the bounds checking, but I 
> leave this to the library folks to review. What you will definately need is 
> benchmarking not just on `M1`, but also `x86-64` and other `aarch64` 
> architectures.

I'm not opposed to accepting this patch as-is, but I think we should do so with 
an eye towards reverting if we figure out a way to improve the `putChar` 
intrinsic so that it doesn't block merge store optimization. What do you think 
@eme64?

As the need for the changes in 
[b5ad8e7](https://github.com/openjdk/jdk/pull/19626/commits/b5ad8e70928c547d134d0e4a532441cad9a7e4a2)
 showed added code complexity (like here) can be detrimental to performance, 
and if the `putChar` can be improved we might see benefits in more places.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2180308686

Reply via email to