On Sun, 23 Jun 2024 08:47:25 GMT, Shaojin Wen <d...@openjdk.org> wrote:
>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > private static final field `UNSAFE` I'm running your benchmark [PutCharTest.java](https://github.com/user-attachments/files/15974672/PutCharTest.txt) with: /oracle-work/jdk-fork2/build/linux-x64-slowdebug/jdk/bin/java --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED -XX:+TraceMergeStores -Xbatch -XX:CompileCommand=printcompilation,PutCharTest::* -XX:CompileCommand=compileonly,PutCharTest::put* -XX:+PrintIdeal PutCharTest.java Aha, I found the limitation in `MergeStores`: https://github.com/openjdk/jdk/blob/f7862bd6b9994814c6dfd43d471122408601f288/src/hotspot/share/opto/memnode.cpp#L2982-L2986 Basically, I require the `store` to have the same data-size as the element-size of the array. But the `putChar` ends up as a 2-byte `StoreC`, and the array is a `byte[]` with 1-byte elements. I quickly removed this check: diff --git a/src/hotspot/share/opto/memnode.cpp b/src/hotspot/share/opto/memnode.cpp index d0b6c59637f..78efda2db13 100644 --- a/src/hotspot/share/opto/memnode.cpp +++ b/src/hotspot/share/opto/memnode.cpp @@ -2980,10 +2980,10 @@ StoreNode* MergePrimitiveArrayStores::run() { return nullptr; } BasicType bt = aryptr_t->elem()->array_element_basic_type(); - if (!is_java_primitive(bt) || - type2aelembytes(bt) != _store->memory_size()) { - return nullptr; - } + //if (!is_java_primitive(bt) || + // type2aelembytes(bt) != _store->memory_size()) { + // return nullptr; + //} // The _store must be the "last" store in a chain. If we find a use we could merge with // then that use or a store further down is the "last" store. @@ -3033,13 +3033,13 @@ bool MergePrimitiveArrayStores::is_compatible_store(const StoreNode* other_store if (!is_java_primitive(aryptr_bt1) || !is_java_primitive(aryptr_bt2)) { return false; } - int size1 = type2aelembytes(aryptr_bt1); - int size2 = type2aelembytes(aryptr_bt2); - if (size1 != size2 || - size1 != _store->memory_size() || - _store->memory_size() != other_store->memory_size()) { - return false; - } + //int size1 = type2aelembytes(aryptr_bt1); + //int size2 = type2aelembytes(aryptr_bt2); + //if (size1 != size2 || + // size1 != _store->memory_size() || + // _store->memory_size() != other_store->memory_size()) { + // return false; + //} return true; } And now it optimizes: 15523 103 b 4 PutCharTest::putNull (14 bytes) [TraceMergeStores]: Replace 74 StoreC === 62 7 73 22 [[ 99 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; mismatched unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: PutCharTest::putChar @ bci:14 (line 14) PutCharTest::putCharsAt @ bci:4 (line 7) PutCharTest::putNull @ bci:10 (line 23) 99 StoreC === 62 74 98 23 [[ 124 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; mismatched unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: PutCharTest::putChar @ bci:14 (line 14) PutCharTest::putCharsAt @ bci:13 (line 8) PutCharTest::putNull @ bci:10 (line 23) 124 StoreC === 62 99 123 24 [[ 150 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; mismatched unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: PutCharTest::putChar @ bci:14 (line 14) PutCharTest::putCharsAt @ bci:23 (line 9) PutCharTest::putNull @ bci:10 (line 23) 150 StoreC === 62 124 149 24 [[ 17 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=4; mismatched unsafe Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; !jvms: PutCharTest::putChar @ bci:14 (line 14) PutCharTest::putCharsAt @ bci:33 (line 10) PutCharTest::putNull @ bci:10 (line 23) [TraceMergeStores]: with 155 ConL === 0 [[ 156 ]] #long:30399761348886638 156 StoreL === 62 7 73 155 [[ ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; mismatched Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=4; I will file an RFE to enable this use-case as well. @wenshao thanks for the standalone test, that was really helpful! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2189509983