On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen <d...@openjdk.org> wrote:
>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into >> primitive arrays by combining values into larger stores. >> >> This PR rewrites the code of appendNull and append(boolean) methods so that >> these two methods can be optimized by C2. > > Shaojin Wen has updated the pull request incrementally with one additional > commit since the last revision: > > revert Going back to: ./java -XX:+TraceMergeStores -Xbatch -XX:CompileCommand=printcompilation,*::* -XX:+PrintInlining AppendNullTest.java > log.log We have occurances of `TraceMergeStores` in these compilations: 9139 1992 b 4 AppendNullTest$StringBuilder::appendNull (75 bytes) -> as discussed above 9156 1996 % b 4 AppendNullTest::main @ 11 (45 bytes) -> once main method is OSR compiled 529 StoreB === 513 408 527 464 [[ 554 540 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !jvms: AppendNullTest$StringUTF16::putChar @ bci:15 (line 115) AppendNullTest$StringUTF16::putCharsAt @ bci:3 (line 87) AppendNullTest$StringBuilder::appendNull @ bci:63 (line 25) AppendNullTest::main @ bci:18 (line 123) 8 such StoreB get turned into StoreL. This also comes from the recursive inlining, ultimately from the same `putChar`. 1154 StoreB === 1139 3516 1152 464 [[ 1178 1164 ]] @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9; Memory: @byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=9; !jvms: AppendNullTest$StringUTF16::putChar @ bci:15 (line 115) AppendNullTest$StringUTF16::putCharsAt @ bci:3 (line 87) AppendNullTest$StringBuilder::appendNull @ bci:63 (line 25) AppendNullTest::main @ bci:21 (line 123) Now the pattern repeats, because you have multiple `appendNull` calls in the `main` melthod. So a total of 5 such optimizations, since all the methods are inlined multiple times. That is all the optimizations that I see, where `MergeStores` happens. Now the question is if there are any compilations where we would expect the `MergeStores` to happen but they do not. That is more difficult. I can for example see a previous compilation of the `main` method: 9151 1995 b 3 AppendNullTest::main (45 bytes) @ 5 AppendNullTest$StringBuilder::<init> (26 bytes) inline @ 1 java.lang.Object::<init> (1 bytes) inline @ 18 AppendNullTest$StringBuilder::appendNull (75 bytes) failed to inline: callee is too large @ 21 AppendNullTest$StringBuilder::appendNull (75 bytes) failed to inline: callee is too large @ 24 AppendNullTest$StringBuilder::appendNull (75 bytes) failed to inline: callee is too large @ 27 AppendNullTest$StringBuilder::appendNull (75 bytes) failed to inline: callee is too large @ 30 AppendNullTest$StringBuilder::appendNull (75 bytes) failed to inline: callee is too large @ 35 AppendNullTest$StringBuilder::clear (3 bytes) inline I think it is obvious: if methods are not inlined, then we cannot optimize the code there. Instead, this is going to be method calls, where we call `appendNull` - with the overhead of a call of course. That could be another source of performance loss. Could be - it depends how often we execute this compiled code at runtime. Of course `appendNull` is already comiled and optimized, so we would still expect to have `long` stores instead of `byte` stores. So far goes the "theory". Now we need to test the theory with some performance measurements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162828575