On Tue, 11 Jun 2024 11:35:28 GMT, Shaojin Wen <d...@openjdk.org> wrote:

>> After PR https://github.com/openjdk/jdk/pull/16245, C2 optimizes stores into 
>> primitive arrays by combining values ​​into larger stores.
>> 
>> This PR rewrites the code of appendNull and append(boolean) methods so that 
>> these two methods can be optimized by C2.
>
> Shaojin Wen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   revert

Going back to:

./java -XX:+TraceMergeStores -Xbatch -XX:CompileCommand=printcompilation,*::* 
-XX:+PrintInlining AppendNullTest.java > log.log


We have occurances of `TraceMergeStores` in these compilations:

9139 1992    b  4       AppendNullTest$StringBuilder::appendNull (75 bytes)
 -> as discussed above

9156 1996 %  b  4       AppendNullTest::main @ 11 (45 bytes)
 -> once main method is OSR compiled

  529  StoreB  === 513 408 527 464  [[ 554 540 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9;  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, 
idx=9; !jvms: AppendNullTest$StringUTF16::putChar @ bci:15 (line 115) 
AppendNullTest$StringUTF16::putCharsAt @ bci:3 (line 87) 
AppendNullTest$StringBuilder::appendNull @ bci:63 (line 25) 
AppendNullTest::main @ bci:18 (line 123)
8 such StoreB get turned into StoreL. This also comes from the recursive 
inlining, ultimately from the same `putChar`.

 1154  StoreB  === 1139 3516 1152 464  [[ 1178 1164 ]]  @byte[int:>=0] 
(java/lang/Cloneable,java/io/Serializable):exact+any *, idx=9;  Memory: 
@byte[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, 
idx=9; !jvms: AppendNullTest$StringUTF16::putChar @ bci:15 (line 115) 
AppendNullTest$StringUTF16::putCharsAt @ bci:3 (line 87) 
AppendNullTest$StringBuilder::appendNull @ bci:63 (line 25) 
AppendNullTest::main @ bci:21 (line 123)
 Now the pattern repeats, because you have multiple `appendNull` calls in the 
`main` melthod.
 So a total of 5 such optimizations, since all the methods are inlined multiple 
times.

 
 That is all the optimizations that I see, where `MergeStores` happens. Now the 
question is if there are any compilations where we would expect the 
`MergeStores` to happen but they do not. That is more difficult.
 
I can for example see a previous compilation of the `main` method:

9151 1995    b  3       AppendNullTest::main (45 bytes)
                              @ 5   AppendNullTest$StringBuilder::<init> (26 
bytes)   inline
                                @ 1   java.lang.Object::<init> (1 bytes)   
inline
                              @ 18   AppendNullTest$StringBuilder::appendNull 
(75 bytes)   failed to inline: callee is too large
                              @ 21   AppendNullTest$StringBuilder::appendNull 
(75 bytes)   failed to inline: callee is too large
                              @ 24   AppendNullTest$StringBuilder::appendNull 
(75 bytes)   failed to inline: callee is too large
                              @ 27   AppendNullTest$StringBuilder::appendNull 
(75 bytes)   failed to inline: callee is too large
                              @ 30   AppendNullTest$StringBuilder::appendNull 
(75 bytes)   failed to inline: callee is too large
                              @ 35   AppendNullTest$StringBuilder::clear (3 
bytes)   inline

I think it is obvious: if methods are not inlined, then we cannot optimize the 
code there. Instead, this is going to be method calls, where we call 
`appendNull` - with the overhead of a call of course. That could be another 
source of performance loss. Could be - it depends how often we execute this 
compiled code at runtime.
Of course `appendNull` is already comiled and optimized, so we would still 
expect to have `long` stores instead of `byte` stores.

So far goes the "theory". Now we need to test the theory with some performance 
measurements.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19626#issuecomment-2162828575

Reply via email to