On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken <d...@openjdk.org> wrote:

>> Possibly. I tried a few simple variants that initialized the `StringBuilder` 
>> capacity at various guesses, such as sum of constant sizes + some factor 
>> based on args, but across a diverse set of micros that gives both some wins 
>> and some regressions. Getting the estimation just right is pretty tricky, 
>> especially when size of the arguments are arbitrary (like for any 
>> String/Object arg).
>
> What are the scenarios which had regressions? 
> Given the conservative growth for StringBuilder, it surprises me a bit that 
> any scenario would regress.

I took a second look and it turns out that there were neither regressions nor 
improvements in my test, only a few false positives: For C2 the SB chain is 
recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a 
shape where the initial size in java bytecode - if any - is ignored.

With C1 having an initial size can have a significant effect. One way to 
provoke a regression there is to have a huge constant suffix while 
underestimating the size of the operands, which can lead to excessive 
allocation:


Name                             Cnt     Base    Error       Test    Error   
Unit  Change
StringConcat.concat23StringConst   5  385,268 ±  5,238    341,251 ±  2,606  
ns/op   1,13x (p = 0,000*)
  :gc.alloc.rate                     6039,095 ± 82,309  12764,169 ± 97,146 
MB/sec   2,11x (p = 0,000*)
  :gc.alloc.rate.norm                2440,003 ±  0,000   4568,002 ±  0,000   
B/op   1,87x (p = 0,000*)


Still a bit better on throughput from less actual copying.

*If* the `StringBuilder` is sized sufficiently (constants + args * N) things 
look much better, of course: 

-XX:TieredStopAtLevel=1
Name                             Cnt     Base    Error      Test    Error   
Unit  Change
StringConcat.concat23StringConst   5  385,268 ±  5,238   259,628 ±  2,515  
ns/op   1,48x (p = 0,000*)
  :gc.alloc.rate                     6039,095 ± 82,309  8902,803 ± 86,563 
MB/sec   1,47x (p = 0,000*)
  :gc.alloc.rate.norm                2440,003 ±  0,000  2424,002 ±  0,000   
B/op   0,99x (p = 0,000*)


For most cases having a size based on the sum of the constants plus some small 
factor per argument seem to be a decent heuristic - for C1. Since this adds 
just one bytecode to the generated method I guess it wouldn't hurt.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612

Reply via email to