On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken <d...@openjdk.org> wrote:
>> Possibly. I tried a few simple variants that initialized the `StringBuilder` >> capacity at various guesses, such as sum of constant sizes + some factor >> based on args, but across a diverse set of micros that gives both some wins >> and some regressions. Getting the estimation just right is pretty tricky, >> especially when size of the arguments are arbitrary (like for any >> String/Object arg). > > What are the scenarios which had regressions? > Given the conservative growth for StringBuilder, it surprises me a bit that > any scenario would regress. I took a second look and it turns out that there were neither regressions nor improvements in my test, only a few false positives: For C2 the SB chain is recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a shape where the initial size in java bytecode - if any - is ignored. With C1 having an initial size can have a significant effect. One way to provoke a regression there is to have a huge constant suffix while underestimating the size of the operands, which can lead to excessive allocation: Name Cnt Base Error Test Error Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238 341,251 ± 2,606 ns/op 1,13x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 MB/sec 2,11x (p = 0,000*) :gc.alloc.rate.norm 2440,003 ± 0,000 4568,002 ± 0,000 B/op 1,87x (p = 0,000*) Still a bit better on throughput from less actual copying. *If* the `StringBuilder` is sized sufficiently (constants + args * N) things look much better, of course: -XX:TieredStopAtLevel=1 Name Cnt Base Error Test Error Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 ns/op 1,48x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 MB/sec 1,47x (p = 0,000*) :gc.alloc.rate.norm 2440,003 ± 0,000 2424,002 ± 0,000 B/op 0,99x (p = 0,000*) For most cases having a size based on the sum of the constants plus some small factor per argument seem to be a decent heuristic - for C1. Since this adds just one bytecode to the generated method I guess it wouldn't hurt. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612