Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 10:05:27 GMT, Aleksey Shipilev wrote: >>> I really wish this change was not done with ClassFile API, but with a >>> simple bundled ASM, so it would be easily backportable, if we decide to. It >>> does not look like CFA buys us a lot here readability/complexity wise: >>> [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) >> >> I would be open to splitting out and pushing the ASM version first and do >> this CFA port as a follow-up. > >> > I really wish this change was not done with ClassFile API, but with a >> > simple bundled ASM, so it would be easily backportable, if we decide to. >> > It does not look like CFA buys us a lot here readability/complexity wise: >> > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) >> >> I would be open to splitting out and pushing the ASM version first and do >> this CFA port as a follow-up. > > That would be good, thanks. @shipilev @mlchung opened new PR #18953 for pushing the ASM-based version. Adapted applicable code changes from this PR so it should be equivalent in behavior. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2077319721
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 10:01:47 GMT, Claes Redestad wrote: > > I really wish this change was not done with ClassFile API, but with a > > simple bundled ASM, so it would be easily backportable, if we decide to. It > > does not look like CFA buys us a lot here readability/complexity wise: > > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) > > I would be open to splitting out and pushing the ASM version first and do > this CFA port as a follow-up. That would be good, thanks. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2074585421
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 09:57:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Update src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java > > Co-authored-by: Mandy Chung > I really wish this change was not done with ClassFile API, but with a simple > bundled ASM, so it would be easily backportable, if we decide to. It does not > look like CFA buys us a lot here readability/complexity wise: > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) I would be open to splitting out and pushing the ASM version first and do this CFA port as a follow-up. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2074577669
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Sun, 14 Apr 2024 14:33:26 GMT, Claes Redestad wrote: >> What are the scenarios which had regressions? >> Given the conservative growth for StringBuilder, it surprises me a bit that >> any scenario would regress. > > I took a second look and it turns out that there were neither regressions nor > improvements in my test, only a few false positives: For C2 the SB chain is > recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into > a shape where the initial size in java bytecode - if any - is ignored. > > With C1 having an initial size can have a significant effect. One way to > provoke a regression there is to have a huge constant suffix while > underestimating the size of the operands, which can lead to excessive > allocation: > > > Name Cnt BaseError TestError > Unit Change > StringConcat.concat23StringConst 5 385,268 ± 5,238341,251 ± 2,606 > ns/op 1,13x (p = 0,000*) > :gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 > MB/sec 2,11x (p = 0,000*) > :gc.alloc.rate.norm2440,003 ± 0,000 4568,002 ± 0,000 > B/op 1,87x (p = 0,000*) > > > Still a bit better on throughput from less actual copying. > > *If* the `StringBuilder` is sized sufficiently (constants + args * N) things > look much better, of course: > > -XX:TieredStopAtLevel=1 > Name Cnt BaseError TestError > Unit Change > StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 > ns/op 1,48x (p = 0,000*) > :gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 > MB/sec 1,47x (p = 0,000*) > :gc.alloc.rate.norm2440,003 ± 0,000 2424,002 ± 0,000 > B/op 0,99x (p = 0,000*) > > > For most cases having a size based on the sum of the constants plus some > small factor per argument seem to be a decent heuristic - for C1. Since this > adds just one bytecode to the generated method I guess it wouldn't hurt. Presizing this StringBuilder is a thing I looked into once upon a time, discussed here: https://mail.openjdk.org/pipermail/compiler-dev/2015-March/009356.html This work, I understand, the indyfication of string concatenation in the first place. Primitive values can get bounded at appropriate lengths for their types (see e.g. https://stackoverflow.com/a/21146952/869736). It sounds like you're trying to conserve bytecode length, so maybe you don't want to presize the StringBuilder with the exact Object.toString() lengths, though. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1576819289
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java Co-authored-by: Mandy Chung - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/83f4048f..e7cbaaf5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=05-06 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690