Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Copyright and comment about ProxyClassDumper Thanks, nits resolved. Also some minor improvement (improved for; use iconst(mv, len) as originally intended) - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2079016011
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Copyright and comment about ProxyClassDumper Looks okay, thanks. Only cosmetic nits: src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 108: > 106: static { > 107: String highArity = > VM.getSavedProperty("java.lang.invoke.StringConcat.highArityThreshold"); > 108: HIGH_ARITY_THRESHOLD = Integer.parseInt(highArity != null ? > highArity : "20"); Maybe write this as `= (highArity != null) ? Integer.parseInt(highArity) : 20` to help interpreter a bit. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1105: > 1103: for (int c = 0; c < constants.length; c++) { > 1104: if (constants[c] != null) > 1105: len += constants[c].length(); Braces? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1118: > 1116: ); > 1117: > 1118: Extra new-line? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1164: > 1162: } > 1163: > 1164: Extra new-line? - Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18953#pullrequestreview-2024514742 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723293 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723981 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724174 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724568
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
> Splitting out the ASM-based version from #18690 to push that first under the > JBS (to help backporting). Keeping #18690 open to rebase and follow-up on > this as a subtask. See discussion in that #18690 for more details, discussion > and motivation for this. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Copyright and comment about ProxyClassDumper - Changes: - all: https://git.openjdk.org/jdk/pull/18953/files - new: https://git.openjdk.org/jdk/pull/18953/files/a4dfbfca..591d2d1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18953&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18953&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953 PR: https://git.openjdk.org/jdk/pull/18953
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 12 Apr 2024 15:06:36 GMT, Chen Liang wrote: >> I'd prefer considering such optimizations as follow-ups. We need to think >> about where to define such shared classes in a way that considers the full >> lifecycle, facilitates class unloading (one cache per classloader?) while >> still getting good reuse. > > I think `Invokers` might be a good place. Its lifetime is bound to its MT, > and MT already handles class unloading well. If we are doing erased then a > plain array suffices. I think we should be wary about adding things that are strongly kept alive by the MT. While the MTs themselves are weakly referenced and can be unloaded if all usage goes aways, many oft-used MTs might be kept effectively forever. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1565912250
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Pre-size stringbuilders based on constant size and a small argument factor - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/1c3d354c..796d968b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=01-02 Stats: 44 lines in 2 files changed: 42 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken wrote: >> Possibly. I tried a few simple variants that initialized the `StringBuilder` >> capacity at various guesses, such as sum of constant sizes + some factor >> based on args, but across a diverse set of micros that gives both some wins >> and some regressions. Getting the estimation just right is pretty tricky, >> especially when size of the arguments are arbitrary (like for any >> String/Object arg). > > What are the scenarios which had regressions? > Given the conservative growth for StringBuilder, it surprises me a bit that > any scenario would regress. I took a second look and it turns out that there were neither regressions nor improvements in my test, only a few false positives: For C2 the SB chain is recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a shape where the initial size in java bytecode - if any - is ignored. With C1 having an initial size can have a significant effect. One way to provoke a regression there is to have a huge constant suffix while underestimating the size of the operands, which can lead to excessive allocation: Name Cnt BaseError TestError Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238341,251 ± 2,606 ns/op 1,13x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 MB/sec 2,11x (p = 0,000*) :gc.alloc.rate.norm2440,003 ± 0,000 4568,002 ± 0,000 B/op 1,87x (p = 0,000*) Still a bit better on throughput from less actual copying. *If* the `StringBuilder` is sized sufficiently (constants + args * N) things look much better, of course: -XX:TieredStopAtLevel=1 Name Cnt BaseError TestError Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 ns/op 1,48x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 MB/sec 1,47x (p = 0,000*) :gc.alloc.rate.norm2440,003 ± 0,000 2424,002 ± 0,000 B/op 0,99x (p = 0,000*) For most cases having a size based on the sum of the constants plus some small factor per argument seem to be a decent heuristic - for C1. Since this adds just one bytecode to the generated method I guess it wouldn't hurt. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612