Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Make Set.of(STRONG) a constant, fix compilation, minor code improvements Thanks for reviewing! I'll split out and PR an ASM version as a subtask and rebase this PR on top of that. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2075654943
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Make Set.of(STRONG) a constant, fix compilation, minor code improvements Looks fine to me. Indeed, splitting this with ASM and then convert it to ClassFile API would help the backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1077: > 1075: > 1076: /** > 1077: * Ensure a capacity in the initial StringBuilder to > accommodate all constants plus this factor times the number Nit: wrap long line. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1085: > 1083: > 1084: static { > 1085: DUMPER = > ClassFileDumper.getInstance("java.lang.invoke.StringConcatFactory.dump", > "stringConcatClasses"); Nit: this static block isn't strictly needed. Can assign at the declaration of the static variable. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1112: > 1110: return hiddenLookup.findStatic(innerClass, METHOD_NAME, > args); > : } catch (Exception e) { > 1112: DUMPER.dumpFailedClass(className, classBytes); This line is no longer needed. The bytes will be dumped if it's enabled for both success and failing case. - Marked as reviewed by mchung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18690#pullrequestreview-2020345792 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578178759 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578176295 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578173742
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Make Set.of(STRONG) a constant, fix compilation, minor code improvements - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/e7cbaaf5..9742f074 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=06-07 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690