Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v4]
On Fri, 26 Apr 2024 09:42:06 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Fix nits, use iconst as intended Running another round of sanity testing (tier1+2) before integration. - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2079038010
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Copyright and comment about ProxyClassDumper Thanks, nits resolved. Also some minor improvement (improved for; use iconst(mv, len) as originally intended) - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2079016011
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v4]
> Splitting out the ASM-based version from #18690 to push that first under the > JBS (to help backporting). Keeping #18690 open to rebase and follow-up on > this as a subtask. See discussion in that #18690 for more details, discussion > and motivation for this. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Fix nits, use iconst as intended - Changes: - all: https://git.openjdk.org/jdk/pull/18953/files - new: https://git.openjdk.org/jdk/pull/18953/files/591d2d1c..9b083f4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18953=03 - incr: https://webrevs.openjdk.org/?repo=jdk=18953=02-03 Stats: 9 lines in 1 file changed: 1 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953 PR: https://git.openjdk.org/jdk/pull/18953
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Copyright and comment about ProxyClassDumper Looks okay, thanks. Only cosmetic nits: src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 108: > 106: static { > 107: String highArity = > VM.getSavedProperty("java.lang.invoke.StringConcat.highArityThreshold"); > 108: HIGH_ARITY_THRESHOLD = Integer.parseInt(highArity != null ? > highArity : "20"); Maybe write this as `= (highArity != null) ? Integer.parseInt(highArity) : 20` to help interpreter a bit. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1105: > 1103: for (int c = 0; c < constants.length; c++) { > 1104: if (constants[c] != null) > 1105: len += constants[c].length(); Braces? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1118: > 1116: ); > 1117: > 1118: Extra new-line? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1164: > 1162: } > 1163: > 1164: Extra new-line? - Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18953#pullrequestreview-2024514742 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723293 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723981 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724174 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724568
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Thu, 25 Apr 2024 19:53:46 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Remove accidental use of java.lang.classfile @shipilev are you OK with this? - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2078922183
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
> Splitting out the ASM-based version from #18690 to push that first under the > JBS (to help backporting). Keeping #18690 open to rebase and follow-up on > this as a subtask. See discussion in that #18690 for more details, discussion > and motivation for this. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Copyright and comment about ProxyClassDumper - Changes: - all: https://git.openjdk.org/jdk/pull/18953/files - new: https://git.openjdk.org/jdk/pull/18953/files/a4dfbfca..591d2d1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18953=02 - incr: https://webrevs.openjdk.org/?repo=jdk=18953=01-02 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953 PR: https://git.openjdk.org/jdk/pull/18953
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Fri, 26 Apr 2024 02:36:22 GMT, Chen Liang wrote: > Also a side note for backport: ClassFileDumper exists only since JDK 21, so > for earlier backports you need to use an alternative dumping method or remove > dumping ability. Yes, original code used ProxyClassDumper, which was renamed/reworked in 21 by [JDK-8304846](https://bugs.openjdk.org/browse/JDK-8304846). I think we've gone as far as we can code-wise in ensuring this is cleanly backportable to at least 21. I'll add a comment about ProxyClassDumper. - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2078910803
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Fri, 26 Apr 2024 03:14:54 GMT, Mandy Chung wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove accidental use of java.lang.classfile > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1057: > >> 1055: */ >> 1056: private static final class SimpleStringBuilderStrategy { >> 1057: static final int CLASSFILE_VERSION = 52; // JDK 8 > > Alternatively, this can use `ClassFileFormatVersion.latest().major()` which > exists since JDK 20. As the original code just had 52 verbatim and I don't know how far we'll want to backport this I think it's fine to leave this at 52. - PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580682911
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Thu, 25 Apr 2024 19:53:46 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Remove accidental use of java.lang.classfile src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1057: > 1055: */ > 1056: private static final class SimpleStringBuilderStrategy { > 1057: static final int CLASSFILE_VERSION = 52; // JDK 8 Alternatively, this can use `ClassFileFormatVersion.latest().major()` which exists since JDK 20. - PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580432036
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Thu, 25 Apr 2024 19:53:46 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Remove accidental use of java.lang.classfile Also a side note for backport: ClassFileDumper exists only since JDK 21, so for earlier backports you need to use an alternative dumping method or remove dumping ability. - PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2078532971
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v9]
On Wed, 24 Apr 2024 19:13:43 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Nits, clean up constant, introduce missing constant MethodTypeDesc for > toString It's a little confusing for JDK-8327247 to have 2 PRs but it's okay. We can add a note in JDK-8327247 to clarify. I assume you plan to use PR to convert to use ClassFile API after you pull from JDK-8327247 once integrated? - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2078512445
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Thu, 25 Apr 2024 19:53:46 GMT, Claes Redestad wrote: >> Splitting out the ASM-based version from #18690 to push that first under the >> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on >> this as a subtask. See discussion in that #18690 for more details, >> discussion and motivation for this. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Remove accidental use of java.lang.classfile Nit: the copyright header needs update before integration. - Marked as reviewed by mchung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18953#pullrequestreview-2023964506
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Thu, 25 Apr 2024 16:46:25 GMT, Chen Liang wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Remove accidental use of java.lang.classfile > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1059: > >> 1057: */ >> 1058: private static final class SimpleStringBuilderStrategy { >> 1059: static final int CLASSFILE_VERSION = >> ClassFile.latestMajorVersion(); > > Still breaks backward ASM port, we should use a fixed version like 52 for > JAVA_8 and convert to latest only in the CF conversion later. Good catch. In the code I am [reviving](https://github.com/cl4es/jdk/commit/36c4b11bc6cf5a008d5935934aa75f2d2bbe6a23#diff-1339c269a3729d849799d29a7431ccd508a034ced91c1796b952795396843891L771) this field was simply set to `52`. Fixed. - PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580047931
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
> Splitting out the ASM-based version from #18690 to push that first under the > JBS (to help backporting). Keeping #18690 open to rebase and follow-up on > this as a subtask. See discussion in that #18690 for more details, discussion > and motivation for this. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Remove accidental use of java.lang.classfile - Changes: - all: https://git.openjdk.org/jdk/pull/18953/files - new: https://git.openjdk.org/jdk/pull/18953/files/d8d5e5d3..a4dfbfca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18953=01 - incr: https://webrevs.openjdk.org/?repo=jdk=18953=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953 PR: https://git.openjdk.org/jdk/pull/18953
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Thu, 25 Apr 2024 14:15:56 GMT, Claes Redestad wrote: > Splitting out the ASM-based version from #18690 to push that first under the > JBS (to help backporting). Keeping #18690 open to rebase and follow-up on > this as a subtask. See discussion in that #18690 for more details, discussion > and motivation for this. Only with ASM can we realize how concise ClassFile API is! src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1059: > 1057: */ > 1058: private static final class SimpleStringBuilderStrategy { > 1059: static final int CLASSFILE_VERSION = > ClassFile.latestMajorVersion(); Still breaks backward ASM port, we should use a fixed version like 52 for JAVA_8 and convert to latest only in the CF conversion later. - PR Review: https://git.openjdk.org/jdk/pull/18953#pullrequestreview-2023078249 PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1579822765
RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
Splitting out the ASM-based version from #18690 to push that first under the JBS (to help backporting). Keeping #18690 open to rebase and follow-up on this as a subtask. See discussion in that #18690 for more details, discussion and motivation for this. - Commit messages: - Adapt main PR feedback to ASM version - ASM fallback version Changes: https://git.openjdk.org/jdk/pull/18953/files Webrev: https://webrevs.openjdk.org/?repo=jdk=18953=00 Issue: https://bugs.openjdk.org/browse/JDK-8327247 Stats: 300 lines in 2 files changed: 294 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953 PR: https://git.openjdk.org/jdk/pull/18953
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 10:05:27 GMT, Aleksey Shipilev wrote: >>> I really wish this change was not done with ClassFile API, but with a >>> simple bundled ASM, so it would be easily backportable, if we decide to. It >>> does not look like CFA buys us a lot here readability/complexity wise: >>> [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) >> >> I would be open to splitting out and pushing the ASM version first and do >> this CFA port as a follow-up. > >> > I really wish this change was not done with ClassFile API, but with a >> > simple bundled ASM, so it would be easily backportable, if we decide to. >> > It does not look like CFA buys us a lot here readability/complexity wise: >> > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) >> >> I would be open to splitting out and pushing the ASM version first and do >> this CFA port as a follow-up. > > That would be good, thanks. @shipilev @mlchung opened new PR #18953 for pushing the ASM-based version. Adapted applicable code changes from this PR so it should be equivalent in behavior. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2077319721
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v9]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Nits, clean up constant, introduce missing constant MethodTypeDesc for toString - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/9742f074..0ac8f24b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=08 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=07-08 Stats: 13 lines in 1 file changed: 2 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Make Set.of(STRONG) a constant, fix compilation, minor code improvements Thanks for reviewing! I'll split out and PR an ASM version as a subtask and rebase this PR on top of that. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2075654943
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Make Set.of(STRONG) a constant, fix compilation, minor code improvements Looks fine to me. Indeed, splitting this with ASM and then convert it to ClassFile API would help the backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1077: > 1075: > 1076: /** > 1077: * Ensure a capacity in the initial StringBuilder to > accommodate all constants plus this factor times the number Nit: wrap long line. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1085: > 1083: > 1084: static { > 1085: DUMPER = > ClassFileDumper.getInstance("java.lang.invoke.StringConcatFactory.dump", > "stringConcatClasses"); Nit: this static block isn't strictly needed. Can assign at the declaration of the static variable. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1112: > 1110: return hiddenLookup.findStatic(innerClass, METHOD_NAME, > args); > : } catch (Exception e) { > 1112: DUMPER.dumpFailedClass(className, classBytes); This line is no longer needed. The bytes will be dumped if it's enabled for both success and failing case. - Marked as reviewed by mchung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18690#pullrequestreview-2020345792 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578178759 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578176295 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578173742
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Make Set.of(STRONG) a constant, fix compilation, minor code improvements - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/e7cbaaf5..9742f074 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=07 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=06-07 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 10:01:47 GMT, Claes Redestad wrote: > > I really wish this change was not done with ClassFile API, but with a > > simple bundled ASM, so it would be easily backportable, if we decide to. It > > does not look like CFA buys us a lot here readability/complexity wise: > > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) > > I would be open to splitting out and pushing the ASM version first and do > this CFA port as a follow-up. That would be good, thanks. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2074585421
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Wed, 24 Apr 2024 09:57:42 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Update src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java > > Co-authored-by: Mandy Chung > I really wish this change was not done with ClassFile API, but with a simple > bundled ASM, so it would be easily backportable, if we decide to. It does not > look like CFA buys us a lot here readability/complexity wise: > [d99b159](https://github.com/openjdk/jdk/commit/d99b1596c5ca57b110c1db88be430c6c54c3d599) I would be open to splitting out and pushing the ASM version first and do this CFA port as a follow-up. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2074577669
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
On Sun, 14 Apr 2024 14:33:26 GMT, Claes Redestad wrote: >> What are the scenarios which had regressions? >> Given the conservative growth for StringBuilder, it surprises me a bit that >> any scenario would regress. > > I took a second look and it turns out that there were neither regressions nor > improvements in my test, only a few false positives: For C2 the SB chain is > recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into > a shape where the initial size in java bytecode - if any - is ignored. > > With C1 having an initial size can have a significant effect. One way to > provoke a regression there is to have a huge constant suffix while > underestimating the size of the operands, which can lead to excessive > allocation: > > > Name Cnt BaseError TestError > Unit Change > StringConcat.concat23StringConst 5 385,268 ± 5,238341,251 ± 2,606 > ns/op 1,13x (p = 0,000*) > :gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 > MB/sec 2,11x (p = 0,000*) > :gc.alloc.rate.norm2440,003 ± 0,000 4568,002 ± 0,000 > B/op 1,87x (p = 0,000*) > > > Still a bit better on throughput from less actual copying. > > *If* the `StringBuilder` is sized sufficiently (constants + args * N) things > look much better, of course: > > -XX:TieredStopAtLevel=1 > Name Cnt BaseError TestError > Unit Change > StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 > ns/op 1,48x (p = 0,000*) > :gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 > MB/sec 1,47x (p = 0,000*) > :gc.alloc.rate.norm2440,003 ± 0,000 2424,002 ± 0,000 > B/op 0,99x (p = 0,000*) > > > For most cases having a size based on the sum of the constants plus some > small factor per argument seem to be a decent heuristic - for C1. Since this > adds just one bytecode to the generated method I guess it wouldn't hurt. Presizing this StringBuilder is a thing I looked into once upon a time, discussed here: https://mail.openjdk.org/pipermail/compiler-dev/2015-March/009356.html This work, I understand, the indyfication of string concatenation in the first place. Primitive values can get bounded at appropriate lengths for their types (see e.g. https://stackoverflow.com/a/21146952/869736). It sounds like you're trying to conserve bytecode length, so maybe you don't want to presize the StringBuilder with the exact Object.toString() lengths, though. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1576819289
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v7]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Update src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java Co-authored-by: Mandy Chung - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/83f4048f..e7cbaaf5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=06 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=05-06 Stats: 5 lines in 1 file changed: 1 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v6]
On Thu, 18 Apr 2024 14:50:30 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Minor SimpleStringBuilderStrategy:: overhead reduction src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1108: > 1106: DUMPER.dumpClass(innerClass.getName(), innerClass, > classBytes); > 1107: MethodHandle mh = hiddenLookup.findStatic(innerClass, > METHOD_NAME, args); > 1108: return mh; An alternative way to define a hidden class that will detect if the dumper is enabled and dump the classfile. Suggestion: Lookup hiddenLookup = lookup.makeHiddenClassDefiner(className, classBytes, Set.of(STRONG), DUMPER) .defineClassAsLookup(true); Class innerClass = hiddenLookup.lookupClass(); return hiddenLookup.findStatic(innerClass, METHOD_NAME, args); - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1576880600
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v6]
On Thu, 18 Apr 2024 14:50:30 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Minor SimpleStringBuilderStrategy:: overhead reduction I really wish this change was not done with ClassFile API, but with a simple bundled ASM, so it would be easily backportable, if we decide to. It does not look like CFA buys us a lot here readability/complexity wise: https://github.com/openjdk/jdk/pull/18690/commits/d99b1596c5ca57b110c1db88be430c6c54c3d599 - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2073284920
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v6]
On Mon, 22 Apr 2024 08:59:33 GMT, Claes Redestad wrote: > @mlchung @asotona: Alan asked me to use the ClassFile API here. As it's only > used in what's effectively a slow path it shouldn't be blocked by the startup > investigations we're doing there, right? I agree that this should not be blocked by the startup investigation for ClassFile API. It only affects high arity expressions. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2073085437
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v6]
On Thu, 18 Apr 2024 14:50:30 GMT, Claes Redestad wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Minor SimpleStringBuilderStrategy:: overhead reduction This one is still waiting for a review. @mlchung @asotona: Alan asked me to use the ClassFile API here. As it's only used in what's effectively a slow path it shouldn't be blocked by the startup investigations we're doing there, right? - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2068862978
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v6]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Minor SimpleStringBuilderStrategy:: overhead reduction - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/fd9c40d2..83f4048f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=05 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=04-05 Stats: 11 lines in 1 file changed: 1 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v5]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge master, resolve conflicts due pr/18688 - Dump using the hidden class name - Pre-size stringbuilders based on constant size and a small argument factor - @liach feedback - Bump threshold after experiments - Port ASM to CFA - Lower compilation overhead for large concat expressions, initial ASM-based version based on pre-existing implementation - Changes: https://git.openjdk.org/jdk/pull/18690/files Webrev: https://webrevs.openjdk.org/?repo=jdk=18690=04 Stats: 237 lines in 2 files changed: 229 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v4]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Dump using the hidden class name - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/796d968b..9eed6609 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=03 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Fri, 12 Apr 2024 15:06:36 GMT, Chen Liang wrote: >> I'd prefer considering such optimizations as follow-ups. We need to think >> about where to define such shared classes in a way that considers the full >> lifecycle, facilitates class unloading (one cache per classloader?) while >> still getting good reuse. > > I think `Invokers` might be a good place. Its lifetime is bound to its MT, > and MT already handles class unloading well. If we are doing erased then a > plain array suffices. I think we should be wary about adding things that are strongly kept alive by the MT. While the MTs themselves are weakly referenced and can be unloaded if all usage goes aways, many oft-used MTs might be kept effectively forever. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1565912250
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: Pre-size stringbuilders based on constant size and a small argument factor - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/1c3d354c..796d968b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=02 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=01-02 Stats: 44 lines in 2 files changed: 42 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]
On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken wrote: >> Possibly. I tried a few simple variants that initialized the `StringBuilder` >> capacity at various guesses, such as sum of constant sizes + some factor >> based on args, but across a diverse set of micros that gives both some wins >> and some regressions. Getting the estimation just right is pretty tricky, >> especially when size of the arguments are arbitrary (like for any >> String/Object arg). > > What are the scenarios which had regressions? > Given the conservative growth for StringBuilder, it surprises me a bit that > any scenario would regress. I took a second look and it turns out that there were neither regressions nor improvements in my test, only a few false positives: For C2 the SB chain is recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a shape where the initial size in java bytecode - if any - is ignored. With C1 having an initial size can have a significant effect. One way to provoke a regression there is to have a huge constant suffix while underestimating the size of the operands, which can lead to excessive allocation: Name Cnt BaseError TestError Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238341,251 ± 2,606 ns/op 1,13x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 12764,169 ± 97,146 MB/sec 2,11x (p = 0,000*) :gc.alloc.rate.norm2440,003 ± 0,000 4568,002 ± 0,000 B/op 1,87x (p = 0,000*) Still a bit better on throughput from less actual copying. *If* the `StringBuilder` is sized sufficiently (constants + args * N) things look much better, of course: -XX:TieredStopAtLevel=1 Name Cnt BaseError TestError Unit Change StringConcat.concat23StringConst 5 385,268 ± 5,238 259,628 ± 2,515 ns/op 1,48x (p = 0,000*) :gc.alloc.rate 6039,095 ± 82,309 8902,803 ± 86,563 MB/sec 1,47x (p = 0,000*) :gc.alloc.rate.norm2440,003 ± 0,000 2424,002 ± 0,000 B/op 0,99x (p = 0,000*) For most cases having a size based on the sum of the constants plus some small factor per argument seem to be a decent heuristic - for C1. Since this adds just one bytecode to the generated method I guess it wouldn't hurt. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Fri, 12 Apr 2024 10:19:27 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line >> 1430: >> >>> 1428: cb.new_(STRING_BUILDER); >>> 1429: cb.dup(); >>> 1430: cb.invokespecial(STRING_BUILDER, "", >>> MethodTypeDesc.ofDescriptor("()V")); >> >> Would there be value in initializing to a larger capacity? Given the number >> of append calls, seems the default cap of 16 is unlikely to be sufficient. > > Possibly. I tried a few simple variants that initialized the `StringBuilder` > capacity at various guesses, such as sum of constant sizes + some factor > based on args, but across a diverse set of micros that gives both some wins > and some regressions. Getting the estimation just right is pretty tricky, > especially when size of the arguments are arbitrary (like for any > String/Object arg). What are the scenarios which had regressions? Given the conservative growth for StringBuilder, it surprises me a bit that any scenario would regress. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564165946
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Claes Redestad has updated the pull request incrementally with one additional commit since the last revision: @liach feedback - Changes: - all: https://git.openjdk.org/jdk/pull/18690/files - new: https://git.openjdk.org/jdk/pull/18690/files/b5928994..1c3d354c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk=18690=01 - incr: https://webrevs.openjdk.org/?repo=jdk=18690=00-01 Stats: 32 lines in 1 file changed: 4 ins; 23 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v2]
On Fri, 12 Apr 2024 14:26:04 GMT, Chen Liang wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> @liach feedback > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1406: > >> 1404: @Override >> 1405: public void accept(ClassBuilder clb) { >> 1406: clb.withFlags(AccessFlag.PUBLIC, >> AccessFlag.FINAL, AccessFlag.SUPER, AccessFlag.SYNTHETIC) > > Why is this hidden class public? Removed `PUBLIC`. > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1458: > >> 1456: >> 1457: // Load the argument of type cl at slot onto stack, return the >> number of argument stack slots consumed. >> 1458: private static int load(CodeBuilder cb, Class cl, int slot) >> { > > You can replace all calls to `load(cb, cl, slot)` with > `cb.loadInstruction(TypeKind.from(cl), slot)`, and the retrn slot count can > be accessed by `TypeKind::slotSize` Thanks, I've tried to simplify. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562702306 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562701048
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 14:57:45 GMT, Claes Redestad wrote: >> src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line >> 1437: >> >>> 1435: for (int c = 0; c < args.parameterCount(); >>> c++) { >>> 1436: if (constants[c] != null) { >>> 1437: cb.ldc(constants[c]); >> >> I think for Remi's approach, we change: >> 1. Insert an extra String array (maybe need a way to mark it stable?) arg >> representing constants >> 2. Change this ldc into aload + aaload (or List.get if we use immutable List) >> 3. Call `bindTo(constantStrings)` on the resulting MH >> >> This approach can significantly reduce the number of classes spinned instead >> of generating one class per constant array; might need to measure >> performance to see if this is a good tradeoff >> >> Oh, I just noticed that we need to erase everything to the generic method >> type. I think Remi's "value class" means there's no overhead for converting >> primitives into wrappers in this conversion to generic method type. > > I'd prefer considering such optimizations as follow-ups. We need to think > about where to define such shared classes in a way that considers the full > lifecycle, facilitates class unloading (one cache per classloader?) while > still getting good reuse. I think `Invokers` might be a good place. Its lifetime is bound to its MT, and MT already handles class unloading well. If we are doing erased then a plain array suffices. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562686793
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 14:53:58 GMT, Chen Liang wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1437: > >> 1435: for (int c = 0; c < args.parameterCount(); >> c++) { >> 1436: if (constants[c] != null) { >> 1437: cb.ldc(constants[c]); > > I think for Remi's approach, we change: > 1. Insert an extra String array (maybe need a way to mark it stable?) arg > representing constants > 2. Change this ldc into aload + aaload (or List.get if we use immutable List) > 3. Call `bindTo(constantStrings)` on the resulting MH > > This approach can significantly reduce the number of classes spinned instead > of generating one class per constant array; might need to measure performance > to see if this is a good tradeoff > > Oh, I just noticed that we need to erase everything to the generic method > type. I think Remi's "value class" means there's no overhead for converting > primitives into wrappers in this conversion to generic method type. I'd prefer considering such optimizations as follow-ups. We need to think about where to define such shared classes in a way that considers the full lifecycle, facilitates class unloading (one cache per classloader?) while still getting good reuse. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562673638
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1437: > 1435: for (int c = 0; c < args.parameterCount(); c++) > { > 1436: if (constants[c] != null) { > 1437: cb.ldc(constants[c]); I think for Remi's approach, we change: 1. Insert an extra String array (maybe need a way to mark it stable?) arg representing constants 2. Change this ldc into aload + aaload (or List.get if we use immutable List) 3. Call `bindTo(constantStrings)` on the resulting MH This approach can significantly reduce the number of classes spinned instead of generating one class per constant array; might need to measure performance to see if this is a good tradeoff Oh, I just noticed that we need to erase everything to the generic method type. I think Remi's "value class" means there's no overhead for converting primitives into wrappers in this conversion to generic method type. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562667143
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 14:32:05 GMT, Chen Liang wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1425: > >> 1423: >> 1424: private static Consumer generateMethod(String[] >> constants, MethodType args) { >> 1425: return new Consumer() { > > Are you using inner classes for startup performance concerns? > java.lang.invoke should be ready when you can reach here so using lambdas is > safe. We've opted to avoid lambda use in `java.lang.invoke` historically to avoid bootstrap surprises. A small price to pay, imho. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562661264
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 13:44:23 GMT, Rémi Forax wrote: > One class per arity + value classes can be a good combo ? Not sure value classes matter here? We would need one static instance per call site holding the constants. Trickier for performance is the potential for profile pollution between such classes. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051885862
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1406: > 1404: @Override > 1405: public void accept(ClassBuilder clb) { > 1406: clb.withFlags(AccessFlag.PUBLIC, > AccessFlag.FINAL, AccessFlag.SUPER, AccessFlag.SYNTHETIC) Why is this hidden class public? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1409: > 1407: .withMethodBody(METHOD_NAME, > 1408: > MethodTypeDesc.ofDescriptor(args.toMethodDescriptorString()), > 1409: ClassFile.ACC_FINAL | > ClassFile.ACC_PUBLIC | ClassFile.ACC_STATIC, Why is this method final? Don't think this method can ever be hidden. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1416: > 1414: Class innerClass = > lookup.defineHiddenClass(classBytes, true, STRONG).lookupClass(); > 1415: DUMPER.dumpClass(className, innerClass, classBytes); > 1416: MethodHandle mh = lookup.findStatic(innerClass, > METHOD_NAME, args); We can probably make the method private, and use the lookup from `defineHiddenClass` instead of the host class lookup. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1425: > 1423: > 1424: private static Consumer generateMethod(String[] > constants, MethodType args) { > 1425: return new Consumer() { Are you using inner classes for startup performance concerns? java.lang.invoke should be ready when you can reach here so using lambdas is safe. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1458: > 1456: > 1457: // Load the argument of type cl at slot onto stack, return the > number of argument stack slots consumed. > 1458: private static int load(CodeBuilder cb, Class cl, int slot) { You can replace all calls to `load(cb, cl, slot)` with `cb.loadInstruction(TypeKind.from(cl), slot)`, and the retrn slot count can be accessed by `TypeKind::slotSize` - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562629418 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562630031 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562635315 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562637025 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562640252
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. One class per arity + value classes can be a good combo ? - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051792260
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 11:41:17 GMT, Thomas Stuefe wrote: > Yes, we are concerned with that, especially for a possible future where > Lilliput is the sole default. Atm we can address about 4 million classes. > There are thoughts about making this number of classes infinite, but if > possible we would like to avoid that complexity. 4M classes should be enough for everyone :-) Either way this PR shouldn't make the situation worse. I think for expressions of high arity the average number of classes generated by the current strategy is likely larger than 1. Forcing generation of exactly one class per such call site is thus unlikely to be worse. But since it depends on app having a parameter to adjust the level at which we fall back to simple bytecode generation seems reasonable. There's a range of possible follow-up enhancements to reduce classes generated. For example we could generate classes where constants are injected rather than hardcoded and cache/reuse those classes at different call-sites. Further out we could pre-generate classes (one per arity) containing methods for known signatures when dumping CDS archives or running jlink plugins so that there's little to no bytecode generation at runtime and a bound number of classes. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051734530
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 10:07:48 GMT, Claes Redestad wrote: > I guess Lilliput adds some hard or soft limit on the number of classes loaded? Yes, we are concerned with that, especially for a possible future where Lilliput is the sole default. Atm we can address about 4 million classes. There are thoughts about making this number of classes infinite, but if possible we would like to avoid that complexity. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051597166
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 03:16:58 GMT, Brett Okken wrote: >> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1430: > >> 1428: cb.new_(STRING_BUILDER); >> 1429: cb.dup(); >> 1430: cb.invokespecial(STRING_BUILDER, "", >> MethodTypeDesc.ofDescriptor("()V")); > > Would there be value in initializing to a larger capacity? Given the number > of append calls, seems the default cap of 16 is unlikely to be sufficient. Possibly. I tried a few simple variants that initialized the `StringBuilder` capacity at various guesses, such as sum of constant sizes + some factor based on args, but across a diverse set of micros that gives both some wins and some regressions. Getting the estimation just right is pretty tricky, especially when size of the arguments are arbitrary (like for any String/Object arg). - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562356874
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Fri, 12 Apr 2024 06:34:32 GMT, Thomas Stuefe wrote: > Stupid question maybe, this would be one LambdaForm per argument set? E.g. > would two invocations with the same arguments re-use the same LambdaForm? > > I would like to get an understanding of the numbers of classes involved for > these solutions, and whether they are bounded. Mostly for Lilliput reason. Any concat expression sharing the same exact shape will ideally share the same chain of LFs, and many of the intermediary LFs will be shared between concat expressions of all manner of shapes and sizes. On the extreme of an application with one singular, huge expression we might see quite a few LF classes per call site, while the incremental number of classes loaded for an expression whose shape has been visited elsewhere will be none. One could guess there is some upper bound based on the number of variant types (7: int, long, float, double, char, boolean, Object) to the power of the highest possible arity (100 - after which javac will start splitting and chaining the indy calls), though since there are some softly referenced caches that the LF editor leans on then in theory it might very well be unbounded. In practice we're bounded by the number of call sites times some factor that depends on how much sharing is going on. Since most expressions are likely to be low arity with high degree of sharing, then a long tail of more complex high arity expressions that might see less degree of sharing the assumption has been that the number of classes loaded per concat will be low. I guess Lilliput adds some hard or soft limit on the number of classes loaded? - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051462197
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Fyi: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2024-April/074787.html - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051184229
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low Stupid question maybe, this would be one LambdaForm per argument set? E.g. would two invocations with the same arguments re-use the same LambdaForm? I would like to get an understanding of the numbers of classes involved for these solutions, and whether they are bounded. Mostly for Lilliput reason. - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2051082454
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1430: > 1428: cb.new_(STRING_BUILDER); > 1429: cb.dup(); > 1430: cb.invokespecial(STRING_BUILDER, "", > MethodTypeDesc.ofDescriptor("()V")); Would there be value in initializing to a larger capacity? Given the number of append calls, seems the default cap of 16 is unlikely to be sufficient. - PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1561966067
Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad wrote: > This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. Anyone out there? - PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2050707573
RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases
This patch suggests a workaround to an issue with huge SCF MH expression trees taking excessive JIT compilation resources by reviving (part of) the simple bytecode-generating strategy that was originally available as an all-or-nothing strategy choice. Instead of reintroducing a binary strategy choice I propose a threshold parameter, controlled by `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions below or at this threshold there's no change, for expressions with an arity above it we use the straightforward one-class per expression. There are a few trade-offs at play here which influence the choice of threshold. The simple high arity strategy will for example not see any reuse of LambdaForms, which means we might end up with a higher number of classes generated and loaded in applications if we set this value too low. It may also produce worse performance on average. On the other hand there is the observed increase in C2 resource usage as expressions grow unwieldy. On the other other hand high arity expressions are likely rare to begin with, with less opportunities for sharing than the more common low-arity expressions. I turned the submitted test case into a few JMH benchmarks and did some experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: Baseline strategy: 13 args: 6.3M 23 args: 18M 123 args: 868M `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: 13 args: 2.11M 23 args: 3.67M 123 args: 4.75M For 123 args the memory overhead of the baseline strategy is 180x, but for 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) I've conservatively chosen a threshold at arity 20. This keeps C2 resource pressure at a reasonable level (< 18M) while avoiding perturbing performance at the vast majority of call sites. I was asked to use the new class file API for mainline. There's a version of this patch implemented using ASM in 7c52a9f which might be a reasonable basis for a backport. - Commit messages: - Bump threshold after experiments - Port ASM to CFA - Lower compilation overhead for large concat expressions, initial ASM-based version based on pre-existing implementation Changes: https://git.openjdk.org/jdk/pull/18690/files Webrev: https://webrevs.openjdk.org/?repo=jdk=18690=00 Issue: https://bugs.openjdk.org/browse/JDK-8327247 Stats: 213 lines in 2 files changed: 210 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690 PR: https://git.openjdk.org/jdk/pull/18690