Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]

2024-04-24 Thread Claes Redestad
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad  wrote:

>> This patch suggests a workaround to an issue with huge SCF MH expression 
>> trees taking excessive JIT compilation resources by reviving (part of) the 
>> simple bytecode-generating strategy that was originally available as an 
>> all-or-nothing strategy choice. 
>> 
>> Instead of reintroducing a binary strategy choice I propose a threshold 
>> parameter, controlled by 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
>> below or at this threshold there's no change, for expressions with an arity 
>> above it we use the `StringBuilder`-chain bytecode generator. 
>> 
>> There are a few trade-offs at play here which influence the choice of 
>> threshold. The simple high arity strategy will for example not see any reuse 
>> of LambdaForms but strictly always generate a class per indy callsite, which 
>> means we might end up with a higher total number of classes generated and 
>> loaded in applications if we set this value too low. It may also produce 
>> worse performance on average. On the other hand there is the observed 
>> increase in C2 resource usage as expressions grow unwieldy. On the other 
>> other hand high arity expressions are likely rare to begin with, with less 
>> opportunities for sharing than the more common low-arity expressions. 
>> 
>> I turned the submitted test case into a few JMH benchmarks and did some 
>> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
>> 
>> Baseline strategy:
>> 13 args: 6.3M
>> 23 args: 18M
>> 123 args: 868M
>> 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
>> 13 args: 2.11M
>> 23 args: 3.67M
>> 123 args: 4.75M
>> 
>> For 123 args the memory overhead of the baseline strategy is 180x, but for 
>> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
>> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
>> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
>> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
>> at the vast majority of call sites.
>> 
>> I was asked to use the new class file API for mainline. There's a version of 
>> this patch implemented using ASM in 7c52a9f which might be a reasonable 
>> basis for a backport.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Make Set.of(STRONG) a constant, fix compilation, minor code improvements

Thanks for reviewing! I'll split out and PR an ASM version as a subtask and 
rebase this PR on top of that.

-

PR Comment: https://git.openjdk.org/jdk/pull/18690#issuecomment-2075654943


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]

2024-04-24 Thread Mandy Chung
On Wed, 24 Apr 2024 10:08:42 GMT, Claes Redestad  wrote:

>> This patch suggests a workaround to an issue with huge SCF MH expression 
>> trees taking excessive JIT compilation resources by reviving (part of) the 
>> simple bytecode-generating strategy that was originally available as an 
>> all-or-nothing strategy choice. 
>> 
>> Instead of reintroducing a binary strategy choice I propose a threshold 
>> parameter, controlled by 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
>> below or at this threshold there's no change, for expressions with an arity 
>> above it we use the `StringBuilder`-chain bytecode generator. 
>> 
>> There are a few trade-offs at play here which influence the choice of 
>> threshold. The simple high arity strategy will for example not see any reuse 
>> of LambdaForms but strictly always generate a class per indy callsite, which 
>> means we might end up with a higher total number of classes generated and 
>> loaded in applications if we set this value too low. It may also produce 
>> worse performance on average. On the other hand there is the observed 
>> increase in C2 resource usage as expressions grow unwieldy. On the other 
>> other hand high arity expressions are likely rare to begin with, with less 
>> opportunities for sharing than the more common low-arity expressions. 
>> 
>> I turned the submitted test case into a few JMH benchmarks and did some 
>> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
>> 
>> Baseline strategy:
>> 13 args: 6.3M
>> 23 args: 18M
>> 123 args: 868M
>> 
>> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
>> 13 args: 2.11M
>> 23 args: 3.67M
>> 123 args: 4.75M
>> 
>> For 123 args the memory overhead of the baseline strategy is 180x, but for 
>> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
>> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
>> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
>> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
>> at the vast majority of call sites.
>> 
>> I was asked to use the new class file API for mainline. There's a version of 
>> this patch implemented using ASM in 7c52a9f which might be a reasonable 
>> basis for a backport.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Make Set.of(STRONG) a constant, fix compilation, minor code improvements

Looks fine to me.   Indeed, splitting this with ASM and then convert it to 
ClassFile API would help the backport.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1077:

> 1075: 
> 1076: /**
> 1077:  * Ensure a capacity in the initial StringBuilder to 
> accommodate all constants plus this factor times the number

Nit: wrap long line.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1085:

> 1083: 
> 1084: static {
> 1085: DUMPER = 
> ClassFileDumper.getInstance("java.lang.invoke.StringConcatFactory.dump", 
> "stringConcatClasses");

Nit: this static block isn't strictly needed.  Can assign at the declaration of 
the static variable.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1112:

> 1110: return hiddenLookup.findStatic(innerClass, METHOD_NAME, 
> args);
> : } catch (Exception e) {
> 1112: DUMPER.dumpFailedClass(className, classBytes);

This line is no longer needed.   The bytes will be dumped if it's enabled for 
both success and failing case.

-

Marked as reviewed by mchung (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18690#pullrequestreview-2020345792
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578178759
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578176295
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1578173742


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v8]

2024-04-24 Thread Claes Redestad
> This patch suggests a workaround to an issue with huge SCF MH expression 
> trees taking excessive JIT compilation resources by reviving (part of) the 
> simple bytecode-generating strategy that was originally available as an 
> all-or-nothing strategy choice. 
> 
> Instead of reintroducing a binary strategy choice I propose a threshold 
> parameter, controlled by 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
> below or at this threshold there's no change, for expressions with an arity 
> above it we use the `StringBuilder`-chain bytecode generator. 
> 
> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low. It may also produce 
> worse performance on average. On the other hand there is the observed 
> increase in C2 resource usage as expressions grow unwieldy. On the other 
> other hand high arity expressions are likely rare to begin with, with less 
> opportunities for sharing than the more common low-arity expressions. 
> 
> I turned the submitted test case into a few JMH benchmarks and did some 
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
> 
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
> 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
> 
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23 
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
> at the vast majority of call sites.
> 
> I was asked to use the new class file API for mainline. There's a version of 
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis 
> for a backport.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Make Set.of(STRONG) a constant, fix compilation, minor code improvements

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18690/files
  - new: https://git.openjdk.org/jdk/pull/18690/files/e7cbaaf5..9742f074

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=06-07

  Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod
  Patch: https://git.openjdk.org/jdk/pull/18690.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690

PR: https://git.openjdk.org/jdk/pull/18690