Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-26 Thread Claes Redestad
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad  wrote:

>> Splitting out the ASM-based version from #18690 to push that first under the 
>> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on 
>> this as a subtask. See discussion in that #18690 for more details, 
>> discussion and motivation for this.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Copyright and comment about ProxyClassDumper

Thanks, nits resolved. Also some minor improvement (improved for; use 
iconst(mv, len) as originally intended)

-

PR Comment: https://git.openjdk.org/jdk/pull/18953#issuecomment-2079016011


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-26 Thread Aleksey Shipilev
On Fri, 26 Apr 2024 08:45:45 GMT, Claes Redestad  wrote:

>> Splitting out the ASM-based version from #18690 to push that first under the 
>> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on 
>> this as a subtask. See discussion in that #18690 for more details, 
>> discussion and motivation for this.
>
> Claes Redestad has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Copyright and comment about ProxyClassDumper

Looks okay, thanks. Only cosmetic nits:

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 108:

> 106: static {
> 107: String highArity = 
> VM.getSavedProperty("java.lang.invoke.StringConcat.highArityThreshold");
> 108: HIGH_ARITY_THRESHOLD = Integer.parseInt(highArity != null ? 
> highArity : "20");

Maybe write this as `= (highArity != null) ? Integer.parseInt(highArity) : 20` 
to help interpreter a bit.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1105:

> 1103: for (int c = 0; c < constants.length; c++) {
> 1104: if (constants[c] != null)
> 1105: len += constants[c].length();

Braces?

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1118:

> 1116: );
> 1117: 
> 1118: 

Extra new-line?

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1164:

> 1162: }
> 1163: 
> 1164: 

Extra new-line?

-

Marked as reviewed by shade (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18953#pullrequestreview-2024514742
PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723293
PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580723981
PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724174
PR Review Comment: https://git.openjdk.org/jdk/pull/18953#discussion_r1580724568


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-26 Thread Claes Redestad
> Splitting out the ASM-based version from #18690 to push that first under the 
> JBS (to help backporting). Keeping #18690 open to rebase and follow-up on 
> this as a subtask. See discussion in that #18690 for more details, discussion 
> and motivation for this.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Copyright and comment about ProxyClassDumper

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18953/files
  - new: https://git.openjdk.org/jdk/pull/18953/files/a4dfbfca..591d2d1c

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18953&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18953&range=01-02

  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18953.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18953/head:pull/18953

PR: https://git.openjdk.org/jdk/pull/18953


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-15 Thread Claes Redestad
On Fri, 12 Apr 2024 15:06:36 GMT, Chen Liang  wrote:

>> I'd prefer considering such optimizations as follow-ups. We need to think 
>> about where to define such shared classes in a way that considers the full 
>> lifecycle, facilitates class unloading (one cache per classloader?) while 
>> still getting good reuse.
>
> I think `Invokers` might be a good place. Its lifetime is bound to its MT, 
> and MT already handles class unloading well. If we are doing erased then a 
> plain array suffices.

I think we should be wary about adding things that are strongly kept alive by 
the MT. While the MTs themselves are weakly referenced and can be unloaded if 
all usage goes aways, many oft-used MTs might be kept effectively forever.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1565912250


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-14 Thread Claes Redestad
> This patch suggests a workaround to an issue with huge SCF MH expression 
> trees taking excessive JIT compilation resources by reviving (part of) the 
> simple bytecode-generating strategy that was originally available as an 
> all-or-nothing strategy choice. 
> 
> Instead of reintroducing a binary strategy choice I propose a threshold 
> parameter, controlled by 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=`: For expressions 
> below or at this threshold there's no change, for expressions with an arity 
> above it we use the `StringBuilder`-chain bytecode generator. 
> 
> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low. It may also produce 
> worse performance on average. On the other hand there is the observed 
> increase in C2 resource usage as expressions grow unwieldy. On the other 
> other hand high arity expressions are likely rare to begin with, with less 
> opportunities for sharing than the more common low-arity expressions. 
> 
> I turned the submitted test case into a few JMH benchmarks and did some 
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
> 
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
> 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
> 
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23 
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
> at the vast majority of call sites.
> 
> I was asked to use the new class file API for mainline. There's a version of 
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis 
> for a backport.

Claes Redestad has updated the pull request incrementally with one additional 
commit since the last revision:

  Pre-size stringbuilders based on constant size and a small argument factor

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/18690/files
  - new: https://git.openjdk.org/jdk/pull/18690/files/1c3d354c..796d968b

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18690&range=01-02

  Stats: 44 lines in 2 files changed: 42 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/18690.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18690/head:pull/18690

PR: https://git.openjdk.org/jdk/pull/18690


Re: RFR: 8327247: C2 uses up to 2GB of RAM to compile complex string concat in extreme cases [v3]

2024-04-14 Thread Claes Redestad
On Sat, 13 Apr 2024 17:54:54 GMT, Brett Okken  wrote:

>> Possibly. I tried a few simple variants that initialized the `StringBuilder` 
>> capacity at various guesses, such as sum of constant sizes + some factor 
>> based on args, but across a diverse set of micros that gives both some wins 
>> and some regressions. Getting the estimation just right is pretty tricky, 
>> especially when size of the arguments are arbitrary (like for any 
>> String/Object arg).
>
> What are the scenarios which had regressions? 
> Given the conservative growth for StringBuilder, it surprises me a bit that 
> any scenario would regress.

I took a second look and it turns out that there were neither regressions nor 
improvements in my test, only a few false positives: For C2 the SB chain is 
recognized by the (fragile) C2 OptimizeStringConcat pass and transformed into a 
shape where the initial size in java bytecode - if any - is ignored.

With C1 having an initial size can have a significant effect. One way to 
provoke a regression there is to have a huge constant suffix while 
underestimating the size of the operands, which can lead to excessive 
allocation:


Name Cnt BaseError   TestError   
Unit  Change
StringConcat.concat23StringConst   5  385,268 ±  5,238341,251 ±  2,606  
ns/op   1,13x (p = 0,000*)
  :gc.alloc.rate 6039,095 ± 82,309  12764,169 ± 97,146 
MB/sec   2,11x (p = 0,000*)
  :gc.alloc.rate.norm2440,003 ±  0,000   4568,002 ±  0,000   
B/op   1,87x (p = 0,000*)


Still a bit better on throughput from less actual copying.

*If* the `StringBuilder` is sized sufficiently (constants + args * N) things 
look much better, of course: 

-XX:TieredStopAtLevel=1
Name Cnt BaseError  TestError   
Unit  Change
StringConcat.concat23StringConst   5  385,268 ±  5,238   259,628 ±  2,515  
ns/op   1,48x (p = 0,000*)
  :gc.alloc.rate 6039,095 ± 82,309  8902,803 ± 86,563 
MB/sec   1,47x (p = 0,000*)
  :gc.alloc.rate.norm2440,003 ±  0,000  2424,002 ±  0,000   
B/op   0,99x (p = 0,000*)


For most cases having a size based on the sum of the constants plus some small 
factor per argument seem to be a decent heuristic - for C1. Since this adds 
just one bytecode to the generated method I guess it wouldn't hurt.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1564753612