On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad <redes...@openjdk.org> wrote:

> This patch suggests a workaround to an issue with huge SCF MH expression 
> trees taking excessive JIT compilation resources by reviving (part of) the 
> simple bytecode-generating strategy that was originally available as an 
> all-or-nothing strategy choice. 
> 
> Instead of reintroducing a binary strategy choice I propose a threshold 
> parameter, controlled by 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions 
> below or at this threshold there's no change, for expressions with an arity 
> above it we use the `StringBuilder`-chain bytecode generator. 
> 
> There are a few trade-offs at play here which influence the choice of 
> threshold. The simple high arity strategy will for example not see any reuse 
> of LambdaForms but strictly always generate a class per indy callsite, which 
> means we might end up with a higher total number of classes generated and 
> loaded in applications if we set this value too low. It may also produce 
> worse performance on average. On the other hand there is the observed 
> increase in C2 resource usage as expressions grow unwieldy. On the other 
> other hand high arity expressions are likely rare to begin with, with less 
> opportunities for sharing than the more common low-arity expressions. 
> 
> I turned the submitted test case into a few JMH benchmarks and did some 
> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`:
> 
> Baseline strategy:
> 13 args: 6.3M
> 23 args: 18M
> 123 args: 868M
> 
> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`:
> 13 args: 2.11M
> 23 args: 3.67M
> 123 args: 4.75M
> 
> For 123 args the memory overhead of the baseline strategy is 180x, but for 23 
> args we're down to a 5x memory overhead, and down to a 3x overhead for 13 
> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) 
> I've conservatively chosen a threshold at arity 20. This keeps C2 resource 
> pressure at a reasonable level (< 18M) while avoiding perturbing performance 
> at the vast majority of call sites.
> 
> I was asked to use the new class file API for mainline. There's a version of 
> this patch implemented using ASM in 7c52a9f which might be a reasonable basis 
> for a backport.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1406:

> 1404:                         @Override
> 1405:                         public void accept(ClassBuilder clb) {
> 1406:                             clb.withFlags(AccessFlag.PUBLIC, 
> AccessFlag.FINAL, AccessFlag.SUPER, AccessFlag.SYNTHETIC)

Why is this hidden class public?

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1409:

> 1407:                                 .withMethodBody(METHOD_NAME,
> 1408:                                         
> MethodTypeDesc.ofDescriptor(args.toMethodDescriptorString()),
> 1409:                                         ClassFile.ACC_FINAL | 
> ClassFile.ACC_PUBLIC | ClassFile.ACC_STATIC,

Why is this method final? Don't think this method can ever be hidden.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1416:

> 1414:                 Class<?> innerClass = 
> lookup.defineHiddenClass(classBytes, true, STRONG).lookupClass();
> 1415:                 DUMPER.dumpClass(className, innerClass, classBytes);
> 1416:                 MethodHandle mh = lookup.findStatic(innerClass, 
> METHOD_NAME, args);

We can probably make the method private, and use the lookup from 
`defineHiddenClass` instead of the host class lookup.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1425:

> 1423: 
> 1424:         private static Consumer<CodeBuilder> generateMethod(String[] 
> constants, MethodType args) {
> 1425:             return new Consumer<CodeBuilder>() {

Are you using inner classes for startup performance concerns? java.lang.invoke 
should be ready when you can reach here so using lambdas is safe.

src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1458:

> 1456: 
> 1457:         // Load the argument of type cl at slot onto stack, return the 
> number of argument stack slots consumed.
> 1458:         private static int load(CodeBuilder cb, Class<?> cl, int slot) {

You can replace all calls to `load(cb, cl, slot)` with 
`cb.loadInstruction(TypeKind.from(cl), slot)`, and the retrn slot count can be 
accessed by `TypeKind::slotSize`

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562629418
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562630031
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562635315
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562637025
PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562640252

Reply via email to