On Tue, 9 Apr 2024 12:01:49 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> This patch suggests a workaround to an issue with huge SCF MH expression > trees taking excessive JIT compilation resources by reviving (part of) the > simple bytecode-generating strategy that was originally available as an > all-or-nothing strategy choice. > > Instead of reintroducing a binary strategy choice I propose a threshold > parameter, controlled by > `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions > below or at this threshold there's no change, for expressions with an arity > above it we use the `StringBuilder`-chain bytecode generator. > > There are a few trade-offs at play here which influence the choice of > threshold. The simple high arity strategy will for example not see any reuse > of LambdaForms but strictly always generate a class per indy callsite, which > means we might end up with a higher total number of classes generated and > loaded in applications if we set this value too low. It may also produce > worse performance on average. On the other hand there is the observed > increase in C2 resource usage as expressions grow unwieldy. On the other > other hand high arity expressions are likely rare to begin with, with less > opportunities for sharing than the more common low-arity expressions. > > I turned the submitted test case into a few JMH benchmarks and did some > experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: > > Baseline strategy: > 13 args: 6.3M > 23 args: 18M > 123 args: 868M > > `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: > 13 args: 2.11M > 23 args: 3.67M > 123 args: 4.75M > > For 123 args the memory overhead of the baseline strategy is 180x, but for 23 > args we're down to a 5x memory overhead, and down to a 3x overhead for 13 > args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) > I've conservatively chosen a threshold at arity 20. This keeps C2 resource > pressure at a reasonable level (< 18M) while avoiding perturbing performance > at the vast majority of call sites. > > I was asked to use the new class file API for mainline. There's a version of > this patch implemented using ASM in 7c52a9f which might be a reasonable basis > for a backport. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1406: > 1404: @Override > 1405: public void accept(ClassBuilder clb) { > 1406: clb.withFlags(AccessFlag.PUBLIC, > AccessFlag.FINAL, AccessFlag.SUPER, AccessFlag.SYNTHETIC) Why is this hidden class public? src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1409: > 1407: .withMethodBody(METHOD_NAME, > 1408: > MethodTypeDesc.ofDescriptor(args.toMethodDescriptorString()), > 1409: ClassFile.ACC_FINAL | > ClassFile.ACC_PUBLIC | ClassFile.ACC_STATIC, Why is this method final? Don't think this method can ever be hidden. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1416: > 1414: Class<?> innerClass = > lookup.defineHiddenClass(classBytes, true, STRONG).lookupClass(); > 1415: DUMPER.dumpClass(className, innerClass, classBytes); > 1416: MethodHandle mh = lookup.findStatic(innerClass, > METHOD_NAME, args); We can probably make the method private, and use the lookup from `defineHiddenClass` instead of the host class lookup. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1425: > 1423: > 1424: private static Consumer<CodeBuilder> generateMethod(String[] > constants, MethodType args) { > 1425: return new Consumer<CodeBuilder>() { Are you using inner classes for startup performance concerns? java.lang.invoke should be ready when you can reach here so using lambdas is safe. src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line 1458: > 1456: > 1457: // Load the argument of type cl at slot onto stack, return the > number of argument stack slots consumed. > 1458: private static int load(CodeBuilder cb, Class<?> cl, int slot) { You can replace all calls to `load(cb, cl, slot)` with `cb.loadInstruction(TypeKind.from(cl), slot)`, and the retrn slot count can be accessed by `TypeKind::slotSize` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562629418 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562630031 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562635315 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562637025 PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562640252