On Fri, 12 Apr 2024 14:53:58 GMT, Chen Liang <li...@openjdk.org> wrote:
>> This patch suggests a workaround to an issue with huge SCF MH expression >> trees taking excessive JIT compilation resources by reviving (part of) the >> simple bytecode-generating strategy that was originally available as an >> all-or-nothing strategy choice. >> >> Instead of reintroducing a binary strategy choice I propose a threshold >> parameter, controlled by >> `-Djava.lang.invoke.StringConcat.highArityThreshold=<val>`: For expressions >> below or at this threshold there's no change, for expressions with an arity >> above it we use the `StringBuilder`-chain bytecode generator. >> >> There are a few trade-offs at play here which influence the choice of >> threshold. The simple high arity strategy will for example not see any reuse >> of LambdaForms but strictly always generate a class per indy callsite, which >> means we might end up with a higher total number of classes generated and >> loaded in applications if we set this value too low. It may also produce >> worse performance on average. On the other hand there is the observed >> increase in C2 resource usage as expressions grow unwieldy. On the other >> other hand high arity expressions are likely rare to begin with, with less >> opportunities for sharing than the more common low-arity expressions. >> >> I turned the submitted test case into a few JMH benchmarks and did some >> experiments with `-XX:CompileCommand=MemStat,StringConcat::concat,print`: >> >> Baseline strategy: >> 13 args: 6.3M >> 23 args: 18M >> 123 args: 868M >> >> `-Djava.lang.invoke.StringConcat.highArityThreshold=0`: >> 13 args: 2.11M >> 23 args: 3.67M >> 123 args: 4.75M >> >> For 123 args the memory overhead of the baseline strategy is 180x, but for >> 23 args we're down to a 5x memory overhead, and down to a 3x overhead for 13 >> args. Since the absolute overhead for 23 is borderline acceptable (+15Mb) >> I've conservatively chosen a threshold at arity 20. This keeps C2 resource >> pressure at a reasonable level (< 18M) while avoiding perturbing performance >> at the vast majority of call sites. >> >> I was asked to use the new class file API for mainline. There's a version of >> this patch implemented using ASM in 7c52a9f which might be a reasonable >> basis for a backport. > > src/java.base/share/classes/java/lang/invoke/StringConcatFactory.java line > 1437: > >> 1435: for (int c = 0; c < args.parameterCount(); >> c++) { >> 1436: if (constants[c] != null) { >> 1437: cb.ldc(constants[c]); > > I think for Remi's approach, we change: > 1. Insert an extra String array (maybe need a way to mark it stable?) arg > representing constants > 2. Change this ldc into aload + aaload (or List.get if we use immutable List) > 3. Call `bindTo(constantStrings)` on the resulting MH > > This approach can significantly reduce the number of classes spinned instead > of generating one class per constant array; might need to measure performance > to see if this is a good tradeoff > > Oh, I just noticed that we need to erase everything to the generic method > type. I think Remi's "value class" means there's no overhead for converting > primitives into wrappers in this conversion to generic method type. I'd prefer considering such optimizations as follow-ups. We need to think about where to define such shared classes in a way that considers the full lifecycle, facilitates class unloading (one cache per classloader?) while still getting good reuse. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18690#discussion_r1562673638