On Thu, 23 Apr 2026 16:24:24 GMT, Vladimir Kozlov <[email protected]> wrote:
>> I was looking at the `reinterpret` variant of the benchmark, which is still >> slow. The issue there has to do with escape analysis getting "stuck" on such >> a big IR. Unfortunately, I was able to reproduce even w/o FFM/restricted >> methods: >> >> https://github.com/mcimadamore/jdk/blob/6b51dfae06ad935a5487698fe2ab93babdcc4ade/test/micro/org/openjdk/bench/java/lang/EscapeAnalysisDupAccess.java >> >> When the optimization in this PR is enabled, we get something like this: >> >> >> Benchmark Mode Cnt Score Error Units >> EscapeAnalysisDupAccess.copyDirect avgt 3 157.055 ± 1.428 ns/op >> EscapeAnalysisDupAccess.copyDup1 avgt 3 161.035 ± 18.928 ns/op >> EscapeAnalysisDupAccess.copyDup2 avgt 3 18519.558 ± 7707.574 ns/op >> >> >> If we disable the optimization (e.g. using `-XX:-IncrementalInline`), then >> we get this: >> >> >> Benchmark Mode Cnt Score Error Units >> EscapeAnalysisDupAccess.copyDirect avgt 3 159.383 ± 27.783 ns/op >> EscapeAnalysisDupAccess.copyDup1 avgt 3 766.249 ± 40.345 ns/op >> EscapeAnalysisDupAccess.copyDup2 avgt 3 2324.964 ± 93.478 ns/op >> >> >> I think this comparison is interesting because it shows many things at once: >> * delayed inlining dosn't affect `copyDirect` >> * delayed inlining _improves_ `copyDup1` >> * delayed inlining significantly _regresses_ on `copyDup2` >> >> That is, as long as escape analysis can keep up, delayed inlining wins. But >> when escape analysis stops working (and C2 never finishes compiling the >> benchmark method), then the numbers are much worse than when no inlining >> occurs at all. >> >> When the optimization is disabled, we can clearly see inlining failures: >> >> @ 274 org.openjdk.bench.java.lang.EscapeAnalysisDupAccess$Dup2Foo::z (10 >> bytes) failed to inline: NodeCountInliningCutoff >> >> >> In some way, these failures "protect" escape analysis from combinatorial >> explosion. >> >> So the question is: >> * is this a problem? >> * if so, is this a problem unique to escape analysis? >> * or, as IR graph grows, other phases might exhibit similar runaway behavior? >> >> Depending on what the answer is (and I'm not expert enough on C2 to comment >> on this), different approaches might be required. E.g. if it's "just" an EA >> issue, we might only delay inlining if there's no allocation in the bytecode >> of the method to be inlined. This is probably not hard to do. The question >> is: is that enough? > >> The issue there has to do with escape analysis getting "stuck" on such a big >> IR. > > @mcimadamore what EA message gives in such case? Is it time limit or a lot > of EA iterations? Or something else? > Or it simple can't eliminate hot allocations? @vnkozlov EA gets stuck at `ConnectionGraph::split_unique_types` because there are too many scalar replaceable allocations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4306245592
