On Thu, 23 Apr 2026 16:24:24 GMT, Vladimir Kozlov <[email protected]> wrote:

>> I was looking at the `reinterpret` variant of the benchmark, which is still 
>> slow. The issue there has to do with escape analysis getting "stuck" on such 
>> a big IR. Unfortunately, I was able to reproduce even w/o FFM/restricted 
>> methods:
>> 
>> https://github.com/mcimadamore/jdk/blob/6b51dfae06ad935a5487698fe2ab93babdcc4ade/test/micro/org/openjdk/bench/java/lang/EscapeAnalysisDupAccess.java
>> 
>> When the optimization in this PR is enabled, we get something like this:
>> 
>> 
>> Benchmark                           Mode  Cnt      Score      Error  Units
>> EscapeAnalysisDupAccess.copyDirect  avgt    3    157.055 ±    1.428  ns/op
>> EscapeAnalysisDupAccess.copyDup1    avgt    3    161.035 ±   18.928  ns/op
>> EscapeAnalysisDupAccess.copyDup2    avgt    3  18519.558 ± 7707.574  ns/op
>> 
>> 
>> If we disable the optimization (e.g. using `-XX:-IncrementalInline`), then 
>> we get this:
>> 
>> 
>> Benchmark                           Mode  Cnt     Score    Error  Units
>> EscapeAnalysisDupAccess.copyDirect  avgt    3   159.383 ± 27.783  ns/op
>> EscapeAnalysisDupAccess.copyDup1    avgt    3   766.249 ± 40.345  ns/op
>> EscapeAnalysisDupAccess.copyDup2    avgt    3  2324.964 ± 93.478  ns/op
>> 
>> 
>> I think this comparison is interesting because it shows many things at once:
>> * delayed inlining dosn't affect `copyDirect`
>> * delayed inlining _improves_ `copyDup1`
>> * delayed inlining significantly _regresses_ on `copyDup2`
>> 
>> That is, as long as escape analysis can keep up, delayed inlining wins. But 
>> when escape analysis stops working (and C2 never finishes compiling the 
>> benchmark method), then the numbers are much worse than when no inlining 
>> occurs at all.
>> 
>> When the optimization is disabled, we can clearly see inlining failures:
>> 
>> @ 274   org.openjdk.bench.java.lang.EscapeAnalysisDupAccess$Dup2Foo::z (10 
>> bytes)   failed to inline: NodeCountInliningCutoff
>> 
>> 
>> In some way, these failures "protect" escape analysis from combinatorial 
>> explosion.
>> 
>> So the question is:
>> * is this a problem?
>> * if so, is this a problem unique to escape analysis?
>> * or, as IR graph grows, other phases might exhibit similar runaway behavior?
>> 
>> Depending on what the answer is (and I'm not expert enough on C2 to comment 
>> on this), different approaches might be required. E.g. if it's "just" an EA 
>> issue, we might only delay inlining if there's no allocation in the bytecode 
>> of the method to be inlined. This is probably not hard to do. The question 
>> is: is that enough?
>
>> The issue there has to do with escape analysis getting "stuck" on such a big 
>> IR.
> 
> @mcimadamore  what EA message gives in such case? Is it time limit or a lot 
> of EA iterations? Or something else?
> Or it simple can't eliminate hot allocations?

@vnkozlov EA gets stuck at `ConnectionGraph::split_unique_types` because there 
are too many scalar replaceable allocations.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4306245592

Reply via email to