Re: RFR: 8382700: C2: Delay inlining instead of giving up when hit NodeCountInliningCutoff [v2]

Maurizio Cimadamore Fri, 24 Apr 2026 02:56:02 -0700

On Fri, 24 Apr 2026 09:17:23 GMT, Maurizio Cimadamore <[email protected]> 
wrote:


>> Quan Anh Mai has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   add benchmark
>
> Thanks for the feedback.
> 
>> No idea if this is related to escape analysis and/or other algorithms. It is 
>> entirely impractical, usually needing 15+ JMH warmup iterations to reach 
>> peak performance, especially with `MemorySegment::reinterpret`
> 
> This specific issue is very likely related to the EA problem described here. 
> I found similar pathological behavior when using either reinterpret or 
> asSlice in the hot path when using the incremental inlining changes described 
> in this PR. EA simply can't keep up (for now) with such a large IR and 
> basically ends up taking a very long time -- comparable to the overall 
> benchmark execution.
> 
> IMHO, this stresses the fact that some of this magic is not free. Some 
> tricks, like doing an on-the-fly reinterpret might work well in synthetic 
> small benchmarks, but it almost always blows up in more realistic conditions. 
> In the current state, I think it causes more problems than it solves, and, 
> while it is an interesting stress test (e.g. see how far we can push C2), I 
> don't think we should specifically tune for it.
> 
>> For FFM specifically, a refactoring away from VarHandles, at least for the 
>> plain memory access methods available on `MemorySegment`, might be worth 
>> exploring. I also hope `MemorySegment::reinterpret` could be simplified. 
>> Ideas:
>> 
>>    Make the default implementations throw UOE, move the actual 
>> implementations to `NativeMemorySegmentImpl`. This removes the need for the 
>> `!isNative()` check.
>> 
>>    Inline `reinterpretInternal` into `reinterpret`. This eliminates the 
>> `cleanupAction` conditional, which should help escape analysis. It also 
>> eliminates the call completely for the `MemorySegment reinterpret(long 
>> newSize)` overload (the most important for us).
>> 
>>    Uhm... `Reflection.ensureNativeAccess`. I get the importance, but I wish 
>> this could be called once per module, instead of carrying its complexity in 
>> every call-site.
> 
> I agree with all the points above.
> 
> Btw, one thing I found adds more cost than anticipated is the fluent style 
> adopted by LWJGL accessor. Fluent setters seems bigger than void setters. And 
> instance accessors seems generally quite bigger than static accessors which 
> take a memory segment parameter (jextract style). So, all these little 
> factors contribute to what we see in the end.

> Thank you, @mcimadamore. It is known issue (from the beginning) that this 
> part of EA if very expensive. May be we can rework this part to not be 
> quadratic regarding number of allocations.

@vnkozlov  -- is there a JBS issue for this kind of issue we can refer to? I 
mean, not an issue about the fact that the timeout check could be better -- I 
mean an issue about the fact that this part of EA doesn't scale with the number 
of IR nodes. Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4312261268

Re: RFR: 8382700: C2: Delay inlining instead of giving up when hit NodeCountInliningCutoff [v2]

Reply via email to