On Fri, 24 Apr 2026 09:52:20 GMT, Maurizio Cimadamore <[email protected]> 
wrote:

>> Thanks for the feedback.
>> 
>>> No idea if this is related to escape analysis and/or other algorithms. It 
>>> is entirely impractical, usually needing 15+ JMH warmup iterations to reach 
>>> peak performance, especially with `MemorySegment::reinterpret`
>> 
>> This specific issue is very likely related to the EA problem described here. 
>> I found similar pathological behavior when using either reinterpret or 
>> asSlice in the hot path when using the incremental inlining changes 
>> described in this PR. EA simply can't keep up (for now) with such a large IR 
>> and basically ends up taking a very long time -- comparable to the overall 
>> benchmark execution.
>> 
>> IMHO, this stresses the fact that some of this magic is not free. Some 
>> tricks, like doing an on-the-fly reinterpret might work well in synthetic 
>> small benchmarks, but it almost always blows up in more realistic 
>> conditions. In the current state, I think it causes more problems than it 
>> solves, and, while it is an interesting stress test (e.g. see how far we can 
>> push C2), I don't think we should specifically tune for it.
>> 
>>> For FFM specifically, a refactoring away from VarHandles, at least for the 
>>> plain memory access methods available on `MemorySegment`, might be worth 
>>> exploring. I also hope `MemorySegment::reinterpret` could be simplified. 
>>> Ideas:
>>> 
>>>    Make the default implementations throw UOE, move the actual 
>>> implementations to `NativeMemorySegmentImpl`. This removes the need for the 
>>> `!isNative()` check.
>>> 
>>>    Inline `reinterpretInternal` into `reinterpret`. This eliminates the 
>>> `cleanupAction` conditional, which should help escape analysis. It also 
>>> eliminates the call completely for the `MemorySegment reinterpret(long 
>>> newSize)` overload (the most important for us).
>>> 
>>>    Uhm... `Reflection.ensureNativeAccess`. I get the importance, but I wish 
>>> this could be called once per module, instead of carrying its complexity in 
>>> every call-site.
>> 
>> I agree with all the points above.
>> 
>> Btw, one thing I found adds more cost than anticipated is the fluent style 
>> adopted by LWJGL accessor. Fluent setters seems bigger than void setters. 
>> And instance accessors seems generally quite bigger than static accessors 
>> which take a memory segment parameter (jextract style). So, all these little 
>> factors contribute to what we see in the end.
>
>> Thank you, @mcimadamore. It is known issue (from the beginning) that this 
>> part of EA if very expensive. May be we can rework this part to not be 
>> quadratic regarding number of allocations.
> 
> @vnkozlov  -- is there a JBS issue for this kind of issue we can refer to? I 
> mean, not an issue about the fact that the timeout check could be better -- I 
> mean an issue about the fact that this part of EA doesn't scale with the 
> number of IR nodes. Thanks!

@mcimadamore  here is old issue which was fixed by introducing timer for EA: 
https://bugs.openjdk.org/browse/JDK-8041984

And here is new one, filed by @merykitty, to additionally check timeout inside 
`split_unique_types()` and bailout:
https://bugs.openjdk.org/browse/JDK-8383178

> I mean an issue about the fact that this part of EA doesn't scale with the 
> number of IR nodes.

No, we don't have such issue filed yet. This is first time we hit this big 
timeout AFAIK.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4314586408
PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4314603692

Reply via email to