On Fri, 24 Apr 2026 09:52:20 GMT, Maurizio Cimadamore <[email protected]> wrote:
>> Thanks for the feedback. >> >>> No idea if this is related to escape analysis and/or other algorithms. It >>> is entirely impractical, usually needing 15+ JMH warmup iterations to reach >>> peak performance, especially with `MemorySegment::reinterpret` >> >> This specific issue is very likely related to the EA problem described here. >> I found similar pathological behavior when using either reinterpret or >> asSlice in the hot path when using the incremental inlining changes >> described in this PR. EA simply can't keep up (for now) with such a large IR >> and basically ends up taking a very long time -- comparable to the overall >> benchmark execution. >> >> IMHO, this stresses the fact that some of this magic is not free. Some >> tricks, like doing an on-the-fly reinterpret might work well in synthetic >> small benchmarks, but it almost always blows up in more realistic >> conditions. In the current state, I think it causes more problems than it >> solves, and, while it is an interesting stress test (e.g. see how far we can >> push C2), I don't think we should specifically tune for it. >> >>> For FFM specifically, a refactoring away from VarHandles, at least for the >>> plain memory access methods available on `MemorySegment`, might be worth >>> exploring. I also hope `MemorySegment::reinterpret` could be simplified. >>> Ideas: >>> >>> Make the default implementations throw UOE, move the actual >>> implementations to `NativeMemorySegmentImpl`. This removes the need for the >>> `!isNative()` check. >>> >>> Inline `reinterpretInternal` into `reinterpret`. This eliminates the >>> `cleanupAction` conditional, which should help escape analysis. It also >>> eliminates the call completely for the `MemorySegment reinterpret(long >>> newSize)` overload (the most important for us). >>> >>> Uhm... `Reflection.ensureNativeAccess`. I get the importance, but I wish >>> this could be called once per module, instead of carrying its complexity in >>> every call-site. >> >> I agree with all the points above. >> >> Btw, one thing I found adds more cost than anticipated is the fluent style >> adopted by LWJGL accessor. Fluent setters seems bigger than void setters. >> And instance accessors seems generally quite bigger than static accessors >> which take a memory segment parameter (jextract style). So, all these little >> factors contribute to what we see in the end. > >> Thank you, @mcimadamore. It is known issue (from the beginning) that this >> part of EA if very expensive. May be we can rework this part to not be >> quadratic regarding number of allocations. > > @vnkozlov -- is there a JBS issue for this kind of issue we can refer to? I > mean, not an issue about the fact that the timeout check could be better -- I > mean an issue about the fact that this part of EA doesn't scale with the > number of IR nodes. Thanks! @mcimadamore here is old issue which was fixed by introducing timer for EA: https://bugs.openjdk.org/browse/JDK-8041984 And here is new one, filed by @merykitty, to additionally check timeout inside `split_unique_types()` and bailout: https://bugs.openjdk.org/browse/JDK-8383178 > I mean an issue about the fact that this part of EA doesn't scale with the > number of IR nodes. No, we don't have such issue filed yet. This is first time we hit this big timeout AFAIK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4314586408 PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4314603692
