On Fri, 24 Apr 2026 09:17:23 GMT, Maurizio Cimadamore <[email protected]> wrote:
>> Quan Anh Mai has updated the pull request incrementally with one additional >> commit since the last revision: >> >> add benchmark > > Thanks for the feedback. > >> No idea if this is related to escape analysis and/or other algorithms. It is >> entirely impractical, usually needing 15+ JMH warmup iterations to reach >> peak performance, especially with `MemorySegment::reinterpret` > > This specific issue is very likely related to the EA problem described here. > I found similar pathological behavior when using either reinterpret or > asSlice in the hot path when using the incremental inlining changes described > in this PR. EA simply can't keep up (for now) with such a large IR and > basically ends up taking a very long time -- comparable to the overall > benchmark execution. > > IMHO, this stresses the fact that some of this magic is not free. Some > tricks, like doing an on-the-fly reinterpret might work well in synthetic > small benchmarks, but it almost always blows up in more realistic conditions. > In the current state, I think it causes more problems than it solves, and, > while it is an interesting stress test (e.g. see how far we can push C2), I > don't think we should specifically tune for it. > >> For FFM specifically, a refactoring away from VarHandles, at least for the >> plain memory access methods available on `MemorySegment`, might be worth >> exploring. I also hope `MemorySegment::reinterpret` could be simplified. >> Ideas: >> >> Make the default implementations throw UOE, move the actual >> implementations to `NativeMemorySegmentImpl`. This removes the need for the >> `!isNative()` check. >> >> Inline `reinterpretInternal` into `reinterpret`. This eliminates the >> `cleanupAction` conditional, which should help escape analysis. It also >> eliminates the call completely for the `MemorySegment reinterpret(long >> newSize)` overload (the most important for us). >> >> Uhm... `Reflection.ensureNativeAccess`. I get the importance, but I wish >> this could be called once per module, instead of carrying its complexity in >> every call-site. > > I agree with all the points above. > > Btw, one thing I found adds more cost than anticipated is the fluent style > adopted by LWJGL accessor. Fluent setters seems bigger than void setters. And > instance accessors seems generally quite bigger than static accessors which > take a memory segment parameter (jextract style). So, all these little > factors contribute to what we see in the end. > Thank you, @mcimadamore. It is known issue (from the beginning) that this > part of EA if very expensive. May be we can rework this part to not be > quadratic regarding number of allocations. @vnkozlov -- is there a JBS issue for this kind of issue we can refer to? I mean, not an issue about the fact that the timeout check could be better -- I mean an issue about the fact that this part of EA doesn't scale with the number of IR nodes. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/30874#issuecomment-4312261268
