> This patch fixes a performance issue when dereferencing memory segments > backed by different kinds of scopes. When looking at inline traces, I found > that one of our benchmark, namely `LoopOverPollutedSegment` was already > hitting the ceiling of the bimorphic inline cache, specifically when checking > liveness of the segment scope in the memory access hotpath > (`ResourceScopeImpl::checkValidState`). The benchmark only used segments > backed by confined and global scope. I then added (in the initialization > "polluting" loop) segments backed by a shared scope, and then the benchmark > numbers started to look as follows: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 7.004 ? > 0.089 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? > 0.016 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? > 0.110 ms/op > LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 7.175 ? > 0.048 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? > 0.036 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? > 0.098 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? > 0.002 ms/op > > > That is, since now we have *three* different kinds of scopes (confined, > shared and global), the call to the liveness check can no longer be inlined. > One solution could be, as we do for the *base* accessor, to add a scope > accessor to all memory segment implementation classes. But doing so only > works ok for heap segments (for which the scope accessor just returns the > global scope constants). For native segments, we're still megamorphic (as a > native segment can be backed by all kinds of scopes). > > In the end, it turned out to be much simpler to just make the liveness check > monomorphic, since there's so much sharing between the code paths already. > With that change, numbers of the tweaked benchmark go back to normal: > > > Benchmark Mode Cnt Score > Error Units > LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 0.241 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? > 0.003 ms/op > LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 0.248 ? > 0.001 ms/op > LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? > 0.013 ms/op > LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? > 0.004 ms/op > LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? > 0.001 ms/op > LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? > 0.005 ms/op > > > Note that this patch tidies up a bit the usage of `checkValidState` vs. > `checkValidStateSlow`. The former should only really be used in the hot path, > while the latter is a more general routine which should be used in > non-performance critical code. Making `checkValidState` monomorphic caused > the `ScopeAccessError` to be generated in more places, so I needed to either > update the usage to use the safer `checkValidStateSlow` (where possible) or, > (in `Buffer` and `ConfinedScope`) just add extra wrapping.
Maurizio Cimadamore has updated the pull request incrementally with one additional commit since the last revision: Use owner field instead of accessor in checkValidStateSlow ------------- Changes: - all: https://git.openjdk.java.net/jdk18/pull/82/files - new: https://git.openjdk.java.net/jdk18/pull/82/files/c6082953..04a1e9f2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk18&pr=82&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk18/pull/82.diff Fetch: git fetch https://git.openjdk.java.net/jdk18 pull/82/head:pull/82 PR: https://git.openjdk.java.net/jdk18/pull/82