On Thu, 6 Jan 2022 12:36:52 GMT, Maurizio Cimadamore <mcimadam...@openjdk.org> wrote:
>> This patch fixes a performance issue when dereferencing memory segments >> backed by different kinds of scopes. When looking at inline traces, I found >> that one of our benchmark, namely `LoopOverPollutedSegment` was already >> hitting the ceiling of the bimorphic inline cache, specifically when >> checking liveness of the segment scope in the memory access hotpath >> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments >> backed by confined and global scope. I then added (in the initialization >> "polluting" loop) segments backed by a shared scope, and then the benchmark >> numbers started to look as follows: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 7.004 ? >> 0.089 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ? >> 0.016 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ? >> 0.110 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 7.175 ? >> 0.048 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ? >> 0.036 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ? >> 0.098 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ? >> 0.002 ms/op >> >> >> That is, since now we have *three* different kinds of scopes (confined, >> shared and global), the call to the liveness check can no longer be inlined. >> One solution could be, as we do for the *base* accessor, to add a scope >> accessor to all memory segment implementation classes. But doing so only >> works ok for heap segments (for which the scope accessor just returns the >> global scope constants). For native segments, we're still megamorphic (as a >> native segment can be backed by all kinds of scopes). >> >> In the end, it turned out to be much simpler to just make the liveness check >> monomorphic, since there's so much sharing between the code paths already. >> With that change, numbers of the tweaked benchmark go back to normal: >> >> >> Benchmark Mode Cnt Score >> Error Units >> LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 0.241 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ? >> 0.003 ms/op >> LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 0.248 ? >> 0.001 ms/op >> LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ? >> 0.013 ms/op >> LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ? >> 0.004 ms/op >> LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ? >> 0.001 ms/op >> LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ? >> 0.005 ms/op >> >> >> Note that this patch tidies up a bit the usage of `checkValidState` vs. >> `checkValidStateSlow`. The former should only really be used in the hot >> path, while the latter is a more general routine which should be used in >> non-performance critical code. Making `checkValidState` monomorphic caused >> the `ScopeAccessError` to be generated in more places, so I needed to either >> update the usage to use the safer `checkValidStateSlow` (where possible) or, >> (in `Buffer` and `ConfinedScope`) just add extra wrapping. > > Maurizio Cimadamore has updated the pull request incrementally with one > additional commit since the last revision: > > * Drop redundant owner field > * Reuse `state` variable for confined lock count Marked as reviewed by jvernee (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk18/pull/82