On Wed, 5 Jan 2022 16:59:30 GMT, Maurizio Cimadamore <[email protected]>
wrote:
> This patch fixes a performance issue when dereferencing memory segments
> backed by different kinds of scopes. When looking at inline traces, I found
> that one of our benchmark, namely `LoopOverPollutedSegment` was already
> hitting the ceiling of the bimorphic inline cache, specifically when checking
> liveness of the segment scope in the memory access hotpath
> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments
> backed by confined and global scope. I then added (in the initialization
> "polluting" loop) segments backed by a shared scope, and then the benchmark
> numbers started to look as follows:
>
>
> Benchmark Mode Cnt Score
> Error Units
> LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 7.004 ?
> 0.089 ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 7.159 ?
> 0.016 ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 7.017 ?
> 0.110 ms/op
> LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 7.175 ?
> 0.048 ms/op
> LoopOverPollutedSegments.heap_unsafe avgt 30 0.243 ?
> 0.004 ms/op
> LoopOverPollutedSegments.native_segment_VH avgt 30 7.366 ?
> 0.036 ms/op
> LoopOverPollutedSegments.native_segment_instance avgt 30 7.305 ?
> 0.098 ms/op
> LoopOverPollutedSegments.native_unsafe avgt 30 0.238 ?
> 0.002 ms/op
>
>
> That is, since now we have *three* different kinds of scopes (confined,
> shared and global), the call to the liveness check can no longer be inlined.
> One solution could be, as we do for the *base* accessor, to add a scope
> accessor to all memory segment implementation classes. But doing so only
> works ok for heap segments (for which the scope accessor just returns the
> global scope constants). For native segments, we're still megamorphic (as a
> native segment can be backed by all kinds of scopes).
>
> In the end, it turned out to be much simpler to just make the liveness check
> monomorphic, since there's so much sharing between the code paths already.
> With that change, numbers of the tweaked benchmark go back to normal:
>
>
> Benchmark Mode Cnt Score
> Error Units
> LoopOverPollutedSegments.heap_segment_floats_VH avgt 30 0.241 ?
> 0.003 ms/op
> LoopOverPollutedSegments.heap_segment_floats_instance avgt 30 0.244 ?
> 0.003 ms/op
> LoopOverPollutedSegments.heap_segment_ints_VH avgt 30 0.242 ?
> 0.003 ms/op
> LoopOverPollutedSegments.heap_segment_ints_instance avgt 30 0.248 ?
> 0.001 ms/op
> LoopOverPollutedSegments.heap_unsafe avgt 30 0.247 ?
> 0.013 ms/op
> LoopOverPollutedSegments.native_segment_VH avgt 30 0.245 ?
> 0.004 ms/op
> LoopOverPollutedSegments.native_segment_instance avgt 30 0.245 ?
> 0.001 ms/op
> LoopOverPollutedSegments.native_unsafe avgt 30 0.247 ?
> 0.005 ms/op
>
>
> Note that this patch tidies up a bit the usage of `checkValidState` vs.
> `checkValidStateSlow`. The former should only really be used in the hot path,
> while the latter is a more general routine which should be used in
> non-performance critical code. Making `checkValidState` monomorphic caused
> the `ScopeAccessError` to be generated in more places, so I needed to either
> update the usage to use the safer `checkValidStateSlow` (where possible) or,
> (in `Buffer` and `ConfinedScope`) just add extra wrapping.
src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java
line 190:
> 188: @ForceInline
> 189: public final void checkValidState() {
> 190: if (owner != null && owner != Thread.currentThread()) {
For consistency we could change code `checkValidStateSlow` to refer directly to
`owner`.
It would be satisfying, but I don't know if it's possible, to compose
`checkValidStateSlow` from `checkValidState` e.g.
public final checkValidStateSlow() {
checkValidState();
if (!isAlive() { ... }
}
-------------
PR: https://git.openjdk.java.net/jdk18/pull/82