On Wed, 5 Jan 2022 17:23:40 GMT, Paul Sandoz <psan...@openjdk.org> wrote:

>> This patch fixes a performance issue when dereferencing memory segments 
>> backed by different kinds of scopes. When looking at inline traces, I found 
>> that one of our benchmark, namely `LoopOverPollutedSegment` was already 
>> hitting the ceiling of the bimorphic inline cache, specifically when 
>> checking liveness of the segment scope in the memory access hotpath 
>> (`ResourceScopeImpl::checkValidState`). The benchmark only used segments 
>> backed by confined and global scope. I then added (in the initialization 
>> "polluting" loop) segments backed by a shared scope, and then the benchmark 
>> numbers started to look as follows:
>> 
>> 
>> Benchmark                                              Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VH        avgt   30  7.004 ? 
>> 0.089  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  7.159 ? 
>> 0.016  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH          avgt   30  7.017 ? 
>> 0.110  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instance    avgt   30  7.175 ? 
>> 0.048  ms/op
>> LoopOverPollutedSegments.heap_unsafe                   avgt   30  0.243 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_VH             avgt   30  7.366 ? 
>> 0.036  ms/op
>> LoopOverPollutedSegments.native_segment_instance       avgt   30  7.305 ? 
>> 0.098  ms/op
>> LoopOverPollutedSegments.native_unsafe                 avgt   30  0.238 ? 
>> 0.002  ms/op
>> 
>> 
>> That is, since now we have *three* different kinds of scopes (confined, 
>> shared and global), the call to the liveness check can no longer be inlined. 
>> One solution could be, as we do for the *base* accessor, to add a scope 
>> accessor to all memory segment implementation classes. But doing so only 
>> works ok for heap segments (for which the scope accessor just returns the 
>> global scope constants). For native segments, we're still megamorphic (as a 
>> native segment can be backed by all kinds of scopes).
>> 
>> In the end, it turned out to be much simpler to just make the liveness check 
>> monomorphic, since there's so much sharing between the code paths already. 
>> With that change, numbers of the tweaked benchmark go back to normal:
>> 
>> 
>> Benchmark                                              Mode  Cnt  Score   
>> Error  Units
>> LoopOverPollutedSegments.heap_segment_floats_VH        avgt   30  0.241 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_floats_instance  avgt   30  0.244 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_VH          avgt   30  0.242 ? 
>> 0.003  ms/op
>> LoopOverPollutedSegments.heap_segment_ints_instance    avgt   30  0.248 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.heap_unsafe                   avgt   30  0.247 ? 
>> 0.013  ms/op
>> LoopOverPollutedSegments.native_segment_VH             avgt   30  0.245 ? 
>> 0.004  ms/op
>> LoopOverPollutedSegments.native_segment_instance       avgt   30  0.245 ? 
>> 0.001  ms/op
>> LoopOverPollutedSegments.native_unsafe                 avgt   30  0.247 ? 
>> 0.005  ms/op
>> 
>> 
>> Note that this patch tidies up a bit the usage of `checkValidState` vs. 
>> `checkValidStateSlow`. The former should only really be used in the hot 
>> path, while the latter is a more general routine which should be used in 
>> non-performance critical code. Making `checkValidState` monomorphic caused 
>> the `ScopeAccessError` to be generated in more places, so I needed to either 
>> update the usage to use the safer `checkValidStateSlow` (where possible) or, 
>> (in `Buffer` and `ConfinedScope`) just add extra wrapping.
>
> src/jdk.incubator.foreign/share/classes/jdk/internal/foreign/ResourceScopeImpl.java
>  line 190:
> 
>> 188:     @ForceInline
>> 189:     public final void checkValidState() {
>> 190:         if (owner != null && owner != Thread.currentThread()) {
> 
> For consistency we could change code `checkValidStateSlow` to refer directly 
> to `owner`.
> 
> It would be satisfying, but I don't know if it's possible, to compose 
> `checkValidStateSlow` from `checkValidState` e.g.
> 
> public final checkValidStateSlow() {
>     checkValidState();
>     if (!isAlive() { ... }
> }

I'll change use of `owner`. It's not really possible to write 
checkValidStateSlow in terms of checkValidState, because the latter does a 
plain read of the state, whereas the former does a volatile read. Reusing one 
from the other would result in two reads (a plain and a volatile).

-------------

PR: https://git.openjdk.java.net/jdk18/pull/82

Reply via email to