Going forward, converting older JDK code to use the relatively new FFM API
requires system calls that can provide `errno` and the likes to explicitly
allocate a `MemorySegment` to capture potential error states. This can lead to
negative performance implications if not designed carefully and also introduces
unnecessary code complexity.
Hence, this PR proposes to add a JDK internal method handle adapter that can be
used to handle system calls with `errno`, `GetLastError`, and `WSAGetLastError`.
It relies on an efficient carrier-thread-local cache of memory regions to
allide allocations.
Here are some benchmarks that ran on a platform thread and virtual threads
respectively:
Benchmark Mode Cnt Score
Error Units
CaptureStateUtilBench.OfVirtual.adaptedSysCallFail avgt 30 24.193 ?
0.268 ns/op
CaptureStateUtilBench.OfVirtual.adaptedSysCallSuccess avgt 30 8.268 ?
0.080 ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationFail avgt 30 42.076 ?
1.003 ns/op
CaptureStateUtilBench.OfVirtual.explicitAllocationSuccess avgt 30 21.801 ?
0.138 ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationFail avgt 30 23.265 ?
0.087 ns/op
CaptureStateUtilBench.OfVirtual.tlAllocationSuccess avgt 30 8.285 ?
0.155 ns/op
CaptureStateUtilBench.adaptedSysCallFail avgt 30 23.033 ?
0.423 ns/op
CaptureStateUtilBench.adaptedSysCallSuccess avgt 30 3.676 ?
0.104 ns/op // <- Happy path using an internal pool
CaptureStateUtilBench.explicitAllocationFail avgt 30 42.023 ?
0.736 ns/op
CaptureStateUtilBench.explicitAllocationSuccess avgt 30 22.013 ?
0.648 ns/op // <- Allocating memory upon each invocation
CaptureStateUtilBench.tlAllocationFail avgt 30 22.050 ?
0.233 ns/op
CaptureStateUtilBench.tlAllocationSuccess avgt 30 3.756 ?
0.056 ns/op // <- Using the pool explicitly from Java code
Adapted system call:
return (int) ADAPTED_HANDLE.invoke(0, 0); // Uses a MH-internal pool
Explicit allocation:
try (var arena = Arena.ofConfined()) {
return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
}
Thread Local allocation:
try (var arena = POOLS.take()) {
return (int) HANDLE.invoke(arena.allocate(4), 0, 0); // Uses a
manually specified pool
}
The adapted system call exhibits a ~6x performance improvement over the
existing "explicit allocation" scheme for the happy path on platform threads.
Because there needs to be sharing across threads for virtual-tread-capable
carrier threads, these are a bit slower ("only" ~2.5x faster).
Tested and passed tiers 1-3.
-------------
Commit messages:
- Bump copyright year
- Add benchmarks
- Add method handle adapter for system calls
Changes: https://git.openjdk.org/jdk/pull/23517/files
Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23517&range=00
Issue: https://bugs.openjdk.org/browse/JDK-8347408
Stats: 1381 lines in 11 files changed: 1370 ins; 2 del; 9 mod
Patch: https://git.openjdk.org/jdk/pull/23517.diff
Fetch: git fetch https://git.openjdk.org/jdk.git pull/23517/head:pull/23517
PR: https://git.openjdk.org/jdk/pull/23517