On Wed, 15 Jan 2025 16:09:36 GMT, Per Minborg <[email protected]> wrote:
>> Going forward, converting older JDK code to use the relatively new FFM API
>> requires system calls that can provide `errno` and the likes to explicitly
>> allocate a MemorySegment to capture potential error states. This can lead to
>> negative performance implications if not designed carefully and also
>> introduces unnecessary code complexity.
>>
>> Hence, this PR proposes to add a _JDK internal_ method handle adapter that
>> can be used to handle system calls with `errno`, `GetLastError`, and
>> `WSAGetLastError`.
>>
>> It currently relies on a thread-local cache of MemorySegments to allide
>> allocations. If, in the future, a more efficient thread-associated
>> allocation scheme becomes available, we could easily migrate to that one.
>>
>> Here are some benchmarks:
>>
>>
>> Benchmark Mode Cnt Score Error
>> Units
>> CaptureStateUtilBench.explicitAllocationFail avgt 30 41.615 ? 1.203
>> ns/op
>> CaptureStateUtilBench.explicitAllocationSuccess avgt 30 23.094 ? 0.580
>> ns/op
>> CaptureStateUtilBench.threadLocalFail avgt 30 14.760 ? 0.078
>> ns/op
>> CaptureStateUtilBench.threadLocalReuseSuccess avgt 30 7.189 ? 0.151
>> ns/op
>>
>>
>> Explicit allocation:
>>
>> try (var arena = Arena.ofConfined()) {
>> return (int) HANDLE.invoke(arena.allocate(4), 0, 0);
>> }
>>
>>
>> Thread Local (tl):
>>
>> return (int) ADAPTED_HANDLE.invoke(arena.allocate(4), 0, 0);
>>
>>
>> The graph below shows the difference in latency for a successful call:
>>
>> 
>>
>> This is a ~3x improvement for both the happy and the error path.
>>
>>
>> Tested and passed tiers 1-3.
>
> Per Minborg has updated the pull request incrementally with two additional
> commits since the last revision:
>
> - Use invokeExact semantics in the tests
> - Clean up
src/java.base/share/classes/jdk/internal/foreign/CaptureStateUtil.java line 282:
> 280: * use in the boostrap sequence.
> 281: */
> 282: private static final class SegmentCache {
This abstraction seems very useful, and... it also strikes me as generalizable?
It's effectively a one-element cache, where there's some logic to initialize
the cached element (which could be provided by a lambda). Then it's using a
platform local under the hood and only using the cached element when it makes
sense to do so (e.g. when there has not been a virtual thread switcharoo :-) ).
In "unsafe" cases, we just compute the element using the user-provided lambda
instead of using the cache. Am I dreaming?
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/22391#discussion_r1916963768