On Fri, 17 Apr 2026 11:31:39 GMT, Paul Hübner <[email protected]> wrote:
>> Hi all,
>>
>> The Java FF&M API includes functionality to both initialize and read from
>> thread-local data prior to and immediately after downcalls, respectively,
>> through the `Linker.Option::captureCallState` API. This is useful, as an
>> example, when setting or capturing `errno` when interfacing with C
>> functions. However, using this linker option introduces some invocation
>> overhead at runtime.
>>
>> This RFE introduces a JMH microbenchmark which quantifies this overhead. A
>> simple downcall to `strtol` is measured with and without call state
>> capturing.
>>
>> Testing: GHA for sanity testing.
>>
>> # Benchmarking Results
>>
>> I have executed this benchmark on Oracle-supported platforms, the results
>> can be found below. For each platform, I've done two trials corresponding to
>> different JDKs: one being the current HEAD ("current", based on
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6), and one with
>> [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) reverted
>> ("legacy", revert commit on top of
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6). This one-off experiment is
>> insightful since [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559)
>> increased the overhead of downcalls using state capturing by introducing
>> thread-local data initialization. The performance impact of this change was
>> previously unquantified.
>>
>> ## Linux x64
>>
>> **Current:**
>>
>> Benchmark Mode Cnt Score
>> Error Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 38.442 ±
>> 0.016 ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 45.425 ±
>> 1.826 ns/op
>>
>>
>> **Legacy:**
>>
>> Benchmark Mode Cnt Score
>> Error Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 39.224 ±
>> 0.789 ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 41.011 ±
>> 0.058 ns/op
>>
>>
>> ## Linux AArch64
>>
>> **Current:**
>>
>> Benchmark Mode Cnt Score
>> Error Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 45.396 ±
>> 0.185 ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 56.116 ±
>> 0.463 ns/op
>>
>>
>> **Legacy:**
>>
>> Benchmark Mode Cnt Score
>> Error Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 44.859 ±
>> 0.183 ns/op
>> CaptureCallStateOverheadBench.useCaptureCallS...
>
> Paul Hübner has updated the pull request with a new target base due to a
> merge or a rebase. The incremental webrev excludes the unrelated changes
> brought in by the merge/rebase. The pull request contains five additional
> commits since the last revision:
>
> - Remove extraneous export.
> - Merge branch 'master' into JDK-8379630
> - Different long sizes on different platforms.
> - Use C standard library for the benchmark instead.
> - Benchmark to measure call state capturing overhead.
Marked as reviewed by liach (Reviewer).
-------------
PR Review: https://git.openjdk.org/jdk/pull/30719#pullrequestreview-4142523464