On Fri, 17 Apr 2026 11:31:39 GMT, Paul Hübner <[email protected]> wrote:

>> Hi all,
>> 
>> The Java FF&M API includes functionality to both initialize and read from 
>> thread-local data prior to and immediately after downcalls, respectively, 
>> through the `Linker.Option::captureCallState` API. This is useful, as an 
>> example, when setting or capturing `errno` when interfacing with C 
>> functions. However, using this linker option introduces some invocation 
>> overhead at runtime.
>> 
>> This RFE introduces a JMH microbenchmark which quantifies this overhead. A 
>> simple downcall to `strtol` is measured with and without call state 
>> capturing. 
>> 
>> Testing: GHA for sanity testing.
>> 
>> # Benchmarking Results
>> 
>> I have executed this benchmark on Oracle-supported platforms, the results 
>> can be found below. For each platform, I've done two trials corresponding to 
>> different JDKs: one being the current HEAD ("current", based on 
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6), and one with 
>> [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) reverted 
>> ("legacy", revert commit on top of 
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6). This one-off experiment is 
>> insightful since [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) 
>> increased the overhead of downcalls using state capturing by introducing 
>> thread-local data initialization. The performance impact of this change was 
>> previously unquantified.
>> 
>> ## Linux x64
>> 
>> **Current:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  38.442 ± 
>> 0.016  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  45.425 ± 
>> 1.826  ns/op
>> 
>> 
>> **Legacy:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  39.224 ± 
>> 0.789  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  41.011 ± 
>> 0.058  ns/op
>> 
>> 
>> ## Linux AArch64
>> 
>> **Current:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  45.396 ± 
>> 0.185  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  56.116 ± 
>> 0.463  ns/op
>> 
>> 
>> **Legacy:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  44.859 ± 
>> 0.183  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallS...
>
> Paul Hübner has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains five additional 
> commits since the last revision:
> 
>  - Remove extraneous export.
>  - Merge branch 'master' into JDK-8379630
>  - Different long sizes on different platforms.
>  - Use C standard library for the benchmark instead.
>  - Benchmark to measure call state capturing overhead.

Marked as reviewed by liach (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/30719#pullrequestreview-4142523464

Reply via email to