On Thu, 23 Apr 2026 16:24:53 GMT, Paul Hübner <[email protected]> wrote:

>> Hi all,
>> 
>> The Java FF&M API includes functionality to both initialize and read from 
>> thread-local data prior to and immediately after downcalls, respectively, 
>> through the `Linker.Option::captureCallState` API. This is useful, as an 
>> example, when setting or capturing `errno` when interfacing with C 
>> functions. However, using this linker option introduces some invocation 
>> overhead at runtime.
>> 
>> This RFE introduces a JMH microbenchmark which quantifies this overhead. A 
>> simple downcall to `strtol` is measured with and without call state 
>> capturing. 
>> 
>> Testing: GHA for sanity testing.
>> 
>> # Benchmarking Results
>> 
>> ⚠️ **These results apply prior to 
>> [28506ca](https://github.com/openjdk/jdk/pull/30719/commits/28506ca7a8f03cddb0903e4f58b4f38742f15606)!**
>> 
>> I have executed this benchmark on Oracle-supported platforms, the results 
>> can be found below. For each platform, I've done two trials corresponding to 
>> different JDKs: one being the current HEAD ("current", based on 
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6), and one with 
>> [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) reverted 
>> ("legacy", revert commit on top of 
>> 8357de88aa35ee998fefe321ee6dae9eb4993fa6). This one-off experiment is 
>> insightful since [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) 
>> increased the overhead of downcalls using state capturing by introducing 
>> thread-local data initialization. The performance impact of this change was 
>> previously unquantified.
>> 
>> ## Linux x64
>> 
>> **Current:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  38.442 ± 
>> 0.016  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  45.425 ± 
>> 1.826  ns/op
>> 
>> 
>> **Legacy:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  39.224 ± 
>> 0.789  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  41.011 ± 
>> 0.058  ns/op
>> 
>> 
>> ## Linux AArch64
>> 
>> **Current:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  Units
>> CaptureCallStateOverheadBench.doNotUseCaptureCallState  avgt   30  45.396 ± 
>> 0.185  ns/op
>> CaptureCallStateOverheadBench.useCaptureCallState       avgt   30  56.116 ± 
>> 0.463  ns/op
>> 
>> 
>> **Legacy:**
>> 
>> Benchmark                                               Mode  Cnt   Score   
>> Error  ...
>
> Paul Hübner has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Reviewer feedback.

@JornVernee thanks for your feedback, I've ported the benchmark to `strtoll` 
and made it more descriptive. @liach @minborg does this change lead to any 
follow-up questions/concerns from you?

I've not been able to re-run all the benchmarks (I'm missing `legacy` 
measurements for `strtoll`) due to hardware scarcity. My questions: 
* Is it mission-critical that we get the measurements for `strtoll` or are the 
`strtol` ones sufficient? 
* If the answer to the above is yes, would it be okay to integrate this and 
I'll follow up and publish the results to the Panama mailing list instead?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30719#issuecomment-4306073215

Reply via email to