On Thu, 23 Apr 2026 16:24:53 GMT, Paul Hübner <[email protected]> wrote:
>> Hi all, >> >> The Java FF&M API includes functionality to both initialize and read from >> thread-local data prior to and immediately after downcalls, respectively, >> through the `Linker.Option::captureCallState` API. This is useful, as an >> example, when setting or capturing `errno` when interfacing with C >> functions. However, using this linker option introduces some invocation >> overhead at runtime. >> >> This RFE introduces a JMH microbenchmark which quantifies this overhead. A >> simple downcall to `strtol` is measured with and without call state >> capturing. >> >> Testing: GHA for sanity testing. >> >> # Benchmarking Results >> >> ⚠️ **These results apply prior to >> [28506ca](https://github.com/openjdk/jdk/pull/30719/commits/28506ca7a8f03cddb0903e4f58b4f38742f15606)!** >> >> I have executed this benchmark on Oracle-supported platforms, the results >> can be found below. For each platform, I've done two trials corresponding to >> different JDKs: one being the current HEAD ("current", based on >> 8357de88aa35ee998fefe321ee6dae9eb4993fa6), and one with >> [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) reverted >> ("legacy", revert commit on top of >> 8357de88aa35ee998fefe321ee6dae9eb4993fa6). This one-off experiment is >> insightful since [JDK-8378559](https://bugs.openjdk.org/browse/JDK-8378559) >> increased the overhead of downcalls using state capturing by introducing >> thread-local data initialization. The performance impact of this change was >> previously unquantified. >> >> ## Linux x64 >> >> **Current:** >> >> Benchmark Mode Cnt Score >> Error Units >> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 38.442 ± >> 0.016 ns/op >> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 45.425 ± >> 1.826 ns/op >> >> >> **Legacy:** >> >> Benchmark Mode Cnt Score >> Error Units >> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 39.224 ± >> 0.789 ns/op >> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 41.011 ± >> 0.058 ns/op >> >> >> ## Linux AArch64 >> >> **Current:** >> >> Benchmark Mode Cnt Score >> Error Units >> CaptureCallStateOverheadBench.doNotUseCaptureCallState avgt 30 45.396 ± >> 0.185 ns/op >> CaptureCallStateOverheadBench.useCaptureCallState avgt 30 56.116 ± >> 0.463 ns/op >> >> >> **Legacy:** >> >> Benchmark Mode Cnt Score >> Error ... > > Paul Hübner has updated the pull request incrementally with one additional > commit since the last revision: > > Reviewer feedback. @JornVernee thanks for your feedback, I've ported the benchmark to `strtoll` and made it more descriptive. @liach @minborg does this change lead to any follow-up questions/concerns from you? I've not been able to re-run all the benchmarks (I'm missing `legacy` measurements for `strtoll`) due to hardware scarcity. My questions: * Is it mission-critical that we get the measurements for `strtoll` or are the `strtol` ones sufficient? * If the answer to the above is yes, would it be okay to integrate this and I'll follow up and publish the results to the Panama mailing list instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/30719#issuecomment-4306073215
