On Thu, 1 Sep 2022 16:47:58 GMT, Robbin Ehn <r...@openjdk.org> wrote:

> Please consider, only implemented on x64/aarch64 linux/windows.
> 
> On my box calling clock_gettime via JNI goes from 35ns to 28ns when enabled.
> 
> Passes t1-7 with option forced on, also passes t1-4 as is in this PR.

Yes, sorry. Since we had this code before I forgot adding an explanation.
This gives back around 75% of the gained performance of transitions-less JNI 
calls. (in one benchmark critical gives 8ns, this gave 6ns (on JDK17))
Note that is against accidentally optimized JNI critical (removal was done in 
steps, before the final step it was faster than the original implementation, it 
was never intended to make it faster).
So it should be even closer the original pre-JDK 17 numbers.
But note that this applies to all JNI methods, not just some special ones.

For safepoints poll the Java thread do:
1: Store an unsafe thread state as indication that we are entering the VM.
2: Check if entrance into the VM can be performed safely.

VM Thread (or a handshaker) do:
1: Store polling word
2: Read the thread state

This must be executed in order where 1 happens before 2.

store unsafe thread state
store_load_barrier
load poll

store poll
store_load_barrier
load thread state


This patch moves store_load_barrier to the read of the thread state by the use 
of system memory barrier, which make sure we get program order: "guarantee that 
all its running thread siblings have passed through a state where all memory 
accesses to user-space addresses match program order"


store unsafe thread state
compiler_barrier
load poll

store poll
system_memory_barrier
load thread state


As you said this big hammer have downside since it always must be emitted 
before thread the thread state.
Such as:
* Using JFR sampler with short periods, or sampling many threads.
* Workload with many per seconds safepoints or handshake

Which means your overall performance may suffer and only a few special 
workloads should notice a difference at all.

I have not changed all transitions to elide the store_load, since they are not 
performance impacting and this PR was focused on native transitions.
If you think all transitions should honor this flag I can do a follow-up.

-------------

PR: https://git.openjdk.org/jdk/pull/10123

Reply via email to