The approach exploiting "locked" CMPXCHG8B is ready.
The patches attached to HARMONY-2092.
This solution is about 1.6 times slower than Jrockit in a special
microbenchmark heavily utilizing volatile variables of type long.
George Timoshenko wrote:
I had a question in the JIRA about this issue: why don't we use "lock"
prefix for the atomic access?
well...
Originally we split all 64-bit memory access into 2 ones of 32-bit.
It does not have sense to set #LOCK prefix for them. (there is a gap
between)
We can only set #LOCK to some instruction that reads/writes whole 64 bits.
The bad thing is the only instruction (according to IA32 spec) we can
set #LOCK to is CMPXCHG8B (MOVQ, MOVSD and any others can not be used
with #LOCK)
This monster (CMPXCHG8B) requires 4 registers:
EAX
EBX
ECX
EDX
and (FLAGS) also.
I am not sure CMPXCHG8B usage will be faster than making volatile fields
always synchronized (artificially)