On 03/20/2014 08:49 AM, Aleksey Shipilev wrote:
On 03/20/2014 11:06 AM, Peter Levart wrote:
I was thinking about last night, for question: "Why is this
double-checked non-volatile-then-volatile trick not any faster than pure
volatile variant even on ARM platform where volatile read should have
some penalty compared to normal read?", might be in the fact that
Raspberry Pi is a single-core/single-thread "machine". Would anyone with
JVM JIT compiler expertise care to share some insight? I suspect that on
such platform, the compiler optimizes volatile accesses so that they are
performed without otherwise necessary memory fences...
Yes, at least C2 is known to not emit memory fences on uniprocessor
machines. You need to have a multicore ARM. If you are still interested,
contact me privately and I can arrange the access to my personal
quad-core Cortex-A9.

-Aleksey.

Hi,

Thanks to Aleksey for re-establishing the access, I bring you results of the microbenchmark from his quad-core Cortex-A9:

JDK 8 options: -client, org.openjdk.jmh.Main parameters: ".*" -i 10 -r 5 -wi 5 
-w 1 -f 1 [-t 1|max]

--- Baseline, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10    69292.305      299.516 
   ns/op
o.t.Bench6375303.testToString          avgt        10*20.003*         0.433    
ns/op

--- Baseline, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10   100390.024     2158.132 
   ns/op
o.t.Bench6375303.testToString          avgt        10*20.151*         0.677    
ns/op

--- double-checked nonvolatile-then-volatile-read+CAS, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10    69951.406      221.516 
   ns/op
o.t.Bench6375303.testToString          avgt        10*19.681*         0.025    
ns/op

--- double-checked nonvolatile-then-volatile-read+CAS, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10   104231.335     3842.095 
   ns/op
o.t.Bench6375303.testToString          avgt        10*20.030*         0.595    
ns/op

--- classic volatile read+CAS, 1-thread ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10    69753.542      180.110 
   ns/op
o.t.Bench6375303.testToString          avgt        10*23.285*         0.267    
ns/op

--- classic volatile read+CAS, 4-threads ---

Benchmark                              Mode   Samples         Mean   Mean error 
   Units
o.t.Bench6375303.testFirstToString     avgt        10    99664.256     1814.090 
   ns/op
o.t.Bench6375303.testToString          avgt        10*23.491*         0.606    
ns/op


...as can be seen, the double-checked read-then-volatile-read+CAS trick is about 15% faster than classic volatile-read+CAS in this case.


Regards, Peter

Reply via email to