On 03/20/2014 11:06 AM, Peter Levart wrote: > I was thinking about last night, for question: "Why is this > double-checked non-volatile-then-volatile trick not any faster than pure > volatile variant even on ARM platform where volatile read should have > some penalty compared to normal read?", might be in the fact that > Raspberry Pi is a single-core/single-thread "machine". Would anyone with > JVM JIT compiler expertise care to share some insight? I suspect that on > such platform, the compiler optimizes volatile accesses so that they are > performed without otherwise necessary memory fences...
Yes, at least C2 is known to not emit memory fences on uniprocessor machines. You need to have a multicore ARM. If you are still interested, contact me privately and I can arrange the access to my personal quad-core Cortex-A9. -Aleksey.