Hi, I just saw commit cbc5d70e... , enabling MFENCE and SFENCE on SSE2 systems, rather than the earlier LOCK ADD [ESP], $0 for cpu_lfence and cpu_mfence.
When the Sun JVM folks were working, they found LOCK ADD was faster on Intel systems, by a substantial amount; on AMD systems LOCK ADD was the same speed as *FENCE, but "pipelined better", whatever that means: http://blogs.sun.com/dave/resource/NHM-Pipeline-Blog-V2.txt Perhaps this commit should be measured closely? I'd love to hear data either confirming or disagreeing with Dave Dice and the Sun JVM team... Thanks, -- vs
