> x86 has compiler barrier inside the relaxed() API so that code does not > get reordered. ARM64 architecturally guarantees device writes to be observed > in order.
There are places where you don't even need a compile barrier between every write. I had horrid problems getting some ppc code (for a specific embedded SoC) optimised to have no extra barriers. I ended up just writing through 'pointer to volatile' and adding an explicit 'eieio' between the block of writes and status read. No less painful was doing a byteswapping write to normal memory. David