Re: volatile access optimization (C++ / x86_64)

Matt Godbolt Tue, 30 Dec 2014 10:33:06 -0800

On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel <trie...@redhat.com> wrote:
> I agree with Andrew.  My understanding of volatile is that the generated
> code must do exactly what the abstract machine would do.


That makes sense. I suppose I don't understand what the difference is
in terms of an abstract machine of "load; add; store" versus the
"load-add-store". At least from on x86, from the perspective of the
memory bus, there's no difference I'm aware of.

> One can use volatiles for synchronization if one is also manually adding
> HW barriers and potentially compiler barriers (depending on whether you
> need to mix volatile and non-volatile) -- but volatiles really aim at a
> different use case than atomics.

Again, the processor's reordering and memory barriers are not of huge
concern to me in this instance. I completely agree about volatile
being the wrong use case.

> For the single-writer shared-counter case, a load and a store operation
> with memory_order_relaxed seem to be right approach.

I agree: this most closely models my intention: a non-atomic-increment
but which has the semantics of being visible to other threads in a
finite period of time (as per your previous email).

The relaxed-load; add; relaxed-store generates the same code as the
volatile code (as in; three separate instructions), but I prefer it
over the volatile as it is more intention-revealing.  As to whether
it's valid to peephole optimize the three instructions to be a single
increment in the case of x86 given relaxed memory ordering, I can
offer no good opinion (though my instinct is it should be able to be!)

Thanks all for your help, Matt

Re: volatile access optimization (C++ / x86_64)

Reply via email to