On Tue, Dec 30, 2014 at 5:05 AM, Torvald Riegel <trie...@redhat.com> wrote: > I agree with Andrew. My understanding of volatile is that the generated > code must do exactly what the abstract machine would do.
That makes sense. I suppose I don't understand what the difference is in terms of an abstract machine of "load; add; store" versus the "load-add-store". At least from on x86, from the perspective of the memory bus, there's no difference I'm aware of. > One can use volatiles for synchronization if one is also manually adding > HW barriers and potentially compiler barriers (depending on whether you > need to mix volatile and non-volatile) -- but volatiles really aim at a > different use case than atomics. Again, the processor's reordering and memory barriers are not of huge concern to me in this instance. I completely agree about volatile being the wrong use case. > For the single-writer shared-counter case, a load and a store operation > with memory_order_relaxed seem to be right approach. I agree: this most closely models my intention: a non-atomic-increment but which has the semantics of being visible to other threads in a finite period of time (as per your previous email). The relaxed-load; add; relaxed-store generates the same code as the volatile code (as in; three separate instructions), but I prefer it over the volatile as it is more intention-revealing. As to whether it's valid to peephole optimize the three instructions to be a single increment in the case of x86 given relaxed memory ordering, I can offer no good opinion (though my instinct is it should be able to be!) Thanks all for your help, Matt