https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67458
--- Comment #5 from Peter Cordes <peter at cordes dot ca> --- > optabs: ensure atomic_load/stores have compiler barriers Thanks for taking a look at this report. But I think it's not necessary to have a full 2-way barrier. If there's a lighter-weight way to get the behaviour we want, that would be better. (In reply to Peter Cordes from comment #0) > void set(void) { > b = 2; > a.store(1, memory_order_release); > b = 3; > } Looking at this again with a couple years more understanding of atomics, probably gcc could optimize to b = 3; a.store(); because a release-store is only a *1-way barrier*. (Unlike atomic_signal_fence() or atomic_thread_fence(). http://preshing.com/20131125/acquire-and-release-fences-dont-work-the-way-youd-expect/). I think even the original behaviour (a.store(); b=3;) might actually have been technically legal C++11, because you couldn't observe the reordering without invoking UB. A thread that does an a.load(mo_acquire) will still cause C++11 UB if it reads b, because b isn't atomic but there's a data race because it was written after the release operation that we synchronize-with. Thus, no other thread is allowed to look at b any time after this thread ran set(), until another synchronizes-with.