http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
--- Comment #4 from algrant at acm dot org --- So using g++, #include <atomic> int f1(std::atomic<int> const *p, std::atomic<int> const *q) { int flag = p->load(std::memory_order_consume); return flag ? (q + flag - flag)->load(std::memory_order_relaxed) : 0; } demonstrates the same lack of ordering. You suggest that this might be a problem with the atomic built-ins - and yes, if this had been a load-acquire, it would be a problem with the built-in not introducing a barrier or using a load-acquire instruction. But for a load-consume on this architecture, no barrier is necessary to separate the load-consume from a load that is address-dependent on it. The programmer wrote a dependency but the compiler lost track of it. It's not necessary to demonstrate failure - there's an architectural race condition here. Even if it doesn't fail now there's no guarantee it will never fail on future more aggressively reordering cores.