http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
--- Comment #6 from algrant at acm dot org --- Here is a complete C++11 test case. The code generated for the two loads in thread B doesn't maintain the required ordering: ... ldrb w1, [x0] uxtb w1, w1 adrp x0, .LANCHOR0 ldr w2, [x3] ... According to the architecture specification, to achieve the ordering it's sufficient to use the result of the first load in the calculation of the address of the second, even if it's done in such a way as to have no dependence on the value. /* Reproducer for 59448: compile with -std=c++11 -lpthread */ #include <atomic> #include <thread> #include <stdio.h> #include <assert.h> static std::atomic<bool> flag(0); static int data; /* Thread A tries to release a data value 1 to thread B via a release/consume sequence. In the demonstration, this is tried repeatedly. The iterations of the two threads are synchronized via a release/acquire sequence from B to A. This is not relevant to the ordering problem. */ void threadA(void) { for (;;) { data = 1; // A.A flag.store(1, std::memory_order_release); // A.B /* By 1.9#14, A.A is sequenced before A.B */ /* Now wait for thread B to accept the data. */ while (flag.load(std::memory_order_acquire) == 1); assert(data == 0); } } void threadB(void) { for (;;) { int f, d; do { f = flag.load(std::memory_order_consume); // B.A d = *(&data + f - f); // B.B /* By 1.10#9, B.A carries a dependency to B.B */ /* By 1.10#10, A.B is dependency-ordered before B.B */ /* By 1.10#11, A.A intra-thread happens before B.B */ /* By 1.10#12, A.A happens before B.B */ /* By 1.10#13, A.A is a visible side-effect with respect to B.B and the value determined by B.B shall be the value (1) stored by A.A. */ } while (!f); assert(d == 1); /* Release thread A to start another iteration. */ data = 0; flag.store(0, std::memory_order_release); } } int main(void) { std::thread a(&threadA); std::thread b(&threadB); a.join(); b.join(); return 0; }