https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448
preshing <filter-gcc at preshing dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |filter-gcc at preshing dot com --- Comment #22 from preshing <filter-gcc at preshing dot com> --- (In reply to algrant from comment #13) > I see what you mean - there is a race if threadB reads the data when the > flag is not set, i.e. in the case when the read value is never used. > Moving the read of data after the test (as in the fragment in comment 4) > would fix the race. algrant, could you post the complete assembly listing for threadB(), using your version of AArch64 g++, after all the corrections to the example have been taken into account, with -O2? For convenience, I've posted the corrected example below. When I compile the example using PowerPC g++ 4.9.1, it seems that memory_order_consume is already promoted to memory_order_acquire, as evidenced by the cmpw;bne-;isync sequence following the load from flag, which acts as an acquire barrier. I've posted the PowerPC assembly listing below as well. Is the bug specific to AArch64? ------------- /* Reproducer for 59448: compile with -std=c++11 -lpthread -O2 */ #include <atomic> #include <thread> #include <stdio.h> #include <assert.h> static std::atomic<unsigned int> flag(0); static int data; /* Thread A tries to release a data value 1 to thread B via a release/consume sequence. In the demonstration, this is tried repeatedly. The iterations of the two threads are synchronized via a release/acquire sequence from B to A. This is not relevant to the ordering problem. */ void threadA(void) { for (;;) { data = 1; // A.A flag.store(1, std::memory_order_release); // A.B /* By 1.9#14, A.A is sequenced before A.B */ /* Now wait for thread B to accept the data. */ while (flag.load(std::memory_order_acquire) == 1); assert(data == 0); } } void threadB(void) { for (;;) { int f, d; do { f = flag.load(std::memory_order_consume); // B.A } while (!f); /* By 1.10#9, B.A carries a dependency to B.B */ /* By 1.10#10, A.B is dependency-ordered before B.B */ /* By 1.10#11, A.A intra-thread happens before B.B */ /* By 1.10#12, A.A happens before B.B */ /* By 1.10#13, A.A is a visible side-effect with respect to B.B and the value determined by B.B shall be the value (1) stored by A.A. */ d = *(&data + (f - f)); // B.B assert(d == 1); /* Release thread A to start another iteration. */ data = 0; flag.store(0, std::memory_order_release); } } int main(void) { std::thread a(&threadA); std::thread b(&threadB); a.join(); b.join(); return 0; } ------------- PowerPC listing of threadB(): _Z7threadBv: .LFB2302: .cfi_startproc lis 10,.LANCHOR0@ha li 8,0 la 10,.LANCHOR0@l(10) .L16: lwz 9,4(10) cmpw 7,9,9 <------- acquire barrier bne- 7,$+4 <------- isync <------- cmpwi 7,9,0 beq+ 7,.L16 lwz 9,0(10) cmpwi 7,9,1 bne- 7,.L22 stw 8,0(10) lwsync stw 8,4(10) b .L16 .L22: mflr 0 lis 6,.LANCHOR1@ha stwu 1,-16(1) .cfi_def_cfa_offset 16 .cfi_register 65, 0 lis 3,.LC2@ha lis 4,.LC1@ha la 6,.LANCHOR1@l(6) la 3,.LC2@l(3) la 4,.LC1@l(4) li 5,47 addi 6,6,16 stw 0,20(1) .cfi_offset 65, 4 bl __assert_fail .cfi_endproc