https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448

preshing <filter-gcc at preshing dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |filter-gcc at preshing dot com

--- Comment #22 from preshing <filter-gcc at preshing dot com> ---
(In reply to algrant from comment #13)
> I see what you mean - there is a race if threadB reads the data when the
> flag is not set, i.e. in the case when the read value is never used.
> Moving the read of data after the test (as in the fragment in comment 4)
> would fix the race.

algrant, could you post the complete assembly listing for threadB(), using your
version of AArch64 g++, after all the corrections to the example have been
taken into account, with -O2? For convenience, I've posted the corrected
example below.

When I compile the example using PowerPC g++ 4.9.1, it seems that
memory_order_consume is already promoted to memory_order_acquire, as evidenced
by the cmpw;bne-;isync sequence following the load from flag, which acts as an
acquire barrier. I've posted the PowerPC assembly listing below as well.

Is the bug specific to AArch64?

-------------

/* Reproducer for 59448: compile with -std=c++11 -lpthread -O2 */

#include <atomic>
#include <thread>
#include <stdio.h>
#include <assert.h>

static std::atomic<unsigned int> flag(0);
static int data;

/*
Thread A tries to release a data value 1 to thread B via a
release/consume sequence.

In the demonstration, this is tried repeatedly.  The iterations of
the two threads are synchronized via a release/acquire sequence
from B to A.  This is not relevant to the ordering problem.
*/

void threadA(void)
{
  for (;;) {
    data = 1;                                           // A.A
    flag.store(1, std::memory_order_release);           // A.B
    /* By 1.9#14, A.A is sequenced before A.B */
    /* Now wait for thread B to accept the data. */
    while (flag.load(std::memory_order_acquire) == 1);
    assert(data == 0);
  }
}

void threadB(void)
{
  for (;;) {
    int f, d;
    do {
      f = flag.load(std::memory_order_consume);             // B.A
    } while (!f);
    /* By 1.10#9, B.A carries a dependency to B.B */
    /* By 1.10#10, A.B is dependency-ordered before B.B */
    /* By 1.10#11, A.A intra-thread happens before B.B */
    /* By 1.10#12, A.A happens before B.B */
    /* By 1.10#13, A.A is a visible side-effect with respect to B.B
                   and the value determined by B.B shall be the
                   value (1) stored by A.A. */
    d = *(&data + (f - f));                                 // B.B
    assert(d == 1);
    /* Release thread A to start another iteration. */
    data = 0;
    flag.store(0, std::memory_order_release);
  }
}

int main(void)
{
  std::thread a(&threadA);
  std::thread b(&threadB);
  a.join();
  b.join();
  return 0;
}

------------- PowerPC listing of threadB():

_Z7threadBv:
.LFB2302:
    .cfi_startproc
    lis 10,.LANCHOR0@ha
    li 8,0
    la 10,.LANCHOR0@l(10)
.L16:
    lwz 9,4(10)
    cmpw 7,9,9      <------- acquire barrier
    bne- 7,$+4      <-------
    isync           <-------
    cmpwi 7,9,0
    beq+ 7,.L16
    lwz 9,0(10)
    cmpwi 7,9,1
    bne- 7,.L22
    stw 8,0(10)
    lwsync
    stw 8,4(10)
    b .L16
.L22:
    mflr 0
    lis 6,.LANCHOR1@ha
    stwu 1,-16(1)
    .cfi_def_cfa_offset 16
    .cfi_register 65, 0
    lis 3,.LC2@ha
    lis 4,.LC1@ha
    la 6,.LANCHOR1@l(6)
    la 3,.LC2@l(3)
    la 4,.LC1@l(4)
    li 5,47
    addi 6,6,16
    stw 0,20(1)
    .cfi_offset 65, 4
    bl __assert_fail
    .cfi_endproc

Reply via email to