https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67458

            Bug ID: 67458
           Summary: x86: atomic store with memory_order_release doesn't
                    order other stores
           Product: gcc
           Version: 5.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

As I understand it, std::atomic::store(val, memory_order_release) is a
StoreStore barrier.  However, when compiling for x86 (but not ARM or PowerPC),
g++ moves a store past the barrier.

Tested on godbolt: https://goo.gl/62ZCAS with x86 g++ 4.7.3 to 5.2, ARM g++
4.8.2, and PowerPC g++ 4.8.2.


Test case:

#include <atomic>
std::atomic<int> a;
int b(0);

using namespace std;
void set(void) {
  b = 2;
  a.store(1, memory_order_release);
  b = 3;
}


ASM output:
set(): # x86-64 g++ -std=c++11 -O3
        # missing b=2 store before the MO_release store
        movl    $1, a(%rip)
        movl    $3, b(%rip)
        ret


set(): # ARM g++ -std=c++11 -O3
        movw    r3, #:lower16:.LANCHOR0
        movt    r3, #:upper16:.LANCHOR0
        movs    r2, #2
        movs    r1, #1
        str     r2, [r3]   # b = 2
        movs    r2, #3
        dmb     sy       # full memory barrier.  Couldn't this be just a dmb st
(StoreStore barrier)?
        str     r1, [r3, #4]  # a = 1
        str     r2, [r3]      # b = 3
        bx      lr


Changing the atomic store to:
  atomic_thread_fence(memory_order_release);  // stand-alone StoreStore barrier
  a.store(1, memory_order_relaxed);

changes the x86 asm to be what I expected (and what clang produces for both
versions):
        movl    $2, b(%rip)
        movl    $1, a(%rip)
        movl    $3, b(%rip)
        ret

(With no change to ARM or PowerPC asm)


http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence points out that
unlike with a MO_release fence, a std::atomic::store(MO_release) can move
downward so it appears after a later relaxed store.  i.e. it's not a normal
StoreStore barrier.

  So I think this would be valid output for the a.store(1, MO_release) version,
but not the thread_fence(MO_release) version:

        movl    $3, b(%rip)   # b=3 can appear before a=1, so b=2 never needs
to happen
        movl    $1, a(%rip)
        ret

BTW, I noticed this while writing up an answer to
http://stackoverflow.com/questions/32384901/atomic-operations-stdatomic-and-ordering-of-writes.

Reply via email to