https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697

            Bug ID: 65697
           Summary: __atomic memory barriers not strong enough for __sync
                    builtins
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthew.wahab at arm dot com

The __sync builtins are implemented by expanding them to the equivalent
__atomic builtins, using MEMMODEL_SEQ_CST for those that are full barriers.
This is too weak since __atomic operations with MEMMODEL_SEQ_CST still allow
some data references to move across the operation (see
https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) while a __sync full barrier
should block all movement
(https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#_005f_005fsync-Builtins).

This is problem for Aarch64 where data references after the barrier could be
speculated ahead of it.

For example, compiling with aarch64-none-linux-gnu at -O2,
-----
int foo = 0;
int bar = 0;

int T1(void)
{
  int x = __sync_fetch_and_add(&foo, 1);
  return bar;
}
----
produces
----
T1:
    adrp    x0, .LANCHOR0
    add    x0, x0, :lo12:.LANCHOR0
.L2:
    ldaxr    w1, [x0]       ; load-acquire (__sync_fetch_and_add)
    add    w1, w1, 1
    stlxr    w2, w1, [x0]   ; store-release  (__sync_fetch_and_add)
    cbnz    w2, .L2
    ldr    w0, [x0, 4]    ; load (return bar)
    ret
----
With this code, the load can be executed ahead of the store-release which ends
the __sync_fetch_and_add. A correct implementation should emit a dmb
instruction after the cbnz.

GCC info:
gcc version 5.0.0 20150407 (experimental) 
Target: aarch64-none-linux-gnu

Reply via email to