https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
Bug ID: 65697 Summary: __atomic memory barriers not strong enough for __sync builtins Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: matthew.wahab at arm dot com The __sync builtins are implemented by expanding them to the equivalent __atomic builtins, using MEMMODEL_SEQ_CST for those that are full barriers. This is too weak since __atomic operations with MEMMODEL_SEQ_CST still allow some data references to move across the operation (see https://gcc.gnu.org/ml/gcc/2014-02/msg00058.html) while a __sync full barrier should block all movement (https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html#_005f_005fsync-Builtins). This is problem for Aarch64 where data references after the barrier could be speculated ahead of it. For example, compiling with aarch64-none-linux-gnu at -O2, ----- int foo = 0; int bar = 0; int T1(void) { int x = __sync_fetch_and_add(&foo, 1); return bar; } ---- produces ---- T1: adrp x0, .LANCHOR0 add x0, x0, :lo12:.LANCHOR0 .L2: ldaxr w1, [x0] ; load-acquire (__sync_fetch_and_add) add w1, w1, 1 stlxr w2, w1, [x0] ; store-release (__sync_fetch_and_add) cbnz w2, .L2 ldr w0, [x0, 4] ; load (return bar) ret ---- With this code, the load can be executed ahead of the store-release which ends the __sync_fetch_and_add. A correct implementation should emit a dmb instruction after the cbnz. GCC info: gcc version 5.0.0 20150407 (experimental) Target: aarch64-none-linux-gnu