Issue 177236
Summary atomic_store_n does not provide sequential consistency for ARMv8-A targets (DMB barrier missing)
Labels
Assignees
Reporter aleh-kazakevich
    **Foreword**

I apologized for being naive (and this is my first bugreport on GitHub), but it looks like builtin atomics for ARM64 are seriously broken...

**Problem**

For ARMv8-A targets, the '__atomic_store_n' builtin function with the '__ATOMIC_SEQ_CST' parameter does not provide sequential consistency: the DMB barrier is missing in a machine code generated. C++ atomics have the same problem.

Example code:
```
#include <atomic>

int g_builtinX = 0;
int g_builtinY = 0;

void testBuiltins()
{
    __atomic_store_n(&g_builtinX, 65, __ATOMIC_SEQ_CST);
 __atomic_load_n(&g_builtinY, __ATOMIC_RELAXED);
}

std::atomic<int> g_atomicX;
std::atomic<int> g_atomicY;

void testAtomics()
{
 g_atomicX.store(65, std::memory_order_seq_cst);
 g_atomicY.load(std::memory_order_relaxed);
}

int main()
{
 testBuiltins();
    testAtomics();
    return 0;
}
```

Machine code:
```
_Z12testBuiltinsv:                      // @_Z12testBuiltinsv
	adrp	x8, g_builtinX
	add	x8, x8, :lo12:g_builtinX
	mov	w9, #65
	stlr	w9, [x8]                   // STLR
	adrp	x8, g_builtinY
	ldr	wzr, [x8, :lo12:g_builtinY]    // LDR
	ret

_Z11testAtomicsv:                       // @_Z11testAtomicsv
	adrp	x8, g_atomicX
	add	x8, x8, :lo12:g_atomicX
	mov	w9, #65                         
	stlr	w9, [x8] // STLR
	adrp	x8, g_atomicY
	ldr	wzr, [x8, :lo12:g_atomicY]      // LDR
	ret
```

**Description**

According to ARMv8-A memory model, the 'STLR/LDR' sequence may be executed out-of-order. 

The STLR instruction has release semantic:
> * All explicit memory accesses before the STLR are observed before the STLR.
> * All explicit memory accesses after the STLR are not affected, and are reordered regarding the STLR.

The LDR instruction has no ordering constraints and, thus, may cross the release barrier above. In term of C++, we have only 'release' memory order here, not 'sequential consistent'.

Load-Acquire and Store-Release instructions
https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions

<img width="794" height="340" alt="Image" src="" />

The same issue may be easily reproduced when compiling with "RCpc" extensions and/or for ARMv8.3 or higher.

Source code:
```
__atomic_store_n(&g_builtinX, 65, __ATOMIC_SEQ_CST);
__atomic_load_n(&g_builtinY, __ATOMIC_ACQUIRE); // <--- It was relaxed in the sample above.
```

Machine code:
```
_Z12testBuiltinsv:                      // @_Z12testBuiltinsv
	mov	w8, #65
	adrp	x9, g_builtinX
	add	x9, x9, :lo12:g_builtinX
	stlr	w8, [x9]                    // STLR
	adrp	x8, g_builtinY
	add	x8, x8, :lo12:g_builtinY
	ldapr	wzr, [x8] // LDAPR
```

As in examples above, ARMv8-A memory model permits LDAPR instruction to 'move up' and cross the STLR barrier (that's what is was designed for):

LDAPR
https://developer.arm.com/documentation/ddi0602/2023-06/Base-Instructions/LDAPR--Load-Acquire-RCpc-Register-?lang=en

> There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release, created by having a Store-Release followed by a Load-AcquirePC instruction.

Enabling the LDAPR instructions for C/C++ compilers
https://developer.arm.com/community/arm-community-blogs/b/tools-software-ides-blog/posts/enabling-rcpc-in-gcc-and-llvm

<img width="1799" height="785" alt="Image" src="" />

**(Possible) solution**

Insert DMB instruction immediately after STLR. This approach is used in MSVC, for example:
```
STLR
DMB ISH; Nobody can cross this line in any direction
LDR, LDAPR
```

**Environment**

Clang version: 21.1.6

(store/seq_cst + load/acquire, ARMv8.1-A)
clang++ -S -c ./main.cpp -o ./clang-libstdcxx.S -march=armv8.1-a -O2
clang++ -S -c ./main.cpp -o ./clang-libcxx-rtlib.S -march=armv8.1-a -O2 -stdlib=libc++ -rtlib=compiler-rt

(store/seq_cst + load/relaxed, ARMv8.3-A)
clang++ -S -c ./main.cpp -o ./clang-libstdcxx.S -march=armv8.3-a -O1
clang++ -S -c ./main.cpp -o ./clang-libcxx-rtlib.S -march=armv8.3-a -O1 -stdlib=libc++ -rtlib=compiler-rt

_Interestingly, the GCC compiler has exactly the same issue...
 (g++ 15.2.1 20251112)_
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to