https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767
torvald at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |torvald at gcc dot gnu.org
--- Comment #8 from torvald at gcc dot gnu.org ---
(In reply to mikulas from comment #7)
> #include <stdatomic.h>
>
> atomic_int a = ATOMIC_VAR_INIT(0);
> atomic_int b = ATOMIC_VAR_INIT(0);
> atomic_int p = ATOMIC_VAR_INIT(0);
>
> int thread_1(void)
> {
> atomic_store_explicit(&b, 1, memory_order_relaxed);
> atomic_load_explicit(&p, memory_order_seq_cst);
> return atomic_load_explicit(&a, memory_order_relaxed);
> }
>
> int thread_2(void)
> {
> atomic_store_explicit(&a, 1, memory_order_relaxed);
> atomic_load_explicit(&p, memory_order_seq_cst);
> return atomic_load_explicit(&b, memory_order_relaxed);
> }
>
> See for example this. Suppose that thread_1 and thread_2 are executed
> concurrently. If memory_order_seq_cst were a proper full memory barrier, it
> would be impossible that both functions return 0.
memory_order_seq_cst is a memory order in the Standard's terminology. Fences
are something else (ie, atomic_thread_fence()) , and parametrized by a memory
order. A memory_order_seq_cst *memory access* does not have the same effects
as a memory_order_seq_cst fence. See C++14 29.3p4-7; those paragraphs talk
about memory_order_seq_cst fences specifically, not about memory_order_seq_cst
operations in general.
If you want to make this example of Dekker synchronization correct, you need to
use fences instead of the accesses to p; alternatively, you need to use seq-cst
accesses for all the stores and loads to a and b, in which case there will be
HW fences added via the stores (as Andrew already pointed out).