https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59767
--- Comment #7 from mikulas at artax dot karlin.mff.cuni.cz --- #include <stdatomic.h> atomic_int a = ATOMIC_VAR_INIT(0); atomic_int b = ATOMIC_VAR_INIT(0); atomic_int p = ATOMIC_VAR_INIT(0); int thread_1(void) { atomic_store_explicit(&b, 1, memory_order_relaxed); atomic_load_explicit(&p, memory_order_seq_cst); return atomic_load_explicit(&a, memory_order_relaxed); } int thread_2(void) { atomic_store_explicit(&a, 1, memory_order_relaxed); atomic_load_explicit(&p, memory_order_seq_cst); return atomic_load_explicit(&b, memory_order_relaxed); } See for example this. Suppose that thread_1 and thread_2 are executed concurrently. If memory_order_seq_cst were a proper full memory barrier, it would be impossible that both functions return 0. Because you omit the barrier on read of variable p, it is possible that both functions return 0. thread_1 is compiled into movl $1, b(%rip) movl p(%rip), %eax movl a(%rip), %eax ret thread_2 is compiled into movl $1, a(%rip) movl p(%rip), %eax movl b(%rip), %eax ret ... and the processor is free to move the writes past reads, resulting in both functions returning zero. Does the standard allow this behavior? I don't really know. I don't understand the standard. Please tell me - how do you decide, by interpreting claims in the section 7.17.3 of the C11 standard, whether the above outcome is allowed or not?