On 05/26/2017 04:58 AM, Alexander Monakov wrote: > On Wed, 17 May 2017, Alexander Monakov wrote: > >> Ping. > > Ping^2? > >> (to be clear, patch 2/2 is my previous followup in this thread, I forgot to >> adjust the subject line; it should have said: >> "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load"). >> >> On Wed, 10 May 2017, Alexander Monakov wrote: >> >>> Hi, >>> >>> When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't >>> emit >>> any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence'). >>> This >>> is incorrect: although no machine barrier is needed, the compiler still must >>> emit a compiler barrier into the IR to prevent propagation and code motion >>> across the fence. The testcase added with the patch shows how it can lead >>> to a miscompilation. >>> >>> The proposed patch fixes it by handling non-seq-cst fences exactly like >>> __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory"). >>> >>> The s390 backend uses the a similar mem_thread_fence expansion, so the patch >>> fixes both backends in the same manner. >>> >>> Bootstrapped and regtested on x86_64; also checked that s390-linux cc1 >>> successfully builds after the change. OK for trunk? >>> >>> (the original source code in the PR was misusing atomic fences by doing >>> something like >>> >>> void f(int *p) >>> { >>> while (*p) >>> __atomic_thread_fence(__ATOMIC_ACQUIRE); >>> } >>> >>> but since *p is not atomic, a concurrent write to *p would cause a data >>> race and >>> thus invoke undefined behavior; also, if *p is false prior to entering the >>> loop, >>> execution does not encounter the fence; new test here has code usable >>> without UB) >>> >>> Alexander >>> >>> * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for >>> non-seq-cst fences. Adjust comment. >>> * config/s390/s390.md (mem_thread_fence): Likewise. >>> * optabs.c (expand_asm_memory_barrier): Export. >>> * optabs.h (expand_asm_memory_barrier): Declare. >>> testsuite/ >>> * gcc.target/i386/pr80640-1.c: New testcase. So I think this is up to the target maintainers. I have no concerns with enabling use of expand_asm_memory_barrier to be used outside of optabs. So if the s390/x86 maintainers want to go forward, the optabs changes are pre-approved.
The reasoning seems sound to me -- we may not need fencing at the hardware level because it gives a certain set of guarantees, but we may still need them at the compiler level to prevent undesirable code motions across the fence point in the compiler. Jeff