On 05/26/2017 04:58 AM, Alexander Monakov wrote:
> On Wed, 17 May 2017, Alexander Monakov wrote:
> 
>> Ping.
> 
> Ping^2?
> 
>> (to be clear, patch 2/2 is my previous followup in this thread, I forgot to
>> adjust the subject line; it should have said:
>> "[PATCH 2/2] x86: add compiler memory barriers when expanding atomic_load").
>>
>> On Wed, 10 May 2017, Alexander Monakov wrote:
>>
>>> Hi,
>>>
>>> When expanding __atomic_thread_fence(x) to RTL, the i386 backend doesn't 
>>> emit
>>> any instruction except for x==__ATOMIC_SEQ_CST (which emits 'mfence').  
>>> This 
>>> is incorrect: although no machine barrier is needed, the compiler still must
>>> emit a compiler barrier into the IR to prevent propagation and code motion
>>> across the fence.  The testcase added with the patch shows how it can lead
>>> to a miscompilation.
>>>
>>> The proposed patch fixes it by handling non-seq-cst fences exactly like
>>> __atomic_signal_fence is expanded, by emitting asm volatile("":::"memory").
>>>
>>> The s390 backend uses the a similar mem_thread_fence expansion, so the patch
>>> fixes both backends in the same manner.
>>>
>>> Bootstrapped and regtested on x86_64; also checked that s390-linux cc1
>>> successfully builds after the change.  OK for trunk?
>>>
>>> (the original source code in the PR was misusing atomic fences by doing
>>> something like
>>>
>>>   void f(int *p)
>>>   {
>>>     while (*p)
>>>       __atomic_thread_fence(__ATOMIC_ACQUIRE);
>>>   }
>>>
>>> but since *p is not atomic, a concurrent write to *p would cause a data 
>>> race and
>>> thus invoke undefined behavior; also, if *p is false prior to entering the 
>>> loop,
>>> execution does not encounter the fence; new test here has code usable 
>>> without UB)
>>>
>>> Alexander
>>>
>>>     * config/i386/sync.md (mem_thread_fence): Emit a compiler barrier for
>>>     non-seq-cst fences.  Adjust comment.
>>>     * config/s390/s390.md (mem_thread_fence): Likewise.
>>>     * optabs.c (expand_asm_memory_barrier): Export.
>>>     * optabs.h (expand_asm_memory_barrier): Declare.
>>> testsuite/
>>>     * gcc.target/i386/pr80640-1.c: New testcase.
So I think this is up to the target maintainers.  I have no concerns
with enabling use of expand_asm_memory_barrier to be used outside of
optabs.  So if the s390/x86 maintainers want to go forward, the optabs
changes are pre-approved.

The reasoning seems sound to me -- we may not need fencing at the
hardware level because it gives a certain set of guarantees, but we may
still need them at the compiler level to prevent undesirable code
motions across the fence point in the compiler.

Jeff

Reply via email to