On 07/31/2017 11:02 AM, Alexander Monakov wrote:
> On Mon, 31 Jul 2017, Jeff Law wrote:
>>>> In the middle end patch, do we need a barrier before the fence as well?
>>>> The post-fence barrier prevents reordering the fence with anything which
>>>> follows the fence.  But do we have to also prevent reordering the fence
>>>> with prior instructions with any of the memory models?  This isn't my
>>>> area of expertise, so if it's dumb question, don't hesitate to let me
>>>> know :-)
>>>
>>> That depends on how pessimistic we want to be with respect to backend
>>> getting it wrong.  My expectation here is that if a backend emits non-empty
>>> RTL, the produced sequence for the fence itself acts as a compiler memory
>>> barrier.
>> Perhaps. But do we really want to rely on that?  EMitting a scheduling
>> barrier prior to these atomics is virtually free.
> 
> Please consider that expand_mem_thread_fence is used to place fences around
> seq-cst atomic loads&stores when the backend doesn't provide a direct pattern.
> With compiler barriers on both sides of the machine barrier, the generated
> sequence for a seq-cst atomic load will be 7 insns:
> 
>   asm volatile ("":::"memory");
>   machine_seq_cst_fence ();
>   asm volatile ("":::"memory");
>   dst = mem[src];
>   asm volatile ("":::"memory");
>   machine_seq_cst_fence ();
>   asm volatile ("":::"memory");
> 
> I can easily imagine people looking at RTL dumps with this overkill fencing
> being unhappy about this.
But the extra fences aren't actually going to impact anything except
perhaps an unmeasurable compile-time hit.  ie, it may look bad, but I'd
have a hard time believing it matters in practice.

> 
> I'd be more happy with detecting empty expansion via get_last_insn ().
That seems like an unnecessary complication to me.  I'd prefer instead
to just emit the necessary fencing in the generic code and update the MD
docs so that maintainers know they don't have to emit the RTL fences
themselves.

Jeff

Reply via email to