On Thu, Jun 2, 2016 at 9:08 PM, Richard Henderson <r...@twiddle.net> wrote:
> On 06/02/2016 02:37 PM, Sergey Fedorov wrote:
>>
>>
>> It would give us three TCG operations for each memory operation instead
>> of one. But then we might like to combine these barrier operations back
>> with memory operations in each backend. If we propagate memory ordering
>> semantics up to the backend, it can decide itself what instructions are
>> best to generate.
>
>
> A strongly ordered target would generally only set BEFORE bits or AFTER
> bits, but not both (and I suggest we canonicalize on AFTER for all such
> targets). Thus a strongly ordered target would produce only 2 opcodes per
> memory op.
>
> I supplied both to make it easier to handle a weakly ordered target with
> acquire/release bits.
>
> I would *not* combine the barrier operations back with memory operations in
> the backend.  Only armv8 and ia64 can do that, and given the optimization
> level at which we generate code, I doubt it would really make much
> difference above separate barriers.
>

On armv8, using load_acquire/store_release instructions makes a
significant difference in performance when compared to plain
dmb+memory instruction sequence. So I would really like to keep the
option of generating acq/rel instructions(by combining barrier and
memory or some other way) open.

Thanks,
-- 
Pranith

Reply via email to