On 5/17/19 1:44 PM, Alex Elder wrote:
> On 5/17/19 1:33 PM, Arnd Bergmann wrote:
>> On Fri, May 17, 2019 at 8:08 PM Alex Elder <el...@linaro.org>
>> wrote:
>>> 
>>> On 5/15/19 2:34 AM, Arnd Bergmann wrote:
>>>>> +static void gsi_trans_tre_fill(struct gsi_tre *dest_tre,
>>>>> dma_addr_t addr, +                              u32 len, bool
>>>>> last_tre, bool bei, +                              enum
>>>>> ipa_cmd_opcode opcode) +{ +       struct gsi_tre tre; + +
>>>>> tre.addr = cpu_to_le64(addr); +       tre.len_opcode =
>>>>> gsi_tre_len_opcode(opcode, len); +       tre.reserved = 0; +
>>>>> tre.flags = gsi_tre_flags(last_tre, bei, opcode); + +
>>>>> *dest_tre = tre;        /* Write TRE as a single (16-byte)
>>>>> unit */ +}
>>>> Have you checked that the atomic write is actually what happens
>>>> here, but looking at the compiler output? You might need to add
>>>> a 'volatile' qualifier to the dest_tre argument so the
>>>> temporary structure doesn't get optimized away here.
>>> 
>>> Currently, the assignment *does* become a "stp" instruction. But
>>> I don't know that we can *force* the compiler to write it as a
>>> pair of registers, so I'll soften the comment with "Attempt to
>>> write" or something similar.
>>> 
>>> To my knowledge, adding a volatile qualifier only prevents the 
>>> compiler from performing funny optimizations, but that has no 
>>> effect on whether the 128-bit assignment is made as a single 
>>> unit.  Do you know otherwise?
>> 
>> I don't think it you can force the 128-bit assignment to be atomic,
>> but marking 'dest_tre' should serve to prevent a specific
>> optimization that replaces the function with
>> 
>> dest_tre->addr = ... dest_tre->len_opcode = ... dest_tre->reserved
>> = ... dest_tre->flags = ...
>> 
>> which it might find more efficient than the stp and is equivalent 
>> when the pointer is not marked volatile. We also have the
>> WRITE_ONCE() macro that can help prevent this, but it does not work
>> reliably beyond 64 bit assignments.
> 
> OK, I'll mark it volatile to avoid that potential result.

OK I got interesting results so I wanted to report back.

The way it is currently written (no volatile qualifier) is
the *only* way I get a 16-byte store instruction.

Specifically, with dest_tre having type (struct gsi_tre *):
        *dest_tre = tre;

Produces this:
 4ec: a9002013        stp     x19, x8, [x0]

I attempted to make the assigned-to pointer volatile:
        *(volatile struct gsi_tre *)dest_tre = tre;

But that apparently is interpreted as "assign things
in the destination in exactly the way they were assigned
above into the "tre" structure."  Because it produced this:
 748: f9000348        str     x8, [x26]
 74c: 7940c3e8        ldrh    w8, [sp, #96]
 750: 79001348        strh    w8, [x26, #8]
 754: 7940bbe8        ldrh    w8, [sp, #92]
 758: 79001748        strh    w8, [x26, #10]
 75c: b9405be8        ldr     w8, [sp, #88]
 760: b9000f48        str     w8, [x26, #12]

>From there I went back and changed the type of the dest_tre pointer
parameter to (volatile struct gsi_tre *), and changed the the type
of the values passed through that argument in the two callers to
also have the volatile qualifier.  This way there was no need to
use a cast in the left-hand side.  That too produced a string of
separate assignments, not a single 128-bit one.

I also attempted this:
        *dest_tre = *(volatile struct gsi_tre *)&tre;
And even this:
        *(volatile struct gsi_tre *)dest_tre = *(volatile struct gsi_tre *)&tre;
But neither produced a single "stp" instruction; all produced
the same sequence of instructions above.

So it seems that I must *not* apply a volatile qualifier,
because doing so restricts the compiler from making the
single instruction optimization.

If I've missed something and you have another suggestion for
me to try let me know and I'll try it.

                                        -Alex


> Thanks.
> 
> -Alex
> 
>> 
>> Arnd
>> 
> 

Reply via email to