On Fri, Sep 26, 2014 at 12:37 PM, David Wohlferd <d...@limegreensocks.com> 
wrote:
>
> On 9/25/2014 12:36 AM, Yury Gribov wrote:
>>
>> On 09/24/2014 12:31 PM, Richard Biener wrote:
>>>
>>> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <d...@limegreensocks.com>
>>> wrote:
>>>>
>>>> Hans-Peter Nilsson: I should have listened to you back when you raised
>>>> concerns about this.  My apologies for ever doubting you.
>>>>
>>>> In summary:
>>>>
>>>> - The "trick" in the docs for using an arbitrarily sized struct to force
>>>> register flushes for inline asm does not work.
>>>> - Placing the inline asm in a separate routine can sometimes mask the
>>>> problem with the trick not working.
>>>> - The sample that has been in the docs forever performs an unhelpful,
>>>> unexpected, and probably unwanted stack allocation + memcpy.
>>>>
>>>> Details:
>>>>
>>>> Here is the text from the docs:
>>>>
>>>> -----------
>>>> One trick to avoid [using the "memory" clobber] is available if the size
>>>> of
>>>> the memory being accessed is known at compile time. For example, if
>>>> accessing ten bytes of a string, use a memory input like:
>>>>
>>>>      "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>>>
>>>
>>> Well - this can't work because you essentially are using a _value_
>>> here (looking at the GIMPLE - I'm not sure if a statement expression
>>> evaluates to an lvalue.
>>>
>>> It should work if you simply do this without a stmt expression:
>>>
>>>    "m" (*(struct { char x[10]; } *)ptr)
>>>
>>> because that's clearly an lvalue (and the GIMPLE correctly says so):
>>>
>>>    <bb 2>:
>>>    c.a = 1;
>>>    c.b = 2;
>>>    __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
>>> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
>>>    printf ("%u %u\n", 1, 2);
>>>
>>> note that we still constant propagated 1 and 2 for the reason that
>>> the asm didn't get any VDEF.  That's because you do not have any
>>> memory output!  So while it keeps 'c' live it doesn't consider it
>>> modified by the asm.  You'd still need to clobber the memory,
>>> but "m" clobbers are not supported, only "memory".
>>>
>>> Thus fixed asm:
>>>
>>>
>>>        __asm__ __volatile__ ("rep; stosb"
>>>             : "=D" (Dest), "+c" (Count)
>>>             : "0" (&c), "a" (0),
>>>             "m" (*( struct foo { char x[8]; } *)&c)
>>>             : "memory"
>>>        );
>>>
>>> where I'm not 100% sure if the "m" input is now pointless (that is,
>>> if a "memory" clobber also constitutes a use of all memory).
>>
>>
>> Or maybe even
>>   __asm__ __volatile__ ("rep; stosb"
>>        : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; }
>> *)&c)
>>        : "0" (&c), "a" (0)
>>   );
>> to avoid the big-hammer memory clobber?
>>
>> -Y
>>
>
> Thank you both for the responses.  At this point I've started composing some
> replacement text for the docs (below), cuz clearly what's there is both
> inadequate and wrong.  All comments are welcome.
>
> While the code in Richard's response will always produce the correct
> results, the intent here is to use memory constraints to *avoid* the
> performance penalties of the memory clobber.  The existing docs say this
> should work, and I've seen a number of places using it (linux kernel, glibc,
> etc).  If this does work, we should update the docs to say how it's done.
> If this doesn't work, we should say that too.
>
> Looking at Yury's response, that code does work (in c, not c++).  At least
> it does sometimes.  The problem is that sometimes gcc can "lose track" of
> what a pointer points to.  And when it does, gcc can get confused about what
> to flush.  Here's a simple example that shows this (4.9.0, x64, -O2, 'c'):
>
> #include <stdio.h>
>
> inline void *memset( void * Dest, int c, size_t Count) {
>    void *dummy;
>
>    __asm__ (
>       "rep stosb"
>       : "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; }
> *)Dest)
>       : "0" (Dest), "a" (c)
>       : "cc"//, "memory"
>    );
>
>    return Dest;
> }
>
> int main() {
>    struct {
>      int a;
>    } c;
>
>    void *v;
>    asm volatile("#" : "=r" (v) : "0" (&c)  );
>
>    c.a = 0x30303030;
>
>    //memset(&c, 0, sizeof(c));
>    memset(v, 0, sizeof(c));
>    printf("%x\n", c.a);
> }
>
> This code will work if you pass &c to memset.  But it will fail if you use
> v.  Oh, the wonders of aliasing.

Yeah, in this case because *(struct foo { char x[8]; } *)Dest uses alias-set
'4' but c.a is accessed with alias-set '3'.

You can't expect a struct with a char array to end up using alias-set
zero (which probably was intended).  You want

"=m" (*( struct foo  { char x[8]; } __attribute__((may_alias)) *)Dest)

which makes it work.  Or use *(int *)Dest as you are later accessing
it with int, so you even do not lose TBAA.

Richard.

> And this is why I'm having a problem doc'ing this.  I love the potential
> benefits of using this.  But if you are writing general-purpose routines,
> how can you hope to know whether the code calling you will pass you a
> pointer that will work with this trick? This kind of thing can introduce
> *horribly* hard to track down problems.  Note that the memory clobber always
> works, but potentially with a performance penalty. <sigh>
>
> I could simply skip describing the whole problem with aliasing, but that
> just hides the problem.  I hate when docs do that.
>
> That said, here's what I've written so far.  Any improvements or corrections
> are very welcome.
>
> ==========================
> Flushing registers to memory has performance implications and may be an
> issue
> for time-sensitive code. One trick to avoid flushing more registers than is
> absolutely required is to use a memory constraint.
>
> For example this i386 code works for both c and c++:
>
> @example
> inline void *memset(void *Dest, int c, size_t Count)
> @{
>    struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
>    void *dummy;
>
>    asm (
>       "rep stosb"
>       : "=D" (dummy), "+c" (Count), "=m" (*(struct _reallybigstruct *)Dest)
>       : "0" (Dest), "a" (c)
>       : "cc"
>    );
>
>    return Dest;
> @}
> @end example
>
> Similar code can be used if the memory is being read (by changing the memory
> constraint to be an input) or updated (by changing the constraint to use
> '+' instead of '=').
>
> A few noteworthy things about clobbering using memory constraints:
>
> @itemize @minus
> @item
> Note that @code{_reallybigstruct} uses a size of @code{SSIZE_MAX}. This does
> not mean that when casting @code{Dest} to @code{(struct _reallybigstruct
> *)},
> @code{Dest} must be @code{SSIZE_MAX} bytes long or that @code{SSIZE_MAX}
> bytes will be used.  Nor does it affect registers associated with items that
> may follow @code{Dest} in memory.  This only affects the symbol @code{Dest}.
>
> @item
> It may be that using memory constraints can remove the need for the asm
> @code{volatile} qualifier.  This can, under certain circumstances, result in
> more efficient code. Consider the memset sample above.  Without the memory
> constraint, the only outputs are @code{dummy} and @code{Count}. However,
> both those symbols go out of scope immediately after the asm statement. As a
> result, gcc optimization could reasonably assume that the asm block isn't
> actually needed for anything.  Even adding the "memory" clobber doesn't
> change this. Normally this is resolved by adding the asm @code{volatile}
> qualifier. However, by adding the memory constraint, gcc also considers
> whether updating the @code{Dest} block will affect the subsequent code.
> Only in the event that the @code{Dest} block is not needed will gcc consider
> removing this asm code.
>
> @end itemize
>
> @strong{Warning}: Older versions of this documentation show an example for
> using
> memory constraints like this:
>
> @example
> "m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )
> @end example
>
> This code not only doesn't compile in c++, it has potentially undesirable
> side effects that affect performance, as well as producing incorrect
> results.
>
> ==========================
>
> dw

Reply via email to