On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <d...@limegreensocks.com> wrote:
> Hans-Peter Nilsson: I should have listened to you back when you raised
> concerns about this.  My apologies for ever doubting you.
>
> In summary:
>
> - The "trick" in the docs for using an arbitrarily sized struct to force
> register flushes for inline asm does not work.
> - Placing the inline asm in a separate routine can sometimes mask the
> problem with the trick not working.
> - The sample that has been in the docs forever performs an unhelpful,
> unexpected, and probably unwanted stack allocation + memcpy.
>
> Details:
>
> Here is the text from the docs:
>
> -----------
> One trick to avoid [using the "memory" clobber] is available if the size of
> the memory being accessed is known at compile time. For example, if
> accessing ten bytes of a string, use a memory input like:
>
>     "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )

Well - this can't work because you essentially are using a _value_
here (looking at the GIMPLE - I'm not sure if a statement expression
evaluates to an lvalue.

It should work if you simply do this without a stmt expression:

  "m" (*(struct { char x[10]; } *)ptr)

because that's clearly an lvalue (and the GIMPLE correctly says so):

  <bb 2>:
  c.a = 1;
  c.b = 2;
  __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
&c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
  printf ("%u %u\n", 1, 2);

note that we still constant propagated 1 and 2 for the reason that
the asm didn't get any VDEF.  That's because you do not have any
memory output!  So while it keeps 'c' live it doesn't consider it
modified by the asm.  You'd still need to clobber the memory,
but "m" clobbers are not supported, only "memory".

Thus fixed asm:


      __asm__ __volatile__ ("rep; stosb"
           : "=D" (Dest), "+c" (Count)
           : "0" (&c), "a" (0),
           "m" (*( struct foo { char x[8]; } *)&c)
           : "memory"
      );

where I'm not 100% sure if the "m" input is now pointless (that is,
if a "memory" clobber also constitutes a use of all memory).

Richard.

> -----------
>
> When I did the re-write of gcc's inline asm docs, I left the description for
> this (essentially) untouched.  I just took it on faith that "magic happens"
> and the right code gets generated.  But reading a recent post raised
> questions for me, so I tried it.  And what I found was that not only does
> this not work, it actually just makes a mess.
>
> I started with some code that I knew required some memory clobbering:
>
>     #include <stdio.h>
>
>     int main(int argc, char* argv[])
>     {
>       struct
>       {
>         int a;
>         int b;
>       } c;
>
>       c.a = 1;
>       c.b = 2;
>
>       int Count = sizeof(c);
>       void *Dest;
>
>       __asm__ __volatile__ ("rep; stosb"
>            : "=D" (Dest), "+c" (Count)
>            : "0" (&c), "a" (0)
>            //: "memory"
>       );
>
>       printf("%u %u\n", c.a, c.b);
>     }
>
> As written, this x64 code (compiled with -O2) will print out "1 2", even
> though someone might (incorrectly) expect the asm to overwrite the struct
> with zeros.  Adding the memory clobber allows this code to work as expected
> (printing "0 0").
>
> Now that I have code I can use to see if registers are getting flushed, I
> removed the memory clobber, and tried just 'clobbering' the struct:
>
>     #include <stdio.h>
>
>     int main(int argc, char* argv[])
>     {
>       struct
>       {
>         int a;
>         int b;
>       } c;
>
>       c.a = 1;
>       c.b = 2;
>
>       int Count = sizeof(c);
>       void *Dest;
>
>       __asm__ __volatile__ ("rep; stosb"
>            : "=D" (Dest), "+c" (Count)
>            : "0" (&c), "a" (0),
>            "m" ( ({ struct foo { char x[8]; } *p = (struct foo *)&c ; *p; })
> )
>       );
>
>       printf("%u %u\n", c.a, c.b);
>     }
>
> I'm using a named struct (foo) to avoid some compiler messages, but other
> than that, I believe this is the same as what's in the docs. And it doesn't
> work.  I still get "1 2".
>
> At this point I realized that code I've seen using this trick usually has
> the asm in its own routine.  When I try this, it still fails.  Unless I
> start cranking up the size of x from 8 to ~250.  At ~250, suddenly it starts
> working.  Apparently this is because at this point, gcc decides not to
> inline the routine anymore, and flushes the registers before calling the
> non-inline code.
>
> And why does changing the size of the structure we are pointing to result in
> increases in the size of the routine?  Reading the -S output, the "*p" at
> the end of this constraint generates a call to memcpy the 250 characters
> onto the stack, which it passes to the asm as %4, which is never used.
> Argh!
>
> Conclusion:
>
> What I expected when using that sample code from the docs was that any
> registers that contain values from the struct would get flushed to memory.
> This was intended to be a 'cheaper' alternative to doing a full-on "memory"
> clobber.  What I got instead was an unexpected/unneeded stack allocation and
> memcpy, and STILL didn't get the values flushed.  Yeah, not exactly the
> 'cheaper' I was hoping for.
>
> Is the example in the docs just written incorrectly?  Did this get broken
> somewhere along the line?  Or am I just using it wrong?
>
> I'm using gcc version 4.9.0 (x86_64-win32-seh-rev2, Built by MinGW-W64
> project).  Remember to compile these x64 samples with -O2.
>
> dw

Reply via email to