dw <limegreenso...@yahoo.com> writes:
> On 2/27/2014 11:32 PM, Richard Sandiford wrote:
>> dw <limegreenso...@yahoo.com> writes:
>>> On 2/27/2014 4:11 AM, Richard Sandiford wrote:
>>>> Andrew Haley <a...@redhat.com> writes:
>>>>> Over the years there has been a great deal of traffic on these lists
>>>>> caused by misunderstandings of GCC's inline assembler.  That's partly
>>>>> because it's inherently tricky, but the existing documentation needs
>>>>> to be improved.
>>>>>
>>>>> dw <limegreenso...@yahoo.com> has done a fairly thorough reworking of
>>>>> the documentation.  I've helped a bit.
>>>>>
>>>>> Section 6.41 of the GCC manual has been rewritten.  It has become:
>>>>>
>>>>> 6.41 How to Use Inline Assembly Language in C Code
>>>>> 6.41.1 Basic Asm - Assembler Instructions with No Operands
>>>>> 6.41.2 Extended Asm - Assembler Instructions with C Expression Operands
>>>>>
>>>>> We could simply post the patch to GCC-patches and have at it, but I
>>>>> think it's better to discuss the document here first.  You can read it
>>>>> at
>>>>>
>>>>> http://www.LimeGreenSocks.com/gcc/Basic-Asm.html
>>>>> http://www.LimeGreenSocks.com/gcc/Extended-Asm.html
>>>>> http://www.LimeGreenSocks.com/gcc/extend04.zip (contains .texi, .patch,
>>>>> and affected html pages)
>>>>>
>>>>> All comments are very welcome.
>>>> Thanks for doing this, looks like a big improvement.
>>> Thanks, I did my best.  I appreciate you taking the time to review them.
>>>
>>>> A couple of comments:
>>>>
>>>> The section on basic asms says:
>>>>
>>>>     Do not expect a sequence of asm statements to remain perfectly
>>>>     consecutive after compilation. To ensure that assembler instructions
>>>>     maintain their order, use a single asm statement containing multiple
>>>>     instructions. Note that GCC's optimizer can move asm statements
>>>>     relative to other code, including across jumps.
>>>>
>>>> The "maintain their order" might be a bit misleading, since volatile asms
>>>> (including basic asms) must always be executed in the original order.
>>>> Maybe this was meaning placement/address order instead?
>>> This statement is based on this text from the existing docs:
>>>
>>> "Similarly, you can't expect a sequence of volatile |asm| instructions
>>> to remain perfectly consecutive. If you want consecutive output, use a
>>> single |asm|."
>>>
>>> I do not dispute what you are saying.  I just want to confirm that the
>>> existing docs are incorrect before making a change.  Also, see Andi's
>>> response re -fno-toplevel-reorder.
>>>
>>> It seems to me that recommending "single statement" is both the
>>> clearest, and the safest approach here.  But I'm prepared to change my
>>> mind if there is consensus I should.
>> Right.  I agree with that part.  I just thought that the "maintain their
>> order" could be misunderstood as meaning execution order, whereas I think
>> both sentences of the original docs were talking about being "perfectly
>> consecutive" (which to me means "there are no other instructions inbetween").
>
> Hmm.  I'm not seeing the differences here that you do.

Well, like you say, things can be moved across branches.  So, although
this is a very artificial example:

     asm ("x");
     asm ("y");

could become:

     goto bar;

foo:
     asm ("y");
     ...

bar:
     asm ("x");
     goto foo;

This has reordered the instructions in the sense that they have a
different order in memory.  But they are still _executed_ in the same
order.  Actually reordering the execution would be a serious bug.

So I just want to avoid anything that gives the impression that "y" can
be executed before "x" in this example.  I still think:

> Since the existing docs say "GCC's optimizer can move asm statements 
> relative to other code", how would you feel about:
>
> "Do not expect a sequence of |asm| statements to remain perfectly 
> consecutive after compilation. If you want to stop the compiler from 
> reordering or inserting anything into a sequence of assembler 
> instructions, use a single |asm| statement containing multiple 
> instructions. Note that GCC's optimizer can move |asm| statements 
> relative to other code, including across jumps."

...this gives the impression that we might try to execute volatiles
in a different order.

>>>> It might also be
>>>> worth mentioning that the number of instances of an asm in the output
>>>> may be different from the input.  (Can it increase as well as decrease?
>>>> I'm not sure off-hand, but probably yes.)
>>> So, in the volatile section, how about something like this for decrease:
>>>
>>> "GCC does not delete a volatile |asm| if it is reachable, but may delete
>>> it if it can prove that control flow never reaches the location of the
>>> instruction."
>> It's not just that though.  AIUI it would be OK for:
>>
>>    if (foo)
>>      {
>>        ...
>>        asm ("x");
>>      }
>>    else
>>      {
>>        ...
>>        asm ("x");
>>      }
>>
>> to become:
>>
>>    if (foo)
>>      ...
>>    else
>>      ...
>>    asm ("x");
>
> Could be.  However, I'm not clear what benefit there would be from 
> doc'ing this possibility?

I was just thinking that something along the lines of "Optimizations
may introduce or remove duplicates of an asm, provided that this does
not change which asms are executed." would be more general than just
talking about introducing duplicates.

>>>> In the extended section:
>>>>
>>>>     Unless an output operand has the '&' constraint modifier (see
>>>>     Modifiers), GCC may allocate it in the same register as an unrelated
>>>>     input operand, [...]
>>>>
>>>> It could also use it for addresses in other (memory) outputs.
>>> Ok.  But I'm not sure this really adds anything.  Having warned people
>>> that the register may be re-used unless '&' is used seems sufficient.
>> It matters where it can be reused though.  If you talk about input
>> operands only, people might think it is OK to write asms of the form:
>>
>>     foo tmp,[input0]
>>     bar [output0],tmp
>>     frob [output1],tmp
>
>> where output0 is a register and output1 is a memory.  This safely avoids
>> using the input operand after assigning to output0, but the address in
>> output1 is still live and could be changed by bar.
>
> I'm not sure we're talking about the same problem.  I'm borrowing this 
> x86 example from someone else:
>
> static inline char *
> lcopy( char *dst, const char *src, long len )
> {
>     asm(
>        "shr $3,%2; " /* how many qwords to copy */
>        "rep movsq; " /* copy that many */
>        "mov %3,%2; " /* how many bytes to copy */
>        "rep movsb" /* copy that many */
>         : "+D" (dst),  "+S" (src),  "+c" (len)
>         :  "r" (len & 7)
>         :  "memory");
>    return dst;
> }
>
> You might expect that  "len" and "len & 7" are two different things.  
> However if the function is called with a constant less than 8, the 
> compiler knows that they are actually the same and uses rcx for both, 
> giving mov rcx,rcx for mov %3,%2 and of course by then rcx is zero.   
> Using & on len forces the use of two separate registers.
>
> This seems to me to be a different kind of problem than:
>
> asm ("xxx": "=r" (x), "=m" (x));
>
> Or am I missing your point?

Well, that code is just one instance of (and a good example of)
the principle that GCC assumes all inputs are consumed before any
outputs are written.  And the point is that the "inputs" in that
description aren't restricted to input operands: they apply to any
rvalues in the output operands too.

E.g. the same thing could occur for an artificial case like:

    asm ("...." : "+r" (ptr), "=m" (*x));

if GCC realises that x==ptr.  Then the address in operand 1 might
be the same as operand 0.  The same goes for:

    asm ("...." : "=r" (ptr), "=m" (*x) : "0" (ptr));

which is really just another way of writing the same thing.

Thanks,
Richard

Reply via email to