Hi,

On Wed, 30 Jan 2013, Andrew Haley wrote:

> >>> It's an optimization to do so to avoid partial register stalls.
> >>
> >> Well, it's hardly an optimization if it's incorrect, and it seems to 
> >> be incorrect.

Hmm?  GCC generates code that doesn't rely on the extension taking place.

> >> As the old saying goes, I can make your code 
> >> infinitely fast if you don't care about the results.
> > 
> > It's incorrect to rely on the extension taking place.  It's not 
> > incorrect to do the extension.
> 
> Sure, I understand that, but I am completely baffled as to how extending 
> at a call site avoids partial register stalls if a callee cannot assume 
> that a value is already extended.

Accessing the whole register in the callee (which would induce a 
partial reg stall if the caller wouldn't have written to the whole 
register first, one way is by extending) doesn't mean that it also relies 
on the content of those bits, it can (and must) ignore them.

E.g. assume this function:

  uint8 andme (uint8 a, uint8 b) { return a & b; }

One correct implementation for this function is:

        movl    %esi, %eax
        andl    %edi, %eax
        ret

So the function accesses the upper unspecified bits, but doesn't rely on 
them (because also the upper bits of the return value are unspecified).
If the caller would have set only the low 8 bits of the arguments (like it 
is allowed to do) that write would have incurred a partial reg stall, or 
alternatively the above two reads of the full reg would have incurred such 
a stall (on some architectures).  So GCC chooses to overwrite the full 
register by some unspecified method (extending simply happens to be the 
most straight forward method).


Ciao,
Michael.

Reply via email to