Hi, On Wed, 30 Jan 2013, Andrew Haley wrote:
> >>> It's an optimization to do so to avoid partial register stalls. > >> > >> Well, it's hardly an optimization if it's incorrect, and it seems to > >> be incorrect. Hmm? GCC generates code that doesn't rely on the extension taking place. > >> As the old saying goes, I can make your code > >> infinitely fast if you don't care about the results. > > > > It's incorrect to rely on the extension taking place. It's not > > incorrect to do the extension. > > Sure, I understand that, but I am completely baffled as to how extending > at a call site avoids partial register stalls if a callee cannot assume > that a value is already extended. Accessing the whole register in the callee (which would induce a partial reg stall if the caller wouldn't have written to the whole register first, one way is by extending) doesn't mean that it also relies on the content of those bits, it can (and must) ignore them. E.g. assume this function: uint8 andme (uint8 a, uint8 b) { return a & b; } One correct implementation for this function is: movl %esi, %eax andl %edi, %eax ret So the function accesses the upper unspecified bits, but doesn't rely on them (because also the upper bits of the return value are unspecified). If the caller would have set only the low 8 bits of the arguments (like it is allowed to do) that write would have incurred a partial reg stall, or alternatively the above two reads of the full reg would have incurred such a stall (on some architectures). So GCC chooses to overwrite the full register by some unspecified method (extending simply happens to be the most straight forward method). Ciao, Michael.