https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234
--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I think this is another example of the same missed optimization:
bool is_7bit(wchar_t c)
{
return c >= 0 && c < 128;
}
bool is_7bit_or(wchar_t c)
{
return (c | 0x7F) == 0x7F;
}
bool is_7bit_cmpl(wchar_t c)
{
return (c & ~(wchar_t)0x7F) == 0;
}
bool is_7bit_shift(wchar_t c)
{
return (c >> 7) == 0;
}
https://gcc.godbolt.org/z/sx9jzKcrE
Clang compiles all four functions to exactly the same code, at any non-zero
optimization level. For x86_64:
cmp edi, 128
setb al
ret
And for aarch64:
cmp w0, #128
cset w0, lo
ret
But GCC produces different code for the last one, using a shift. For x86_64:
is_7bit(wchar_t):
cmpl $127, %edi
setbe %al
ret
is_7bit_shift(wchar_t):
shrl $7, %edi
sete %al
ret
And for aarch64:
is_7bit(wchar_t):
cmp w0, 127
cset w0, ls
ret
is_7bit_shift(wchar_t):
cmp wzr, w0, lsr 7
cset w0, eq
ret
On some targets the form using a shift uses more instructions, so this is a -Os
bug on those targets (at least avr, mips, hppa, loonarch64, s390x).
However, on powerpc64le, xtensa and tic6x the code for the shift is smaller.
And for risc-v the code is the same for all four functions:
sltiu a0,a0,128
ret
In any case, if they're equivalent but one form produces smaller code, it seems
better to consistently optimize them to the same form (whichever one is right
for a given target).