https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85234

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I think this is another example of the same missed optimization:

bool is_7bit(wchar_t c)
{
    return c >= 0 && c < 128;
}

bool is_7bit_or(wchar_t c)
{
    return (c | 0x7F) == 0x7F;
}

bool is_7bit_cmpl(wchar_t c)
{
    return (c & ~(wchar_t)0x7F) == 0;
}

bool is_7bit_shift(wchar_t c)
{
    return (c >> 7) == 0;
}

https://gcc.godbolt.org/z/sx9jzKcrE

Clang compiles all four functions to exactly the same code, at any non-zero
optimization level. For x86_64:

        cmp     edi, 128
        setb    al
        ret

And for aarch64:

        cmp     w0, #128
        cset    w0, lo
        ret

But GCC produces different code for the last one, using a shift. For x86_64:

is_7bit(wchar_t):
        cmpl    $127, %edi
        setbe   %al
        ret
is_7bit_shift(wchar_t):
        shrl    $7, %edi
        sete    %al
        ret

And for aarch64:

is_7bit(wchar_t):
        cmp     w0, 127
        cset    w0, ls
        ret
is_7bit_shift(wchar_t):
        cmp     wzr, w0, lsr 7
        cset    w0, eq
        ret


On some targets the form using a shift uses more instructions, so this is a -Os
bug on those targets (at least avr, mips, hppa, loonarch64, s390x).

However, on powerpc64le, xtensa and tic6x the code for the shift is smaller.

And for risc-v the code is the same for all four functions:

        sltiu   a0,a0,128
        ret

In any case, if they're equivalent but one form produces smaller code, it seems
better to consistently optimize them to the same form (whichever one is right
for a given target).

Reply via email to