https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91883
Yichao Yu <yyc1992 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |yyc1992 at gmail dot com
--- Comment #4 from Yichao Yu <yyc1992 at gmail dot com> ---
This may matter more on arm (or other risc architectures) where the constant
size might be more limited. (This is similar to the original example but the
saving here is on the encoding of the immediate rather than saving a shift
instruction)
An unsigned 8 bit divided by 3 on aarch64 is compiled to the following by gcc
```
and w0, w0, 255
mov w1, 43691
movk w1, 0xaaaa, lsl 16
umull x0, w0, w1
ubfx x0, x0, 33, 8
ret
```
which requires two instructions to encode the constant into the w1 register.
OTOH, clang is smart enough to use a smaller constant instead with
```
mov w8, #171
and w9, w0, #0xff
mul w8, w9, w8
lsr w0, w8, #9
```
That fits into the immediate for a single instruction.
The GCC sequence works for at least up to 32bit unsigned integer which isn't
necessary here.
The clang optimization also works when using `assume` attribute or conditions
to limit the possible value range.