https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91883

Yichao Yu <yyc1992 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yyc1992 at gmail dot com

--- Comment #4 from Yichao Yu <yyc1992 at gmail dot com> ---
This may matter more on arm (or other risc architectures) where the constant
size might be more limited. (This is similar to the original example but the
saving here is on the encoding of the immediate rather than saving a shift
instruction)

An unsigned 8 bit divided by 3 on aarch64 is compiled to the following by gcc

```
        and     w0, w0, 255
        mov     w1, 43691
        movk    w1, 0xaaaa, lsl 16
        umull   x0, w0, w1
        ubfx    x0, x0, 33, 8
        ret
```

which requires two instructions to encode the constant into the w1 register.
OTOH, clang is smart enough to use a smaller constant instead with

```
        mov     w8, #171
        and     w9, w0, #0xff
        mul     w8, w9, w8
        lsr     w0, w8, #9
```

That fits into the immediate for a single instruction.

The GCC sequence works for at least up to 32bit unsigned integer which isn't
necessary here.
The clang optimization also works when using `assume` attribute or conditions
to limit the possible value range.

Reply via email to