https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101200

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #3)
> For aarch64 we get:
>         adrp    x1, .LANCHOR0
>         add     x0, x1, :lo12:.LANCHOR0
>         add     x0, x0, 8
>         ldrb    w1, [x1, #:lo12:.LANCHOR0]
>         and     x2, x1, 15
>         ubfx    x1, x1, 4, 4
>         ldr     w2, [x0, x2, lsl 2]
>         str     w2, [x0, x1, lsl 2]
>         ret
> 
> Note the shift and and is combined into one instruction (ubfx) but really
> only a shift instruction is needed.
> Here we have:
> Trying 21 -> 22:
>    21: r112:SI=r92:SI 0>>0x4
>       REG_DEAD r92:SI
>    22: r113:DI=sign_extend(r112:SI)
>       REG_DEAD r112:SI
> Successfully matched this instruction:
> (set (reg:DI 113)
>     (zero_extract:DI (subreg:DI (reg:SI 92 [ d.0_1 ]) 0)
>         (const_int 4 [0x4])
>         (const_int 4 [0x4])))
> 
> The multiple modes issue is part of the problem.  If I was redesigning the
> backends, I would only allow DI mode (and SI mode for i386) and always have
> the zero extends on loads.

Note the aarch64 issue has been solved (maybe by accident).
forwprop props the sign_extend into the load early on.
```
propagating insn 22 into insn 23, replacing:
(set (mem:SI (plus:DI (mult:DI (reg:DI 121 [ _3 ])
                (const_int 4 [0x4]))
            (reg/f:DI 111)) [1 c.b[_3]+0 S4 A32])
    (reg:SI 103 [ _4 ]))
successfully matched this instruction to *movsi_aarch64:
(set (mem:SI (plus:DI (mult:DI (sign_extend:DI (reg:SI 120 [ _3 ]))
                (const_int 4 [0x4]))
            (reg/f:DI 111)) [1 c.b[_3]+0 S4 A32])
    (reg:SI 103 [ _4 ]))
rescanning insn with uid = 23.
updating insn 23 in-place
```

And then combine does not seen the sign_extend at all. But the sign_extend here
is still an issue since it is not needed either.

we now get:
```
        add     x0, x0, 16
        ldrb    w1, [x1, #:lo12:.LANCHOR0]
        and     w2, w1, 15
        lsr     w1, w1, 4
        ldr     w2, [x0, w2, sxtw 2]
        str     w2, [x0, w1, sxtw 2]
```

The sxtw is not needed, it should just be lsl for both cases ...

Reply via email to