Hi,
on 2024/3/1 10:41, HAO CHEN GUI wrote:
> Hi,
> This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
> combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an out AND. It matches a DImode rotate and mask insert on
> rs6000.
>
> Trying 2 -> 7:
> 2: r122:DI=r129:DI
> REG_DEAD r129:DI
> 7: r125:SI=r122:DI#0 0>>0x1f
> REG_DEAD r122:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
> (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0x])))
>
> This conversion blocks the further combination which combines to a SImode
> rotate and mask insert insn.
>
> Trying 9, 7 -> 10:
> 9: r127:SI=r130:DI#0&0xfffe
> REG_DEAD r130:DI
> 7: r125:SI#0=r129:DI 0>>0x1f&0x
> REG_DEAD r129:DI
>10: r124:SI=r127:SI|r125:SI
> REG_DEAD r125:SI
> REG_DEAD r127:SI
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffe]))
> (subreg:SI (zero_extract:DI (reg:DI 129)
> (const_int 32 [0x20])
> (const_int 1 [0x1])) 0)))
> Failed to match this instruction:
> (set (reg:SI 124)
> (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
> (const_int -2 [0xfffe]))
> (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0x])) 0)))
>
> The root cause of the issue is if it's necessary to do the widen mode for
> lshiftrt when the target already has the narrow mode lshiftrt and its cost
> is not high. My former patch tried to fix the problem but not accepted yet.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html
Hope Segher can chime in this proposal updating combine pass, I can understand
the new proposal by introducing new patterns in target code is able to fix the
issue, but IMHO it's likely there are some other mis-optimization which don't
get noticed and they need some similar pattern extension (duplicate some pattern
& adjust with subreg) to optimize, from this perspective, it would be nice if
it's possible to have a more general fix.
Some minor comments for this patch itself are inlined.
>
> As it's stage 4 now, I drafted this patch to fix the regression by adding
> subreg patterns of SImode rotate and mask insert. It actually does reversed
> things and narrow the mode for lshiftrt so that it can matches the SImode
> rotate and mask insert.
>
> The case "rlwimi-2.c" is fixed and restore the corresponding number of
> insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
> is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
> a regression as the total number of insns isn't changed.
>
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
>
> Thanks
> Gui Haochen
>
>
> ChangeLog
> rs6000: Add subreg patterns for SImode rotate and mask insert
>
> In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an AND. The new pattern matches rotate and mask insert on
> rs6000. Thus it blocks the pattern to be further combined to a SImode rotate
> and mask insert pattern. This patch fixes the problem by adding two subreg
> pattern for SImode rotate and mask insert patterns.
>
> gcc/
> PR target/93738
> * config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
> (*rotlsi3_insert_8): New.
>
> gcc/testsuite/
> PR target/93738
> * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
> rotate instructions.
> * gcc.target/powerpc/rlwinm-0.c: Likewise.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bc8bc6ab060..b0b40f91e3e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
> ; difference between rlwimi and rldimi. We also might want dot forms,
> ; but not for rlwimi on POWER4 and similar processors.
>
> +; Subreg pattern of insn "*rotlsi3_insert"
> +(define_insn_and_split "*rotlsi3_insert_9"
Nit: "*rotlsi3_insert_subreg" seems a better name, ...
> + [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> + (ior:SI (and:SI
> + (match_operator:SI 8 "lowpart_subreg_operator"
> + [(and:DI (match_operator:DI 4 "rotate_mask_operator"
> + [(match_operand:DI 1 "gpc_reg_operand" "r")
> + (match_operand:SI 2 "const_int_operand" "n")])
> +(match_operand:DI 3 "const_int_ope