Hi,

on 2024/3/1 10:41, HAO CHEN GUI wrote:
> Hi,
>   This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
> combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an out AND. It matches a DImode rotate and mask insert on
> rs6000.
> 
> Trying 2 -> 7:
>     2: r122:DI=r129:DI
>       REG_DEAD r129:DI
>     7: r125:SI=r122:DI#0 0>>0x1f
>       REG_DEAD r122:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
>     (zero_extract:DI (reg:DI 129)
>         (const_int 32 [0x20])
>         (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 125 [ x ]) 0)
>     (and:DI (lshiftrt:DI (reg:DI 129)
>             (const_int 31 [0x1f]))
>         (const_int 4294967295 [0xffffffff])))
> 
> This conversion blocks the further combination which combines to a SImode
> rotate and mask insert insn.
> 
> Trying 9, 7 -> 10:
>     9: r127:SI=r130:DI#0&0xfffffffffffffffe
>       REG_DEAD r130:DI
>     7: r125:SI#0=r129:DI 0>>0x1f&0xffffffff
>       REG_DEAD r129:DI
>    10: r124:SI=r127:SI|r125:SI
>       REG_DEAD r125:SI
>       REG_DEAD r127:SI
> Failed to match this instruction:
> (set (reg:SI 124)
>     (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (zero_extract:DI (reg:DI 129)
>                 (const_int 32 [0x20])
>                 (const_int 1 [0x1])) 0)))
> Failed to match this instruction:
> (set (reg:SI 124)
>     (ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
>                     (const_int 31 [0x1f]))
>                 (const_int 4294967295 [0xffffffff])) 0)))
> 
>   The root cause of the issue is if it's necessary to do the widen mode for
> lshiftrt when the target already has the narrow mode lshiftrt and its cost
> is not high. My former patch tried to fix the problem but not accepted yet.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

Hope Segher can chime in this proposal updating combine pass, I can understand
the new proposal by introducing new patterns in target code is able to fix the
issue, but IMHO it's likely there are some other mis-optimization which don't
get noticed and they need some similar pattern extension (duplicate some pattern
& adjust with subreg) to optimize, from this perspective, it would be nice if
it's possible to have a more general fix.

Some minor comments for this patch itself are inlined.

> 
>   As it's stage 4 now, I drafted this patch to fix the regression by adding
> subreg patterns of SImode rotate and mask insert. It actually does reversed
> things and narrow the mode for lshiftrt so that it can matches the SImode
> rotate and mask insert.
> 
>   The case "rlwimi-2.c" is fixed and restore the corresponding number of
> insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
> is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
> a regression as the total number of insns isn't changed.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add subreg patterns for SImode rotate and mask insert
> 
> In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
> lshiftrt with an AND.  The new pattern matches rotate and mask insert on
> rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
> and mask insert pattern.  This patch fixes the problem by adding two subreg
> pattern for SImode rotate and mask insert patterns.
> 
> gcc/
>       PR target/93738
>       * config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
>       (*rotlsi3_insert_8): New.
> 
> gcc/testsuite/
>       PR target/93738
>       * gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
>       rotate instructions.
>       * gcc.target/powerpc/rlwinm-0.c: Likewise.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bc8bc6ab060..b0b40f91e3e 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -4253,6 +4253,36 @@ (define_insn "*rotl<mode>3_insert"
>  ; difference between rlwimi and rldimi.  We also might want dot forms,
>  ; but not for rlwimi on POWER4 and similar processors.
> 
> +; Subreg pattern of insn "*rotlsi3_insert"
> +(define_insn_and_split "*rotlsi3_insert_9"

Nit: "*rotlsi3_insert_subreg" seems a better name, ...

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> +     (ior:SI (and:SI
> +              (match_operator:SI 8 "lowpart_subreg_operator"
> +               [(and:DI (match_operator:DI 4 "rotate_mask_operator"
> +                         [(match_operand:DI 1 "gpc_reg_operand" "r")
> +                          (match_operand:SI 2 "const_int_operand" "n")])
> +                        (match_operand:DI 3 "const_int_operand" "n"))])
> +              (match_operand:SI 5 "const_int_operand" "n"))
> +             (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
> +                     (match_operand:SI 7 "const_int_operand" "n"))))]
> +  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
> +   && GET_CODE (operands[4]) == LSHIFTRT
> +   && INTVAL (operands[3]) == 0xffffffff
> +   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +     (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
> +                                  (match_dup 2))
> +                     (match_dup 5))
> +             (and:SI (match_dup 6)
> +                     (match_dup 7))))]
> +{
> +  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> +  operands[9] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> +  [(set_attr "type" "insert")])
> +
>  (define_insn "*rotl<mode>3_insert_2"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
>       (ior:GPR (and:GPR (match_operand:GPR 5 "gpc_reg_operand" "0")
> @@ -4331,6 +4361,31 @@ (define_insn "*rotlsi3_insert_4"
>    "rlwimi %0,%1,32-%h2,%h2,31"
>    [(set_attr "type" "insert")])
> 
> +; Subreg pattern of insn "*rotlsi3_insert_4"
> +(define_insn_and_split "*rotlsi3_insert_8"

..., and "*rotlsi3_insert_4_subreg" for this.

> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> +     (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
> +                     (match_operand:SI 4 "const_int_operand" "n"))
> +             (match_operator:SI 6 "lowpart_subreg_operator"
> +               [(and:DI
> +                 (lshiftrt:DI (match_operand:DI 1 "gpc_reg_operand" "r")
> +                              (match_operand:SI 2 "const_int_operand" "n"))
> +                 (match_operand:DI 5 "const_int_operand" "n"))])))]
> +  "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32
> +   && INTVAL (operands[5]) == 0xffffffff"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +     (ior:SI (and:SI (match_dup 3)
> +                     (match_dup 4))
> +             (lshiftrt:SI (match_dup 7)
> +                          (match_dup 2))))]
> +{
> +  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
> +  operands[7] = gen_rtx_SUBREG (SImode, operands[1], offset);
> +}
> +  [(set_attr "type" "insert")])
> +
>  (define_insn "*rotlsi3_insert_5"
>    [(set (match_operand:SI 0 "gpc_reg_operand" "=r,r")
>       (ior:SI (and:SI (match_operand:SI 1 "gpc_reg_operand" "0,r")
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c 
> b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> index bafa371db73..62344a95aa0 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -6,10 +6,9 @@
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 6728 { target lp64 } } 
> } */
> 
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 } } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c 
> b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> index 4f4fca2d8ef..a10d9174306 100644
> --- a/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwinm-0.c
> @@ -4,10 +4,10 @@
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 6739 { target ilp32 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9716 { target lp64 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+blr} 3375 } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3081 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 3090 { target lp64 } } 
> } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3197 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3093 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwinm} 3084 { target lp64 } } 
> } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+rotlwi} 154 } } */
>  /* { dg-final { scan-assembler-times {(?n)^\s+srwi} 13 { target ilp32 } } } 
> */
>  /* { dg-final { scan-assembler-times {(?n)^\s+srdi} 13 { target lp64 } } } */

Peter's commit r14-9085-g81e5f276c59897 has fixed this with same counts, not 
rebased?

BR,
Kewen

Reply via email to