[to-be-committed][RISC-V][PR target/116256] Improve handling of single bit constants

Jeff Law Sat, 25 Jan 2025 06:23:07 -0800

So under the umbrella of pr116256 (P3 regression) I've been exploringremoval of the mvconst_internal pattern. Not surprisingly, that'sgoing to cause all kinds of undesirable fallout. While I can kind ofsee a path forward for that work, it's going to require some combinework that I don't think we want to tackle in the context of gcc-15.

Essentially without mvconst_internal we'll have fully exposed constantsynthesis prior to combine. Remember that combine has limits on whatcombinations it will perform based on how many instructions are in thesource sequence. If we need 2+ instructions to synthesize the constant,those eat into our budget.

In a world without mvconst_internal we'd need to either improve combineto handle 5 insns cases (which do show up in the testsuite) or we needto significantly improve how combine handles REG_EQUAL notes. 5 insncombinations sound like insanity to me. So I'd tend to lean towards thelatter, though that's going to need some refactoring and diving intonote redistribution (ugh!).

In the mean time we can start limiting mvconst_internal. For theremaining case in pr116256 we have this code in combine:

(insn 8 5 10 2 (set (reg:V2048HF 138 [ _5 ])
        (vec_duplicate:V2048HF (reg:HF 142 [ x ]))) "j.c":152:11 3712 
{*vec_duplicatev2048hf}
     (expr_list:REG_DEAD (reg:HF 142 [ x ])
        (nil)))
(insn 10 8 11 2 (set (reg:DI 139)
        (const_int 2048 [0x800])) "j.c":152:11 275 {*mvconst_internal}

(nil))(insn 11 10 0 2 (set (mem:V2048HF (reg/f:DI 141 [ in ]) [1 MEM <vector(2048) _Float16> [(_Float16 *)in_7(D)]+0 S4096 A128])

        (if_then_else:V2048HF (unspec:V2048BI [
                    (const_vector:V2048BI [
                            (const_int 1 [0x1]) repeated x2048
                        ])
                    (reg:DI 139)
                    (const_int 2 [0x2]) repeated x3
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg:V2048HF 138 [ _5 ])
            (unspec:V2048HF [
                    (reg:DI 0 zero)
                ] UNSPEC_VUNDEF))) "j.c":152:11 3843 {*pred_movv2048hf}
     (expr_list:REG_DEAD (reg/f:DI 141 [ in ])
        (expr_list:REG_DEAD (reg:DI 0 zero)
            (expr_list:REG_DEAD (reg:SI 66 vl)
                (expr_list:REG_DEAD (reg:SI 67 vtype)
                    (expr_list:REG_DEAD (reg:V2048HF 138 [ _5 ])
                        (expr_list:REG_DEAD (reg:DI 139)
                            (nil))))))))

Note a couple things. First insn 8 will be split shortly after combineand will need the constant 2048. But that's obviously exposed late.Second (of course) is the mvconst_internal pattern at insn 10. Aftersplit1 we'll have:

(insn 16 5 17 2 (set (reg:DI 144)(const_int 4096 [0x1000])) "j.c":152:11 -1

     (nil))
(insn 17 16 18 2 (set (reg:DI 143)
        (plus:DI (reg:DI 144)
            (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
     (expr_list:REG_EQUAL (const_int 2048 [0x800])
        (nil)))
(insn 18 17 19 2 (set (reg:V2048HF 138 [ _5 ])

(if_then_else:V2048HF (unspec:V2048BI [(const_vector:V2048BI [

                            (const_int 1 [0x1]) repeated x2048
                        ])
                    (reg:DI 143)
                    (const_int 2 [0x2]) repeated x3
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:V2048HF (reg:HF 142 [ x ]))

(unspec:V2048HF [(reg:DI 0 zero)

                ] UNSPEC_VUNDEF))) "j.c":152:11 -1
     (nil))
(insn 19 18 20 2 (set (reg:DI 145)
        (const_int 4096 [0x1000])) "j.c":152:11 -1
     (nil))
(insn 20 19 11 2 (set (reg:DI 139)
        (plus:DI (reg:DI 145)
            (const_int -2048 [0xfffffffffffff800]))) "j.c":152:11 -1
     (expr_list:REG_EQUAL (const_int 2048 [0x800])
        (nil)))
(insn 11 20 0 2 (set (mem:V2048HF (reg/f:DI 141 [ in ]) [1 MEM <vector(2048) 
_Float16> [(_Float16 *)in_7(D)]+0 S4096 A128])
        (if_then_else:V2048HF (unspec:V2048BI [
                    (const_vector:V2048BI [
                            (const_int 1 [0x1]) repeated x2048
                        ])

(reg:DI 139)(const_int 2 [0x2]) repeated x3

                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (reg:V2048HF 138 [ _5 ])

(unspec:V2048HF [(reg:DI 0 zero)

                ] UNSPEC_VUNDEF))) "j.c":152:11 3843 {*pred_movv2048hf}
     (expr_list:REG_DEAD (reg/f:DI 141 [ in ])

(expr_list:REG_DEAD (reg:DI 0 zero)(expr_list:REG_DEAD (reg:SI 66 vl)

                (expr_list:REG_DEAD (reg:SI 67 vtype)
                    (expr_list:REG_DEAD (reg:V2048HF 138 [ _5 ])
                        (expr_list:REG_DEAD (reg:DI 139)

(nil))))))))

Note the synthesis of 2048 appears twice. I seriously considered addinga local cprop pass at this point. That could be done with a bit ofwork. It didn't look too bad -- the biggest problem is cprop isn'tdesigned to run once we've left cfglayout. But we could probablyfinesse that by not allowing it to change jumps if we've left cfglayoutor converting it to do the more complex jump fixups.

You might ask why the post-reload optimizers don't help since this atleast looks like a case where they could. After LRA the RTL looks like:

(insn 26 5 25 2 (set (reg:DI 15 a5 [144])

(const_int 4096 [0x1000])) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 277 {*movdi_64bit}(expr_list:REG_EQUIV (const_int 4096 [0x1000])

        (nil)))
(insn 25 26 19 2 (set (reg:DI 15 a5 [143])
        (plus:DI (reg:DI 15 a5 [144])
            (const_int -2048 [0xfffffffffffff800]))) 
"/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11
 5 {adddi3}
     (expr_list:REG_EQUIV (const_int 2048 [0x800])
        (nil)))
(insn 19 25 20 2 (set (reg:V2048QI 100 v4 [orig:138 _11 ] [138])
        (if_then_else:V2048QI (unspec:V2048BI [
                    (const_vector:V2048BI [
                            (const_int 1 [0x1]) repeated x2048
                        ])
                    (reg:DI 15 a5 [143])
                    (const_int 2 [0x2]) repeated x3
                    (reg:SI 66 vl)
                    (reg:SI 67 vtype)
                ] UNSPEC_VPREDICATE)
            (vec_duplicate:V2048QI (reg:QI 12 a2 [145]))

(unspec:V2048QI [(reg:DI 0 zero)

                ] UNSPEC_VUNDEF))) 
"/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11
 4172 {*pred_broadcastv2048qi}

(nil))(insn 20 19 21 2 (set (reg:DI 15 a5 [146])(const_int 4096 [0x1000])) "/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11 277 {*movdi_64bit}(expr_list:REG_EQUIV (const_int 4096 [0x1000])

        (nil)))
(insn 21 20 11 2 (set (reg:DI 15 a5 [139])
        (plus:DI (reg:DI 15 a5 [146])
            (const_int -2048 [0xfffffffffffff800]))) 
"/home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/dup-1.c":152:11
 5 {adddi3}
     (expr_list:REG_EQUIV (const_int 2048 [0x800])
        (nil)))

Note the re-use of a5 for the constant synthesis steps. That's going tospoil any chance of reload_cse saving us. That re-use also gets in theway of vsetvl elimination and we ultimately get this code:

foo10:
        li      a5,4096
        addi    a5,a5,-2048
        vsetvli zero,a5,e16,m8,ta,ma
        vfmv.v.f        v8,fa0
        li      a5,4096
        addi    a5,a5,-2048
        vsetvli zero,a5,e16,m8,ta,ma
        vse16.v v8,0(a0)
        ret

The regression is we have the obviously redundant vsetvl. Theadditional copy of the synthesis is undesirable as well.

If we filter out single bit constants from mvconst_internal we triviallyfix that regression. The only fallout is a class of saturation testswhich want to test against 0x80000000. Under the hood this is a minorcodegen issue interacting badly with combine's deliberate rejection ofsimplification of extensions of constants. Rather than constructing theSImode constant, then zero extending the result we can just generate theconstant we actually want directly in DImode.

The net is we fix the regression, don't introduce any obvious newregressions and slightly reduce our dependence on mvconst_internal. Allgood in my book. Obviously I'll wait for pre-commit CI to render a verdict.


jeff

        PR target/116256
gcc/
        * config/riscv/riscv.md (mvconst_internal): Reject single bit
        constants.
        * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Improve
        handling constants.

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e4123c912dc..587bb1c5572 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2470,7 +2470,8 @@ (define_insn_and_split "*mvconst_internal"
         (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
   "!ira_in_progress
    && !(p2m1_shift_operand (operands[1], <MODE>mode)
-        || high_mask_shift_operand (operands[1], <MODE>mode))"
+        || high_mask_shift_operand (operands[1], <MODE>mode)
+       || exact_log2 (INTVAL (operands[1])) >= 0)"
   "#"
   "&& 1"
   [(const_int 0)]
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5a3a0504177..4652454b8fe 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12684,10 +12684,16 @@ riscv_gen_zero_extend_rtx (rtx x, machine_mode mode)
     emit_move_insn (xmode_reg, x);
   else
     {
-      rtx reg_x = gen_reg_rtx (mode);
+      /* Combine deliberately does not simplify extensions of constants
+        (long story).  So try to generate the zero extended constant
+        efficiently.
 
-      emit_move_insn (reg_x, x);
-      riscv_emit_unary (ZERO_EXTEND, xmode_reg, reg_x);
+        First extract the constant and mask off all the bits not in MODE.  */
+      HOST_WIDE_INT val = INTVAL (x);
+      val &= GET_MODE_MASK (mode);
+
+      /* X may need synthesis, so do not blindly copy it.  */
+      xmode_reg = force_reg (Xmode, gen_int_mode (val, Xmode));
     }
 
   return xmode_reg;

[to-be-committed][RISC-V][PR target/116256] Improve handling of single bit constants

Reply via email to