Let's consider this code fragment (from xalan):
bool isValidAncestorType(int type) {
if (type == 0 || type == 6 || type == 4) {
return true;
}
return false;
}
Right now it generates something like this for RISC-V with Zicond enabled:
li a5,6
bgtu a0,a5,.L3
li a5,81
bext a0,a5,a0
ret
.L3:
li a0,0
ret
But there's a clearly better sequence. LLVM for example generates:
addi a1, a0, -6
seqz a1, a1
andi a0, a0, -5
seqz a0, a0
or a0, a1, a0
And with a little work GCC can generate this (which is probably better
than LLVM's codegen ever-so-slightly):
li a4,6 # 8 [c=4 l=4] *movdi_64bit/1
li a5,81 # 33 [c=4 l=4] *movdi_64bit/1
bext a5,a5,a0 # 36 [c=4 l=4] *bextdi
sgtu a0,a0,a4 # 38 [c=4 l=4] *sgtu_didi
czero.nez a0,a5,a0 # 39 [c=4 l=4] *czero.nez.didi
There's multiple ways Daniel B. and I looked at to resolve this. At the
core the failure to if-convert is a cost model problem.
We could try to generate more efficient code out of cmove_arith. What
we're currently getting out of that routine is horribly bad:
```
(insn 10 3 34 2 (set (reg:DI 138)
(const_int 6 [0x6])) 277 {*movdi_64bit}
(nil))
(insn 34 10 35 2 (set (reg:DI 145)
(const_int 0 [0])) 277 {*movdi_64bit}
(nil))
(insn 35 34 36 2 (set (reg:DI 141)
(const_int 81 [0x51])) 277 {*movdi_64bit}
(nil))
(insn 36 35 37 2 (set (reg:DI 140 [ _3 ])
(lshiftrt:DI (reg:DI 141)
(subreg:QI (reg/v:DI 137 [ type ]) 0))) 301 {lshrdi3}
(nil))
(insn 37 36 38 2 (set (reg:DI 143)
(and:DI (reg:DI 140 [ _3 ])
(const_int 1 [0x1]))) 106 {*anddi3}
(nil))
(insn 38 37 40 2 (set (reg:DI 146)
(zero_extend:DI (subreg:QI (reg:DI 143) 0))) 126
{*zero_extendqidi2_internal}
(nil))
(insn 40 38 41 2 (set (reg:DI 147)
(gtu:DI (reg/v:DI 137 [ type ])
(reg:DI 138))) 428 {*sgtu_didi}
(nil))
(insn 41 40 42 2 (set (reg:DI 149)
(if_then_else:DI (ne:DI (reg:DI 147)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 146))) 40369 {*czero.nez.didi}
(nil))
(insn 42 41 43 2 (set (reg:DI 148)
(if_then_else:DI (eq:DI (reg:DI 147)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 145))) 40368 {*czero.eqz.didi}
(nil))
(insn 43 42 25 2 (set (reg:DI 136 [ <retval> ])
(plus:DI (reg:DI 148)
(reg:DI 149))) 5 {*adddi3}
```
In particular note insn 34, 42 and 43. Those are useless. Insns 36,
37, 38 are just a single bit extraction from a variable location (from
one of the if-converted blocks). I couldn't see a good way to fix the
problem with insn 34/insn 42. The desire to make the then/else blocks
independent is cmove_arith is good, what's unclear is whether or not
that code really cares about the *destination* of the then/else blocks.
But I set that aside.
We then thought that cleaning up the variable bit extraction would be
the way to go. So a pattern was constructed to match that form of
variable bit extract and the cost model was twiddled to return that it
was a single fast instruction. But even with those changes fwprop1
refused to make the substitution. Sigh. At least combine recognizes
the idiom later and cleans it up.
Then we realized we really should just ignore the (set (reg) (const_int
0)) in the if-converted sequence. We're going to be able to propagate
that away in nearly every case since we have a hard-wired zero register.
Sure enough, ignoring that insn was enough to tip the balance on this
case and we get the desired code.
Tested on riscv32-elf and riscv64-elf. Pioneer bootstrap is in flight,
though it won't really exercise this problem. The BPI's build hasn't
started yet, so it'll be at least 27hours before it's done.
Waiting on pre-commit CI before moving forward.
Jeff
gcc/
* config//riscv/riscv.cc (riscv_noce_conversion_profitable_p): Ignore
assignments of (const_int 0) to a register. They will get propagated
away.
gcc/testsuite
* gcc.target/riscv/czero-bext.c: New test.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e186c6a99e9d..2bfa953d3dca 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4722,6 +4722,13 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
if (last_dest)
last_dest = dest;
}
+ else if (REG_P (dest) && src == CONST0_RTX( GET_MODE (dest)))
+ {
+ /* A GPR set to zero can always be replaced with x0, so any
+ insn that sets a GPR to zero will eventually be eliminated. */
+ riscv_if_info.original_cost += COSTS_N_INSNS (1);
+ riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
+ }
else
last_dest = NULL_RTX;
diff --git a/gcc/testsuite/gcc.target/riscv/czero-bext.c
b/gcc/testsuite/gcc.target/riscv/czero-bext.c
new file mode 100644
index 000000000000..0cb32bd610d4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/czero-bext.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gc_zicond -mabi=lp64d" { target { rv64 } } } */
+/* { dg-options "-O2 -march=rv32gc_zicond -mabi=ilp32" { target { rv32 } } } */
+
+bool isValidAncestorType(int type) {
+ if (type == 0 || type == 6 || type == 4) {
+ return true;
+ }
+ return false;
+}
+
+
+
+/* { dg-final { scan-assembler "czero.nez\t" } } */
+/* { dg-final { scan-assembler "sgtu\t" } } */
+/* { dg-final { scan-assembler-not "bgtu\t" } } */
+