https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125387
Bug ID: 125387
Summary: riscv: smuldi3_highpart cost too high
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: anton at ozlabs dot org
Target Milestone: ---
I was debugging a situation where integer divide by a constant wasn't getting
converted to multiplication by the reciprocal shifted left, and the result
shifted right trick. An example:
#include <stdint.h>
int64_t foo(int64_t x)
{
return x / 30;
}
# gcc -O2 -march=rv64gcv -mtune=tt-ascalon-d8 div.c -S -dp
foo:
li a5,30 # 6 [c=4 l=4] *movdi_64bit/1
div a0,a0,a5 # 12 [c=52 l=4] divdi3
ret # 25 [c=0 l=4] simple_return
If I remove the Ascalon tune, we see the expected behaviour:
# gcc -O2 -march=rv64gcv div.c -S -dp
foo:
li a4,-2004316160 # 32 [c=4 l=4] *movdi_64bit/1
addi a4,a4,-1911 # 33 [c=4 l=4] *adddi3/1
slli a5,a4,32 # 23 [c=4 l=4] ashldi3
add a5,a5,a4 # 24 [c=4 l=4] *adddi3/0
mulh a5,a0,a5 # 8 [c=88 l=4] smuldi3_highpart
srai a4,a0,63 # 11 [c=4 l=4] ashrdi3
add a5,a5,a0 # 9 [c=4 l=4] *adddi3/0
srai a5,a5,4 # 10 [c=4 l=4] ashrdi3
sub a0,a5,a4 # 17 [c=4 l=4] subdi3
ret # 36 [c=0 l=4] simple_return
The Ascalon integer divide cost is lower than the default tune, but it's high
enough that the above code should be determined to be quicker. The
smuldi3_highpart just ends up being a mulh but notice the cost is very high
(88).
After stumbling around the RISC-V rtx cost code, I think we are adding the cost
of a lot of instructions (shifts, multiply, sign extensions etc). I think this
needs to be fixed in riscv_rtx_costs (it should just be the cost of 1 integer
multiply), but I'm fast getting out of my depth.
(insn 8 24 11 (set (reg:DI 15 a5 [137])
(truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI (reg:DI 10 a0
[orig:143 shiftby ] [143]))
(sign_extend:TI (reg:DI 15 a5 [138])))
(const_int 64 [0x40])))) "div.c":5:17 28 {smuldi3_highpart}
(expr_list:REG_EQUAL (truncate:DI (lshiftrt:TI (mult:TI (sign_extend:TI
(reg:DI 10 a0 [orig:143 shiftby ] [143]))
(const_int -8608480567731124087 [0x8888888888888889]))
(const_int 64 [0x40])))
(nil)))