https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125191
Bug ID: 125191
Summary: lra introduces redundant vector reg copy with
paradoxical subreg
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: rdapp at gcc dot gnu.org
CC: garthlei at gcc dot gnu.org
Target Milestone: ---
Target: riscv
The following source
typedef char vnx16qi __attribute__ ((vector_size (16)));
void permute1 (vnx16qi values1, vnx16qi values2, vnx16qi *out)
{
vnx16qi v = __builtin_shufflevector (values1, values2, 0, 2, 4, 6, 8, 10, 12,
14, 16, 18, 20, 22, 24, 26, 28, 30);
*(vnx16qi *) out = v;
}
(and the riscv backend massaged to emit narrowing shifts instead of compress
insns for this permute case)
results in
...
vnsrl.wi v1,v1,0
vnsrl.wi v2,v2,0
vsetivli zero,16,e8,m1,ta,ma
vmv1r.v v3,v1 # redundant
vslideup.vi v3,v2,8
vse8.v v3,0(a4)
It should just be
vslideup.vi v1,v2,8
The insn is
(insn 24 34 25 2 (set (reg:V16QI 144 [ v_3 ])
(unspec:V16QI [
(unspec:V16BI [
(const_vector:V16BI [
(const_int 1 [0x1]) repeated x16
])
(const_int 16 [0x10])
(const_int 2 [0x2]) repeated x3
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(subreg:V16QI (reg:V8QI 146) 0)
(subreg:V16QI (reg:V8QI 147) 0)
(const_int 8 [0x8])
] UNSPEC_VSLIDEUP)) "bla.c":6:11 22727 {pred_slideupv16qi}
(note the paradoxical subreg on each source)
ira already decides to use the same register:
Popping a1(r144,l0) -- assign reg 97
Popping a2(r146,l0) -- assign reg 97
lra seems OK with that at first:
Considering alt=3 of insn 24: (0) &vr (1) Wc1 (2) 0 (3) vr (4)
rK (5) rvl (6) i (7) i (8) i
0 Early clobber: reject++
overall=1,losers=0,rld_nregs=0
Choosing alt 3 in insn 24: (0) &vr (1) Wc1 (2) 0 (3) vr (4) rK (5)
rvl (6) i (7) i (8) i {pred_slideupv16qi}
but then:
********** Assignment #1: **********
Spill r144 after risky transformations
getting us to:
(insn 44 23 24 2 (set (reg:V16QI 99 v3 [orig:144 v_3 ] [144])
(reg:V16QI 97 v1 [146])) "bla.c":6:11 3328 {*movv16qi}
(nil))
(insn 24 44 25 2 (set (reg:V16QI 99 v3 [orig:144 v_3 ] [144])
(unspec:V16QI [
(unspec:V16BI [
(const_vector:V16BI [
(const_int 1 [0x1]) repeated x16
])
(const_int 16 [0x10])
(const_int 2 [0x2]) repeated x3
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(reg:V16QI 99 v3 [orig:144 v_3 ] [144])
(reg:V16QI 98 v2 [147])
(const_int 8 [0x8])
] UNSPEC_VSLIDEUP)) "bla.c":6:11 22727 {pred_slideupv16qi}
(expr_list:REG_EQUIV (mem:V16QI (reg/f:DI 14 a4 [orig:156 out ] [156]) [0
*out_5(D)+0 S16 A128])
(nil)))
I suppose the paradoxical subreg is the culprit. I haven't done further
analysis and wanted to document the current state.