https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108999
Bug ID: 108999 Summary: Maybe LRA produce inaccurate hardware register occupancy information for subreg operand Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: lehua.ding at rivai dot ai Target Milestone: --- The problem code on the compiler explorer is here: https://godbolt.org/z/GaGWEahPY The problem is that the line `mov z1.d, z4.d` of the assembly code[1] is unnecessary. I find the reason is the LRA pass[2] thinks the hard registers `(subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])` occupied is in conflict with `(reg/v:VNx2DI 98 [ v19 ])`[3]. That is not true because r103 occupied 32 and 33, and r98 occupied 34 according to the dump information of IRA[4]. This is because the function process_alt_operands in lra-constraints.cc source file[5]. When it checks whether the operand 0 of insn 39 is in conflict with other operands of insn 39, it set operand 0 occupied 33 and 34 according to the mode(`biggest_mode[i]`) and the start hard regno 33(`clobbered_hard_regno`). The mode it used is VNx4DI, I think it should use Vnx2DI which is the proper mode for the entire operand 0. So for getting the occupied hard register of the normal subreg operand, it is maybe too wider if use the inner reg's mode. References: [1] assembly code ``` subreg_coalesce5: mov p1.b, p0.b ld2d {z0.d - z1.d}, p0/z, [x0] cmp w1, 0 ble .L2 sxtw x1, w1 mov x0, 0 .L3: ld1d z3.d, p1/z, [x2, x0, lsl 3] ld1d z2.d, p1/z, [x3, x0, lsl 3] add x0, x0, 1 movprfx z4.d, p0/z, z1.d mla z4.d, p0/m, z3.d, z2.d movprfx z0.d, p0/z, z0.d mla z0.d, p0/m, z3.d, z2.d mov z1.d, z4.d cmp x1, x0 bne .L3 .L2: st2d {z0.d - z1.d}, p0, [x4] ret ``` [2] partial content of LRA dump info ``` ... 0 Early clobber: reject++ 0 Conflict early clobber reload: reject-- alt=0,overall=6,losers=1,rld_nregs=0 0 Early clobber: reject++ alt=1,overall=1,losers=0,rld_nregs=0 Choosing alt 1 in insn 36: (0) &w (1) Upl (2) w (3) w (4) 0 (5) Dz {*cond_fmavnx2di_any} 0 Early clobber: reject++ 0 Matched conflict early clobber reloads: reject-- alt=0,overall=6,losers=1,rld_nregs=0 0 Early clobber: reject++ 0 Conflict early clobber reload: reject-- alt=1,overall=6,losers=1,rld_nregs=0 0 Early clobber: reject++ 2 Matching earlyclobber alt: reject-- alt=2,overall=6,losers=1,rld_nregs=1 0 Early clobber: reject++ 3 Matching earlyclobber alt: reject-- alt=3,overall=6,losers=1,rld_nregs=1 0 Early clobber: reject++ 5 Matching earlyclobber alt: reject-- 5 Non-pseudo reload: reject+=2 5 Non input pseudo reload: reject++ alt=4,overall=9,losers=1 -- refuse Staticly defined alt reject+=6 0 Early clobber: reject++ 5 Non-pseudo reload: reject+=2 5 Non input pseudo reload: reject++ alt=5,overall=16,losers=1 -- refuse Choosing alt 0 in insn 39: (0) =&w (1) Upl (2) w (3) w (4) w (5) Dz {*cond_fmavnx2di_any} Creating newreg=117, assigning class FP_REGS to r117 39: r117:VNx2DI=unspec[r104:VNx16BI#0,r97:VNx2DI*r98:VNx2DI+r103:VNx4DI#[16,16],const_vector] 284 REG_DEAD r98:VNx2DI REG_DEAD r97:VNx2DI Inserting insn reload after: 76: r103:VNx4DI#[16,16]=r117:VNx2DI ... ``` [3] partial rtl of IRA pass ```lisp (insn 36 43 37 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0) (unspec:VNx2DI [ (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0) (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ]) (reg/v:VNx2DI 98 [ v19 ])) (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) 0)) (const_vector:VNx2DI repeat [ (const_int 0 [0]) ]) ] UNSPEC_SEL)) "/app/example.c":13:25 discrim 1 7465 {*cond_fmavnx2di_any} (nil)) (insn 39 37 40 4 (set (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16]) (unspec:VNx2DI [ (subreg:VNx2BI (reg/v:VNx16BI 104 [ pg ]) 0) (plus:VNx2DI (mult:VNx2DI (reg/v:VNx2DI 97 [ v18 ]) (reg/v:VNx2DI 98 [ v19 ])) (subreg:VNx2DI (reg/v:VNx4DI 103 [ result ]) [16, 16])) (const_vector:VNx2DI repeat [ (const_int 0 [0]) ]) ] UNSPEC_SEL)) "/app/example.c":14:25 discrim 1 7465 {*cond_fmavnx2di_any} (expr_list:REG_DEAD (reg/v:VNx2DI 98 [ v19 ]) (expr_list:REG_DEAD (reg/v:VNx2DI 97 [ v18 ]) (nil)))) ``` [4] partial content of IRA dump info ``` Disposition: 6:r96 l0 69 24:r97 l0 35 23:r98 l0 34 4:r99 l0 1 3:r102 l0 0 2:r103 l0 32 1:r104 l0 68 5:r106 l0 1 7:r107 l0 2 8:r108 l0 3 0:r109 l0 4 12:r110 l0 68 9:r111 l0 0 14:r112 l0 1 13:r113 l0 2 11:r114 l0 3 10:r115 l0 4 ``` [5] partial source code of process_alt_operands ```c++ /* lra-constraints.cc */ static bool process_alt_operands (int only_alternative) { for (nop = 0; nop < n_operands; nop++) { ... biggest_mode[nop] = GET_MODE (op); if (GET_CODE (op) == SUBREG) { /* !!! Here use reg instead of subreg's mode */ biggest_mode[nop] = wider_subreg_mode (op); operand_reg[nop] = reg = SUBREG_REG (op); } } ... for (nalt = 0; nalt < n_alternatives; nalt++) { ... for (nop = 0; nop < early_clobbered_regs_num; nop++) { ... /* !!! Here set operand0 occupied 33 and 34, where: biggest_mode[i] is VNx4DI clobbered_hard_regno is 33 */ add_to_hard_reg_set (&temp_set, biggest_mode[i], clobbered_hard_regno); ... } } ... } ```