https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116904
Bug ID: 116904 Summary: RISC-V: address calculation not hoisted from loop Product: gcc Version: 14.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: a-horohorin at mail dot ru Target Milestone: --- Hi, I have found a performance bug on RISC-V target. Example: ```c int main() { int arr[5000]; for (int i = 0; i < 5000; ++i) { arr[i] = 1; } __asm__ volatile ( "" : : "r" (arr[42]) ); return 0; } ``` GCC13 produces following asm: ``` ; riscv64-unknown-elf-gcc -S -O2 minimal.c main: li t0,-20480 li a3,20480 addi t0,t0,464 li a5,-20480 add sp,sp,t0 addi a5,a5,480 addi a4,a3,-480 add a4,a4,a5 addi a5,sp,16 add a5,a4,a5 addi a3,a3,-480 addi a4,sp,16 add a3,a3,a4 li a4,1 ; loop contains 3 insns (ok) .L2: sw a4,0(a5) addi a5,a5,4 bne a5,a3,.L2 li a4,20480 li a5,-20480 addi a4,a4,-480 add a4,a4,a5 addi a5,sp,16 add a5,a4,a5 sw a5,12(sp) lw a5,572(a5) li t0,20480 addi t0,t0,-464 li a0,0 add sp,sp,t0 jr ra ``` And GCC14 produces following asm: ``` ; riscv64-unknown-elf-gcc -S -O2 minimal.c main: li t0,-20480 addi t0,t0,480 add sp,sp,t0 mv a5,sp li a4,1 ; loop contains 6 insns instead of 3! .L2: li a3,20480 addi a3,a3,-480 sw a4,0(a5) add a3,a3,sp addi a5,a5,4 bne a5,a3,.L2 lw a5,168(sp) li t0,20480 addi t0,t0,-480 li a0,0 add sp,sp,t0 jr ra ``` Refer to https://godbolt.org/z/n6bzTzzhG I can guess that the cause of the regression is equivalence substitution during reload, which is not hoisted from loop in subsequent passes ```lra_dump_file from gcc14 ... Changing pseudo 141 in operand 3 of insn 39 on equiv frame:SI+0x9c40 Considering alt=0 of insn 39: (2) r (3) rJ 3 Non-pseudo reload: reject+=2 3 Non input pseudo reload: reject++ overall=9,losers=1,rld_nregs=1 Choosing alt 0 in insn 39: (2) r (3) rJ {*branchsi} Creating newreg=161, assigning class GR_REGS to r161 Set class ALL_REGS for r162 39: pc={(r138:SI!=r161:SI)?L38:pc} REG_BR_PROB 1062895956 Inserting insn reload before: 76: r162:SI=0xa000 77: r161:SI=r162:SI-0x3c0 REG_EQUAL 0x9c40 78: r161:SI=r161:SI+frame:SI ... ```