https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116904

            Bug ID: 116904
           Summary: RISC-V: address calculation not hoisted from loop
           Product: gcc
           Version: 14.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: a-horohorin at mail dot ru
  Target Milestone: ---

Hi, 
I have found a performance bug on RISC-V target.

Example:
```c
int main() {
    int arr[5000];
    for (int i = 0; i < 5000; ++i) {
        arr[i] = 1;
    }
    __asm__ volatile ( "" : : "r" (arr[42]) );
    return 0;
}
```

GCC13 produces following asm:
```
; riscv64-unknown-elf-gcc -S -O2 minimal.c
main:
        li      t0,-20480
        li      a3,20480
        addi    t0,t0,464
        li      a5,-20480
        add     sp,sp,t0
        addi    a5,a5,480
        addi    a4,a3,-480
        add     a4,a4,a5
        addi    a5,sp,16
        add     a5,a4,a5
        addi    a3,a3,-480
        addi    a4,sp,16
        add     a3,a3,a4
        li      a4,1
; loop contains 3 insns (ok)
.L2:
        sw      a4,0(a5)
        addi    a5,a5,4
        bne     a5,a3,.L2
        li      a4,20480
        li      a5,-20480
        addi    a4,a4,-480
        add     a4,a4,a5
        addi    a5,sp,16
        add     a5,a4,a5
        sw      a5,12(sp)
        lw      a5,572(a5)
        li      t0,20480
        addi    t0,t0,-464
        li      a0,0
        add     sp,sp,t0
        jr      ra
```

And GCC14 produces following asm:

```
; riscv64-unknown-elf-gcc -S -O2 minimal.c
main:
        li      t0,-20480
        addi    t0,t0,480
        add     sp,sp,t0
        mv      a5,sp
        li      a4,1
; loop contains 6 insns instead of 3!
.L2:
        li      a3,20480
        addi    a3,a3,-480
        sw      a4,0(a5)
        add     a3,a3,sp
        addi    a5,a5,4
        bne     a5,a3,.L2
        lw      a5,168(sp)
        li      t0,20480
        addi    t0,t0,-480
        li      a0,0
        add     sp,sp,t0
        jr      ra
```

Refer to https://godbolt.org/z/n6bzTzzhG

I can guess that the cause of the regression is equivalence substitution during
reload, which is not hoisted from loop in subsequent passes

```lra_dump_file from gcc14
...
    Changing pseudo 141 in operand 3 of insn 39 on equiv frame:SI+0x9c40
         Considering alt=0 of insn 39:   (2) r  (3) rJ
            3 Non-pseudo reload: reject+=2
            3 Non input pseudo reload: reject++
          overall=9,losers=1,rld_nregs=1
      Choosing alt 0 in insn 39:  (2) r  (3) rJ {*branchsi}
      Creating newreg=161, assigning class GR_REGS to r161
      Set class ALL_REGS for r162
   39: pc={(r138:SI!=r161:SI)?L38:pc}
      REG_BR_PROB 1062895956
    Inserting insn reload before:
   76: r162:SI=0xa000
   77: r161:SI=r162:SI-0x3c0
      REG_EQUAL 0x9c40
   78: r161:SI=r161:SI+frame:SI
...
```

Reply via email to