https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124808

            Bug ID: 124808
           Summary: Loop rotation (?) generating poor code
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: daniel.barboza at oss dot qualcomm.com
  Target Milestone: ---

This code:

long long* SetupPrecalculatedData (long long* a) {
    long long b = 1;
    for (int i = 0; i < 64; i++) {
        if(i>>3 < 7)
            a[i] += (b<<(i+8));
    }
    return a;
}


Generates the following RISC-V asm:


SetupPrecalculatedData(long long*):
        mv      a2,a0
        li      a4,0
        li      a1,7
        li      a6,64
.L5:
        addiw   a5,a4,8
        srai    a3,a4,3
        bset    a5,x0,a5
        beq     a3,a1,.L2
.L7:
        ld      a3,0(a2)
        addiw   a4,a4,1
        addi    a2,a2,8
        add     a5,a3,a5
        sd      a5,-8(a2)
        srai    a3,a4,3
        addiw   a5,a4,8
        bset    a5,x0,a5
        bne     a3,a1,.L7
.L2:
        addiw   a4,a4,1
        addi    a2,a2,8
        bne     a4,a6,.L5
        ret



This from LLVM looks generally better:

SetupPrecalculatedData(long long*):         
        li      a1, 0
        li      a7, 55
        li      a6, 256
        li      a4, 64
        mv      a5, a0
        j       .LBB0_2
.LBB0_1:
        addi    a1, a1, 1
        addi    a5, a5, 8
        beq     a1, a4, .LBB0_4
.LBB0_2:
        bltu    a7, a1, .LBB0_1
        ld      a3, 0(a5)
        sll     a2, a6, a1
        add     a2, a2, a3
        sd      a2, 0(a5)
        j       .LBB0_1
.LBB0_4:
        ret

This is reproducible for other targets like aarch64 and x86.


I plan to take a look at this one, mostly to get acquainted with how loop
rotation and tree-ssa-loop and friends work, but if someone else with more
experience wants to give it a go be my guest.

Reply via email to