https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124808
Bug ID: 124808
Summary: Loop rotation (?) generating poor code
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: daniel.barboza at oss dot qualcomm.com
Target Milestone: ---
This code:
long long* SetupPrecalculatedData (long long* a) {
long long b = 1;
for (int i = 0; i < 64; i++) {
if(i>>3 < 7)
a[i] += (b<<(i+8));
}
return a;
}
Generates the following RISC-V asm:
SetupPrecalculatedData(long long*):
mv a2,a0
li a4,0
li a1,7
li a6,64
.L5:
addiw a5,a4,8
srai a3,a4,3
bset a5,x0,a5
beq a3,a1,.L2
.L7:
ld a3,0(a2)
addiw a4,a4,1
addi a2,a2,8
add a5,a3,a5
sd a5,-8(a2)
srai a3,a4,3
addiw a5,a4,8
bset a5,x0,a5
bne a3,a1,.L7
.L2:
addiw a4,a4,1
addi a2,a2,8
bne a4,a6,.L5
ret
This from LLVM looks generally better:
SetupPrecalculatedData(long long*):
li a1, 0
li a7, 55
li a6, 256
li a4, 64
mv a5, a0
j .LBB0_2
.LBB0_1:
addi a1, a1, 1
addi a5, a5, 8
beq a1, a4, .LBB0_4
.LBB0_2:
bltu a7, a1, .LBB0_1
ld a3, 0(a5)
sll a2, a6, a1
add a2, a2, a3
sd a2, 0(a5)
j .LBB0_1
.LBB0_4:
ret
This is reproducible for other targets like aarch64 and x86.
I plan to take a look at this one, mostly to get acquainted with how loop
rotation and tree-ssa-loop and friends work, but if someone else with more
experience wants to give it a go be my guest.