https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95019
Bug ID: 95019 Summary: Optimizer produces suboptimal code related to -ftree-ivopts Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhongyunde at tom dot com Target Milestone: --- For the following code, we can known the variable C000005A1 is only used for the offset of array Dest and Src, and the unit size of the array is 8 bytes, so an iv variable with step 8 will be good for targets, whose load/store insns don't folded the lshift operand. typedef unsigned long long UINT64; void C00000ADA(UINT64 len, long long *__restrict Src, long long *__restrict Dest) { UINT64 C00000ADD, index, C00000068, offset, C00000ADF; UINT64 C000005A1 = 0; for (index = 0; index < len; index++) { Dest[C000005A1] = Src[C000005A1] * Src[C000005A1]; C000005A1 += len - index; } } test base on the MIPS64 gcc 5.4 on https://gcc.godbolt.org, as the MIPS64 target doesn't have load/store folded the lshift operand such as 'ldr x3, [x1, x4, lsl 3]' in ARM64 targets , so use ivtmp with step 8 can eliminate the dsll insn, which is in the kernel loop. @@ -2,16 +2,17 @@ C00000ADA(unsigned long long, long long*, long long*): beq $4,$0,.L10 #, len,, move $7,$0 # C000005A1, + dsll $8,$4,3 # tmp, len << 3 + .L4: - dsll $2,$7,3 # D.2019, C000005A1, - daddu $3,$5,$2 # tmp204, Src, D.2019 + daddu $3,$5,$7 # tmp204, Src, D.2019 ld $3,0($3) # D.2021, *_10 - daddu $2,$6,$2 # tmp205, Dest, D.2019 + daddu $2,$6,$7 # tmp205, Dest, D.2019 dmult $3,$3 # D.2021, D.2021 daddu $7,$7,$4 # C000005A1, C000005A1, ivtmp.6 - daddiu $4,$4,-1 # ivtmp.6, ivtmp.6, + daddiu $4,$4,-8 # ivtmp.6, ivtmp.6, mflo $3 # D.2021 - bne $4,$0,.L4 #, ivtmp.6,, + bne $8,$0,.L4 #, ivtmp.6,, sd $3,0($2) # D.2021, *_8 .L10: