https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102062
--- Comment #5 from Nicholas Piggin <npiggin at gmail dot com> --- (In reply to Bill Schmidt from comment #2) > As expected, I get similar code when compiling either for P9 or P10. Oh I should have specified, -O2 is the only option. If I add -fvariable-expansion-in-unroller it has no effect, just to make sure. It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done something to change this.(In reply to Bill Schmidt from comment #1) > Regarding the latter question, I'm surprised it's not being done. This > behavior is controlled by -fvariable-expansion-in-unroller, which was > enabled by default for PowerPC targets a couple of releases back. You > reported this against GCC 11.2, but I'm skeptical. What options are you > using? > > Compiling with -O2 and current trunk, I see variable expansion kicking in, > and I also see the same base register in use in all references in the loop: > > test: > .LFB0: > .cfi_startproc > .localentry test,1 > slwi 4,4,1 > li 10,0 > li 7,0 > addi 9,3,-4 > extsw 4,4 > andi. 6,4,0x3 > addi 5,4,-1 > mr 8,4 > beq 0,.L9 > cmpdi 0,6,1 > beq 0,.L13 > cmpdi 0,6,2 > bne 0,.L22 > .L14: > lwzu 6,4(9) > addi 4,4,-1 > add 10,10,6 > .L13: > lwzu 6,4(9) > cmpdi 0,4,1 > add 10,10,6 > beq 0,.L19 > .L9: > srdi 8,8,2 > mtctr 8 > .L2: > lwz 4,4(9) > lwz 5,12(9) > lwz 6,8(9) > lwzu 8,16(9) > add 10,4,10 > add 10,10,5 > add 7,6,7 > add 7,7,8 > bdnz .L2 > .L19: > add 3,10,7 > extsw 3,3 > blr > .p2align 4,,15 > .L22: > lwz 10,0(3) > mr 9,3 > mr 4,5 > b .L14 That asm does well on the test, better than my version (a little bit on P9, a lot on P10). It does have 2x more unrolling which probably helps a bit.