--- Comment #5 from Nicholas Piggin <npiggin at gmail dot com> ---
(In reply to Bill Schmidt from comment #2)
> As expected, I get similar code when compiling either for P9 or P10.

Oh I should have specified, -O2 is the only option. If I add
-fvariable-expansion-in-unroller it has no effect, just to make sure.

It's gcc from Debian (gcc version 11.2.0 (Debian 11.2.0-3)). Maybe they've done
something to change this.(In reply to Bill Schmidt from comment #1)
> Regarding the latter question, I'm surprised it's not being done.  This
> behavior is controlled by -fvariable-expansion-in-unroller, which was
> enabled by default for PowerPC targets a couple of releases back.  You
> reported this against GCC 11.2, but I'm skeptical.  What options are you
> using?
> Compiling with -O2 and current trunk, I see variable expansion kicking in,
> and I also see the same base register in use in all references in the loop:
> test:
> .LFB0:
>         .cfi_startproc
>         .localentry     test,1
>         slwi 4,4,1
>         li 10,0
>         li 7,0
>         addi 9,3,-4
>         extsw 4,4
>         andi. 6,4,0x3
>         addi 5,4,-1
>         mr 8,4
>         beq 0,.L9
>         cmpdi 0,6,1
>         beq 0,.L13
>         cmpdi 0,6,2
>         bne 0,.L22
> .L14:
>         lwzu 6,4(9)
>         addi 4,4,-1
>         add 10,10,6
> .L13:
>         lwzu 6,4(9)
>         cmpdi 0,4,1
>         add 10,10,6
>         beq 0,.L19
> .L9:
>         srdi 8,8,2
>         mtctr 8
> .L2:
>         lwz 4,4(9)
>         lwz 5,12(9)
>         lwz 6,8(9)
>         lwzu 8,16(9)
>         add 10,4,10
>         add 10,10,5
>         add 7,6,7
>         add 7,7,8
>         bdnz .L2
> .L19:
>         add 3,10,7
>         extsw 3,3
>         blr
>         .p2align 4,,15
> .L22:
>         lwz 10,0(3)
>         mr 9,3
>         mr 4,5
>         b .L14

That asm does well on the test, better than my version (a little bit on P9, a
lot on P10). It does have 2x more unrolling which probably helps a bit.

Reply via email to