https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120562
Bug ID: 120562
Summary: powerpc: Complex and uncomplete loop unroll
Product: gcc
Version: 15.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: christophe.leroy at csgroup dot eu
Target Milestone: ---
Below function leads to unnecessary complex yet uncomplete loop unrolling
(built with -O2 -m32)
void f(unsigned int *p, unsigned int val)
{
int i;
for (i = 0; i < 10; i++)
p[i] = val;
}
The following is generated from GCC 11 till GCC 15:
toto.o: file format elf32-powerpc
Disassembly of section .text:
00000000 <f>:
0: 38 63 ff fc addi r3,r3,-4
4: 39 20 00 0a li r9,10
8: 35 29 ff fb addic. r9,r9,-5
c: 90 83 00 04 stw r4,4(r3)
10: 90 83 00 08 stw r4,8(r3)
14: 90 83 00 0c stw r4,12(r3)
18: 90 83 00 10 stw r4,16(r3)
1c: 94 83 00 14 stwu r4,20(r3)
20: 4d 82 00 20 beqlr
24: 35 29 ff fb addic. r9,r9,-5
28: 90 83 00 04 stw r4,4(r3)
2c: 90 83 00 08 stw r4,8(r3)
30: 90 83 00 0c stw r4,12(r3)
34: 90 83 00 10 stw r4,16(r3)
38: 94 83 00 14 stwu r4,20(r3)
3c: 40 82 ff cc bne 8 <f+0x8>
40: 4e 80 00 20 blr
GCC should know that the number of iterations being 10, there will be no exit
in the middle with the beqlr, and there will be no loop at the end with the
bne.
So at the end the function should have been:
stw r4,0(r3)
stw r4,4(r3)
stw r4,8(r3)
stw r4,12(r3)
stw r4,16(r3)
stw r4,20(r3)
stw r4,24(r3)
stw r4,28(r3)
stw r4,32(r3)
stw r4,36(r3)
blr