https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101481
Bug ID: 101481 Summary: -ftree-loop-distribute-patterns can slow down and increases size of code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: andres at anarazel dot de Target Milestone: --- Created attachment 51168 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51168&action=edit simplified example reproducing problem Hi, I found -ftree-loop-distribute-patterns to be far too aggressive in replacing code, leading to increased code size and substantial slowdowns (12% in the program I just hit this). The code size increase & slowdown are partially caused by the function call itself, and partially due to the spilling necessary to make that function call. Worsened by the PLT call to memmove(). A very simplified example (also attached) is this: typedef struct node { unsigned char chunks[4]; unsigned char count; } node; void foo(node *a, unsigned char newchunk, unsigned char off) { if (a->count > 3) __builtin_unreachable(); for (int i = a->count - 1; i >= off; i--) a->chunks[i + 1] = a->chunks[i]; a->chunks[off] = newchunk; } which with `-O2 -fPIC` boils down to: foo(node*, unsigned char, unsigned char): pushq %r12 movl %edx, %r8d movl %esi, %r12d pushq %rbp movq %rdi, %rbp pushq %rbx movzbl 4(%rdi), %ecx movzbl %r8b, %ebx leal -1(%rcx), %edx cmpl %ebx, %edx jl .L2 movl %ecx, %eax movslq %edx, %rsi subl %ebx, %ecx subl $1, %ecx movq %rsi, %rdx subq %rcx, %rdx leaq 1(%rcx), %r8 leaq (%rdi,%rdx), %rsi movzbl %al, %edi movq %r8, %rdx movq %rdi, %rax subq %rcx, %rax leaq 0(%rbp,%rax), %rdi call memmove@PLT .L2: movb %r12b, 0(%rbp,%rbx) popq %rbx popq %rbp popq %r12 ret compare to `-O2 -fPIC -fno-tree-loop-distribute-patterns` foo(node*, unsigned char, unsigned char): movzbl 4(%rdi), %eax movzbl %dl, %edx subl $1, %eax cmpl %edx, %eax jl .L2 cltq .L3: movzbl (%rdi,%rax), %ecx movb %cl, 1(%rdi,%rax) subq $1, %rax cmpl %eax, %edx jle .L3 .L2: movb %sil, (%rdi,%rdx) ret Which I think makes the problem apparent. Regards, Andres Freund