https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124901
Bug ID: 124901
Summary: Add pass_late_combine after pass_reorder_blocks
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ubizjak at gmail dot com
Target Milestone: ---
After pass_reorder_blocks, there remain some propagating opportunities
for late_combine. Looking at gcc.target/i386/pr90178.c, we get a
trivial sequence of:
gcc -O2 -mavx -mvzeroupper -m32:
.L5:
xorl %ecx, %ecx
...
movl %ecx, %eax
ret
Putting another instance of pass_late_combine after
pass_reorder_blocks improves the assembly in a non-trivial way:
@@ -28,10 +28,8 @@
cmpl %edx, %ebx
je .L5
.L4:
- movl %eax, %ecx
cmpl %esi, (%eax)
jne .L11
- movl %ecx, %eax
popl %ebx
.cfi_remember_state
.cfi_restore 3
@@ -44,17 +42,16 @@
.p2align 3
.L5:
.cfi_restore_state
- xorl %ecx, %ecx
+ xorl %eax, %eax
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 8
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 4
- movl %ecx, %eax
ret
.cfi_endproc
.LFE0:
.size find_ptr, .-find_ptr
which looks like it is worth putting a new pass here.
A comparison of sizes of default x86_64 linux build shows noticeable
code size improvement:
$ size vmlinux-old.o vmlinux-new.o
text data bss dec hex filename
29432351 4932443 754228 35119022 217dfae vmlinux-old.o
29415516 4932443 754228 35102187 2179deb vmlinux-new.o
which shows a code size reduction of 16835 bytes.