https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124901

            Bug ID: 124901
           Summary: Add pass_late_combine after pass_reorder_blocks
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

After pass_reorder_blocks, there remain some propagating opportunities
for late_combine.  Looking at gcc.target/i386/pr90178.c, we get a
trivial sequence of:

gcc -O2 -mavx -mvzeroupper -m32:

.L5:
    xorl    %ecx, %ecx
    ...
    movl    %ecx, %eax
    ret

Putting another instance of pass_late_combine after
pass_reorder_blocks improves the assembly in a non-trivial way:

 @@ -28,10 +28,8 @@
     cmpl    %edx, %ebx
     je    .L5
 .L4:
-    movl    %eax, %ecx
     cmpl    %esi, (%eax)
     jne    .L11
-    movl    %ecx, %eax
     popl    %ebx
     .cfi_remember_state
     .cfi_restore 3
@@ -44,17 +42,16 @@
     .p2align 3
 .L5:
     .cfi_restore_state
-    xorl    %ecx, %ecx
+    xorl    %eax, %eax
     popl    %ebx
     .cfi_restore 3
     .cfi_def_cfa_offset 8
     popl    %esi
     .cfi_restore 6
     .cfi_def_cfa_offset 4
-    movl    %ecx, %eax
     ret
     .cfi_endproc
 .LFE0:
     .size    find_ptr, .-find_ptr

which looks like it is worth putting a new pass here.

A comparison of sizes of default x86_64 linux build shows noticeable
code size improvement:

$ size vmlinux-old.o vmlinux-new.o
  text    data     bss     dec     hex filename
29432351        4932443  754228 35119022        217dfae vmlinux-old.o
29415516        4932443  754228 35102187        2179deb vmlinux-new.o

which shows a code size reduction of 16835 bytes.

Reply via email to