On Thu, 16 Apr 2026, Uros Bizjak wrote: > On Thu, Apr 16, 2026 at 12:58 PM Richard Biener <[email protected]> wrote: > > > > On Thu, 16 Apr 2026, Uros Bizjak wrote: > > > > > Hello! > > > > > > After pass_reorder_blocks, there remain some propagating opportunities > > > for late_combine. Looking at gcc.target/i386/pr90178.c, we get a > > > trivial sequence of: > > > > > > gcc -O2 -mavx -mvzeroupper -m32: > > > > > > .L5: > > > xorl %ecx, %ecx > > > ... > > > movl %ecx, %eax > > > ret > > > > > > Putting another instance of pass_late_combine after > > > pass_reorder_blocks improves the assembly in a non-trivial way: > > > > > > @@ -28,10 +28,8 @@ > > > cmpl %edx, %ebx > > > je .L5 > > > .L4: > > > - movl %eax, %ecx > > > cmpl %esi, (%eax) > > > jne .L11 > > > - movl %ecx, %eax > > > popl %ebx > > > .cfi_remember_state > > > .cfi_restore 3 > > > @@ -44,17 +42,16 @@ > > > .p2align 3 > > > .L5: > > > .cfi_restore_state > > > - xorl %ecx, %ecx > > > + xorl %eax, %eax > > > popl %ebx > > > .cfi_restore 3 > > > .cfi_def_cfa_offset 8 > > > popl %esi > > > .cfi_restore 6 > > > .cfi_def_cfa_offset 4 > > > - movl %ecx, %eax > > > ret > > > .cfi_endproc > > > .LFE0: > > > .size find_ptr, .-find_ptr > > > > > > which looks like it is worth putting a new pass here. > > > > > > A comparison of sizes of default x86_64 linux build shows noticeable > > > code size improvement: > > > > > > $ size vmlinux-old.o vmlinux-new.o > > > text data bss dec hex filename > > > 29432351 4932443 754228 35119022 217dfae vmlinux-old.o > > > 29415516 4932443 754228 35102187 2179deb vmlinux-new.o > > > > > > which shows a code size reduction of 16835 bytes. > > > > > > Any thoughts? > > > > Did you check other places to schedule the pass? > > I was interested to exercise opportunities, exposed by bbro pass (as > mentioned in [1]), so the natural place to put the new pass is after > bbro pass: > > On x86_32, IRA zeroes %ecx, which is later copied to %eax in the > terminal basic block: > > 12: NOTE_INSN_BASIC_BLOCK 3 > 7: cx:SI=0 > REG_EQUAL 0 > 45: pc=L36 > ... > 36: L36: > 39: NOTE_INSN_BASIC_BLOCK 7 > 37: ax:SI=cx:SI > 38: use ax:SI > > This sequence is reordered in bbro pass to: > > 28: L28: > 12: NOTE_INSN_BASIC_BLOCK 7 > 69: {cx:SI=0;clobber flags:CC;} > REG_UNUSED flags:CC > 71: ax:SI=cx:SI > REG_DEAD cx:SI > 72: use ax:SI > 73: NOTE_INSN_EPILOGUE_BEG > 74: bx:SI=[sp:SI++] > REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4 > REG_CFA_RESTORE bx:SI > 75: si:SI=[sp:SI++] > REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4 > REG_CFA_RESTORE si:SI > 76: simple_return
Ah, so maybe we can have a late combine entry that can be invoked iff BB reorder does any path duplication only? On GIMPLE we increasingly invoke pass workers directly from other passes in such case, VN even has a mode to operate on small portions of the CFG. Could be a simple regcprop also suffices for the issue at hand? Richard.
