On Thu, Apr 16, 2026 at 1:57 PM Richard Biener <[email protected]> wrote:
>
> On Thu, 16 Apr 2026, Uros Bizjak wrote:
>
> > On Thu, Apr 16, 2026 at 12:58 PM Richard Biener <[email protected]> wrote:
> > >
> > > On Thu, 16 Apr 2026, Uros Bizjak wrote:
> > >
> > > > Hello!
> > > >
> > > > After pass_reorder_blocks, there remain some propagating opportunities
> > > > for late_combine. Looking at gcc.target/i386/pr90178.c, we get a
> > > > trivial sequence of:
> > > >
> > > > gcc -O2 -mavx -mvzeroupper -m32:
> > > >
> > > > .L5:
> > > > xorl %ecx, %ecx
> > > > ...
> > > > movl %ecx, %eax
> > > > ret
> > > >
> > > > Putting another instance of pass_late_combine after
> > > > pass_reorder_blocks improves the assembly in a non-trivial way:
> > > >
> > > > @@ -28,10 +28,8 @@
> > > > cmpl %edx, %ebx
> > > > je .L5
> > > > .L4:
> > > > - movl %eax, %ecx
> > > > cmpl %esi, (%eax)
> > > > jne .L11
> > > > - movl %ecx, %eax
> > > > popl %ebx
> > > > .cfi_remember_state
> > > > .cfi_restore 3
> > > > @@ -44,17 +42,16 @@
> > > > .p2align 3
> > > > .L5:
> > > > .cfi_restore_state
> > > > - xorl %ecx, %ecx
> > > > + xorl %eax, %eax
> > > > popl %ebx
> > > > .cfi_restore 3
> > > > .cfi_def_cfa_offset 8
> > > > popl %esi
> > > > .cfi_restore 6
> > > > .cfi_def_cfa_offset 4
> > > > - movl %ecx, %eax
> > > > ret
> > > > .cfi_endproc
> > > > .LFE0:
> > > > .size find_ptr, .-find_ptr
> > > >
> > > > which looks like it is worth putting a new pass here.
> > > >
> > > > A comparison of sizes of default x86_64 linux build shows noticeable
> > > > code size improvement:
> > > >
> > > > $ size vmlinux-old.o vmlinux-new.o
> > > > text data bss dec hex filename
> > > > 29432351 4932443 754228 35119022 217dfae vmlinux-old.o
> > > > 29415516 4932443 754228 35102187 2179deb vmlinux-new.o
> > > >
> > > > which shows a code size reduction of 16835 bytes.
> > > >
> > > > Any thoughts?
> > >
> > > Did you check other places to schedule the pass?
> >
> > I was interested to exercise opportunities, exposed by bbro pass (as
> > mentioned in [1]), so the natural place to put the new pass is after
> > bbro pass:
> >
> > On x86_32, IRA zeroes %ecx, which is later copied to %eax in the
> > terminal basic block:
> >
> > 12: NOTE_INSN_BASIC_BLOCK 3
> > 7: cx:SI=0
> > REG_EQUAL 0
> > 45: pc=L36
> > ...
> > 36: L36:
> > 39: NOTE_INSN_BASIC_BLOCK 7
> > 37: ax:SI=cx:SI
> > 38: use ax:SI
> >
> > This sequence is reordered in bbro pass to:
> >
> > 28: L28:
> > 12: NOTE_INSN_BASIC_BLOCK 7
> > 69: {cx:SI=0;clobber flags:CC;}
> > REG_UNUSED flags:CC
> > 71: ax:SI=cx:SI
> > REG_DEAD cx:SI
> > 72: use ax:SI
> > 73: NOTE_INSN_EPILOGUE_BEG
> > 74: bx:SI=[sp:SI++]
> > REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
> > REG_CFA_RESTORE bx:SI
> > 75: si:SI=[sp:SI++]
> > REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
> > REG_CFA_RESTORE si:SI
> > 76: simple_return
>
> Ah, so maybe we can have a late combine entry that can be invoked
> iff BB reorder does any path duplication only? On GIMPLE we
> increasingly invoke pass workers directly from other passes
> in such case, VN even has a mode to operate on small portions
> of the CFG.
>
> Could be a simple regcprop also suffices for the issue at hand?
Hm, putting another pass_cprop_hardreg after pass_reorder_blocks fails
self-test with:
cc1: internal compiler error: pass cprop_hardreg does not support cloning
0x226f97d internal_error(char const*, ...)
../../git/gcc/gcc/diagnostic-global-context.cc:787
0xce6a79 opt_pass::clone()
../../git/gcc/gcc/passes.cc:90
0xcf8a86 gcc::pass_manager::pass_manager(gcc::context*)
./pass-instances.def:554
0xe3a7d7 general_init
../../git/gcc/gcc/toplev.cc:1164
Uros.