On Thu, 16 Apr 2026, Uros Bizjak wrote:

> On Thu, Apr 16, 2026 at 12:58 PM Richard Biener <[email protected]> wrote:
> >
> > On Thu, 16 Apr 2026, Uros Bizjak wrote:
> >
> > > Hello!
> > >
> > > After pass_reorder_blocks, there remain some propagating opportunities
> > > for late_combine.  Looking at gcc.target/i386/pr90178.c, we get a
> > > trivial sequence of:
> > >
> > > gcc -O2 -mavx -mvzeroupper -m32:
> > >
> > > .L5:
> > >     xorl    %ecx, %ecx
> > >     ...
> > >     movl    %ecx, %eax
> > >     ret
> > >
> > > Putting another instance of pass_late_combine after
> > > pass_reorder_blocks improves the assembly in a non-trivial way:
> > >
> > >  @@ -28,10 +28,8 @@
> > >      cmpl    %edx, %ebx
> > >      je    .L5
> > >  .L4:
> > > -    movl    %eax, %ecx
> > >      cmpl    %esi, (%eax)
> > >      jne    .L11
> > > -    movl    %ecx, %eax
> > >      popl    %ebx
> > >      .cfi_remember_state
> > >      .cfi_restore 3
> > > @@ -44,17 +42,16 @@
> > >      .p2align 3
> > >  .L5:
> > >      .cfi_restore_state
> > > -    xorl    %ecx, %ecx
> > > +    xorl    %eax, %eax
> > >      popl    %ebx
> > >      .cfi_restore 3
> > >      .cfi_def_cfa_offset 8
> > >      popl    %esi
> > >      .cfi_restore 6
> > >      .cfi_def_cfa_offset 4
> > > -    movl    %ecx, %eax
> > >      ret
> > >      .cfi_endproc
> > >  .LFE0:
> > >      .size    find_ptr, .-find_ptr
> > >
> > > which looks like it is worth putting a new pass here.
> > >
> > > A comparison of sizes of default x86_64 linux build shows noticeable
> > > code size improvement:
> > >
> > > $ size vmlinux-old.o vmlinux-new.o
> > >   text    data     bss     dec     hex filename
> > > 29432351        4932443  754228 35119022        217dfae vmlinux-old.o
> > > 29415516        4932443  754228 35102187        2179deb vmlinux-new.o
> > >
> > > which shows a code size reduction of 16835 bytes.
> > >
> > > Any thoughts?
> >
> > Did you check other places to schedule the pass?
> 
> I was interested to exercise opportunities, exposed by bbro pass (as
> mentioned in [1]), so the natural place to put the new pass is after
> bbro pass:
> 
> On x86_32, IRA zeroes %ecx, which is later copied to %eax in the
> terminal basic block:
> 
>    12: NOTE_INSN_BASIC_BLOCK 3
>     7: cx:SI=0
>       REG_EQUAL 0
>    45: pc=L36
>    ...
>    36: L36:
>    39: NOTE_INSN_BASIC_BLOCK 7
>    37: ax:SI=cx:SI
>    38: use ax:SI
> 
> This sequence is reordered in bbro pass to:
> 
>    28: L28:
>    12: NOTE_INSN_BASIC_BLOCK 7
>    69: {cx:SI=0;clobber flags:CC;}
>       REG_UNUSED flags:CC
>    71: ax:SI=cx:SI
>       REG_DEAD cx:SI
>    72: use ax:SI
>    73: NOTE_INSN_EPILOGUE_BEG
>    74: bx:SI=[sp:SI++]
>       REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
>       REG_CFA_RESTORE bx:SI
>    75: si:SI=[sp:SI++]
>       REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
>       REG_CFA_RESTORE si:SI
>    76: simple_return

Ah, so maybe we can have a late combine entry that can be invoked
iff BB reorder does any path duplication only?  On GIMPLE we
increasingly invoke pass workers directly from other passes
in such case, VN even has a mode to operate on small portions
of the CFG.

Could be a simple regcprop also suffices for the issue at hand?

Richard.

Reply via email to