On Thu, Apr 16, 2026 at 1:57 PM Richard Biener <[email protected]> wrote:
>
> On Thu, 16 Apr 2026, Uros Bizjak wrote:
>
> > On Thu, Apr 16, 2026 at 12:58 PM Richard Biener <[email protected]> wrote:
> > >
> > > On Thu, 16 Apr 2026, Uros Bizjak wrote:
> > >
> > > > Hello!
> > > >
> > > > After pass_reorder_blocks, there remain some propagating opportunities
> > > > for late_combine.  Looking at gcc.target/i386/pr90178.c, we get a
> > > > trivial sequence of:
> > > >
> > > > gcc -O2 -mavx -mvzeroupper -m32:
> > > >
> > > > .L5:
> > > >     xorl    %ecx, %ecx
> > > >     ...
> > > >     movl    %ecx, %eax
> > > >     ret
> > > >
> > > > Putting another instance of pass_late_combine after
> > > > pass_reorder_blocks improves the assembly in a non-trivial way:
> > > >
> > > >  @@ -28,10 +28,8 @@
> > > >      cmpl    %edx, %ebx
> > > >      je    .L5
> > > >  .L4:
> > > > -    movl    %eax, %ecx
> > > >      cmpl    %esi, (%eax)
> > > >      jne    .L11
> > > > -    movl    %ecx, %eax
> > > >      popl    %ebx
> > > >      .cfi_remember_state
> > > >      .cfi_restore 3
> > > > @@ -44,17 +42,16 @@
> > > >      .p2align 3
> > > >  .L5:
> > > >      .cfi_restore_state
> > > > -    xorl    %ecx, %ecx
> > > > +    xorl    %eax, %eax
> > > >      popl    %ebx
> > > >      .cfi_restore 3
> > > >      .cfi_def_cfa_offset 8
> > > >      popl    %esi
> > > >      .cfi_restore 6
> > > >      .cfi_def_cfa_offset 4
> > > > -    movl    %ecx, %eax
> > > >      ret
> > > >      .cfi_endproc
> > > >  .LFE0:
> > > >      .size    find_ptr, .-find_ptr
> > > >
> > > > which looks like it is worth putting a new pass here.
> > > >
> > > > A comparison of sizes of default x86_64 linux build shows noticeable
> > > > code size improvement:
> > > >
> > > > $ size vmlinux-old.o vmlinux-new.o
> > > >   text    data     bss     dec     hex filename
> > > > 29432351        4932443  754228 35119022        217dfae vmlinux-old.o
> > > > 29415516        4932443  754228 35102187        2179deb vmlinux-new.o
> > > >
> > > > which shows a code size reduction of 16835 bytes.
> > > >
> > > > Any thoughts?
> > >
> > > Did you check other places to schedule the pass?
> >
> > I was interested to exercise opportunities, exposed by bbro pass (as
> > mentioned in [1]), so the natural place to put the new pass is after
> > bbro pass:
> >
> > On x86_32, IRA zeroes %ecx, which is later copied to %eax in the
> > terminal basic block:
> >
> >    12: NOTE_INSN_BASIC_BLOCK 3
> >     7: cx:SI=0
> >       REG_EQUAL 0
> >    45: pc=L36
> >    ...
> >    36: L36:
> >    39: NOTE_INSN_BASIC_BLOCK 7
> >    37: ax:SI=cx:SI
> >    38: use ax:SI
> >
> > This sequence is reordered in bbro pass to:
> >
> >    28: L28:
> >    12: NOTE_INSN_BASIC_BLOCK 7
> >    69: {cx:SI=0;clobber flags:CC;}
> >       REG_UNUSED flags:CC
> >    71: ax:SI=cx:SI
> >       REG_DEAD cx:SI
> >    72: use ax:SI
> >    73: NOTE_INSN_EPILOGUE_BEG
> >    74: bx:SI=[sp:SI++]
> >       REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
> >       REG_CFA_RESTORE bx:SI
> >    75: si:SI=[sp:SI++]
> >       REG_CFA_ADJUST_CFA sp:SI=sp:SI+0x4
> >       REG_CFA_RESTORE si:SI
> >    76: simple_return
>
> Ah, so maybe we can have a late combine entry that can be invoked
> iff BB reorder does any path duplication only?  On GIMPLE we
> increasingly invoke pass workers directly from other passes
> in such case, VN even has a mode to operate on small portions
> of the CFG.
>
> Could be a simple regcprop also suffices for the issue at hand?

Hm, putting another pass_cprop_hardreg after pass_reorder_blocks fails
self-test with:

cc1: internal compiler error: pass cprop_hardreg does not support cloning
0x226f97d internal_error(char const*, ...)
       ../../git/gcc/gcc/diagnostic-global-context.cc:787
0xce6a79 opt_pass::clone()
       ../../git/gcc/gcc/passes.cc:90
0xcf8a86 gcc::pass_manager::pass_manager(gcc::context*)
       ./pass-instances.def:554
0xe3a7d7 general_init
       ../../git/gcc/gcc/toplev.cc:1164

Uros.

Reply via email to