On Wed, Mar 9, 2022 at 7:10 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Firstly many thanks for already (pre)approving half of this patch.  Jakub 
> Jelinek's
> suggestion for creating a testcase that exposes the SImode issue was extremely
> helpful, but interestingly exposed another missed optimization opportunity 
> that's
> also addressed with this revision of my patch.
>
> char c;
> union Data { char a; short b; };
> void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); }
>
> currently on x86_64 with -O2 on mainline generates following code:
>
>         xorl    %eax, %eax
>         movb    $0, %ah
>         movb    c(%rip), %al
>         ret
>
> which contains the strict_low_part movb following an SI mode xor that we hope
> to optimize, but infuriatingly we also have a completely redundant movb $0, 
> %ah.
> Hence, with this patch we have a new peephole2 that sees that in the first two
> instructions the movb is redundant, which then allows the SImode/SWI48 xor
> followed by movb peephole2 to optimize the rest to movzbl.  With the revised
> patch below, the above testcase is now compiled as:
>
>         movzbl  c(%rip), %eax
>         ret
>
> Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with
> make bootstrap and make -k check with no new failures.  Is this revised 
> version
> (still) Ok for mainline?
>
>
> 2022-03-09  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR tree-optimization/98335
>         * config/i386/i386.md (peephole2): Eliminate redundant insv.
>         Combine movl followed by movb.  Transform xorl followed by
>         a suitable movb or movw into the equivalent movz[bw]l.
>
> gcc/testsuite/ChangeLog
>         PR tree-optimization/98335
>         * g++.target/i386/pr98335.C: New test case.
>         * gcc.target/i386/pr98335.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks again for your help/suggested revisions.
> Roger
> --
>
> > -----Original Message-----
> > From: Uros Bizjak <ubiz...@gmail.com>
> > Sent: 09 March 2022 07:36
> > To: Roger Sayle <ro...@nextmovesoftware.com>
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2
> > xorl;movb -> movzbl
> >
> > On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle
> > <ro...@nextmovesoftware.com> wrote:
> > >
> > >
> > > Hi Uros,
> > >
> > > > Is there a reason that only inserts to DImode registers are implemented?
> > > > IMO, these peepholes should also handle inserts to SImode.
> > >
> > > I wasn't able to construct a test case that produced a byte or word
> > > insert into an SImode register.  The front-ends and middle-end end up
> > > producing different code sequences, and -m32 changes the ABI so that
> > > small structs get passed in memory rather than in registers.
> > >
> > > Here's the expanded testcase that I investigated:
> > >
> > > struct DataCL { char a; int b; };
> > > struct DataWL { short a; int b; };
> > > struct DataIL { int a; int b; };
> > >
> > > struct DataCI { char a; short b; };
> > > struct DataWI { short a; short b; };
> > >
> > > char c;
> > > short w;
> > > int i;
> > >
> > > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) {
> > > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL
> > > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return {
> > > i }; } DataIL bar_il(int idx) { return { i, 0 }; }
> > >
> > > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) {
> > > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI
> > > bar_wi(int idx) { return { w, 0 }; }
> > >
> > >
> > > I agree that for completeness similar peepholes handling inserts into
> > > SImode would be a good thing, but these wouldn't be restricted by
> > > TARGET_64BIT and would therefore require additional -m32 testing.
> > > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is
> > > a regression, SImode peepholes would be more of a "leap of faith".
> > > If you’d be willing to accept a patch without a testcase, let me know.
> >
> > We have plenty of these, where we assume that if the pattern works in one
> > mode, it also works in other modes. So, I think that by changing DI mode to
> > SWI48 mode iterator, we are on the safe side. We can also trust bootstrap 
> > and
> > regression tests here.
> >
> > IMO, the patch with SWI48 mode iterator is OK.
> >
> > Thanks,
> > Uros.
> >
> > >
> > > It's also a pity that subreg handling in combine doesn't allow merging
> > > these inserts into zero registers to be combined to zero_extends in a
> > > machine independent way.  My recent patch for PR 95126 (awaiting
> > > review) should also allow front-ends and middle-end passes more
> > > flexibility in optimizing small struct constructors.
> > >
> > > Thanks (as always) for reviewing patches so quickly.
> > >
> > > Roger
> > > --
>
>

Reply via email to