On Wed, Mar 9, 2022 at 7:10 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > Hi Uros, > Firstly many thanks for already (pre)approving half of this patch. Jakub > Jelinek's > suggestion for creating a testcase that exposes the SImode issue was extremely > helpful, but interestingly exposed another missed optimization opportunity > that's > also addressed with this revision of my patch. > > char c; > union Data { char a; short b; }; > void val(void) { __asm__ __volatile__ ("" : : "r" ((union Data) { c } )); } > > currently on x86_64 with -O2 on mainline generates following code: > > xorl %eax, %eax > movb $0, %ah > movb c(%rip), %al > ret > > which contains the strict_low_part movb following an SI mode xor that we hope > to optimize, but infuriatingly we also have a completely redundant movb $0, > %ah. > Hence, with this patch we have a new peephole2 that sees that in the first two > instructions the movb is redundant, which then allows the SImode/SWI48 xor > followed by movb peephole2 to optimize the rest to movzbl. With the revised > patch below, the above testcase is now compiled as: > > movzbl c(%rip), %eax > ret > > Tested on x86_64-pc-linux-gnu (on top of the revised middle-end patch) with > make bootstrap and make -k check with no new failures. Is this revised > version > (still) Ok for mainline? > > > 2022-03-09 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > PR tree-optimization/98335 > * config/i386/i386.md (peephole2): Eliminate redundant insv. > Combine movl followed by movb. Transform xorl followed by > a suitable movb or movw into the equivalent movz[bw]l. > > gcc/testsuite/ChangeLog > PR tree-optimization/98335 > * g++.target/i386/pr98335.C: New test case. > * gcc.target/i386/pr98335.c: New test case.
OK. Thanks, Uros. > > > Thanks again for your help/suggested revisions. > Roger > -- > > > -----Original Message----- > > From: Uros Bizjak <ubiz...@gmail.com> > > Sent: 09 March 2022 07:36 > > To: Roger Sayle <ro...@nextmovesoftware.com> > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [x86 PATCH] PR tree-optimization/98335: New peephole2 > > xorl;movb -> movzbl > > > > On Mon, Mar 7, 2022 at 12:51 PM Roger Sayle > > <ro...@nextmovesoftware.com> wrote: > > > > > > > > > Hi Uros, > > > > > > > Is there a reason that only inserts to DImode registers are implemented? > > > > IMO, these peepholes should also handle inserts to SImode. > > > > > > I wasn't able to construct a test case that produced a byte or word > > > insert into an SImode register. The front-ends and middle-end end up > > > producing different code sequences, and -m32 changes the ABI so that > > > small structs get passed in memory rather than in registers. > > > > > > Here's the expanded testcase that I investigated: > > > > > > struct DataCL { char a; int b; }; > > > struct DataWL { short a; int b; }; > > > struct DataIL { int a; int b; }; > > > > > > struct DataCI { char a; short b; }; > > > struct DataWI { short a; short b; }; > > > > > > char c; > > > short w; > > > int i; > > > > > > DataCL foo_cl(int idx) { return { c }; } DataCL bar_cl(int idx) { > > > return { c, 0 }; } DataWL foo_wl(int idx) { return { w }; } DataWL > > > bar_wl(int idx) { return { w, 0 }; } DataIL foo_il(int idx) { return { > > > i }; } DataIL bar_il(int idx) { return { i, 0 }; } > > > > > > DataCI foo_ci(int idx) { return { c }; } DataCI bar_ci(int idx) { > > > return { c, 0 }; } DataWI foo_wi(int idx) { return { w }; } DataWI > > > bar_wi(int idx) { return { w, 0 }; } > > > > > > > > > I agree that for completeness similar peepholes handling inserts into > > > SImode would be a good thing, but these wouldn't be restricted by > > > TARGET_64BIT and would therefore require additional -m32 testing. > > > The DImode peepholes I can justify for stage4 as PR tree-opt/98335 is > > > a regression, SImode peepholes would be more of a "leap of faith". > > > If you’d be willing to accept a patch without a testcase, let me know. > > > > We have plenty of these, where we assume that if the pattern works in one > > mode, it also works in other modes. So, I think that by changing DI mode to > > SWI48 mode iterator, we are on the safe side. We can also trust bootstrap > > and > > regression tests here. > > > > IMO, the patch with SWI48 mode iterator is OK. > > > > Thanks, > > Uros. > > > > > > > > It's also a pity that subreg handling in combine doesn't allow merging > > > these inserts into zero registers to be combined to zero_extends in a > > > machine independent way. My recent patch for PR 95126 (awaiting > > > review) should also allow front-ends and middle-end passes more > > > flexibility in optimizing small struct constructors. > > > > > > Thanks (as always) for reviewing patches so quickly. > > > > > > Roger > > > -- > >