On Wed, Jun 26, 2024 at 4:02 PM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Wed, Jun 26, 2024 at 9:14 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Wed, Jun 26, 2024 at 2:52 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt <hongtao....@intel.com> wrote:
> > > >
> > > > 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
> > > > The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
> > > > But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
> > > > It is the case in the PR, the patch uses lower cost to enable more
> > > > simplication and fix the regression.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > > Ok for trunk?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >         PR target/115462
> > > >         * config/i386/i386.cc (ix86_rtx_costs): Use cost of addr when
> > > >         it's lower than rtx_cost (XEXP (addr, 0)) + 1.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >         * gcc.target/i386/pr115462.c: New test.
> > > > ---
> > > >  gcc/config/i386/i386.cc                  |  9 +++++++--
> > > >  gcc/testsuite/gcc.target/i386/pr115462.c | 22 ++++++++++++++++++++++
> > > >  2 files changed, 29 insertions(+), 2 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr115462.c
> > > >
> > > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > > index d4ccc24be6e..83dab8220dd 100644
> > > > --- a/gcc/config/i386/i386.cc
> > > > +++ b/gcc/config/i386/i386.cc
> > > > @@ -22341,8 +22341,13 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> > > > outer_code_i, int opno,
> > > >           if (GET_CODE (addr) == PLUS
> > > >               && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> > > >             {
> > > > -             *total += 1;
> > > > -             *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, 
> > > > speed);
> > > > +             /* PR115462: Cost of ADDR could be cheaper than XEXP 
> > > > (addr, 0)
> > > > +                when it's a lea, use lower cost to enable more
> > > > +                simplification.  */
> > > > +             unsigned cost1 = rtx_cost (addr, Pmode, MEM, 0, speed);
> > > > +             unsigned cost2 = rtx_cost (XEXP (addr, 0), Pmode,
> > > > +                                        PLUS, 0, speed) + 1;
> > >
> > > Just as comment - this is a bit ugly, why would we not always use the
> > > address cost?  (and why are you using 'MEM'?)  Should this be better
> > > handled on the insn_cost level when it's clear the PLUS is separate 
> > > address
> > > calculation (LEA) rather than address calculation in a MEM context?
> >  For MEM, rtx_cost doesn't use address_cost but iterates each subrtx,
> > and adds up the costs,
> >  So for MEM (reg) and MEM (reg + 4), the former costs 5, the latter
> > costs 9, it is not accurate for x86.
>
> But rtx_cost invokes targetm.rtx_cost which allows to avoid that
> recursive processing at any level.  You're dealing with MEM [addr]
> here, so why's rtx_cost (addr, Pmode, MEM, 0, speed) not always
Because when addr is just (plus reg cosnt_int), rtx_cost (addr, Pmode,
MEM, 0, speed) return 4.
But i think it should be equal to addr (reg) which is 0 cost.
> the best way to deal with this?  Since this is the MEM [addr] case
> we know it's not LEA, no?
Maybe the only case I need to handle is reg + disp, otherwise, they're
all lea forms.
>
> >  Ideally address_cost should be used, but it reduces cost too
> > much(range from 1-3).
> > (I've tried that, it regressed many testcases, because two many
> > registers are propagated into addr and increase register pressure).
> >  So the current solution is to make constant disp as cheap as possible
> > so more constant can be propagated into the address(but not
> > registers).
> >
> > >
> > > > +             *total += MIN (cost1, cost2);
> > > >               return true;
> > > >             }
> > > >         }
> > > > diff --git a/gcc/testsuite/gcc.target/i386/pr115462.c 
> > > > b/gcc/testsuite/gcc.target/i386/pr115462.c
> > > > new file mode 100644
> > > > index 00000000000..ad50a6382bc
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/i386/pr115462.c
> > > > @@ -0,0 +1,22 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -mavx2 -fno-tree-vectorize -fno-pic" } */
> > > > +/* { dg-final { scan-assembler-times {(?n)movl[ \t]+.*, 
> > > > p1\.0\+[0-9]*\(,} 3 } } */
> > > > +
> > > > +int
> > > > +foo (long indx, long indx2, long indx3, long indx4, long indx5, long 
> > > > indx6, long n, int* q)
> > > > +{
> > > > +  static int p1[10000];
> > > > +  int* p2 = p1 + 1000;
> > > > +  int* p3 = p1 + 4000;
> > > > +  int* p4 = p1 + 8000;
> > > > +
> > > > +  for (long i = 0; i != n; i++)
> > > > +    {
> > > > +      /* scan for      movl    %edi, p1.0+3996(,%rax,4),
> > > > +        p1.0+3996 should be propagted into the loop.  */
> > > > +      p2[indx++] = q[indx++];
> > > > +      p3[indx2++] = q[indx2++];
> > > > +      p4[indx3++] = q[indx3++];
> > > > +    }
> > > > +  return p1[indx6] + p1[indx5];
> > > > +}
> > > > --
> > > > 2.31.1
> > > >
> >
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

Reply via email to