On Thu, Feb 13, 2020 at 2:46 PM Alexander Monakov <amona...@ispras.ru> wrote:
>
> Ping^3.

OK.

> On Sun, 5 Jan 2020, Alexander Monakov wrote:
>
> > Hi,
> >
> > I noticed there's a costly signed 64-bit division in rtx_cost on x86 as 
> > well as
> > any other target where UNITS_PER_WORD is implemented like TARGET_64BIT ? 8 
> > : 4.
> > It's also evident that rtx_cost does redundant work for a SET rtx argument.
> >
> > Obviously the variable named 'factor' rarely exceeds 1, so in the majority 
> > of
> > cases it can be computed with a well-predictable branch rather than a 
> > division.
> >
> > This patch makes rtx_cost do the division only in case mode is wider than
> > UNITS_PER_WORD, and also moves a test for a SET up front to avoid 
> > redundancy.
> > No functional change.
> >
> > Bootstrapped on x86_64, ok for trunk?
> >
> > To illustrate the improvement this buys, for tramp3d -O2 compilation, I got
> >
> >     before:
> >            73887675319      instructions:u
> >
> >            72438432200      cycles:u
> >              924298569      idq.ms_uops:u
> >           102603799255      uops_dispatched.thread:u
> >
> >     after:
> >            73888371724      instructions:u
> >
> >            72386986612      cycles:u
> >              802744775      idq.ms_uops:u
> >           102096987220      uops_dispatched.thread:u
> >
> > (this is on Sandybridge, idq.ms_uops are uops going via the microcode 
> > sequencer,
> > so the unneeded division is responsible for a good fraction of them)
> >
> >       * rtlanal.c (rtx_cost): Handle a SET up front. Avoid division if the
> >       mode is not wider than UNITS_PER_WORD.
> >
> > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> > index 9a7afccefb8..c7ab86e228b 100644
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -4207,18 +4207,23 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
> > outer_code,
> >    const char *fmt;
> >    int total;
> >    int factor;
> > +  unsigned mode_size;
> >
> >    if (x == 0)
> >      return 0;
> >
> > -  if (GET_MODE (x) != VOIDmode)
> > +  if (GET_CODE (x) == SET)
> > +    /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> > +       the mode for the factor.  */
> > +    mode = GET_MODE (SET_DEST (x));
> > +  else if (GET_MODE (x) != VOIDmode)
> >      mode = GET_MODE (x);
> >
> > +  mode_size = estimated_poly_value (GET_MODE_SIZE (mode));
> > +
> >    /* A size N times larger than UNITS_PER_WORD likely needs N times as
> >       many insns, taking N times as long.  */
> > -  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
> > -  if (factor == 0)
> > -    factor = 1;
> > +  factor = mode_size > UNITS_PER_WORD ? mode_size / UNITS_PER_WORD : 1;
> >
> >    /* Compute the default costs of certain things.
> >       Note that targetm.rtx_costs can override the defaults.  */
> > @@ -4243,14 +4248,6 @@ rtx_cost (rtx x, machine_mode mode, enum rtx_code 
> > outer_code,
> >        /* Used in combine.c as a marker.  */
> >        total = 0;
> >        break;
> > -    case SET:
> > -      /* A SET doesn't have a mode, so let's look at the SET_DEST to get
> > -      the mode for the factor.  */
> > -      mode = GET_MODE (SET_DEST (x));
> > -      factor = estimated_poly_value (GET_MODE_SIZE (mode)) / 
> > UNITS_PER_WORD;
> > -      if (factor == 0)
> > -     factor = 1;
> > -      /* FALLTHRU */
> >      default:
> >        total = factor * COSTS_N_INSNS (1);
> >      }
> >

Reply via email to