Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

Hongtao Liu Tue, 03 Sep 2019 00:58:18 -0700

On Mon, Sep 2, 2019 at 4:41 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Mon, Sep 2, 2019 at 10:13 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > > which is not the case with core_cost (and similar with skylake_cost):
> > >
> > >   2, 2, 4,                /* cost of moving XMM,YMM,ZMM register */
> > >   {6, 6, 6, 6, 12},            /* cost of loading SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   {6, 6, 6, 6, 12},            /* cost of storing SSE registers
> > >                        in 32,64,128,256 and 512-bit */
> > >   2, 2,                    /* SSE->integer and integer->SSE moves */
> > >
> > > We have the same cost of moving between integer registers (by default
> > > set to 2), between SSE registers and between integer and SSE register
> > > sets. I think that at least the cost of moves between regsets should
> > > be substantially higher, rs6000 uses 3x cost of intra-regset moves;
> > > that would translate to the value of 6. The value should be low enough
> > > to keep the cost below the value that forces move through the memory.
> > > Changing core register allocation cost of SSE <-> integer to:
> > >
> > > --cut here--
> > > Index: config/i386/x86-tune-costs.h
> > > ===================================================================
> > > --- config/i386/x86-tune-costs.h        (revision 275281)
> > > +++ config/i386/x86-tune-costs.h        (working copy)
> > > @@ -2555,7 +2555,7 @@ struct processor_costs core_cost = {
> > >                                            in 32,64,128,256 and 512-bit */
> > >    {6, 6, 6, 6, 12},                    /* cost of storing SSE registers
> > >                                            in 32,64,128,256 and 512-bit */
> > > -  2, 2,                                        /* SSE->integer and
> > > integer->SSE moves */
> > > +  6, 6,                                        /* SSE->integer and
> > > integer->SSE moves */
> > >    /* End of register allocator costs.  */
> > >    },
> > >
> > > --cut here--
> > >
> > > still produces direct move in gcc.target/i386/minmax-6.c
> > >
> > > I think that in addition to attached patch, values between 2 and 6
> > > should be considered in benchmarking. Unfortunately, without access to
> > > regressed SPEC tests, I can't analyse these changes by myself.
> > >
> > > Uros.
> >
> > Apply similar change to skylake_cost, on skylake workstation we got
> > performance like:
> > ---------------------------
> > version                                                            |
> > 548_exchange_r score
> > gcc10_20180822:                                           |   10
> > apply remove_max8                                       |   8.9
> > also apply increase integer_tofrom_sse cost |   9.69
> > -----------------------------
> > Still 3% regression which is related to _gfortran_mminloc0_4_i4 in
> > libgfortran.so.5.0.0.
> >
> > I found suspicious code as bellow, does it affect?
>
> Hard to say without access to the test, but I'm glad that changing the
> knob has noticeable effect. I think that (as said by Alan) a fine-tune
> of register pressure calculation will be needed to push this forward.
>
> Uros.
>
> > ------------------
> > modified   gcc/config/i386/i386-features.c
> > @@ -590,7 +590,7 @@ general_scalar_chain::compute_convert_gain ()
> >    if (dump_file)
> >      fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> >
> > -  /* ???  What about integer to SSE?  */
> > +  /* ???  What about integer to SSE?  */???
> >    EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi)
> >      cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer;
> > ------------------
> > --
> > BR,
> > Hongtao


Note:
Removing limit of cost would introduce lots of regressions in SPEC2017 as follow
--------------------------------
531.deepsjeng_r  -7.18%
548.exchange_r  -6.70%
557.xz_r -6.74%
508.namd_r -2.81%
527.cam4_r -6.48%
544.nab_r -3.99%

Tested on skylake server.
-------------------------------------
How about  changing cost from 2 to 8 until we figure out a better number.

-- 
BR,
Hongtao

Re: [PATCH, i386]: Do not limit the cost of moves to/from XMM register to minimum 8.

Reply via email to