On Wed, Sep 4, 2019 at 9:44 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak <ubiz...@gmail.com> wrote:
> >
> > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> >
> > > > > Note:
> > > > > Removing limit of cost would introduce lots of regressions in 
> > > > > SPEC2017 as follow
> > > > > --------------------------------
> > > > > 531.deepsjeng_r  -7.18%
> > > > > 548.exchange_r  -6.70%
> > > > > 557.xz_r -6.74%
> > > > > 508.namd_r -2.81%
> > > > > 527.cam4_r -6.48%
> > > > > 544.nab_r -3.99%
> > > > >
> > > > > Tested on skylake server.
> > > > > -------------------------------------
> > > > > How about  changing cost from 2 to 8 until we figure out a better 
> > > > > number.
> > > >
> > > > Certainly works for me.  Note the STV code uses the "other" 
> > > > sse_to_integer
> > > > number and the numbers in question here are those for the RA.  There's
> > > > a multitude of values used in the tables here, including some a lot 
> > > > larger.
> > > > So the overall bumping to 8 certainly was the wrong thing to do and 
> > > > instead
> > > > individual numbers should have been adjusted (didn't look at the history
> > > > of that bumping).
> > >
> > > For reference:
> > >
> > > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines
> > >
> > >     PR target/32413
> > >     * config/i386/i386.c (ix86_register_move_cost): Rise the cost of
> > >     moves between MMX/SSE registers to at least 8 units to prevent
> > >     ICE caused by non-tieable SI/HI/QImodes in SSE registers.
> > >
> > > should probably have been "twice the cost of X" or something like that
> > > instead where X be some reg-reg move cost.
> >
> > Thanks for the reference. It looks that the patch fixes the issue in
> > the wrong place, this should be solved in
> > inline_secondary_memory_needed:
> >
> >       /* Between SSE and general, we have moves no larger than word size.  
> > */
> >       if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2))
> >            || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode)
> >            || GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> >         return true;
> >
> > as an alternative to implement QI and HImode moves as a SImode move
> > between SSE and int<->SSE registers. We have
> > ix86_secondary_memory_needed_mode that extends QI and HImode secondary
> > memory to SImode, so this should solve PR32413.
> >
> > Other than that, what to do with the bizzare property of direct moves
> > that benchmark far worse than indirect moves? I was expecting that
> > keeping the cost of direct inter-regset moves just a bit below the
> > cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves
> > would prevent unwanted wandering of values between register sets,
> > while still generating the direct move when needed. While this almost
>
> I've not tested it yet.
> So i'll start a test about this patch(change cost from 2-->6) with
> Richard's change.
> I'll keep you informed when finishing test.
>
> > fixes the runtime regression, it is not clear to me from Hongtao Liu's
> > message if  Richard's 2019-08-27 fixes the remaining regression or
> > not). Liu, can you please clarify?
> >
> --------------------------------
> 531.deepsjeng_r  -7.18%
> 548.exchange_r  -6.70%
> 557.xz_r -6.74%
> 508.namd_r -2.81%
> 527.cam4_r -6.48%
> 544.nab_r -3.99%
>
> Tested on skylake server.
> -------------------------------------
> Those regressions are comparing gcc10_20190830 to gcc10_20190824 which
> are mainly caused by removing limit of 8.
>
> > > >  For example Pentium4 has quite high bases for move
> > > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20
> > > > while the opposite 12.
> > > >
> > > > So yes, we want to revert the patch by applying its effect to the
> > > > individual cost tables so we can revisit this for the still interesting
> > > > micro-architectures.
> >
> > Uros.
>
>
>
> --
> BR,
> Hongtao

Change cost from 2->6 got
-------------
531.deepsjeng_r  9.64%
548.exchange_r  10.24%
557.xc_r              7.99%
508.namd_r         1.08%
527.cam4_r          6.91%
553.nab_r            3.06%
------------

for 531,548,557,527, even better comparing to version before regression.
for 508,533, still little regressions comparing to version before regression.
-- 
BR,
Hongtao

Reply via email to