On Wed, Sep 4, 2019 at 9:44 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Sep 4, 2019 at 12:50 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Tue, Sep 3, 2019 at 1:33 PM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > > Note: > > > > > Removing limit of cost would introduce lots of regressions in > > > > > SPEC2017 as follow > > > > > -------------------------------- > > > > > 531.deepsjeng_r -7.18% > > > > > 548.exchange_r -6.70% > > > > > 557.xz_r -6.74% > > > > > 508.namd_r -2.81% > > > > > 527.cam4_r -6.48% > > > > > 544.nab_r -3.99% > > > > > > > > > > Tested on skylake server. > > > > > ------------------------------------- > > > > > How about changing cost from 2 to 8 until we figure out a better > > > > > number. > > > > > > > > Certainly works for me. Note the STV code uses the "other" > > > > sse_to_integer > > > > number and the numbers in question here are those for the RA. There's > > > > a multitude of values used in the tables here, including some a lot > > > > larger. > > > > So the overall bumping to 8 certainly was the wrong thing to do and > > > > instead > > > > individual numbers should have been adjusted (didn't look at the history > > > > of that bumping). > > > > > > For reference: > > > > > > r125951 | uros | 2007-06-22 19:51:06 +0200 (Fri, 22 Jun 2007) | 6 lines > > > > > > PR target/32413 > > > * config/i386/i386.c (ix86_register_move_cost): Rise the cost of > > > moves between MMX/SSE registers to at least 8 units to prevent > > > ICE caused by non-tieable SI/HI/QImodes in SSE registers. > > > > > > should probably have been "twice the cost of X" or something like that > > > instead where X be some reg-reg move cost. > > > > Thanks for the reference. It looks that the patch fixes the issue in > > the wrong place, this should be solved in > > inline_secondary_memory_needed: > > > > /* Between SSE and general, we have moves no larger than word size. > > */ > > if (!(INTEGER_CLASS_P (class1) || INTEGER_CLASS_P (class2)) > > || GET_MODE_SIZE (mode) < GET_MODE_SIZE (SImode) > > || GET_MODE_SIZE (mode) > UNITS_PER_WORD) > > return true; > > > > as an alternative to implement QI and HImode moves as a SImode move > > between SSE and int<->SSE registers. We have > > ix86_secondary_memory_needed_mode that extends QI and HImode secondary > > memory to SImode, so this should solve PR32413. > > > > Other than that, what to do with the bizzare property of direct moves > > that benchmark far worse than indirect moves? I was expecting that > > keeping the cost of direct inter-regset moves just a bit below the > > cost of int<->mem<->xmm, but (much ?) higher than itra-regset moves > > would prevent unwanted wandering of values between register sets, > > while still generating the direct move when needed. While this almost > > I've not tested it yet. > So i'll start a test about this patch(change cost from 2-->6) with > Richard's change. > I'll keep you informed when finishing test. > > > fixes the runtime regression, it is not clear to me from Hongtao Liu's > > message if Richard's 2019-08-27 fixes the remaining regression or > > not). Liu, can you please clarify? > > > -------------------------------- > 531.deepsjeng_r -7.18% > 548.exchange_r -6.70% > 557.xz_r -6.74% > 508.namd_r -2.81% > 527.cam4_r -6.48% > 544.nab_r -3.99% > > Tested on skylake server. > ------------------------------------- > Those regressions are comparing gcc10_20190830 to gcc10_20190824 which > are mainly caused by removing limit of 8. > > > > > For example Pentium4 has quite high bases for move > > > > costs, like xmm <-> xmm move costing 12 and SSE->integer costing 20 > > > > while the opposite 12. > > > > > > > > So yes, we want to revert the patch by applying its effect to the > > > > individual cost tables so we can revisit this for the still interesting > > > > micro-architectures. > > > > Uros. > > > > -- > BR, > Hongtao
Change cost from 2->6 got ------------- 531.deepsjeng_r 9.64% 548.exchange_r 10.24% 557.xc_r 7.99% 508.namd_r 1.08% 527.cam4_r 6.91% 553.nab_r 3.06% ------------ for 531,548,557,527, even better comparing to version before regression. for 508,533, still little regressions comparing to version before regression. -- BR, Hongtao