On Tue, Apr 7, 2015 at 10:19 AM, Richard Biener <rguent...@suse.de> wrote: > > They are suspiciously low (compared to say scalar_stmt_cost) and with > them and the fix for the vectorizer cost model to properly account > scalar stmt costs (and thus correctly dealing with odd costs as bdverN > have) we regress 252.eon because we consider a loop vectorized and > peeled for alignment loop profitable which clearly isn't. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. I've > tested with all b[dt]verN -marchs and the slp-pr56812.cc testcase > (yes, we've run into a similar issue earlier). I've also put the > patch on our SPEC tester to look for fallout. > > It really looks like the costs were derived by some automatic > searching of the parameter space and thus "optimizing" for bugs > in the vectorizer cost model that have meanwhile been fixed > (scalar stmt cost == 6 but scalar load/store cost == 4!?). It is > not a good idea to put in paramters that you can't make sense of > from an architectural point of view (yes, taken/not-taken branch > is somewhat bogus kinds, I'd like to change that to correctly > predicted / wrongly predicted for GCC 6). > > Ok for trunk and 4.9 branch?
I have added a person from AMD to comment on the decision. Otherwise, the patch looks OK, but please wait a couple of days for possible comments. Thanks, Uros. > Thanks, > Richard. > > 2015-04-07 Richard Biener <rguent...@suse.de> > > PR target/65660 > * config/i386/i386.c (bdver1_cost): Double cond_taken_branch_cost > and cond_not_taken_branch_cost to 4 and 2. > (bdver2_cost): Likewise. > (bdver3_cost): Likewise. > (bdver4_cost): Likewise. > > Index: gcc/config/i386/i386.c > =================================================================== > *** gcc/config/i386/i386.c (revision 221888) > --- gcc/config/i386/i386.c (working copy) > *************** const struct processor_costs bdver1_cost > *** 1025,1032 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER2 has optimized REP instruction for medium sized blocks, but for > --- 1025,1032 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER2 has optimized REP instruction for medium sized blocks, but for > *************** const struct processor_costs bdver2_cost > *** 1121,1128 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > > --- 1121,1128 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > > *************** struct processor_costs bdver3_cost = { > *** 1208,1215 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER4 has optimized REP instruction for medium sized blocks, but for > --- 1208,1215 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER4 has optimized REP instruction for medium sized blocks, but for > *************** struct processor_costs bdver4_cost = { > *** 1294,1301 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BTVER1 has optimized REP instruction for medium sized blocks, but for > --- 1294,1301 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BTVER1 has optimized REP instruction for medium sized blocks, but for