On Fri, 2012-02-03 at 14:41 +0100, Richard Guenther wrote: > On Fri, Feb 3, 2012 at 2:24 PM, William J. Schmidt > <wschm...@linux.vnet.ibm.com> wrote: > > This fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50969 by slightly > > raising the cost of vector permutes on powerpc64 VSX targets (and > > ensuring those costs are correctly used). This reverses the performance > > loss for 168.wupwise, and gives a slight boost to 433.milc as well. > > > > In the long run, we will want to model VSX permutes differently, since > > the real issue is that only one floating-point pipe can hold a permute > > at a time. Thus the present patch can be overly conservative when > > permutes are rare compared with other vector instructions. > > > > Bootstrapped and regtested on powerpc64-linux-gnu with no failures. OK > > for trunk? > > Note this makes permutes artificially cheap for AMD K8, K10 and > Bulldozer. Can you change config/i386/i386.c:ix86_builtin_vectorization_cost > to return ix86_cost->vec_stmt_cost instead of one for vec_perm? > The cost is otherwise only queried by SLP vectorization it seems.
Sure, will do. > > Otherwise this looks ok. Please give other maintainers a chance to > chime in (other cost hooks might need similar adjustments). I'll give this until at least late Monday before committing. Thanks for the quick response! Bill > > Thanks, > Richard. > > > Thanks, > > Bill > > > > > > 2012-02-03 Bill Schmidt <wschm...@linux.vnet.ibm.com> > > > > PR tree-optimization/50969 > > * tree-vect-stmts.c (vect_model_store_cost): Correct statement cost > > to > > use vec_perm rather than vector_stmt. > > (vect_model_load_cost): Likewise. > > * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise > > cost of vec_perm for TARGET_VSX. > > > > > > Index: gcc/tree-vect-stmts.c > > =================================================================== > > --- gcc/tree-vect-stmts.c (revision 183871) > > +++ gcc/tree-vect-stmts.c (working copy) > > @@ -882,7 +882,7 @@ vect_model_store_cost (stmt_vec_info stmt_info, in > > { > > /* Uses a high and low interleave operation for each needed permute. > > */ > > inside_cost = ncopies * exact_log2(group_size) * group_size > > - * vect_get_stmt_cost (vector_stmt); > > + * vect_get_stmt_cost (vec_perm); > > > > if (vect_print_dump_info (REPORT_COST)) > > fprintf (vect_dump, "vect_model_store_cost: strided group_size = %d > > .", > > @@ -988,7 +988,7 @@ vect_model_load_cost (stmt_vec_info stmt_info, int > > { > > /* Uses an even and odd extract operations for each needed permute. > > */ > > inside_cost = ncopies * exact_log2(group_size) * group_size > > - * vect_get_stmt_cost (vector_stmt); > > + * vect_get_stmt_cost (vec_perm); > > > > if (vect_print_dump_info (REPORT_COST)) > > fprintf (vect_dump, "vect_model_load_cost: strided group_size = %d > > .", > > Index: gcc/config/rs6000/rs6000.c > > =================================================================== > > --- gcc/config/rs6000/rs6000.c (revision 183871) > > +++ gcc/config/rs6000/rs6000.c (working copy) > > @@ -3540,9 +3540,13 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ > > case vec_to_scalar: > > case scalar_to_vec: > > case cond_branch_not_taken: > > - case vec_perm: > > return 1; > > > > + case vec_perm: > > + if (!TARGET_VSX) > > + return 1; > > + return 2; > > + > > case cond_branch_taken: > > return 3; > > > > > > >