On Thu, Jun 28, 2012 at 04:05:58PM +0200, Jakub Jelinek wrote: >On Thu, Jun 28, 2012@09:17:55AM +0200, Jakub Jelinek wrote: >> I'll look@using MULT_HIGHPART_EXPR in the pattern recognizer and >> vectorizing it as either of the sequences next. > >And here is corresponding pattern recognizer and vectorizer patch. > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > >Unfortunately the addition of the builtin_mul_widen_* hooks on i?86 seems >to pessimize the generated code for gcc.dg/vect/pr51581-3.c >testcase (at least with -O3 -mavx) compared to when the hooks aren't >present, because i?86 has more natural support for widen mult lo/hi >compoared to widen mult even/odd, but I assume that on powerpc it is the >other way around. So, how should I find out if both VEC_WIDEN_MULT_*_EXPR >and builtin_mul_widen_* are possible for the particular vectype which one >will be cheaper? > >2012-06-28 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/51581 > * tree-vect-stmts.c (permute_vec_elements): Add forward decl. > (vectorizable_operation): Handle vectorization of MULT_HIGHPART_EXPR > also using VEC_WIDEN_MULT_*_EXPR or builtin_mul_widen_* plus > VEC_PERM_EXPR if vector MULT_HIGHPART_EXPR isn't supported. > * tree-vect-patterns.c (vect_recog_divmod_pattern): Use > MULT_HIGHPART_EXPR instead of VEC_WIDEN_MULT_*_EXPR and shifts. > > * gcc.dg/vect/pr51581-4.c: New test. > >--- gcc/tree-vect-stmts.c.jj 2012-06-26 11:38:28.000000000 +0200 >+++ gcc/tree-vect-stmts.c 2012-06-28 13:27:50.475158271 +0200 >@@ -3300,17 +3304,18 @@ static bool
>+ icode = optab ? (int) optab_handler (optab, vec_mode) : CODE_FOR_nothing; >+ >+ if (icode == CODE_FOR_nothing >+ && code == MULT_HIGHPART_EXPR >+ && VECTOR_MODE_P (vec_mode) >+ && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN) >+ { >+ /* If MULT_HIGHPART_EXPR isn't supported by the backend, see >+ if we can emit VEC_WIDEN_MULT_{LO,HI}_EXPR followed by VEC_PERM_EXPR >+ or builtin_mul_widen_{even,odd} followed by VEC_PERM_EXPR. */ >+ unsigned int prec = TYPE_PRECISION (TREE_TYPE (scalar_dest)); >+ unsigned int unsignedp = TYPE_UNSIGNED (TREE_TYPE (scalar_dest)); >+ tree wide_type >+ = build_nonstandard_integer_type (prec * 2, unsignedp); >+ wide_vectype >+ = get_same_sized_vectype (wide_type, vectype); >+ >+ sel = XALLOCAVEC (unsigned char, nunits_in); >+ if (VECTOR_MODE_P (TYPE_MODE (wide_vectype)) >+ && GET_MODE_SIZE (TYPE_MODE (wide_vectype)) >+ == GET_MODE_SIZE (vec_mode)) >+ { >+ if (targetm.vectorize.builtin_mul_widen_even >+ && (decl1 = targetm.vectorize.builtin_mul_widen_even (vectype)) >+ && targetm.vectorize.builtin_mul_widen_odd >+ && (decl2 = targetm.vectorize.builtin_mul_widen_odd (vectype)) >+ && TYPE_MODE (TREE_TYPE (TREE_TYPE (decl1))) >+ == TYPE_MODE (wide_vectype)) >+ { >+ for (i = 0; i < nunits_in; i++) >+ sel[i] = !BYTES_BIG_ENDIAN + (i & ~1) >+ + ((i & 1) ? nunits_in : 0); >+ if (0 && can_vec_perm_p (vec_mode, false, sel)) >+ icode = 0; >+ } >+ if (icode == CODE_FOR_nothing) >+ { >+ decl1 = NULL_TREE; >+ decl2 = NULL_TREE; >+ optab = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR, >+ vectype, optab_default); >+ optab2 = optab_for_tree_code (VEC_WIDEN_MULT_HI_EXPR, >+ vectype, optab_default); Really both HI? If so optab2 could be removed from that fn altogether.. >+ if (optab != NULL >+ && optab2 != NULL >+ && optab_handler (optab, vec_mode) != CODE_FOR_nothing >+ && optab_handler (optab2, vec_mode) != CODE_FOR_nothing) >+ { >+ for (i = 0; i < nunits_in; i++) >+ sel[i] = !BYTES_BIG_ENDIAN + 2 * i; >+ if (can_vec_perm_p (vec_mode, false, sel)) >+ icode = optab_handler (optab, vec_mode); >+ } >+ } >+ } >+ if (icode == CODE_FOR_nothing) >+ { >+ if (optab_for_tree_code (code, vectype, optab_default) == NULL) >+ { >+ if (vect_print_dump_info (REPORT_DETAILS)) >+ fprintf (vect_dump, "no optab."); >+ return false; >+ } >+ wide_vectype = NULL_TREE; >+ optab2 = NULL; >+ } >+ } >+