On 08 Sep 15:37, Ilya Enkovich wrote: > 2015-09-04 23:42 GMT+03:00 Jeff Law <l...@redhat.com>: > > > > So do we have enough confidence in this representation that we want to go > > ahead and commit to it? > > I think new representation fits nice mostly. There are some places > where I have to make some exceptions for vector of bools to make it > work. This is mostly to avoid target modifications. I'd like to avoid > necessity to change all targets currently supporting vec_cond. It > makes me add some special handling of vec<bool> in GIMPLE, e.g. I add > a special code in vect_init_vector to build vec<bool> invariants with > proper casting to int. Otherwise I'd need to do it on a target side. > > I made several fixes and current patch (still allowing integer vector > result for vector comparison and applying bool patterns) passes > bootstrap and regression testing on x86_64. Now I'll try to fully > switch to vec<bool> and see how it goes. > > Thanks, > Ilya >
Hi, I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target. Several issues were met. - c/c++ front-ends generate vector comparison with integer vector result. I had to make some modifications to use vec_cond instead. Don't know if there are other front-ends producing vector comparisons. - vector lowering fails to expand vector masks due to mismatch of type and mode sizes. I fixed vector type size computation to match mode size and added a special handling of mask expand. - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here. But get_mask_mode call may be moved into type layout to get it. - Expand of vec<bool> constants/contstructors requires special handling. Common case should require target hooks/optabs to expand vector into required mode. But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes. Currently 'make check' shows two types of regression. - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). This must be due to my front-end changes. Hope it will be easy to fix. - missed vectorization. All of them appear due to bool patterns disabling. I didn't look into all of them but it seems the main problem is in mixed type sizes. With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation. With boolean mask we don't have a proper scalar statements to do that. I think mask widening/narrowing may be directly supported in masked statements vectorization. Going to look into it. I attach what I currently have for a prototype. It grows bigger so I split into several parts. Thanks, Ilya -- * avx512-vec-bool-01-add-truth-vector.ChangeLog 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * doc/tm.texi: Regenerated. * doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New. * stor-layout.c (layout_type): Use mode to get vector mask size. (vector_type_mode): Likewise. * target.def (get_mask_mode): New. * targhooks.c (default_vector_alignment): Use mode alignment for vector masks. (default_get_mask_mode): New. * targhooks.h (default_get_mask_mode): New. * tree.c (make_vector_type): Vector mask has no canonical type. (build_truth_vector_type): New. (build_same_sized_truth_vector_type): New. (truth_type_for): Support vector masks. * tree.h (VECTOR_MASK_TYPE_P): New. (build_truth_vector_type): New. (build_same_sized_truth_vector_type): New. * avx512-vec-bool-02-no-int-vec-cmp.ChangeLog gcc/ 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * tree-cfg.c (verify_gimple_comparison) Require vector mask type for vector comparison. (verify_gimple_assign_ternary): Likewise. gcc/c 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * c-typeck.c (build_conditional_expr): Use vector mask type for vector comparison. (build_vec_cmp): New. (build_binary_op): Use build_vec_cmp for comparison. gcc/cp 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * call.c (build_conditional_expr_1): Use vector mask type for vector comparison. * typeck.c (build_vec_cmp): New. (cp_build_binary_op): Use build_vec_cmp for comparison. * avx512-vec-bool-03-vec-lower.ChangeLog 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * tree-vect-generic.c (tree_vec_extract): Use additional comparison when extracting boolean value. (do_bool_compare): New. (expand_vector_comparison): Add casts for vector mask. (expand_vector_divmod): Use vector mask type for vector comparison. (expand_vector_operations_1) Skip scalar mode mask statements. * avx512-vec-bool-04-vectorize.ChangeLog gcc/ 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results. (const_vector_mask_from_tree): New. (const_vector_from_tree): Use const_vector_mask_from_tree for vector masks. * internal-fn.c (expand_MASK_LOAD): Adjust to optab changes. (expand_MASK_STORE): Likewise. * optabs.c (vector_compare_rtx): Add OPNO arg. (expand_vec_cond_expr): Adjust to vector_compare_rtx change. (get_vec_cmp_icode): New. (expand_vec_cmp_expr_p): New. (expand_vec_cmp_expr): New. (can_vec_mask_load_store_p): Add MASK_MODE arg. * optabs.def (vec_cmp_optab): New. (vec_cmpu_optab): New. (maskload_optab): Transform into convert optab. (maskstore_optab): Likewise. * optabs.h (expand_vec_cmp_expr_p): New. (expand_vec_cmp_expr): New. (can_vec_mask_load_store_p): Add MASK_MODE arg. * tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. (predicate_mem_writes): Use boolean mask. * tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var. (vect_create_destination_var): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask operations for VF. Add mask type computation. * tree-vect-stmts.c (vect_init_vector): Support mask invariants. (vect_get_vec_def_for_operand): Support mask constant. (vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p signature change. (vectorizable_condition): Use vector mask type for vector comparison. (vectorizable_comparison): New. (vect_analyze_stmt): Add vectorizable_comparison. (vect_transform_stmt): Likewise. (get_mask_type_for_scalar_type): New. * tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var (enum stmt_vec_info_type): Add comparison_vec_info_type. (get_mask_type_for_scalar_type): New. * avx512-vec-bool-05-bool-patterns.ChangeLog 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * tree-vect-patterns.c (check_bool_pattern): Check fails if we can vectorize comparison directly. (search_type_for_mask): New. (vect_recog_bool_pattern): Support cases when bool pattern check fails. * avx512-vec-bool-06-i386.ChangeLog 2015-09-15 Ilya Enkovich <enkovich....@gmail.com> * config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New. (ix86_expand_int_vec_cmp): New. (ix86_expand_fp_vec_cmp): New. * config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for op_true and op_false. (ix86_int_cmp_code_to_pcmp_immediate): New. (ix86_fp_cmp_code_to_pcmp_immediate): New. (ix86_cmp_code_to_pcmp_immediate): New. (ix86_expand_mask_vec_cmp): New. (ix86_expand_fp_vec_cmp): New. (ix86_expand_int_sse_cmp): New. (ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp. (ix86_expand_int_vec_cmp): New. (ix86_get_mask_mode): New. (TARGET_VECTORIZE_GET_MASK_MODE): New. * config/i386/sse.md (avx512fmaskmodelower): New. (vec_cmp<mode><avx512fmaskmodelower>): New. (vec_cmp<mode><sseintvecmodelower>): New. (vec_cmpv2div2di): New. (vec_cmpu<mode><avx512fmaskmodelower>): New. (vec_cmpu<mode><sseintvecmodelower>): New. (vec_cmpuv2div2di): New. (maskload<mode>): Rename to ... (maskload<mode><sseintvecmodelower>): ... this. (maskstore<mode>): Rename to ... (maskstore<mode><sseintvecmodelower>): ... this. (maskload<mode><avx512fmaskmodelower>): New. (maskstore<mode><avx512fmaskmodelower>): New.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f5a1f84..acdfcd5 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5688,6 +5688,11 @@ mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}. The default is zero which means to not iterate over other vector sizes. @end deftypefn +@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length}) +This hook returns mode to be used for a mask to be used for a vector +of specified @var{length} with @var{nunits} elements. +@end deftypefn + @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop *@var{loop_info}) This hook should initialize target-specific data structures in preparation for modeling the costs of vectorizing a loop or basic block. The default allocates three unsigned integers for accumulating costs for the prologue, body, and epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it identifies the loop being vectorized; otherwise a single block is being vectorized. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 9d5ac0a..52e912a 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4225,6 +4225,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +@hook TARGET_VECTORIZE_GET_MASK_MODE + @hook TARGET_VECTORIZE_INIT_COST @hook TARGET_VECTORIZE_ADD_STMT_COST diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c index 938e54b..f24a0c4 100644 --- a/gcc/stor-layout.c +++ b/gcc/stor-layout.c @@ -2184,11 +2184,22 @@ layout_type (tree type) TYPE_SATURATING (type) = TYPE_SATURATING (TREE_TYPE (type)); TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type)); - TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR, - TYPE_SIZE_UNIT (innertype), - size_int (nunits)); - TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype), - bitsize_int (nunits)); + if (VECTOR_MASK_TYPE_P (type)) + { + TYPE_SIZE_UNIT (type) + = size_int (GET_MODE_SIZE (type->type_common.mode)); + TYPE_SIZE (type) + = bitsize_int (GET_MODE_BITSIZE (type->type_common.mode)); + } + else + { + TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR, + TYPE_SIZE_UNIT (innertype), + size_int (nunits)); + TYPE_SIZE (type) = int_const_binop (MULT_EXPR, + TYPE_SIZE (innertype), + bitsize_int (nunits)); + } /* For vector types, we do not default to the mode's alignment. Instead, query a target hook, defaulting to natural alignment. @@ -2455,7 +2466,14 @@ vector_type_mode (const_tree t) machine_mode innermode = TREE_TYPE (t)->type_common.mode; /* For integers, try mapping it to a same-sized scalar mode. */ - if (GET_MODE_CLASS (innermode) == MODE_INT) + if (VECTOR_MASK_TYPE_P (t)) + { + mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0); + + if (mode != VOIDmode && have_regs_of_mode[mode]) + return mode; + } + else if (GET_MODE_CLASS (innermode) == MODE_INT) { mode = mode_for_size (TYPE_VECTOR_SUBPARTS (t) * GET_MODE_BITSIZE (innermode), MODE_INT, 0); diff --git a/gcc/target.def b/gcc/target.def index 4edc209..c5b8ed9 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over other vector sizes.", (void), default_autovectorize_vector_sizes) +/* Function to get a target mode for a vector mask. */ +DEFHOOK +(get_mask_mode, + "This hook returns mode to be used for a mask to be used for a vector\n\ +of specified @var{length} with @var{nunits} elements.", + machine_mode, + (unsigned nunits, unsigned length), + default_get_mask_mode) + /* Target builtin that implements vector gather operation. */ DEFHOOK (builtin_gather, diff --git a/gcc/targhooks.c b/gcc/targhooks.c index 7238c8f..ac01d57 100644 --- a/gcc/targhooks.c +++ b/gcc/targhooks.c @@ -1087,6 +1087,20 @@ default_autovectorize_vector_sizes (void) return 0; } +/* By defaults a vector of integers is used as a mask. */ + +machine_mode +default_get_mask_mode (unsigned nunits, unsigned vector_size) +{ + unsigned elem_size = vector_size / nunits; + machine_mode elem_mode + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); + + gcc_assert (elem_size * nunits == vector_size); + + return mode_for_vector (elem_mode, nunits); +} + /* By default, the cost model accumulates three separate costs (prologue, loop body, and epilogue) for a vectorized loop or block. So allocate an array of three unsigned ints, set it to zero, and return its address. */ diff --git a/gcc/targhooks.h b/gcc/targhooks.h index 5ae991d..cc7263f 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode mode, int, bool); extern machine_mode default_preferred_simd_mode (machine_mode mode); extern unsigned int default_autovectorize_vector_sizes (void); +extern machine_mode default_get_mask_mode (unsigned, unsigned); extern void *default_init_cost (struct loop *); extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt, struct _stmt_vec_info *, int, diff --git a/gcc/tree.c b/gcc/tree.c index af3a6a3..946d2ad 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -9742,8 +9742,9 @@ make_vector_type (tree innertype, int nunits, machine_mode mode) if (TYPE_STRUCTURAL_EQUALITY_P (innertype)) SET_TYPE_STRUCTURAL_EQUALITY (t); - else if (TYPE_CANONICAL (innertype) != innertype - || mode != VOIDmode) + else if ((TYPE_CANONICAL (innertype) != innertype + || mode != VOIDmode) + && !VECTOR_MASK_TYPE_P (t)) TYPE_CANONICAL (t) = make_vector_type (TYPE_CANONICAL (innertype), nunits, VOIDmode); @@ -10568,6 +10569,36 @@ build_vector_type (tree innertype, int nunits) return make_vector_type (innertype, nunits, VOIDmode); } +/* Build truth vector with specified length and number of units. */ + +tree +build_truth_vector_type (unsigned nunits, unsigned vector_size) +{ + machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits, + vector_size); + + if (mask_mode == VOIDmode) + return NULL; + + return make_vector_type (boolean_type_node, nunits, mask_mode); +} + +/* Returns a vector type corresponding to a comparison of VECTYPE. */ + +tree +build_same_sized_truth_vector_type (tree vectype) +{ + if (VECTOR_MASK_TYPE_P (vectype)) + return vectype; + + unsigned HOST_WIDE_INT size = GET_MODE_SIZE (TYPE_MODE (vectype)); + + if (!size) + size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype)); + + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size); +} + /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set. */ tree @@ -11054,9 +11085,10 @@ truth_type_for (tree type) { if (TREE_CODE (type) == VECTOR_TYPE) { - tree elem = lang_hooks.types.type_for_size - (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0); - return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type)); + if (VECTOR_MASK_TYPE_P (type)) + return type; + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type), + GET_MODE_SIZE (TYPE_MODE (type))); } else return boolean_type_node; diff --git a/gcc/tree.h b/gcc/tree.h index 2cd6ec4..09fb26d 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -469,6 +469,12 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int, #define VECTOR_TYPE_P(TYPE) (TREE_CODE (TYPE) == VECTOR_TYPE) +/* Nonzero if TYPE represents a vector of booleans. */ + +#define VECTOR_MASK_TYPE_P(TYPE) \ + (TREE_CODE (TYPE) == VECTOR_TYPE \ + && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE) + /* Nonzero if TYPE represents an integral type. Note that we do not include COMPLEX types here. Keep these checks in ascending code order. */ @@ -3820,6 +3826,8 @@ extern tree build_reference_type_for_mode (tree, machine_mode, bool); extern tree build_reference_type (tree); extern tree build_vector_type_for_mode (tree, machine_mode); extern tree build_vector_type (tree innertype, int nunits); +extern tree build_truth_vector_type (unsigned, unsigned); +extern tree build_same_sized_truth_vector_type (tree vectype); extern tree build_opaque_vector_type (tree innertype, int nunits); extern tree build_index_type (tree); extern tree build_array_type (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index e8c8189..6ea4f19 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -4753,6 +4753,18 @@ build_conditional_expr (location_t colon_loc, tree ifexp, bool ifexp_bcp, && TREE_CODE (orig_op2) == INTEGER_CST && !TREE_OVERFLOW (orig_op2))); } + + /* Need to convert condition operand into a vector mask. */ + if (VECTOR_TYPE_P (TREE_TYPE (ifexp))) + { + tree vectype = TREE_TYPE (ifexp); + tree elem_type = TREE_TYPE (vectype); + tree zero = build_int_cst (elem_type, 0); + tree zero_vec = build_vector_from_val (vectype, zero); + tree cmp_type = build_same_sized_truth_vector_type (vectype); + ifexp = build2 (NE_EXPR, cmp_type, ifexp, zero_vec); + } + if (int_const || (ifexp_bcp && TREE_CODE (ifexp) == INTEGER_CST)) ret = fold_build3_loc (colon_loc, COND_EXPR, result_type, ifexp, op1, op2); else @@ -10195,6 +10207,19 @@ push_cleanup (tree decl, tree cleanup, bool eh_only) STATEMENT_LIST_STMT_EXPR (list) = stmt_expr; } +/* Build a vector comparison using VEC_COND_EXPR. */ + +static tree +build_vec_cmp (tree_code code, tree type, + tree arg0, tree arg1) +{ + tree zero_vec = build_zero_cst (type); + tree minus_one_vec = build_minus_one_cst (type); + tree cmp_type = build_same_sized_truth_vector_type (type); + tree cmp = build2 (code, cmp_type, arg0, arg1); + return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); +} + /* Build a binary-operation expression without default conversions. CODE is the kind of expression to build. LOCATION is the operator's location. @@ -10753,7 +10778,8 @@ build_binary_op (location_t location, enum tree_code code, result_type = build_opaque_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); converted = 1; - break; + ret = build_vec_cmp (resultcode, result_type, op0, op1); + goto return_build_binary_op; } if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1)) warning_at (location, @@ -10895,7 +10921,8 @@ build_binary_op (location_t location, enum tree_code code, result_type = build_opaque_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); converted = 1; - break; + ret = build_vec_cmp (resultcode, result_type, op0, op1); + goto return_build_binary_op; } build_type = integer_type_node; if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 8d4a9e2..7f16e84 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -4727,8 +4727,10 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3, } if (!COMPARISON_CLASS_P (arg1)) - arg1 = cp_build_binary_op (loc, NE_EXPR, arg1, - build_zero_cst (arg1_type), complain); + { + tree cmp_type = build_same_sized_truth_vector_type (arg1_type); + arg1 = build2 (NE_EXPR, cmp_type, arg1, build_zero_cst (arg1_type)); + } return fold_build3 (VEC_COND_EXPR, arg2_type, arg1, arg2, arg3); } diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c index 83fd34c..89bacc2 100644 --- a/gcc/cp/typeck.c +++ b/gcc/cp/typeck.c @@ -3898,6 +3898,18 @@ build_binary_op (location_t location, enum tree_code code, tree op0, tree op1, return cp_build_binary_op (location, code, op0, op1, tf_warning_or_error); } +/* Build a vector comparison using VEC_COND_EXPR. */ + +static tree +build_vec_cmp (tree_code code, tree type, + tree arg0, tree arg1) +{ + tree zero_vec = build_zero_cst (type); + tree minus_one_vec = build_minus_one_cst (type); + tree cmp_type = build_same_sized_truth_vector_type(type); + tree cmp = build2 (code, cmp_type, arg0, arg1); + return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec); +} /* Build a binary-operation expression without default conversions. CODE is the kind of expression to build. @@ -4757,7 +4769,7 @@ cp_build_binary_op (location_t location, result_type = build_opaque_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); converted = 1; - break; + return build_vec_cmp (resultcode, result_type, op0, op1); } build_type = boolean_type_node; if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index 5ac73b3..2ce5a84 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1) return true; } } - /* Or an integer vector type with the same size and element count + /* Or a boolean vector type with the same element count as the comparison operand types. */ else if (TREE_CODE (type) == VECTOR_TYPE - && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) + && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) { if (TREE_CODE (op0_type) != VECTOR_TYPE || TREE_CODE (op1_type) != VECTOR_TYPE) @@ -3478,12 +3478,7 @@ verify_gimple_comparison (tree type, tree op0, tree op1) return true; } - if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type) - || (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (type))) - != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))) - /* The result of a vector comparison is of signed - integral type. */ - || TYPE_UNSIGNED (TREE_TYPE (type))) + if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)) { error ("invalid vector comparison resulting type"); debug_generic_expr (type); @@ -3970,15 +3965,13 @@ verify_gimple_assign_ternary (gassign *stmt) break; case VEC_COND_EXPR: - if (!VECTOR_INTEGER_TYPE_P (rhs1_type) - || TYPE_SIGN (rhs1_type) != SIGNED - || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type) + if (!VECTOR_MASK_TYPE_P (rhs1_type) || TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (lhs_type)) { - error ("the first argument of a VEC_COND_EXPR must be of a signed " - "integral vector type of the same size and number of " - "elements as the result"); + error ("the first argument of a VEC_COND_EXPR must be of a " + "boolean vector type of the same number of elements " + "as the result"); debug_generic_expr (lhs_type); debug_generic_expr (rhs1_type); return true;
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c index be3d27f..a89b08c 100644 --- a/gcc/tree-vect-generic.c +++ b/gcc/tree-vect-generic.c @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type, tree t, tree bitsize, tree bitpos) { if (bitpos) - return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); + { + if (TREE_CODE (type) == BOOLEAN_TYPE) + { + tree itype + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0); + tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t, + bitsize, bitpos); + return gimplify_build2 (gsi, NE_EXPR, type, field, + build_zero_cst (itype)); + } + else + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); + } else return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); } @@ -171,6 +183,21 @@ do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, build_int_cst (comp_type, 0)); } +/* Construct expression (A[BITPOS] code B[BITPOS]) + + INNER_TYPE is the type of A and B elements + + returned expression is of boolean type. */ +static tree +do_bool_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, + tree bitpos, tree bitsize, enum tree_code code) +{ + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); + + return gimplify_build2 (gsi, code, boolean_type_node, a, b); +} + /* Expand vector addition to scalars. This does bit twiddling in order to increase parallelism: @@ -350,9 +377,31 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, tree op1, enum tree_code code) { tree t; - if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0))) - t = expand_vector_piecewise (gsi, do_compare, type, - TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); + if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type) + && !expand_vec_cond_expr_p (type, TREE_TYPE (op0))) + { + if (VECTOR_MODE_P (TYPE_MODE (type))) + { + tree inner_type = TREE_TYPE (TREE_TYPE (op0)); + tree elem_type = build_nonstandard_integer_type + (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0); + tree int_vec_type = build_vector_type (elem_type, + TYPE_VECTOR_SUBPARTS (type)); + tree vec = expand_vector_piecewise (gsi, do_compare, int_vec_type, + TREE_TYPE (TREE_TYPE (op0)), + op0, op1, code); + gimple stmt; + + return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, vec); + t = make_ssa_name (type); + stmt = gimple_build_assign (t, build1 (VIEW_CONVERT_EXPR, type, vec)); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + } + else + t = expand_vector_piecewise (gsi, do_bool_compare, type, + TREE_TYPE (TREE_TYPE (op0)), + op0, op1, code); + } else t = NULL_TREE; @@ -625,11 +674,12 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0, if (addend == NULL_TREE && expand_vec_cond_expr_p (type, type)) { - tree zero, cst, cond; + tree zero, cst, cond, mask_type; gimple stmt; + mask_type = build_same_sized_truth_vector_type (type); zero = build_zero_cst (type); - cond = build2 (LT_EXPR, type, op0, zero); + cond = build2 (LT_EXPR, mask_type, op0, zero); for (i = 0; i < nunits; i++) vec[i] = build_int_cst (TREE_TYPE (type), ((unsigned HOST_WIDE_INT) 1 @@ -1506,6 +1556,12 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi) if (TREE_CODE (type) != VECTOR_TYPE) return; + /* A scalar operation pretending to be a vector one. */ + if (VECTOR_MASK_TYPE_P (type) + && !VECTOR_MODE_P (TYPE_MODE (type)) + && TYPE_MODE (type) != BLKmode) + return; + if (CONVERT_EXPR_CODE_P (code) || code == FLOAT_EXPR || code == FIX_TRUNC_EXPR
diff --git a/gcc/expr.c b/gcc/expr.c index 1e820b4..6ae0c4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode) if (TREE_CODE (ops->type) == VECTOR_TYPE) { tree ifexp = build2 (ops->code, ops->type, arg0, arg1); - tree if_true = constant_boolean_node (true, ops->type); - tree if_false = constant_boolean_node (false, ops->type); - return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target); + if (VECTOR_MASK_TYPE_P (ops->type)) + return expand_vec_cmp_expr (ops->type, ifexp, target); + else + { + tree if_true = constant_boolean_node (true, ops->type); + tree if_false = constant_boolean_node (false, ops->type); + return expand_vec_cond_expr (ops->type, ifexp, if_true, + if_false, target); + } } /* Get the rtx comparison code to use. We know that EXP is a comparison @@ -11289,6 +11295,39 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range, return 1; } +/* Return a CONST_VECTOR rtx representing vector mask for + a VECTOR_CST of booleans. */ +static rtx +const_vector_mask_from_tree (tree exp) +{ + rtvec v; + unsigned i; + int units; + tree elt; + machine_mode inner, mode; + + mode = TYPE_MODE (TREE_TYPE (exp)); + units = GET_MODE_NUNITS (mode); + inner = GET_MODE_INNER (mode); + + v = rtvec_alloc (units); + + for (i = 0; i < VECTOR_CST_NELTS (exp); ++i) + { + elt = VECTOR_CST_ELT (exp, i); + + gcc_assert (TREE_CODE (elt) == INTEGER_CST); + if (integer_zerop (elt)) + RTVEC_ELT (v, i) = CONST0_RTX (inner); + else if (integer_onep (elt)) + RTVEC_ELT (v, i) = CONSTM1_RTX (inner); + else + gcc_unreachable (); + } + + return gen_rtx_CONST_VECTOR (mode, v); +} + /* Return a CONST_VECTOR rtx for a VECTOR_CST tree. */ static rtx const_vector_from_tree (tree exp) @@ -11304,6 +11343,9 @@ const_vector_from_tree (tree exp) if (initializer_zerop (exp)) return CONST0_RTX (mode); + if (VECTOR_MASK_TYPE_P (TREE_TYPE (exp))) + return const_vector_mask_from_tree (exp); + units = GET_MODE_NUNITS (mode); inner = GET_MODE_INNER (mode); diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index e785946..4ca0a40 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) create_output_operand (&ops[0], target, TYPE_MODE (type)); create_fixed_operand (&ops[1], mem); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))), + 3, ops); } static void @@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt) create_fixed_operand (&ops[0], mem); create_input_operand (&ops[1], reg, TYPE_MODE (type)); create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt))); - expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops); + expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type), + TYPE_MODE (TREE_TYPE (maskt))), + 3, ops); } static void diff --git a/gcc/optabs.c b/gcc/optabs.c index e533e6e..fd9932f 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp) } /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or - unsigned operators. Do not generate compare instruction. */ + unsigned operators. OPNO holds an index of the first comparison + operand in insn with code ICODE. Do not generate compare instruction. */ static rtx vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, - bool unsignedp, enum insn_code icode) + bool unsignedp, enum insn_code icode, + unsigned int opno) { struct expand_operand ops[2]; rtx rtx_op0, rtx_op1; @@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1, create_input_operand (&ops[0], rtx_op0, m0); create_input_operand (&ops[1], rtx_op1, m1); - if (!maybe_legitimize_operands (icode, 4, 2, ops)) + if (!maybe_legitimize_operands (icode, opno, 2, ops)) gcc_unreachable (); return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value); } @@ -6843,16 +6845,25 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, op0a = TREE_OPERAND (op0, 0); op0b = TREE_OPERAND (op0, 1); tcode = TREE_CODE (op0); + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); } else { + gcc_assert (VECTOR_MASK_TYPE_P (TREE_TYPE (op0))); + if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT) + { + /* This is a vcond with mask. To be supported soon... */ + gcc_unreachable (); + } /* Fake op0 < 0. */ - gcc_assert (!TYPE_UNSIGNED (TREE_TYPE (op0))); - op0a = op0; - op0b = build_zero_cst (TREE_TYPE (op0)); - tcode = LT_EXPR; + else + { + op0a = op0; + op0b = build_zero_cst (TREE_TYPE (op0)); + tcode = LT_EXPR; + unsignedp = false; + } } - unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a)); @@ -6863,7 +6874,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, if (icode == CODE_FOR_nothing) return 0; - comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode); + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4); rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); @@ -6877,6 +6888,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2, return ops[0].value; } +/* Return insn code for a comparison operator with VMODE + resultin MASK_MODE, unsigned if UNS is true. */ + +static inline enum insn_code +get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns) +{ + optab tab = uns ? vec_cmpu_optab : vec_cmp_optab; + return convert_optab_handler (tab, vmode, mask_mode); +} + +/* Return TRUE if appropriate vector insn is available + for vector comparison expr with vector type VALUE_TYPE + and resulting mask with MASK_TYPE. */ + +bool +expand_vec_cmp_expr_p (tree value_type, tree mask_type) +{ + enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type), + TYPE_MODE (mask_type), + TYPE_UNSIGNED (value_type)); + return (icode != CODE_FOR_nothing); +} + +/* Generate insns for a vector comparison into a mask. */ + +rtx +expand_vec_cmp_expr (tree type, tree exp, rtx target) +{ + struct expand_operand ops[4]; + enum insn_code icode; + rtx comparison; + machine_mode mask_mode = TYPE_MODE (type); + machine_mode vmode; + bool unsignedp; + tree op0a, op0b; + enum tree_code tcode; + + op0a = TREE_OPERAND (exp, 0); + op0b = TREE_OPERAND (exp, 1); + tcode = TREE_CODE (exp); + + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a)); + vmode = TYPE_MODE (TREE_TYPE (op0a)); + + icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp); + if (icode == CODE_FOR_nothing) + return 0; + + comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2); + create_output_operand (&ops[0], target, mask_mode); + create_fixed_operand (&ops[1], comparison); + create_fixed_operand (&ops[2], XEXP (comparison, 0)); + create_fixed_operand (&ops[3], XEXP (comparison, 1)); + expand_insn (icode, 4, ops); + return ops[0].value; +} + /* Return non-zero if a highpart multiply is supported of can be synthisized. For the benefit of expand_mult_highpart, the return value is 1 for direct, 2 for even/odd widening, and 3 for hi/lo widening. */ @@ -7002,26 +7070,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1, /* Return true if target supports vector masked load/store for mode. */ bool -can_vec_mask_load_store_p (machine_mode mode, bool is_load) +can_vec_mask_load_store_p (machine_mode mode, + machine_mode mask_mode, + bool is_load) { optab op = is_load ? maskload_optab : maskstore_optab; - machine_mode vmode; unsigned int vector_sizes; /* If mode is vector mode, check it directly. */ if (VECTOR_MODE_P (mode)) - return optab_handler (op, mode) != CODE_FOR_nothing; + return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing; /* Otherwise, return true if there is some vector mode with the mask load/store supported. */ /* See if there is any chance the mask load or store might be vectorized. If not, punt. */ - vmode = targetm.vectorize.preferred_simd_mode (mode); - if (!VECTOR_MODE_P (vmode)) + mode = targetm.vectorize.preferred_simd_mode (mode); + if (!VECTOR_MODE_P (mode)) + return false; + + mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode), + GET_MODE_SIZE (mode)); + if (mask_mode == VOIDmode) return false; - if (optab_handler (op, vmode) != CODE_FOR_nothing) + if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) return true; vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); @@ -7031,9 +7105,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load) vector_sizes &= ~cur; if (cur <= GET_MODE_SIZE (mode)) continue; - vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); - if (VECTOR_MODE_P (vmode) - && optab_handler (op, vmode) != CODE_FOR_nothing) + mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode)); + mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode), + cur); + if (VECTOR_MODE_P (mode) + && mask_mode != VOIDmode + && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing) return true; } return false; diff --git a/gcc/optabs.def b/gcc/optabs.def index 888b21c..9804378 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") OPTAB_CD(vcond_optab, "vcond$a$b") OPTAB_CD(vcondu_optab, "vcondu$a$b") +OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b") +OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b") +OPTAB_CD(maskload_optab, "maskload$a$b") +OPTAB_CD(maskstore_optab, "maskstore$a$b") OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc) OPTAB_NX(add_optab, "add$F$a3") @@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a") OPTAB_D (usum_widen_optab, "widen_usum$I$a3") OPTAB_D (usad_optab, "usad$I$a") OPTAB_D (ssad_optab, "ssad$I$a") -OPTAB_D (maskload_optab, "maskload$a") -OPTAB_D (maskstore_optab, "maskstore$a") OPTAB_D (vec_extract_optab, "vec_extract$a") OPTAB_D (vec_init_optab, "vec_init$a") OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a") diff --git a/gcc/optabs.h b/gcc/optabs.h index 95f5cbc..dfe9ebf 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *); extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx); /* Return tree if target supports vector operations for COND_EXPR. */ +bool expand_vec_cmp_expr_p (tree, tree); + +/* Generate code for VEC_COND_EXPR. */ +extern rtx expand_vec_cmp_expr (tree, tree, rtx); + +/* Return true if target supports vector comparison. */ bool expand_vec_cond_expr_p (tree, tree); /* Generate code for VEC_COND_EXPR. */ @@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool); extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool); /* Return true if target supports vector masked load/store for mode. */ -extern bool can_vec_mask_load_store_p (machine_mode, bool); +extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool); /* Return true if there is an inline compare and swap pattern. */ extern bool can_compare_and_swap_p (machine_mode, bool); diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 291e602..d66517d 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt) || VECTOR_MODE_P (mode)) return false; - if (can_vec_mask_load_store_p (mode, is_load)) + if (can_vec_mask_load_store_p (mode, VOIDmode, is_load)) return true; return false; @@ -2068,7 +2068,7 @@ predicate_mem_writes (loop_p loop) { tree lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); - tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask; + tree ref, addr, ptr, masktype, mask; gimple new_stmt; int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs))); ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; @@ -2082,15 +2082,47 @@ predicate_mem_writes (loop_p loop) mask = vect_masks[index]; else { - masktype = build_nonstandard_integer_type (bitsize, 1); - mask_op0 = build_int_cst (masktype, swap ? 0 : -1); - mask_op1 = build_int_cst (masktype, swap ? -1 : 0); - cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), - is_gimple_condexpr, - NULL_TREE, - true, GSI_SAME_STMT); - mask = fold_build_cond_expr (masktype, unshare_expr (cond), - mask_op0, mask_op1); + masktype = boolean_type_node; + if ((TREE_CODE (cond) == NE_EXPR + || TREE_CODE (cond) == EQ_EXPR) + && (integer_zerop (TREE_OPERAND (cond, 1)) + || integer_onep (TREE_OPERAND (cond, 1))) + && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0))) + == BOOLEAN_TYPE) + { + bool negate = (TREE_CODE (cond) == EQ_EXPR); + if (integer_onep (TREE_OPERAND (cond, 1))) + negate = !negate; + if (swap) + negate = !negate; + mask = TREE_OPERAND (cond, 0); + if (negate) + { + mask = ifc_temp_var (masktype, unshare_expr (cond), + &gsi); + mask = build1 (TRUTH_NOT_EXPR, masktype, mask); + } + } + else if (swap && + TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison) + { + tree op_type = TREE_TYPE (TREE_OPERAND (cond, 0)); + tree_code code + = invert_tree_comparison (TREE_CODE (cond), + HONOR_NANS (op_type)); + if (code != ERROR_MARK) + mask = build2 (code, TREE_TYPE (cond), + TREE_OPERAND (cond, 0), + TREE_OPERAND (cond, 1)); + else + { + mask = ifc_temp_var (masktype, unshare_expr (cond), + &gsi); + mask = build1 (TRUTH_NOT_EXPR, masktype, mask); + } + } + else + mask = unshare_expr (cond); mask = ifc_temp_var (masktype, mask, &gsi); /* Save mask and its size for further use. */ vect_sizes.safe_push (bitsize); diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index f1eaef4..0a39825 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name) case vect_scalar_var: prefix = "stmp"; break; + case vect_mask_var: + prefix = "mask"; + break; case vect_pointer_var: prefix = "vectp"; break; @@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype) tree type; enum vect_var_kind kind; - kind = vectype ? vect_simple_var : vect_scalar_var; + kind = vectype + ? VECTOR_MASK_TYPE_P (vectype) + ? vect_mask_var + : vect_simple_var + : vect_scalar_var; type = vectype ? vectype : TREE_TYPE (scalar_dest); gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 59c75af..1810f78 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - int nbbs = loop->num_nodes; + unsigned nbbs = loop->num_nodes; unsigned int vectorization_factor = 0; tree scalar_type; gphi *phi; tree vectype; unsigned int nunits; stmt_vec_info stmt_info; - int i; + unsigned i; HOST_WIDE_INT dummy; gimple stmt, pattern_stmt = NULL; gimple_seq pattern_def_seq = NULL; gimple_stmt_iterator pattern_def_si = gsi_none (); bool analyze_pattern_stmt = false; + bool bool_result; + auto_vec<stmt_vec_info> mask_producers; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, @@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) return false; } + bool_result = false; + if (STMT_VINFO_VECTYPE (stmt_info)) { /* The only case when a vectype had been already set is for stmts @@ -444,6 +448,32 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3)); else scalar_type = TREE_TYPE (gimple_get_lhs (stmt)); + + /* Bool ops don't participate in vectorization factor + computation. For comparison use compared types to + compute a factor. */ + if (TREE_CODE (scalar_type) == BOOLEAN_TYPE) + { + mask_producers.safe_push (stmt_info); + bool_result = true; + + if (gimple_code (stmt) == GIMPLE_ASSIGN + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) + == tcc_comparison + && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt))) + != BOOLEAN_TYPE) + scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); + else + { + if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si)) + { + pattern_def_seq = NULL; + gsi_next (&si); + } + continue; + } + } + if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, @@ -466,7 +496,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) return false; } - STMT_VINFO_VECTYPE (stmt_info) = vectype; + if (!bool_result) + STMT_VINFO_VECTYPE (stmt_info) = vectype; if (dump_enabled_p ()) { @@ -479,8 +510,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) /* The vectorization factor is according to the smallest scalar type (or the largest vector size, but we only support one vector size per loop). */ - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, - &dummy); + if (!bool_result) + scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, + &dummy); if (dump_enabled_p ()) { dump_printf_loc (MSG_NOTE, vect_location, @@ -555,6 +587,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) } LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor; + for (i = 0; i < mask_producers.length (); i++) + { + tree mask_type = NULL; + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]); + + stmt = STMT_VINFO_STMT (mask_producers[i]); + + if (gimple_code (stmt) == GIMPLE_ASSIGN + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison + && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt))) != BOOLEAN_TYPE) + { + scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt)); + mask_type = get_mask_type_for_scalar_type (scalar_type); + + if (!mask_type) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: unsupported mask\n"); + return false; + } + } + else + { + tree rhs, def; + ssa_op_iter iter; + gimple def_stmt; + enum vect_def_type dt; + + FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE) + { + if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo, + &def_stmt, &def, &dt, &vectype)) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: can't compute mask type " + "for statement, "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, + 0); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + + /* No vectype probably means external definition. + Allow it in case there is another operand which + allows to determine mask type. */ + if (!vectype) + continue; + + if (!mask_type) + mask_type = vectype; + else if (TYPE_VECTOR_SUBPARTS (mask_type) + != TYPE_VECTOR_SUBPARTS (vectype)) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: different sized masks " + "types in statement, "); + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, + mask_type); + dump_printf (MSG_MISSED_OPTIMIZATION, " and "); + dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, + vectype); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + } + } + + /* No mask_type should mean loop invariant predicate. + This is probably a subject for optimization in + if-conversion. */ + if (!mask_type) + { + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "not vectorized: can't compute mask type " + "for statement, "); + dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, + 0); + dump_printf (MSG_MISSED_OPTIMIZATION, "\n"); + } + return false; + } + + STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type; + } + return true; } diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index f87c066..f3887be 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1316,27 +1316,61 @@ vect_init_vector_1 (gimple stmt, gimple new_stmt, gimple_stmt_iterator *gsi) tree vect_init_vector (gimple stmt, tree val, tree type, gimple_stmt_iterator *gsi) { + tree val_type = TREE_TYPE (val); + machine_mode mode = TYPE_MODE (type); + machine_mode val_mode = TYPE_MODE(val_type); tree new_var; gimple init_stmt; tree vec_oprnd; tree new_temp; if (TREE_CODE (type) == VECTOR_TYPE - && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE) - { - if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val))) + && TREE_CODE (val_type) != VECTOR_TYPE) + { + /* Handle vector of bool represented as a vector of + integers here rather than on expand because it is + a default mask type for targets. Vector mask is + built in a following way: + + tmp = (int)val + vec_tmp = {tmp, ..., tmp} + vec_cst = VIEW_CONVERT_EXPR<vector(N) _Bool>(vec_tmp); */ + if (TREE_CODE (val_type) == BOOLEAN_TYPE + && VECTOR_MODE_P (mode) + && SCALAR_INT_MODE_P (GET_MODE_INNER (mode)) + && GET_MODE_INNER (mode) != val_mode) { - if (CONSTANT_CLASS_P (val)) - val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); - else + unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode)); + tree stype = build_nonstandard_integer_type (size, 1); + tree vectype = get_vectype_for_scalar_type (stype); + + new_temp = make_ssa_name (stype); + init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val); + vect_init_vector_1 (stmt, init_stmt, gsi); + + val = make_ssa_name (vectype); + new_temp = build_vector_from_val (vectype, new_temp); + init_stmt = gimple_build_assign (val, new_temp); + vect_init_vector_1 (stmt, init_stmt, gsi); + + val = build1 (VIEW_CONVERT_EXPR, type, val); + } + else + { + if (!types_compatible_p (TREE_TYPE (type), val_type)) { - new_temp = make_ssa_name (TREE_TYPE (type)); - init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val); - vect_init_vector_1 (stmt, init_stmt, gsi); - val = new_temp; + if (CONSTANT_CLASS_P (val)) + val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val); + else + { + new_temp = make_ssa_name (TREE_TYPE (type)); + init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val); + vect_init_vector_1 (stmt, init_stmt, gsi); + val = new_temp; + } } + val = build_vector_from_val (type, val); } - val = build_vector_from_val (type, val); } new_var = vect_get_new_vect_var (type, vect_simple_var, "cst_"); @@ -1368,6 +1402,7 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def) gimple def_stmt; stmt_vec_info def_stmt_info = NULL; stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt); + tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo); unsigned int nunits; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); tree def; @@ -1411,7 +1446,12 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def) /* Case 1: operand is a constant. */ case vect_constant_def: { - vector_type = get_vectype_for_scalar_type (TREE_TYPE (op)); + if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE + && VECTOR_MASK_TYPE_P (stmt_vectype)) + vector_type = stmt_vectype; + else + vector_type = get_vectype_for_scalar_type (TREE_TYPE (op)); + gcc_assert (vector_type); nunits = TYPE_VECTOR_SUBPARTS (vector_type); @@ -1429,7 +1469,11 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def) /* Case 2: operand is defined outside the loop - loop invariant. */ case vect_external_def: { - vector_type = get_vectype_for_scalar_type (TREE_TYPE (def)); + if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE + && VECTOR_MASK_TYPE_P (stmt_vectype)) + vector_type = stmt_vectype; + else + vector_type = get_vectype_for_scalar_type (TREE_TYPE (def)); gcc_assert (vector_type); if (scalar_def) @@ -1758,6 +1802,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree mask_vectype; tree elem_type; gimple new_stmt; tree dummy; @@ -1785,8 +1830,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE; mask = gimple_call_arg (stmt, 2); - if (TYPE_PRECISION (TREE_TYPE (mask)) - != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)))) + + if (TREE_CODE (TREE_TYPE (mask)) != BOOLEAN_TYPE) return false; /* FORNOW. This restriction should be relaxed. */ @@ -1815,6 +1860,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, if (STMT_VINFO_STRIDED_P (stmt_info)) return false; + if (TREE_CODE (mask) != SSA_NAME) + return false; + + if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL, + &def_stmt, &def, &dt, &mask_vectype)) + return false; + + if (!mask_vectype) + mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype)); + + if (!mask_vectype) + return false; + if (STMT_VINFO_GATHER_P (stmt_info)) { gimple def_stmt; @@ -1848,14 +1906,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi, : DR_STEP (dr), size_zero_node) <= 0) return false; else if (!VECTOR_MODE_P (TYPE_MODE (vectype)) - || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store)) - return false; - - if (TREE_CODE (mask) != SSA_NAME) - return false; - - if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL, - &def_stmt, &def, &dt)) + || !can_vec_mask_load_store_p (TYPE_MODE (vectype), + TYPE_MODE (mask_vectype), + !is_store)) return false; if (is_store) @@ -7229,10 +7282,7 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi, && TREE_CODE (else_clause) != FIXED_CST) return false; - unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))); - /* The result of a vector comparison should be signed type. */ - tree cmp_type = build_nonstandard_integer_type (prec, 0); - vec_cmp_type = get_same_sized_vectype (cmp_type, vectype); + vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype); if (vec_cmp_type == NULL_TREE) return false; @@ -7373,6 +7423,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi, return true; } +/* vectorizable_comparison. + + Check if STMT is comparison expression that can be vectorized. + If VEC_STMT is also passed, vectorize the STMT: create a vectorized + comparison, put it in VEC_STMT, and insert it at GSI. + + Return FALSE if not a vectorizable STMT, TRUE otherwise. */ + +bool +vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi, + gimple *vec_stmt, tree reduc_def, + slp_tree slp_node) +{ + tree lhs, rhs1, rhs2; + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree vectype1, vectype2; + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE; + tree vec_compare; + tree new_temp; + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + tree def; + enum vect_def_type dt, dts[4]; + unsigned nunits; + int ncopies; + enum tree_code code; + stmt_vec_info prev_stmt_info = NULL; + int i, j; + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); + vec<tree> vec_oprnds0 = vNULL; + vec<tree> vec_oprnds1 = vNULL; + tree mask_type; + tree mask; + + if (!VECTOR_MASK_TYPE_P (vectype)) + return false; + + mask_type = vectype; + nunits = TYPE_VECTOR_SUBPARTS (vectype); + + if (slp_node || PURE_SLP_STMT (stmt_info)) + ncopies = 1; + else + ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies >= 1); + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) + return false; + + if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def + && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle + && reduc_def)) + return false; + + if (STMT_VINFO_LIVE_P (stmt_info)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "value used after loop.\n"); + return false; + } + + if (!is_gimple_assign (stmt)) + return false; + + code = gimple_assign_rhs_code (stmt); + + if (TREE_CODE_CLASS (code) != tcc_comparison) + return false; + + rhs1 = gimple_assign_rhs1 (stmt); + rhs2 = gimple_assign_rhs2 (stmt); + + if (TREE_CODE (rhs1) == SSA_NAME) + { + gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1); + if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo, + &rhs1_def_stmt, &def, &dt, &vectype1)) + return false; + } + else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST + && TREE_CODE (rhs1) != FIXED_CST) + return false; + + if (TREE_CODE (rhs2) == SSA_NAME) + { + gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2); + if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo, + &rhs2_def_stmt, &def, &dt, &vectype2)) + return false; + } + else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST + && TREE_CODE (rhs2) != FIXED_CST) + return false; + + vectype = vectype1 ? vectype1 : vectype2; + + if (!vectype + || nunits != TYPE_VECTOR_SUBPARTS (vectype)) + return false; + + if (!vec_stmt) + { + STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type; + return expand_vec_cmp_expr_p (vectype, mask_type); + } + + /* Transform. */ + if (!slp_node) + { + vec_oprnds0.create (1); + vec_oprnds1.create (1); + } + + /* Handle def. */ + lhs = gimple_assign_lhs (stmt); + mask = vect_create_destination_var (lhs, mask_type); + + /* Handle cmp expr. */ + for (j = 0; j < ncopies; j++) + { + gassign *new_stmt = NULL; + if (j == 0) + { + if (slp_node) + { + auto_vec<tree, 2> ops; + auto_vec<vec<tree>, 2> vec_defs; + + ops.safe_push (rhs1); + ops.safe_push (rhs2); + vect_get_slp_defs (ops, slp_node, &vec_defs, -1); + vec_oprnds1 = vec_defs.pop (); + vec_oprnds0 = vec_defs.pop (); + + ops.release (); + vec_defs.release (); + } + else + { + gimple gtemp; + vec_rhs1 + = vect_get_vec_def_for_operand (rhs1, stmt, NULL); + vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL, + >emp, &def, &dts[0]); + vec_rhs2 = + vect_get_vec_def_for_operand (rhs2, stmt, NULL); + vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL, + >emp, &def, &dts[1]); + } + } + else + { + vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0], + vec_oprnds0.pop ()); + vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1], + vec_oprnds1.pop ()); + } + + if (!slp_node) + { + vec_oprnds0.quick_push (vec_rhs1); + vec_oprnds1.quick_push (vec_rhs2); + } + + /* Arguments are ready. Create the new vector stmt. */ + FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1) + { + vec_rhs2 = vec_oprnds1[i]; + + vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2); + new_stmt = gimple_build_assign (mask, vec_compare); + new_temp = make_ssa_name (mask, new_stmt); + gimple_assign_set_lhs (new_stmt, new_temp); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (slp_node) + SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); + } + + if (slp_node) + continue; + + if (j == 0) + STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt; + else + STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt; + + prev_stmt_info = vinfo_for_stmt (new_stmt); + } + + vec_oprnds0.release (); + vec_oprnds1.release (); + + return true; +} /* Make sure the statement is vectorizable. */ @@ -7576,7 +7821,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node) || vectorizable_call (stmt, NULL, NULL, node) || vectorizable_store (stmt, NULL, NULL, node) || vectorizable_reduction (stmt, NULL, NULL, node) - || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)); + || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node) + || vectorizable_comparison (stmt, NULL, NULL, NULL, node)); else { if (bb_vinfo) @@ -7588,7 +7834,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node) || vectorizable_load (stmt, NULL, NULL, node, NULL) || vectorizable_call (stmt, NULL, NULL, node) || vectorizable_store (stmt, NULL, NULL, node) - || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)); + || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node) + || vectorizable_comparison (stmt, NULL, NULL, NULL, node)); } if (!ok) @@ -7704,6 +7951,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi, gcc_assert (done); break; + case comparison_vec_info_type: + done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node); + gcc_assert (done); + break; + case call_vec_info_type: done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node); stmt = gsi_stmt (*gsi); @@ -8038,6 +8290,23 @@ get_vectype_for_scalar_type (tree scalar_type) return vectype; } +/* Function get_mask_type_for_scalar_type. + + Returns the mask type corresponding to a result of comparison + of vectors of specified SCALAR_TYPE as supported by target. */ + +tree +get_mask_type_for_scalar_type (tree scalar_type) +{ + tree vectype = get_vectype_for_scalar_type (scalar_type); + + if (!vectype) + return NULL; + + return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), + current_vector_size); +} + /* Function get_same_sized_vectype Returns a vector type corresponding to SCALAR_TYPE of size diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 58e8f10..94aea1a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -28,7 +28,8 @@ along with GCC; see the file COPYING3. If not see enum vect_var_kind { vect_simple_var, vect_pointer_var, - vect_scalar_var + vect_scalar_var, + vect_mask_var }; /* Defines type of operation. */ @@ -482,6 +483,7 @@ enum stmt_vec_info_type { call_simd_clone_vec_info_type, assignment_vec_info_type, condition_vec_info_type, + comparison_vec_info_type, reduc_vec_info_type, induc_vec_info_type, type_promotion_vec_info_type, @@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info); /* In tree-vect-stmts.c. */ extern unsigned int current_vector_size; extern tree get_vectype_for_scalar_type (tree); +extern tree get_mask_type_for_scalar_type (tree); extern tree get_same_sized_vectype (tree, tree); extern bool vect_is_simple_use (tree, gimple, loop_vec_info, bb_vec_info, gimple *,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 758ca38..cffacaa 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -2957,7 +2957,7 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) default: if (TREE_CODE_CLASS (rhs_code) == tcc_comparison) { - tree vecitype, comp_vectype; + tree vecitype, comp_vectype, mask_type; /* If the comparison can throw, then is_gimple_condexpr will be false and we can't make a COND_EXPR/VEC_COND_EXPR out of it. */ @@ -2968,6 +2968,11 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) if (comp_vectype == NULL_TREE) return false; + mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1)); + if (mask_type + && expand_vec_cmp_expr_p (comp_vectype, mask_type)) + return false; + if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE) { machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1)); @@ -3192,6 +3197,75 @@ adjust_bool_pattern (tree var, tree out_type, tree trueval, } +/* Try to determine a proper type for converting bool VAR + into an integer value. The type is chosen so that + conversion has the same number of elements as a mask + producer. */ + +static tree +search_type_for_mask (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) +{ + gimple def_stmt; + enum vect_def_type dt; + tree def, rhs1; + enum tree_code rhs_code; + tree res = NULL; + + if (TREE_CODE (var) != SSA_NAME) + return NULL; + + if ((TYPE_PRECISION (TREE_TYPE (var)) != 1 + || !TYPE_UNSIGNED (TREE_TYPE (var))) + && TREE_CODE (TREE_TYPE (var)) != BOOLEAN_TYPE) + return NULL; + + if (!vect_is_simple_use (var, NULL, loop_vinfo, bb_vinfo, &def_stmt, &def, + &dt)) + return NULL; + + if (dt != vect_internal_def) + return NULL; + + if (!is_gimple_assign (def_stmt)) + return NULL; + + rhs_code = gimple_assign_rhs_code (def_stmt); + rhs1 = gimple_assign_rhs1 (def_stmt); + + switch (rhs_code) + { + case SSA_NAME: + case BIT_NOT_EXPR: + CASE_CONVERT: + res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo); + break; + + case BIT_AND_EXPR: + case BIT_IOR_EXPR: + case BIT_XOR_EXPR: + if (!(res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo))) + res = search_type_for_mask (gimple_assign_rhs2 (def_stmt), + loop_vinfo, bb_vinfo); + break; + + default: + if (TREE_CODE_CLASS (rhs_code) == tcc_comparison) + { + if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE + || !TYPE_UNSIGNED (TREE_TYPE (rhs1))) + { + machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1)); + res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1); + } + else + res = TREE_TYPE (rhs1); + } + } + + return res; +} + + /* Function vect_recog_bool_pattern Try to find pattern like following: @@ -3249,6 +3323,7 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in, enum tree_code rhs_code; tree var, lhs, rhs, vectype; stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); + stmt_vec_info new_stmt_info; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); gimple pattern_stmt; @@ -3274,16 +3349,43 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in, if (vectype == NULL_TREE) return NULL; - if (!check_bool_pattern (var, loop_vinfo, bb_vinfo)) - return NULL; - - rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts); - lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); - if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) - pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs); + if (check_bool_pattern (var, loop_vinfo, bb_vinfo)) + { + rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts); + lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); + if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) + pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs); + else + pattern_stmt + = gimple_build_assign (lhs, NOP_EXPR, rhs); + } else - pattern_stmt - = gimple_build_assign (lhs, NOP_EXPR, rhs); + { + tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo); + tree cst0, cst1; + + if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (lhs))) + type = TREE_TYPE (lhs); + cst0 = build_int_cst (type, 0); + cst1 = build_int_cst (type, 1); + lhs = vect_recog_temp_ssa_var (type, NULL); + pattern_stmt = gimple_build_assign (lhs, COND_EXPR, var, cst0, cst1); + + if (!useless_type_conversion_p (type, TREE_TYPE (lhs))) + { + tree new_vectype = get_vectype_for_scalar_type (type); + new_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, + bb_vinfo); + set_vinfo_for_stmt (pattern_stmt, new_stmt_info); + STMT_VINFO_VECTYPE (new_stmt_info) = new_vectype; + new_pattern_def_seq (stmt_vinfo, pattern_stmt); + + rhs = lhs; + lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); + pattern_stmt = gimple_build_assign (lhs, CONVERT_EXPR, rhs); + } + } + *type_out = vectype; *type_in = vectype; stmts->safe_push (last_stmt); @@ -3312,10 +3414,11 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in, if (get_vectype_for_scalar_type (type) == NULL_TREE) return NULL; - if (!check_bool_pattern (var, loop_vinfo, bb_vinfo)) - return NULL; + if (check_bool_pattern (var, loop_vinfo, bb_vinfo)) + rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts); + else + rhs = var; - rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts); lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); pattern_stmt = gimple_build_assign (lhs, COND_EXPR, @@ -3340,16 +3443,38 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in, gcc_assert (vectype != NULL_TREE); if (!VECTOR_MODE_P (TYPE_MODE (vectype))) return NULL; - if (!check_bool_pattern (var, loop_vinfo, bb_vinfo)) - return NULL; - rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts); + if (check_bool_pattern (var, loop_vinfo, bb_vinfo)) + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), + NULL_TREE, stmts); + else + { + tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo); + tree cst0, cst1, new_vectype; + + if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (vectype))) + type = TREE_TYPE (vectype); + + cst0 = build_int_cst (type, 0); + cst1 = build_int_cst (type, 1); + new_vectype = get_vectype_for_scalar_type (type); + + rhs = vect_recog_temp_ssa_var (type, NULL); + pattern_stmt = gimple_build_assign (rhs, COND_EXPR, var, cst0, cst1); + + pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, + bb_vinfo); + set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info); + STMT_VINFO_VECTYPE (pattern_stmt_info) = new_vectype; + append_pattern_def_seq (stmt_vinfo, pattern_stmt); + } + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) { tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); gimple cast_stmt = gimple_build_assign (rhs2, NOP_EXPR, rhs); - new_pattern_def_seq (stmt_vinfo, cast_stmt); + append_pattern_def_seq (stmt_vinfo, cast_stmt); rhs = rhs2; } pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 6a17ef4..e22aa57 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]); extern bool ix86_expand_int_vcond (rtx[]); extern void ix86_expand_vec_perm (rtx[]); extern bool ix86_expand_vec_perm_const (rtx[]); +extern bool ix86_expand_mask_vec_cmp (rtx[]); +extern bool ix86_expand_int_vec_cmp (rtx[]); +extern bool ix86_expand_fp_vec_cmp (rtx[]); extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool); extern bool ix86_expand_int_addcc (rtx[]); extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 070605f..d17c350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1, cmp_op1 = force_reg (cmp_ops_mode, cmp_op1); if (optimize - || reg_overlap_mentioned_p (dest, op_true) - || reg_overlap_mentioned_p (dest, op_false)) + || (op_true && reg_overlap_mentioned_p (dest, op_true)) + || (op_false && reg_overlap_mentioned_p (dest, op_false))) dest = gen_reg_rtx (maskcmp ? cmp_mode : mode); /* Compare patterns for int modes are unspec in AVX512F only. */ @@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[]) return true; } -/* Expand a floating-point vector conditional move; a vcond operation - rather than a movcc operation. */ +/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes. */ + +static int +ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code) +{ + switch (code) + { + case EQ: + return 0; + case LT: + case LTU: + return 1; + case LE: + case LEU: + return 2; + case NE: + return 4; + case GE: + case GEU: + return 5; + case GT: + case GTU: + return 6; + default: + gcc_unreachable (); + } +} + +/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes. */ + +static int +ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code) +{ + switch (code) + { + case EQ: + return 0x08; + case NE: + return 0x04; + case GT: + return 0x16; + case LE: + return 0x1a; + case GE: + return 0x15; + case LT: + return 0x19; + default: + gcc_unreachable (); + } +} + +/* Return immediate value to be used in UNSPEC_PCMP + for comparison CODE in MODE. */ + +static int +ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode) +{ + if (FLOAT_MODE_P (mode)) + return ix86_fp_cmp_code_to_pcmp_immediate (code); + return ix86_int_cmp_code_to_pcmp_immediate (code); +} + +/* Expand AVX-512 vector comparison. */ bool -ix86_expand_fp_vcond (rtx operands[]) +ix86_expand_mask_vec_cmp (rtx operands[]) { - enum rtx_code code = GET_CODE (operands[3]); + machine_mode mask_mode = GET_MODE (operands[0]); + machine_mode cmp_mode = GET_MODE (operands[2]); + enum rtx_code code = GET_CODE (operands[1]); + rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode)); + int unspec_code; + rtx unspec; + + switch (code) + { + case LEU: + case GTU: + case GEU: + case LTU: + unspec_code = UNSPEC_UNSIGNED_PCMP; + default: + unspec_code = UNSPEC_PCMP; + } + + unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2], + operands[3], imm), + unspec_code); + emit_insn (gen_rtx_SET (operands[0], unspec)); + + return true; +} + +/* Expand fp vector comparison. */ + +bool +ix86_expand_fp_vec_cmp (rtx operands[]) +{ + enum rtx_code code = GET_CODE (operands[1]); rtx cmp; code = ix86_prepare_sse_fp_compare_args (operands[0], code, - &operands[4], &operands[5]); + &operands[2], &operands[3]); if (code == UNKNOWN) { rtx temp; - switch (GET_CODE (operands[3])) + switch (GET_CODE (operands[1])) { case LTGT: - temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4], - operands[5], operands[0], operands[0]); - cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4], - operands[5], operands[1], operands[2]); + temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2], + operands[3], NULL, NULL); + cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2], + operands[3], NULL, NULL); code = AND; break; case UNEQ: - temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4], - operands[5], operands[0], operands[0]); - cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4], - operands[5], operands[1], operands[2]); + temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2], + operands[3], NULL, NULL); + cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2], + operands[3], NULL, NULL); code = IOR; break; default: @@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[]) } cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1, OPTAB_DIRECT); - ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); - return true; } + else + cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3], + operands[1], operands[2]); - if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4], - operands[5], operands[1], operands[2])) - return true; + if (operands[0] != cmp) + emit_move_insn (operands[0], cmp); - cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], - operands[1], operands[2]); - ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); return true; } -/* Expand a signed/unsigned integral vector conditional move. */ - -bool -ix86_expand_int_vcond (rtx operands[]) +static rtx +ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1, + rtx op_true, rtx op_false, bool *negate) { - machine_mode data_mode = GET_MODE (operands[0]); - machine_mode mode = GET_MODE (operands[4]); - enum rtx_code code = GET_CODE (operands[3]); - bool negate = false; - rtx x, cop0, cop1; - - cop0 = operands[4]; - cop1 = operands[5]; + machine_mode data_mode = GET_MODE (dest); + machine_mode mode = GET_MODE (cop0); + rtx x; - /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 - and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ - if ((code == LT || code == GE) - && data_mode == mode - && cop1 == CONST0_RTX (mode) - && operands[1 + (code == LT)] == CONST0_RTX (data_mode) - && GET_MODE_UNIT_SIZE (data_mode) > 1 - && GET_MODE_UNIT_SIZE (data_mode) <= 8 - && (GET_MODE_SIZE (data_mode) == 16 - || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32))) - { - rtx negop = operands[2 - (code == LT)]; - int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1; - if (negop == CONST1_RTX (data_mode)) - { - rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift), - operands[0], 1, OPTAB_DIRECT); - if (res != operands[0]) - emit_move_insn (operands[0], res); - return true; - } - else if (GET_MODE_INNER (data_mode) != DImode - && vector_all_ones_operand (negop, data_mode)) - { - rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift), - operands[0], 0, OPTAB_DIRECT); - if (res != operands[0]) - emit_move_insn (operands[0], res); - return true; - } - } - - if (!nonimmediate_operand (cop1, mode)) - cop1 = force_reg (mode, cop1); - if (!general_operand (operands[1], data_mode)) - operands[1] = force_reg (data_mode, operands[1]); - if (!general_operand (operands[2], data_mode)) - operands[2] = force_reg (data_mode, operands[2]); + *negate = false; /* XOP supports all of the comparisons on all 128-bit vector int types. */ if (TARGET_XOP @@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[]) case LE: case LEU: code = reverse_condition (code); - negate = true; + *negate = true; break; case GE: case GEU: code = reverse_condition (code); - negate = true; + *negate = true; /* FALLTHRU */ case LT: @@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[]) case EQ: /* SSE4.1 supports EQ. */ if (!TARGET_SSE4_1) - return false; + return NULL; break; case GT: case GTU: /* SSE4.2 supports GT/GTU. */ if (!TARGET_SSE4_2) - return false; + return NULL; break; default: @@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[]) case V8HImode: /* Perform a parallel unsigned saturating subtraction. */ x = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1))); + emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, + cop1))); cop0 = x; cop1 = CONST0_RTX (mode); code = EQ; - negate = !negate; + *negate = !*negate; break; default: @@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[]) } } + if (*negate) + std::swap (op_true, op_false); + /* Allow the comparison to be done in one mode, but the movcc to happen in another mode. */ if (data_mode == mode) { - x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, - operands[1+negate], operands[2-negate]); + x = ix86_expand_sse_cmp (dest, code, cop0, cop1, + op_true, op_false); } else { gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode)); x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1, - operands[1+negate], operands[2-negate]); + op_true, op_false); if (GET_MODE (x) == mode) x = gen_lowpart (data_mode, x); } + return x; +} + +/* Expand integer vector comparison. */ + +bool +ix86_expand_int_vec_cmp (rtx operands[]) +{ + rtx_code code = GET_CODE (operands[1]); + bool negate = false; + rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2], + operands[3], NULL, NULL, &negate); + + if (!cmp) + return false; + + if (negate) + cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp, + CONST0_RTX (GET_MODE (cmp)), + NULL, NULL, &negate); + + gcc_assert (!negate); + + if (operands[0] != cmp) + emit_move_insn (operands[0], cmp); + + return true; +} + +/* Expand a floating-point vector conditional move; a vcond operation + rather than a movcc operation. */ + +bool +ix86_expand_fp_vcond (rtx operands[]) +{ + enum rtx_code code = GET_CODE (operands[3]); + rtx cmp; + + code = ix86_prepare_sse_fp_compare_args (operands[0], code, + &operands[4], &operands[5]); + if (code == UNKNOWN) + { + rtx temp; + switch (GET_CODE (operands[3])) + { + case LTGT: + temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4], + operands[5], operands[0], operands[0]); + cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4], + operands[5], operands[1], operands[2]); + code = AND; + break; + case UNEQ: + temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4], + operands[5], operands[0], operands[0]); + cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4], + operands[5], operands[1], operands[2]); + code = IOR; + break; + default: + gcc_unreachable (); + } + cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1, + OPTAB_DIRECT); + ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); + return true; + } + + if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4], + operands[5], operands[1], operands[2])) + return true; + + cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], + operands[1], operands[2]); + ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); + return true; +} + +/* Expand a signed/unsigned integral vector conditional move. */ + +bool +ix86_expand_int_vcond (rtx operands[]) +{ + machine_mode data_mode = GET_MODE (operands[0]); + machine_mode mode = GET_MODE (operands[4]); + enum rtx_code code = GET_CODE (operands[3]); + bool negate = false; + rtx x, cop0, cop1; + + cop0 = operands[4]; + cop1 = operands[5]; + + /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 + and x < 0 ? 1 : 0 into (unsigned) x >> 31. */ + if ((code == LT || code == GE) + && data_mode == mode + && cop1 == CONST0_RTX (mode) + && operands[1 + (code == LT)] == CONST0_RTX (data_mode) + && GET_MODE_UNIT_SIZE (data_mode) > 1 + && GET_MODE_UNIT_SIZE (data_mode) <= 8 + && (GET_MODE_SIZE (data_mode) == 16 + || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32))) + { + rtx negop = operands[2 - (code == LT)]; + int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1; + if (negop == CONST1_RTX (data_mode)) + { + rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift), + operands[0], 1, OPTAB_DIRECT); + if (res != operands[0]) + emit_move_insn (operands[0], res); + return true; + } + else if (GET_MODE_INNER (data_mode) != DImode + && vector_all_ones_operand (negop, data_mode)) + { + rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift), + operands[0], 0, OPTAB_DIRECT); + if (res != operands[0]) + emit_move_insn (operands[0], res); + return true; + } + } + + if (!nonimmediate_operand (cop1, mode)) + cop1 = force_reg (mode, cop1); + if (!general_operand (operands[1], data_mode)) + operands[1] = force_reg (data_mode, operands[1]); + if (!general_operand (operands[2], data_mode)) + operands[2] = force_reg (data_mode, operands[2]); + + x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1, + operands[1], operands[2], &negate); + + if (!x) + return false; + ix86_expand_sse_movcc (operands[0], x, operands[1+negate], operands[2-negate]); return true; @@ -51678,6 +51866,30 @@ ix86_autovectorize_vector_sizes (void) (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0; } +/* Implemenation of targetm.vectorize.get_mask_mode. */ + +static machine_mode +ix86_get_mask_mode (unsigned nunits, unsigned vector_size) +{ + /* Scalar mask case. */ + if (TARGET_AVX512F && vector_size == 64) + { + unsigned elem_size = vector_size / nunits; + if ((vector_size == 64 || TARGET_AVX512VL) + && ((elem_size == 4 || elem_size == 8) + || TARGET_AVX512BW)) + return smallest_mode_for_size (nunits, MODE_INT); + } + + unsigned elem_size = vector_size / nunits; + machine_mode elem_mode + = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); + + gcc_assert (elem_size * nunits == vector_size); + + return mode_for_vector (elem_mode, nunits); +} + /* Return class of registers which could be used for pseudo of MODE @@ -52612,6 +52824,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load, #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ ix86_autovectorize_vector_sizes +#undef TARGET_VECTORIZE_GET_MASK_MODE +#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode #undef TARGET_VECTORIZE_INIT_COST #define TARGET_VECTORIZE_INIT_COST ix86_init_cost #undef TARGET_VECTORIZE_ADD_STMT_COST diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4535570..a8d55cc 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -605,6 +605,15 @@ (V16SF "HI") (V8SF "QI") (V4SF "QI") (V8DF "QI") (V4DF "QI") (V2DF "QI")]) +;; Mapping of vector modes to corresponding mask size +(define_mode_attr avx512fmaskmodelower + [(V64QI "di") (V32QI "si") (V16QI "hi") + (V32HI "si") (V16HI "hi") (V8HI "qi") (V4HI "qi") + (V16SI "hi") (V8SI "qi") (V4SI "qi") + (V8DI "qi") (V4DI "qi") (V2DI "qi") + (V16SF "hi") (V8SF "qi") (V4SF "qi") + (V8DF "qi") (V4DF "qi") (V2DF "qi")]) + ;; Mapping of vector float modes to an integer mode of the same size (define_mode_attr sseintvecmode [(V16SF "V16SI") (V8DF "V8DI") @@ -2803,6 +2812,150 @@ (const_string "0"))) (set_attr "mode" "<MODE>")]) +(define_expand "vec_cmp<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:V48_AVX512VL 2 "register_operand") + (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512F" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI12_AVX512VL 2 "register_operand") + (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512BW" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI_256 2 "register_operand") + (match_operand:VI_256 3 "nonimmediate_operand")]))] + "TARGET_AVX2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI124_128 2 "register_operand") + (match_operand:VI124_128 3 "nonimmediate_operand")]))] + "TARGET_SSE2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpv2div2di" + [(set (match_operand:V2DI 0 "register_operand") + (match_operator:V2DI 1 "" + [(match_operand:V2DI 2 "register_operand") + (match_operand:V2DI 3 "nonimmediate_operand")]))] + "TARGET_SSE4_2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VF_256 2 "register_operand") + (match_operand:VF_256 3 "nonimmediate_operand")]))] + "TARGET_AVX" +{ + bool ok = ix86_expand_fp_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmp<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VF_128 2 "register_operand") + (match_operand:VF_128 3 "nonimmediate_operand")]))] + "TARGET_SSE" +{ + bool ok = ix86_expand_fp_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI48_AVX512VL 2 "register_operand") + (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512F" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><avx512fmaskmodelower>" + [(set (match_operand:<avx512fmaskmode> 0 "register_operand") + (match_operator:<avx512fmaskmode> 1 "" + [(match_operand:VI12_AVX512VL 2 "register_operand") + (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))] + "TARGET_AVX512BW" +{ + bool ok = ix86_expand_mask_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI_256 2 "register_operand") + (match_operand:VI_256 3 "nonimmediate_operand")]))] + "TARGET_AVX2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpu<mode><sseintvecmodelower>" + [(set (match_operand:<sseintvecmode> 0 "register_operand") + (match_operator:<sseintvecmode> 1 "" + [(match_operand:VI124_128 2 "register_operand") + (match_operand:VI124_128 3 "nonimmediate_operand")]))] + "TARGET_SSE2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + +(define_expand "vec_cmpuv2div2di" + [(set (match_operand:V2DI 0 "register_operand") + (match_operator:V2DI 1 "" + [(match_operand:V2DI 2 "register_operand") + (match_operand:V2DI 3 "nonimmediate_operand")]))] + "TARGET_SSE4_2" +{ + bool ok = ix86_expand_int_vec_cmp (operands); + gcc_assert (ok); + DONE; +}) + (define_expand "vcond<V_512:mode><VF_512:mode>" [(set (match_operand:V_512 0 "register_operand") (if_then_else:V_512 @@ -17895,7 +18048,7 @@ (set_attr "btver2_decode" "vector") (set_attr "mode" "<sseinsnmode>")]) -(define_expand "maskload<mode>" +(define_expand "maskload<mode><sseintvecmodelower>" [(set (match_operand:V48_AVX2 0 "register_operand") (unspec:V48_AVX2 [(match_operand:<sseintvecmode> 2 "register_operand") @@ -17903,7 +18056,23 @@ UNSPEC_MASKMOV))] "TARGET_AVX") -(define_expand "maskstore<mode>" +(define_expand "maskload<mode><avx512fmaskmodelower>" + [(set (match_operand:V48_AVX512VL 0 "register_operand") + (vec_merge:V48_AVX512VL + (match_operand:V48_AVX512VL 1 "memory_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512F") + +(define_expand "maskload<mode><avx512fmaskmodelower>" + [(set (match_operand:VI12_AVX512VL 0 "register_operand") + (vec_merge:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 "memory_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512BW") + +(define_expand "maskstore<mode><sseintvecmodelower>" [(set (match_operand:V48_AVX2 0 "memory_operand") (unspec:V48_AVX2 [(match_operand:<sseintvecmode> 2 "register_operand") @@ -17912,6 +18081,22 @@ UNSPEC_MASKMOV))] "TARGET_AVX") +(define_expand "maskstore<mode><avx512fmaskmodelower>" + [(set (match_operand:V48_AVX512VL 0 "memory_operand") + (vec_merge:V48_AVX512VL + (match_operand:V48_AVX512VL 1 "register_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512F") + +(define_expand "maskstore<mode><avx512fmaskmodelower>" + [(set (match_operand:VI12_AVX512VL 0 "memory_operand") + (vec_merge:VI12_AVX512VL + (match_operand:VI12_AVX512VL 1 "register_operand") + (match_dup 0) + (match_operand:<avx512fmaskmode> 2 "register_operand")))] + "TARGET_AVX512BW") + (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>" [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m") (unspec:AVX256MODE2P