On Thu, 5 Sep 2019 at 14:29, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Sorry for the slow reply. > > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > > On Fri, 30 Aug 2019 at 16:15, Richard Biener <richard.guent...@gmail.com> > > wrote: > >> > >> On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford > >> <richard.sandif...@arm.com> wrote: > >> > > >> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > >> > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford > >> > > <richard.sandif...@arm.com> wrote: > >> > >> > >> > >> Richard should have the final say, but some comments... > >> > >> > >> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > >> > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > >> > >> > index 1e2dfe5d22d..862206b3256 100644 > >> > >> > --- a/gcc/tree-vect-stmts.c > >> > >> > +++ b/gcc/tree-vect-stmts.c > >> > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info > >> > >> > loop_vinfo, tree vectype, > >> > >> > > >> > >> > static tree > >> > >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree > >> > >> > vec_mask, > >> > >> > - gimple_stmt_iterator *gsi) > >> > >> > + gimple_stmt_iterator *gsi, tree mask, > >> > >> > + cond_vmask_map_type *cond_to_vec_mask) > >> > >> > >> > >> "scalar_mask" might be a better name. But maybe we should key off the > >> > >> vector mask after all, now that we're relying on the code having no > >> > >> redundancies. > >> > >> > >> > >> Passing the vinfo would be better than passing the cond_vmask_map_type > >> > >> directly. > >> > >> > >> > >> > { > >> > >> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE > >> > >> > (vec_mask))); > >> > >> > if (!loop_mask) > >> > >> > return vec_mask; > >> > >> > > >> > >> > gcc_assert (TREE_TYPE (loop_mask) == mask_type); > >> > >> > + > >> > >> > + tree *slot = 0; > >> > >> > + if (cond_to_vec_mask) > >> > >> > >> > >> The pointer should never be null in this context. > >> > > Disabling check for NULL results in segfault with cond_arith_4.c > >> > > because we > >> > > reach prepare_load_store_mask via vect_schedule_slp, called from > >> > > here in vect_transform_loop: > >> > > /* Schedule the SLP instances first, then handle loop vectorization > >> > > below. */ > >> > > if (!loop_vinfo->slp_instances.is_empty ()) > >> > > { > >> > > DUMP_VECT_SCOPE ("scheduling SLP instances"); > >> > > vect_schedule_slp (loop_vinfo); > >> > > } > >> > > > >> > > which is before bb processing loop. > >> > > >> > We want this optimisation to be applied to SLP too though. Especially > >> > since non-SLP will be going away at some point. > >> > > >> > But as Richard says, the problem with SLP is that the statements aren't > >> > traversed in block order, so I guess we can't do the on-the-fly > >> > redundancy elimination there... > >> > >> And the current patch AFAICS can generate wrong SSA for this reason. > >> > >> > Maybe an alternative would be to record during the analysis phase which > >> > scalar conditions need which loop masks. Statements that need a loop > >> > mask currently do: > >> > > >> > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); > >> > > >> > If we also pass the scalar condition, we can maintain a hash_set of > >> > <condition, ncopies> pairs, representing the conditions that have > >> > loop masks applied at some point in the vectorised code. The COND_EXPR > >> > code can use that set to decide whether to apply the loop mask or not. > >> > >> Yeah, that sounds better. > >> > >> Note that I don't like the extra "helpers" in fold-const.c/h, they do not > >> look > >> useful in general so put them into vectorizer private code. The > >> decomposing > >> also doesn't look too nice, instead prepare_load_store_mask could get > >> such decomposed representation - possibly quite natural with the suggestion > >> from Richard above. > > Hi, > > Thanks for the suggestions, I have an attached updated patch, that > > tries to address above suggestions. > > With patch, we manage to use same predicate for both tests in PR, and > > the redundant AND ops are eliminated > > by fre4. > > > > I have a few doubts: > > 1] I moved tree_cond_ops into tree-vectorizer.[ch], I will get rid of > > it in follow up patch. > > I am not sure what to pass as def of scalar condition (scalar_mask) to > > vect_record_loop_mask > > from vectorizable_store, vectorizable_reduction and > > vectorizable_live_operation ? In the patch, > > I just passed NULL. > > For vectorizable_store this is just "mask", like for vectorizable_load. > Passing NULL looks right for the other two. (Nit, GCC style is to use > NULL rather than 0.) > > > 2] Do changes to vectorizable_condition and > > vectorizable_condition_apply_loop_mask look OK ? > > Some comments below. > > > 3] The patch additionally regresses following tests (apart from fmla_2.c): > > FAIL: gcc.target/aarch64/sve/cond_convert_1.c -march=armv8.2-a+sve > > scan-assembler-not \\tsel\\t > > FAIL: gcc.target/aarch64/sve/cond_convert_4.c -march=armv8.2-a+sve > > scan-assembler-not \\tsel\\t > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > > scan-assembler-not \\tsel\\t > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > > scan-assembler-times \\tmovprfx\\t > > [...] > > For cond_convert_1.c, I think it would be OK to change the test to: > > for (int i = 0; i < n; ++i) \ > { \ > FLOAT_TYPE bi = b[i]; \ > r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi; \ > } \ > > so that only the a[i] load is conditional. Same for the other two. > > I think originally I had to write it this way precisely because > we didn't have the optimisation you're adding, so this is actually > a good sign :-) > > > @@ -8313,7 +8313,7 @@ vect_double_mask_nunits (tree type) > > > > void > > vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks, > > - unsigned int nvectors, tree vectype) > > + unsigned int nvectors, tree vectype, tree scalar_mask) > > { > > gcc_assert (nvectors != 0); > > if (masks->length () < nvectors) > > New parameter needs documentation. > > > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > > index dd9d45a9547..49ea86a0680 100644 > > --- a/gcc/tree-vect-stmts.c > > +++ b/gcc/tree-vect-stmts.c > > @@ -1888,7 +1888,7 @@ static void > > check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, > > vec_load_store_type vls_type, int group_size, > > vect_memory_access_type memory_access_type, > > - gather_scatter_info *gs_info) > > + gather_scatter_info *gs_info, tree scalar_mask) > > { > > /* Invariant loads need no special support. */ > > if (memory_access_type == VMAT_INVARIANT) > > Same here. > > > @@ -9763,6 +9765,29 @@ vect_is_simple_cond (tree cond, vec_info *vinfo, > > return true; > > } > > > > +static void > > +vectorizable_condition_apply_loop_mask (tree &vec_compare, > > + gimple_stmt_iterator *&gsi, > > + stmt_vec_info &stmt_info, > > + tree loop_mask, > > + tree vec_cmp_type) > > Function needs a comment. > > I think it'd be better to return the new mask and not make vec_compare > a reference. stmt_info shouldn't need to be a reference either (it's > just a pointer type). > > > +{ > > + if (COMPARISON_CLASS_P (vec_compare)) > > + { > > + tree tmp = make_ssa_name (vec_cmp_type); > > + gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare), > > + TREE_OPERAND (vec_compare, 0), > > + TREE_OPERAND (vec_compare, 1)); > > + vect_finish_stmt_generation (stmt_info, g, gsi); > > + vec_compare = tmp; > > + } > > + > > + tree tmp2 = make_ssa_name (vec_cmp_type); > > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, > > loop_mask); > > + vect_finish_stmt_generation (stmt_info, g, gsi); > > + vec_compare = tmp2; > > +} > > + > > /* vectorizable_condition. > > > > Check if STMT_INFO is conditional modify expression that can be > > vectorized. > > @@ -9975,6 +10000,36 @@ vectorizable_condition (stmt_vec_info stmt_info, > > gimple_stmt_iterator *gsi, > > /* Handle cond expr. */ > > for (j = 0; j < ncopies; j++) > > { > > + tree loop_mask = NULL_TREE; > > + > > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > > + { > > + scalar_cond_masked_key cond (cond_expr, ncopies); > > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) > > Nit: untabified line. > > > + { > > + scalar_cond_masked_key cond (cond_expr, ncopies); > > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) > > This "if" looks redundant -- isn't the condition the same as above? Oops sorry, probably a copy-paste typo -;) > > > + { > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, > > vectype, j); > > + } > > + } > > + else > > + { > > + cond.cond_ops.code > > + = invert_tree_comparison (cond.cond_ops.code, true); > > Would be better to pass an HONOR_NANS value instead of "true". > > > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) > > + { > > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, > > vectype, j); > > + std::swap (then_clause, else_clause); > > + cond_code = cond.cond_ops.code; > > + cond_expr = build2 (cond_code, TREE_TYPE (cond_expr), > > + then_clause, else_clause); > > Rather than do the swap here and build a new tree, I think it would be > better to set a boolean that indicates that the then and else are swapped. > Then we can conditionally swap them after: > > vec_then_clause = vec_oprnds2[i]; > vec_else_clause = vec_oprnds3[i]; > > > [...] > > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > > index dc181524744..794e65f0007 100644 > > --- a/gcc/tree-vectorizer.c > > +++ b/gcc/tree-vectorizer.c > > @@ -464,6 +464,7 @@ vec_info::vec_info (vec_info::vec_kind kind_in, void > > *target_cost_data_in, > > target_cost_data (target_cost_data_in) > > { > > stmt_vec_infos.create (50); > > + scalar_cond_masked_set = new scalar_cond_masked_set_type (); > > } > > > > vec_info::~vec_info () > > @@ -476,6 +477,8 @@ vec_info::~vec_info () > > > > destroy_cost_data (target_cost_data); > > free_stmt_vec_infos (); > > + delete scalar_cond_masked_set; > > + scalar_cond_masked_set = 0; > > } > > > > vec_info_shared::vec_info_shared () > > No need to assign null here, since we're at the end of the destructor. > But maybe scalar_cond_masked_set should be "scalar_cond_masked_set_type" > rather than "scalar_cond_masked_set_type *", if the object is going to > have the same lifetime as the vec_info anyway. > > Looks good otherwise. I skipped over the tree_cond_ops bit given > your comment above that this was temporary. Thanks for the suggestions, I tried addressing them in attached patch. Does it look OK ?
With patch, the only following FAIL remains for aarch64-sve.exp: FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve scan-assembler-times \\tmovprfx\\t 6 which now contains 14. Should I adjust the test, assuming the change isn't a regression ? Thanks, Prathamesh > > Thanks, > Richard
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c index 69468eb69be..d2ffcc758f3 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c @@ -11,7 +11,10 @@ INT_TYPE *__restrict pred, int n) \ { \ for (int i = 0; i < n; ++i) \ - r[i] = pred[i] ? (FLOAT_TYPE) a[i] : b[i]; \ + { \ + FLOAT_TYPE bi = b[i]; \ + r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi; \ + } \ } #define TEST_ALL(T) \ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c index 55b535fa0cf..d55aef0bb9a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c @@ -11,7 +11,10 @@ INT_TYPE *__restrict pred, int n) \ { \ for (int i = 0; i < n; ++i) \ - r[i] = pred[i] ? (INT_TYPE) a[i] : b[i]; \ + { \ + INT_TYPE bi = b[i]; \ + r[i] = pred[i] ? (INT_TYPE) a[i] : bi; \ + } \ } #define TEST_ALL(T) \ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c index adf828398bb..f17480fb2f2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c @@ -13,7 +13,10 @@ TYPE *__restrict pred, int n) \ { \ for (int i = 0; i < n; ++i) \ - r[i] = pred[i] ? OP (a[i]) : b[i]; \ + { \ + TYPE bi = b[i]; \ + r[i] = pred[i] ? OP (a[i]) : bi; \ + } \ } #define TEST_INT_TYPE(T, TYPE) \ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c index 5c04bcdb3f5..a1b0667dab5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c @@ -15,5 +15,9 @@ f (double *restrict a, double *restrict b, double *restrict c, } } -/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ +/* See https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01644.html + for XFAILing the below test. */ + +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ /* { dg-final { scan-assembler-not {\tfmad\t} } } */ diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b0cbbac0cb5..d869dfabeb0 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -7197,7 +7197,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, } else vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num, - vectype_in); + vectype_in, NULL); } if (dump_enabled_p () && reduction_type == FOLD_LEFT_REDUCTION) @@ -8110,7 +8110,7 @@ vectorizable_live_operation (stmt_vec_info stmt_info, gcc_assert (ncopies == 1 && !slp_node); vect_record_loop_mask (loop_vinfo, &LOOP_VINFO_MASKS (loop_vinfo), - 1, vectype); + 1, vectype, NULL); } } return true; @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type) /* Record that a fully-masked version of LOOP_VINFO would need MASKS to contain a sequence of NVECTORS masks that each control a vector of type - VECTYPE. */ + VECTYPE. SCALAR_MASK if non-null, represents the mask used for corresponding + load/store stmt. */ void vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks, - unsigned int nvectors, tree vectype) + unsigned int nvectors, tree vectype, tree scalar_mask) { gcc_assert (nvectors != 0); if (masks->length () < nvectors) @@ -8329,6 +8330,12 @@ vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks, rgm->max_nscalars_per_iter = nscalars_per_iter; rgm->mask_type = build_same_sized_truth_vector_type (vectype); } + + if (scalar_mask) + { + scalar_cond_masked_key cond (scalar_mask, nvectors); + loop_vinfo->scalar_cond_masked_set.add (cond); + } } /* Given a complete set of masks MASKS, extract mask number INDEX diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index dd9d45a9547..14c2fcb53a7 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, stmt_vec_info, says how the load or store is going to be implemented and GROUP_SIZE is the number of load or store statements in the containing group. If the access is a gather load or scatter store, GS_INFO describes - its arguments. + its arguments. SCALAR_MASK is the scalar mask used for corresponding + load or store stmt. Clear LOOP_VINFO_CAN_FULLY_MASK_P if a fully-masked loop is not supported, otherwise record the required mask types. */ @@ -1888,7 +1889,7 @@ static void check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, vec_load_store_type vls_type, int group_size, vect_memory_access_type memory_access_type, - gather_scatter_info *gs_info) + gather_scatter_info *gs_info, tree scalar_mask) { /* Invariant loads need no special support. */ if (memory_access_type == VMAT_INVARIANT) @@ -1912,7 +1913,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, return; } unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype); - vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, scalar_mask); return; } @@ -1936,7 +1937,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, return; } unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype); - vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); + vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, scalar_mask); return; } @@ -1974,7 +1975,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); unsigned int nvectors; if (can_div_away_from_zero_p (group_size * vf, nunits, &nvectors)) - vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype); + vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask); else gcc_unreachable (); } @@ -3436,7 +3437,9 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, unsigned int nvectors = (slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies); - vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_out); + tree scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno); + vect_record_loop_mask (loop_vinfo, masks, nvectors, + vectype_out, scalar_mask); } return true; } @@ -7390,7 +7393,7 @@ vectorizable_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)) check_load_store_masking (loop_vinfo, vectype, vls_type, group_size, - memory_access_type, &gs_info); + memory_access_type, &gs_info, mask); STMT_VINFO_TYPE (stmt_info) = store_vec_info_type; vect_model_store_cost (stmt_info, ncopies, rhs_dt, memory_access_type, @@ -8637,7 +8640,7 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo)) check_load_store_masking (loop_vinfo, vectype, VLS_LOAD, group_size, - memory_access_type, &gs_info); + memory_access_type, &gs_info, mask); STMT_VINFO_TYPE (stmt_info) = load_vec_info_type; vect_model_load_cost (stmt_info, ncopies, memory_access_type, @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, /* Handle cond expr. */ for (j = 0; j < ncopies; j++) { + tree loop_mask = NULL_TREE; + bool swap_cond_operands = false; + + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) + { + scalar_cond_masked_key cond (cond_expr, ncopies); + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) + { + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); + } + else + { + cond.code = invert_tree_comparison (cond.code, + HONOR_NANS (TREE_TYPE (cond.op0))); + if (loop_vinfo->scalar_cond_masked_set.contains (cond)) + { + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j); + cond_code = cond.code; + swap_cond_operands = true; + } + } + } + stmt_vec_info new_stmt_info = NULL; if (j == 0) { @@ -10052,6 +10080,9 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, vec_then_clause = vec_oprnds2[i]; vec_else_clause = vec_oprnds3[i]; + if (swap_cond_operands) + std::swap (vec_then_clause, vec_else_clause); + if (masked) vec_compare = vec_cond_lhs; else @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi, } } } + + if (loop_mask) + { + if (COMPARISON_CLASS_P (vec_compare)) + { + tree tmp = make_ssa_name (vec_cmp_type); + gassign *g = gimple_build_assign (tmp, + TREE_CODE (vec_compare), + TREE_OPERAND (vec_compare, 0), + TREE_OPERAND (vec_compare, 1)); + vect_finish_stmt_generation (stmt_info, g, gsi); + vec_compare = tmp; + } + + tree tmp2 = make_ssa_name (vec_cmp_type); + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask); + vect_finish_stmt_generation (stmt_info, g, gsi); + vec_compare = tmp2; + } + if (reduction_type == EXTRACT_LAST_REDUCTION) { if (!is_gimple_val (vec_compare)) diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index dc181524744..c4b2d8e8647 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt) { return new pass_ipa_increase_alignment (ctxt); } + +/* If code(T) is comparison op or def of comparison stmt, + extract it's operands. + Else return <NE_EXPR, T, 0>. */ + +void +scalar_cond_masked_key::get_cond_ops_from_tree (tree t) +{ + if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison) + { + this->code = TREE_CODE (t); + this->op0 = TREE_OPERAND (t, 0); + this->op1 = TREE_OPERAND (t, 1); + return; + } + + if (TREE_CODE (t) == SSA_NAME) + { + gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t)); + if (stmt) + { + tree_code code = gimple_assign_rhs_code (stmt); + if (TREE_CODE_CLASS (code) == tcc_comparison) + { + this->code = code; + this->op0 = gimple_assign_rhs1 (stmt); + this->op1 = gimple_assign_rhs2 (stmt); + return; + } + } + } + + this->code = NE_EXPR; + this->op0 = t; + this->op1 = build_zero_cst (TREE_TYPE (t)); +} diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1456cde4c2c..e20a61ee33f 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -26,6 +26,7 @@ typedef class _stmt_vec_info *stmt_vec_info; #include "tree-data-ref.h" #include "tree-hash-traits.h" #include "target.h" +#include "hash-set.h" /* Used for naming of new temporaries. */ enum vect_var_kind { @@ -174,7 +175,71 @@ public: #define SLP_TREE_TWO_OPERATORS(S) (S)->two_operators #define SLP_TREE_DEF_TYPE(S) (S)->def_type +struct scalar_cond_masked_key +{ + scalar_cond_masked_key (tree t, unsigned ncopies_) + : ncopies (ncopies_) + { + get_cond_ops_from_tree (t); + } + + void get_cond_ops_from_tree (tree); + + unsigned ncopies; + tree_code code; + tree op0; + tree op1; +}; +template<> +struct default_hash_traits<scalar_cond_masked_key> +{ + typedef scalar_cond_masked_key compare_type; + typedef scalar_cond_masked_key value_type; + + static inline hashval_t + hash (value_type v) + { + inchash::hash h; + h.add_int (v.code); + inchash::add_expr (v.op0, h, 0); + inchash::add_expr (v.op1, h, 0); + h.add_int (v.ncopies); + return h.end (); + } + + static inline bool + equal (value_type existing, value_type candidate) + { + return (existing.ncopies == candidate.ncopies + && existing.code == candidate.code + && operand_equal_p (existing.op0, candidate.op0, 0) + && operand_equal_p (existing.op1, candidate.op1, 0)); + } + + static inline void + mark_empty (value_type &v) + { + v.ncopies = 0; + } + + static inline bool + is_empty (value_type v) + { + return v.ncopies == 0; + } + + static inline void mark_deleted (value_type &) {} + + static inline bool is_deleted (const value_type &) + { + return false; + } + + static inline void remove (value_type &) {} +}; + +typedef hash_set<scalar_cond_masked_key> scalar_cond_masked_set_type; /* Describes two objects whose addresses must be unequal for the vectorized loop to be valid. */ @@ -255,6 +320,9 @@ public: /* Cost data used by the target cost model. */ void *target_cost_data; + /* Set of scalar conditions that have loop mask applied. */ + scalar_cond_masked_set_type scalar_cond_masked_set; + private: stmt_vec_info new_stmt_vec_info (gimple *stmt); void set_vinfo_for_stmt (gimple *, stmt_vec_info); @@ -1617,7 +1685,7 @@ extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, extern tree vect_halve_mask_nunits (tree); extern tree vect_double_mask_nunits (tree); extern void vect_record_loop_mask (loop_vec_info, vec_loop_masks *, - unsigned int, tree); + unsigned int, tree, tree); extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *, unsigned int, tree, unsigned int);