Sorry for the slow reply. Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: > On Fri, 30 Aug 2019 at 16:15, Richard Biener <richard.guent...@gmail.com> > wrote: >> >> On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford >> <richard.sandif...@arm.com> wrote: >> > >> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: >> > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford >> > > <richard.sandif...@arm.com> wrote: >> > >> >> > >> Richard should have the final say, but some comments... >> > >> >> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes: >> > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c >> > >> > index 1e2dfe5d22d..862206b3256 100644 >> > >> > --- a/gcc/tree-vect-stmts.c >> > >> > +++ b/gcc/tree-vect-stmts.c >> > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info >> > >> > loop_vinfo, tree vectype, >> > >> > >> > >> > static tree >> > >> > prepare_load_store_mask (tree mask_type, tree loop_mask, tree >> > >> > vec_mask, >> > >> > - gimple_stmt_iterator *gsi) >> > >> > + gimple_stmt_iterator *gsi, tree mask, >> > >> > + cond_vmask_map_type *cond_to_vec_mask) >> > >> >> > >> "scalar_mask" might be a better name. But maybe we should key off the >> > >> vector mask after all, now that we're relying on the code having no >> > >> redundancies. >> > >> >> > >> Passing the vinfo would be better than passing the cond_vmask_map_type >> > >> directly. >> > >> >> > >> > { >> > >> > gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE >> > >> > (vec_mask))); >> > >> > if (!loop_mask) >> > >> > return vec_mask; >> > >> > >> > >> > gcc_assert (TREE_TYPE (loop_mask) == mask_type); >> > >> > + >> > >> > + tree *slot = 0; >> > >> > + if (cond_to_vec_mask) >> > >> >> > >> The pointer should never be null in this context. >> > > Disabling check for NULL results in segfault with cond_arith_4.c because >> > > we >> > > reach prepare_load_store_mask via vect_schedule_slp, called from >> > > here in vect_transform_loop: >> > > /* Schedule the SLP instances first, then handle loop vectorization >> > > below. */ >> > > if (!loop_vinfo->slp_instances.is_empty ()) >> > > { >> > > DUMP_VECT_SCOPE ("scheduling SLP instances"); >> > > vect_schedule_slp (loop_vinfo); >> > > } >> > > >> > > which is before bb processing loop. >> > >> > We want this optimisation to be applied to SLP too though. Especially >> > since non-SLP will be going away at some point. >> > >> > But as Richard says, the problem with SLP is that the statements aren't >> > traversed in block order, so I guess we can't do the on-the-fly >> > redundancy elimination there... >> >> And the current patch AFAICS can generate wrong SSA for this reason. >> >> > Maybe an alternative would be to record during the analysis phase which >> > scalar conditions need which loop masks. Statements that need a loop >> > mask currently do: >> > >> > vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype); >> > >> > If we also pass the scalar condition, we can maintain a hash_set of >> > <condition, ncopies> pairs, representing the conditions that have >> > loop masks applied at some point in the vectorised code. The COND_EXPR >> > code can use that set to decide whether to apply the loop mask or not. >> >> Yeah, that sounds better. >> >> Note that I don't like the extra "helpers" in fold-const.c/h, they do not >> look >> useful in general so put them into vectorizer private code. The decomposing >> also doesn't look too nice, instead prepare_load_store_mask could get >> such decomposed representation - possibly quite natural with the suggestion >> from Richard above. > Hi, > Thanks for the suggestions, I have an attached updated patch, that > tries to address above suggestions. > With patch, we manage to use same predicate for both tests in PR, and > the redundant AND ops are eliminated > by fre4. > > I have a few doubts: > 1] I moved tree_cond_ops into tree-vectorizer.[ch], I will get rid of > it in follow up patch. > I am not sure what to pass as def of scalar condition (scalar_mask) to > vect_record_loop_mask > from vectorizable_store, vectorizable_reduction and > vectorizable_live_operation ? In the patch, > I just passed NULL.
For vectorizable_store this is just "mask", like for vectorizable_load. Passing NULL looks right for the other two. (Nit, GCC style is to use NULL rather than 0.) > 2] Do changes to vectorizable_condition and > vectorizable_condition_apply_loop_mask look OK ? Some comments below. > 3] The patch additionally regresses following tests (apart from fmla_2.c): > FAIL: gcc.target/aarch64/sve/cond_convert_1.c -march=armv8.2-a+sve > scan-assembler-not \\tsel\\t > FAIL: gcc.target/aarch64/sve/cond_convert_4.c -march=armv8.2-a+sve > scan-assembler-not \\tsel\\t > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > scan-assembler-not \\tsel\\t > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve > scan-assembler-times \\tmovprfx\\t > [...] For cond_convert_1.c, I think it would be OK to change the test to: for (int i = 0; i < n; ++i) \ { \ FLOAT_TYPE bi = b[i]; \ r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi; \ } \ so that only the a[i] load is conditional. Same for the other two. I think originally I had to write it this way precisely because we didn't have the optimisation you're adding, so this is actually a good sign :-) > @@ -8313,7 +8313,7 @@ vect_double_mask_nunits (tree type) > > void > vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks, > - unsigned int nvectors, tree vectype) > + unsigned int nvectors, tree vectype, tree scalar_mask) > { > gcc_assert (nvectors != 0); > if (masks->length () < nvectors) New parameter needs documentation. > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index dd9d45a9547..49ea86a0680 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1888,7 +1888,7 @@ static void > check_load_store_masking (loop_vec_info loop_vinfo, tree vectype, > vec_load_store_type vls_type, int group_size, > vect_memory_access_type memory_access_type, > - gather_scatter_info *gs_info) > + gather_scatter_info *gs_info, tree scalar_mask) > { > /* Invariant loads need no special support. */ > if (memory_access_type == VMAT_INVARIANT) Same here. > @@ -9763,6 +9765,29 @@ vect_is_simple_cond (tree cond, vec_info *vinfo, > return true; > } > > +static void > +vectorizable_condition_apply_loop_mask (tree &vec_compare, > + gimple_stmt_iterator *&gsi, > + stmt_vec_info &stmt_info, > + tree loop_mask, > + tree vec_cmp_type) Function needs a comment. I think it'd be better to return the new mask and not make vec_compare a reference. stmt_info shouldn't need to be a reference either (it's just a pointer type). > +{ > + if (COMPARISON_CLASS_P (vec_compare)) > + { > + tree tmp = make_ssa_name (vec_cmp_type); > + gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare), > + TREE_OPERAND (vec_compare, 0), > + TREE_OPERAND (vec_compare, 1)); > + vect_finish_stmt_generation (stmt_info, g, gsi); > + vec_compare = tmp; > + } > + > + tree tmp2 = make_ssa_name (vec_cmp_type); > + gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, > loop_mask); > + vect_finish_stmt_generation (stmt_info, g, gsi); > + vec_compare = tmp2; > +} > + > /* vectorizable_condition. > > Check if STMT_INFO is conditional modify expression that can be > vectorized. > @@ -9975,6 +10000,36 @@ vectorizable_condition (stmt_vec_info stmt_info, > gimple_stmt_iterator *gsi, > /* Handle cond expr. */ > for (j = 0; j < ncopies; j++) > { > + tree loop_mask = NULL_TREE; > + > + if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) > + { > + scalar_cond_masked_key cond (cond_expr, ncopies); > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) Nit: untabified line. > + { > + scalar_cond_masked_key cond (cond_expr, ncopies); > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) This "if" looks redundant -- isn't the condition the same as above? > + { > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, > j); > + } > + } > + else > + { > + cond.cond_ops.code > + = invert_tree_comparison (cond.cond_ops.code, true); Would be better to pass an HONOR_NANS value instead of "true". > + if (loop_vinfo->scalar_cond_masked_set->contains (cond)) > + { > + vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo); > + loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, > j); > + std::swap (then_clause, else_clause); > + cond_code = cond.cond_ops.code; > + cond_expr = build2 (cond_code, TREE_TYPE (cond_expr), > + then_clause, else_clause); Rather than do the swap here and build a new tree, I think it would be better to set a boolean that indicates that the then and else are swapped. Then we can conditionally swap them after: vec_then_clause = vec_oprnds2[i]; vec_else_clause = vec_oprnds3[i]; > [...] > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index dc181524744..794e65f0007 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -464,6 +464,7 @@ vec_info::vec_info (vec_info::vec_kind kind_in, void > *target_cost_data_in, > target_cost_data (target_cost_data_in) > { > stmt_vec_infos.create (50); > + scalar_cond_masked_set = new scalar_cond_masked_set_type (); > } > > vec_info::~vec_info () > @@ -476,6 +477,8 @@ vec_info::~vec_info () > > destroy_cost_data (target_cost_data); > free_stmt_vec_infos (); > + delete scalar_cond_masked_set; > + scalar_cond_masked_set = 0; > } > > vec_info_shared::vec_info_shared () No need to assign null here, since we're at the end of the destructor. But maybe scalar_cond_masked_set should be "scalar_cond_masked_set_type" rather than "scalar_cond_masked_set_type *", if the object is going to have the same lifetime as the vec_info anyway. Looks good otherwise. I skipped over the tree_cond_ops bit given your comment above that this was temporary. Thanks, Richard