Re: [SVE] PR86753

Richard Sandiford Thu, 05 Sep 2019 02:00:43 -0700

Sorry for the slow reply.

Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> On Fri, 30 Aug 2019 at 16:15, Richard Biener <richard.guent...@gmail.com> 
> wrote:
>>
>> On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford
>> <richard.sandif...@arm.com> wrote:
>> >
>> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
>> > > <richard.sandif...@arm.com> wrote:
>> > >>
>> > >> Richard should have the final say, but some comments...
>> > >>
>> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
>> > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> > >> > index 1e2dfe5d22d..862206b3256 100644
>> > >> > --- a/gcc/tree-vect-stmts.c
>> > >> > +++ b/gcc/tree-vect-stmts.c
>> > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info 
>> > >> > loop_vinfo, tree vectype,
>> > >> >
>> > >> >  static tree
>> > >> >  prepare_load_store_mask (tree mask_type, tree loop_mask, tree 
>> > >> > vec_mask,
>> > >> > -                      gimple_stmt_iterator *gsi)
>> > >> > +                      gimple_stmt_iterator *gsi, tree mask,
>> > >> > +                      cond_vmask_map_type *cond_to_vec_mask)
>> > >>
>> > >> "scalar_mask" might be a better name.  But maybe we should key off the
>> > >> vector mask after all, now that we're relying on the code having no
>> > >> redundancies.
>> > >>
>> > >> Passing the vinfo would be better than passing the cond_vmask_map_type
>> > >> directly.
>> > >>
>> > >> >  {
>> > >> >    gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE 
>> > >> > (vec_mask)));
>> > >> >    if (!loop_mask)
>> > >> >      return vec_mask;
>> > >> >
>> > >> >    gcc_assert (TREE_TYPE (loop_mask) == mask_type);
>> > >> > +
>> > >> > +  tree *slot = 0;
>> > >> > +  if (cond_to_vec_mask)
>> > >>
>> > >> The pointer should never be null in this context.
>> > > Disabling check for NULL results in segfault with cond_arith_4.c because 
>> > > we
>> > > reach prepare_load_store_mask via vect_schedule_slp, called from
>> > > here in vect_transform_loop:
>> > >  /* Schedule the SLP instances first, then handle loop vectorization
>> > >      below.  */
>> > >   if (!loop_vinfo->slp_instances.is_empty ())
>> > >     {
>> > >       DUMP_VECT_SCOPE ("scheduling SLP instances");
>> > >       vect_schedule_slp (loop_vinfo);
>> > >     }
>> > >
>> > > which is before bb processing loop.
>> >
>> > We want this optimisation to be applied to SLP too though.  Especially
>> > since non-SLP will be going away at some point.
>> >
>> > But as Richard says, the problem with SLP is that the statements aren't
>> > traversed in block order, so I guess we can't do the on-the-fly
>> > redundancy elimination there...
>>
>> And the current patch AFAICS can generate wrong SSA for this reason.
>>
>> > Maybe an alternative would be to record during the analysis phase which
>> > scalar conditions need which loop masks.  Statements that need a loop
>> > mask currently do:
>> >
>> >       vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
>> >
>> > If we also pass the scalar condition, we can maintain a hash_set of
>> > <condition, ncopies> pairs, representing the conditions that have
>> > loop masks applied at some point in the vectorised code.  The COND_EXPR
>> > code can use that set to decide whether to apply the loop mask or not.
>>
>> Yeah, that sounds better.
>>
>> Note that I don't like the extra "helpers" in fold-const.c/h, they do not 
>> look
>> useful in general so put them into vectorizer private code.  The decomposing
>> also doesn't look too nice, instead prepare_load_store_mask could get
>> such decomposed representation - possibly quite natural with the suggestion
>> from Richard above.
> Hi,
> Thanks for the suggestions, I have an attached updated patch, that
> tries to address above suggestions.
> With patch, we manage to use same predicate for both tests in PR, and
> the redundant AND ops are eliminated
> by fre4.
>
> I have a few doubts:
> 1] I moved tree_cond_ops into tree-vectorizer.[ch], I will get rid of
> it in follow up patch.
> I am not sure what to pass as def of scalar condition (scalar_mask) to
> vect_record_loop_mask
> from vectorizable_store, vectorizable_reduction and
> vectorizable_live_operation ? In the patch,
> I just passed NULL.


For vectorizable_store this is just "mask", like for vectorizable_load.
Passing NULL looks right for the other two.  (Nit, GCC style is to use
NULL rather than 0.)

> 2] Do changes to vectorizable_condition and
> vectorizable_condition_apply_loop_mask look OK ?

Some comments below.

> 3] The patch additionally regresses following tests (apart from fmla_2.c):
> FAIL: gcc.target/aarch64/sve/cond_convert_1.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_convert_4.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> scan-assembler-not \\tsel\\t
> FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> scan-assembler-times \\tmovprfx\\t
> [...]

For cond_convert_1.c, I think it would be OK to change the test to:

    for (int i = 0; i < n; ++i)                                 \
      {                                                         \
        FLOAT_TYPE bi = b[i];                                   \
        r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi;                \
      }                                                         \

so that only the a[i] load is conditional.  Same for the other two.

I think originally I had to write it this way precisely because
we didn't have the optimisation you're adding, so this is actually
a good sign :-)

> @@ -8313,7 +8313,7 @@ vect_double_mask_nunits (tree type)
>  
>  void
>  vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks,
> -                    unsigned int nvectors, tree vectype)
> +                    unsigned int nvectors, tree vectype, tree scalar_mask)
>  {
>    gcc_assert (nvectors != 0);
>    if (masks->length () < nvectors)

New parameter needs documentation.

> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index dd9d45a9547..49ea86a0680 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1888,7 +1888,7 @@ static void
>  check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
>                         vec_load_store_type vls_type, int group_size,
>                         vect_memory_access_type memory_access_type,
> -                       gather_scatter_info *gs_info)
> +                       gather_scatter_info *gs_info, tree scalar_mask)
>  {
>    /* Invariant loads need no special support.  */
>    if (memory_access_type == VMAT_INVARIANT)

Same here.

> @@ -9763,6 +9765,29 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
>    return true;
>  }
>  
> +static void
> +vectorizable_condition_apply_loop_mask (tree &vec_compare,
> +                                     gimple_stmt_iterator *&gsi,
> +                                     stmt_vec_info &stmt_info,
> +                                     tree loop_mask,
> +                                     tree vec_cmp_type)

Function needs a comment.

I think it'd be better to return the new mask and not make vec_compare
a reference.  stmt_info shouldn't need to be a reference either (it's
just a pointer type).

> +{
> +  if (COMPARISON_CLASS_P (vec_compare))
> +    {
> +      tree tmp = make_ssa_name (vec_cmp_type);
> +      gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare),
> +                                     TREE_OPERAND (vec_compare, 0),
> +                                     TREE_OPERAND (vec_compare, 1));
> +      vect_finish_stmt_generation (stmt_info, g, gsi);
> +      vec_compare = tmp;
> +    }
> +
> +  tree tmp2 = make_ssa_name (vec_cmp_type);
> +  gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, 
> loop_mask);
> +  vect_finish_stmt_generation (stmt_info, g, gsi);
> +  vec_compare = tmp2;
> +}
> +
>  /* vectorizable_condition.
>  
>     Check if STMT_INFO is conditional modify expression that can be 
> vectorized.
> @@ -9975,6 +10000,36 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> gimple_stmt_iterator *gsi,
>    /* Handle cond expr.  */
>    for (j = 0; j < ncopies; j++)
>      {
> +      tree loop_mask = NULL_TREE;
> +
> +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> +     {
> +       scalar_cond_masked_key cond (cond_expr, ncopies);
> +          if (loop_vinfo->scalar_cond_masked_set->contains (cond))

Nit: untabified line.

> +         {
> +           scalar_cond_masked_key cond (cond_expr, ncopies);
> +           if (loop_vinfo->scalar_cond_masked_set->contains (cond))

This "if" looks redundant -- isn't the condition the same as above?

> +             {
> +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, 
> j);
> +             }
> +         }
> +       else
> +         {
> +           cond.cond_ops.code
> +             = invert_tree_comparison (cond.cond_ops.code, true);

Would be better to pass an HONOR_NANS value instead of "true".

> +           if (loop_vinfo->scalar_cond_masked_set->contains (cond))
> +             {
> +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, 
> j);
> +               std::swap (then_clause, else_clause);
> +               cond_code = cond.cond_ops.code;
> +               cond_expr = build2 (cond_code, TREE_TYPE (cond_expr),
> +                                   then_clause, else_clause);

Rather than do the swap here and build a new tree, I think it would be
better to set a boolean that indicates that the then and else are swapped.
Then we can conditionally swap them after:

          vec_then_clause = vec_oprnds2[i];
          vec_else_clause = vec_oprnds3[i];

> [...]
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index dc181524744..794e65f0007 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -464,6 +464,7 @@ vec_info::vec_info (vec_info::vec_kind kind_in, void 
> *target_cost_data_in,
>      target_cost_data (target_cost_data_in)
>  {
>    stmt_vec_infos.create (50);
> +  scalar_cond_masked_set = new scalar_cond_masked_set_type ();
>  }
>  
>  vec_info::~vec_info ()
> @@ -476,6 +477,8 @@ vec_info::~vec_info ()
>  
>    destroy_cost_data (target_cost_data);
>    free_stmt_vec_infos ();
> +  delete scalar_cond_masked_set;
> +  scalar_cond_masked_set = 0;
>  }
>  
>  vec_info_shared::vec_info_shared ()

No need to assign null here, since we're at the end of the destructor.
But maybe scalar_cond_masked_set should be "scalar_cond_masked_set_type"
rather than "scalar_cond_masked_set_type *", if the object is going to
have the same lifetime as the vec_info anyway.

Looks good otherwise.  I skipped over the tree_cond_ops bit given
your comment above that this was temporary.

Thanks,
Richard

Re: [SVE] PR86753

Reply via email to