On Thu, 5 Sep 2019 at 14:29, Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Sorry for the slow reply.
>
> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> > On Fri, 30 Aug 2019 at 16:15, Richard Biener <richard.guent...@gmail.com> 
> > wrote:
> >>
> >> On Wed, Aug 28, 2019 at 11:02 AM Richard Sandiford
> >> <richard.sandif...@arm.com> wrote:
> >> >
> >> > Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> >> > > On Tue, 27 Aug 2019 at 21:14, Richard Sandiford
> >> > > <richard.sandif...@arm.com> wrote:
> >> > >>
> >> > >> Richard should have the final say, but some comments...
> >> > >>
> >> > >> Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> >> > >> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> >> > >> > index 1e2dfe5d22d..862206b3256 100644
> >> > >> > --- a/gcc/tree-vect-stmts.c
> >> > >> > +++ b/gcc/tree-vect-stmts.c
> >> > >> > @@ -1989,17 +1989,31 @@ check_load_store_masking (loop_vec_info 
> >> > >> > loop_vinfo, tree vectype,
> >> > >> >
> >> > >> >  static tree
> >> > >> >  prepare_load_store_mask (tree mask_type, tree loop_mask, tree 
> >> > >> > vec_mask,
> >> > >> > -                      gimple_stmt_iterator *gsi)
> >> > >> > +                      gimple_stmt_iterator *gsi, tree mask,
> >> > >> > +                      cond_vmask_map_type *cond_to_vec_mask)
> >> > >>
> >> > >> "scalar_mask" might be a better name.  But maybe we should key off the
> >> > >> vector mask after all, now that we're relying on the code having no
> >> > >> redundancies.
> >> > >>
> >> > >> Passing the vinfo would be better than passing the cond_vmask_map_type
> >> > >> directly.
> >> > >>
> >> > >> >  {
> >> > >> >    gcc_assert (useless_type_conversion_p (mask_type, TREE_TYPE 
> >> > >> > (vec_mask)));
> >> > >> >    if (!loop_mask)
> >> > >> >      return vec_mask;
> >> > >> >
> >> > >> >    gcc_assert (TREE_TYPE (loop_mask) == mask_type);
> >> > >> > +
> >> > >> > +  tree *slot = 0;
> >> > >> > +  if (cond_to_vec_mask)
> >> > >>
> >> > >> The pointer should never be null in this context.
> >> > > Disabling check for NULL results in segfault with cond_arith_4.c 
> >> > > because we
> >> > > reach prepare_load_store_mask via vect_schedule_slp, called from
> >> > > here in vect_transform_loop:
> >> > >  /* Schedule the SLP instances first, then handle loop vectorization
> >> > >      below.  */
> >> > >   if (!loop_vinfo->slp_instances.is_empty ())
> >> > >     {
> >> > >       DUMP_VECT_SCOPE ("scheduling SLP instances");
> >> > >       vect_schedule_slp (loop_vinfo);
> >> > >     }
> >> > >
> >> > > which is before bb processing loop.
> >> >
> >> > We want this optimisation to be applied to SLP too though.  Especially
> >> > since non-SLP will be going away at some point.
> >> >
> >> > But as Richard says, the problem with SLP is that the statements aren't
> >> > traversed in block order, so I guess we can't do the on-the-fly
> >> > redundancy elimination there...
> >>
> >> And the current patch AFAICS can generate wrong SSA for this reason.
> >>
> >> > Maybe an alternative would be to record during the analysis phase which
> >> > scalar conditions need which loop masks.  Statements that need a loop
> >> > mask currently do:
> >> >
> >> >       vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
> >> >
> >> > If we also pass the scalar condition, we can maintain a hash_set of
> >> > <condition, ncopies> pairs, representing the conditions that have
> >> > loop masks applied at some point in the vectorised code.  The COND_EXPR
> >> > code can use that set to decide whether to apply the loop mask or not.
> >>
> >> Yeah, that sounds better.
> >>
> >> Note that I don't like the extra "helpers" in fold-const.c/h, they do not 
> >> look
> >> useful in general so put them into vectorizer private code.  The 
> >> decomposing
> >> also doesn't look too nice, instead prepare_load_store_mask could get
> >> such decomposed representation - possibly quite natural with the suggestion
> >> from Richard above.
> > Hi,
> > Thanks for the suggestions, I have an attached updated patch, that
> > tries to address above suggestions.
> > With patch, we manage to use same predicate for both tests in PR, and
> > the redundant AND ops are eliminated
> > by fre4.
> >
> > I have a few doubts:
> > 1] I moved tree_cond_ops into tree-vectorizer.[ch], I will get rid of
> > it in follow up patch.
> > I am not sure what to pass as def of scalar condition (scalar_mask) to
> > vect_record_loop_mask
> > from vectorizable_store, vectorizable_reduction and
> > vectorizable_live_operation ? In the patch,
> > I just passed NULL.
>
> For vectorizable_store this is just "mask", like for vectorizable_load.
> Passing NULL looks right for the other two.  (Nit, GCC style is to use
> NULL rather than 0.)
>
> > 2] Do changes to vectorizable_condition and
> > vectorizable_condition_apply_loop_mask look OK ?
>
> Some comments below.
>
> > 3] The patch additionally regresses following tests (apart from fmla_2.c):
> > FAIL: gcc.target/aarch64/sve/cond_convert_1.c -march=armv8.2-a+sve
> > scan-assembler-not \\tsel\\t
> > FAIL: gcc.target/aarch64/sve/cond_convert_4.c -march=armv8.2-a+sve
> > scan-assembler-not \\tsel\\t
> > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > scan-assembler-not \\tsel\\t
> > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > scan-assembler-times \\tmovprfx\\t
> > [...]
>
> For cond_convert_1.c, I think it would be OK to change the test to:
>
>     for (int i = 0; i < n; ++i)                                 \
>       {                                                         \
>         FLOAT_TYPE bi = b[i];                                   \
>         r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi;                \
>       }                                                         \
>
> so that only the a[i] load is conditional.  Same for the other two.
>
> I think originally I had to write it this way precisely because
> we didn't have the optimisation you're adding, so this is actually
> a good sign :-)
>
> > @@ -8313,7 +8313,7 @@ vect_double_mask_nunits (tree type)
> >
> >  void
> >  vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks,
> > -                    unsigned int nvectors, tree vectype)
> > +                    unsigned int nvectors, tree vectype, tree scalar_mask)
> >  {
> >    gcc_assert (nvectors != 0);
> >    if (masks->length () < nvectors)
>
> New parameter needs documentation.
>
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index dd9d45a9547..49ea86a0680 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -1888,7 +1888,7 @@ static void
> >  check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
> >                         vec_load_store_type vls_type, int group_size,
> >                         vect_memory_access_type memory_access_type,
> > -                       gather_scatter_info *gs_info)
> > +                       gather_scatter_info *gs_info, tree scalar_mask)
> >  {
> >    /* Invariant loads need no special support.  */
> >    if (memory_access_type == VMAT_INVARIANT)
>
> Same here.
>
> > @@ -9763,6 +9765,29 @@ vect_is_simple_cond (tree cond, vec_info *vinfo,
> >    return true;
> >  }
> >
> > +static void
> > +vectorizable_condition_apply_loop_mask (tree &vec_compare,
> > +                                     gimple_stmt_iterator *&gsi,
> > +                                     stmt_vec_info &stmt_info,
> > +                                     tree loop_mask,
> > +                                     tree vec_cmp_type)
>
> Function needs a comment.
>
> I think it'd be better to return the new mask and not make vec_compare
> a reference.  stmt_info shouldn't need to be a reference either (it's
> just a pointer type).
>
> > +{
> > +  if (COMPARISON_CLASS_P (vec_compare))
> > +    {
> > +      tree tmp = make_ssa_name (vec_cmp_type);
> > +      gassign *g = gimple_build_assign (tmp, TREE_CODE (vec_compare),
> > +                                     TREE_OPERAND (vec_compare, 0),
> > +                                     TREE_OPERAND (vec_compare, 1));
> > +      vect_finish_stmt_generation (stmt_info, g, gsi);
> > +      vec_compare = tmp;
> > +    }
> > +
> > +  tree tmp2 = make_ssa_name (vec_cmp_type);
> > +  gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, 
> > loop_mask);
> > +  vect_finish_stmt_generation (stmt_info, g, gsi);
> > +  vec_compare = tmp2;
> > +}
> > +
> >  /* vectorizable_condition.
> >
> >     Check if STMT_INFO is conditional modify expression that can be 
> > vectorized.
> > @@ -9975,6 +10000,36 @@ vectorizable_condition (stmt_vec_info stmt_info, 
> > gimple_stmt_iterator *gsi,
> >    /* Handle cond expr.  */
> >    for (j = 0; j < ncopies; j++)
> >      {
> > +      tree loop_mask = NULL_TREE;
> > +
> > +      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > +     {
> > +       scalar_cond_masked_key cond (cond_expr, ncopies);
> > +          if (loop_vinfo->scalar_cond_masked_set->contains (cond))
>
> Nit: untabified line.
>
> > +         {
> > +           scalar_cond_masked_key cond (cond_expr, ncopies);
> > +           if (loop_vinfo->scalar_cond_masked_set->contains (cond))
>
> This "if" looks redundant -- isn't the condition the same as above?
Oops sorry, probably a copy-paste typo -;)
>
> > +             {
> > +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > vectype, j);
> > +             }
> > +         }
> > +       else
> > +         {
> > +           cond.cond_ops.code
> > +             = invert_tree_comparison (cond.cond_ops.code, true);
>
> Would be better to pass an HONOR_NANS value instead of "true".
>
> > +           if (loop_vinfo->scalar_cond_masked_set->contains (cond))
> > +             {
> > +               vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +               loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > vectype, j);
> > +               std::swap (then_clause, else_clause);
> > +               cond_code = cond.cond_ops.code;
> > +               cond_expr = build2 (cond_code, TREE_TYPE (cond_expr),
> > +                                   then_clause, else_clause);
>
> Rather than do the swap here and build a new tree, I think it would be
> better to set a boolean that indicates that the then and else are swapped.
> Then we can conditionally swap them after:
>
>           vec_then_clause = vec_oprnds2[i];
>           vec_else_clause = vec_oprnds3[i];
>
> > [...]
> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > index dc181524744..794e65f0007 100644
> > --- a/gcc/tree-vectorizer.c
> > +++ b/gcc/tree-vectorizer.c
> > @@ -464,6 +464,7 @@ vec_info::vec_info (vec_info::vec_kind kind_in, void 
> > *target_cost_data_in,
> >      target_cost_data (target_cost_data_in)
> >  {
> >    stmt_vec_infos.create (50);
> > +  scalar_cond_masked_set = new scalar_cond_masked_set_type ();
> >  }
> >
> >  vec_info::~vec_info ()
> > @@ -476,6 +477,8 @@ vec_info::~vec_info ()
> >
> >    destroy_cost_data (target_cost_data);
> >    free_stmt_vec_infos ();
> > +  delete scalar_cond_masked_set;
> > +  scalar_cond_masked_set = 0;
> >  }
> >
> >  vec_info_shared::vec_info_shared ()
>
> No need to assign null here, since we're at the end of the destructor.
> But maybe scalar_cond_masked_set should be "scalar_cond_masked_set_type"
> rather than "scalar_cond_masked_set_type *", if the object is going to
> have the same lifetime as the vec_info anyway.
>
> Looks good otherwise.  I skipped over the tree_cond_ops bit given
> your comment above that this was temporary.
Thanks for the suggestions, I tried addressing them in attached patch.
Does it look OK ?

With patch, the only following FAIL remains for aarch64-sve.exp:
FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
scan-assembler-times \\tmovprfx\\t 6
which now contains 14.
Should I adjust the test, assuming the change isn't a regression ?

Thanks,
Prathamesh
>
> Thanks,
> Richard
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c
index 69468eb69be..d2ffcc758f3 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_1.c
@@ -11,7 +11,10 @@
 		   INT_TYPE *__restrict pred, int n)		\
   {								\
     for (int i = 0; i < n; ++i)					\
-      r[i] = pred[i] ? (FLOAT_TYPE) a[i] : b[i];		\
+      {								\
+	FLOAT_TYPE bi = b[i];					\
+	r[i] = pred[i] ? (FLOAT_TYPE) a[i] : bi;		\
+      }								\
   }
 
 #define TEST_ALL(T) \
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c
index 55b535fa0cf..d55aef0bb9a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_4.c
@@ -11,7 +11,10 @@
 		   INT_TYPE *__restrict pred, int n)		\
   {								\
     for (int i = 0; i < n; ++i)					\
-      r[i] = pred[i] ? (INT_TYPE) a[i] : b[i];			\
+      {								\
+	INT_TYPE bi = b[i];					\
+	r[i] = pred[i] ? (INT_TYPE) a[i] : bi;			\
+      }								\
   }
 
 #define TEST_ALL(T) \
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c
index adf828398bb..f17480fb2f2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_2.c
@@ -13,7 +13,10 @@
 		      TYPE *__restrict pred, int n)		\
   {								\
     for (int i = 0; i < n; ++i)					\
-      r[i] = pred[i] ? OP (a[i]) : b[i];			\
+      {								\
+	TYPE bi = b[i];						\
+	r[i] = pred[i] ? OP (a[i]) : bi;			\
+      }								\
   }
 
 #define TEST_INT_TYPE(T, TYPE) \
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
index 5c04bcdb3f5..a1b0667dab5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fmla_2.c
@@ -15,5 +15,9 @@ f (double *restrict a, double *restrict b, double *restrict c,
     }
 }
 
-/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */
+/* See https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01644.html
+   for XFAILing the below test.  */
+
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
 /* { dg-final { scan-assembler-not {\tfmad\t} } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b0cbbac0cb5..d869dfabeb0 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -7197,7 +7197,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 	    }
 	  else
 	    vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-				   vectype_in);
+				   vectype_in, NULL);
 	}
       if (dump_enabled_p ()
 	  && reduction_type == FOLD_LEFT_REDUCTION)
@@ -8110,7 +8110,7 @@ vectorizable_live_operation (stmt_vec_info stmt_info,
 	      gcc_assert (ncopies == 1 && !slp_node);
 	      vect_record_loop_mask (loop_vinfo,
 				     &LOOP_VINFO_MASKS (loop_vinfo),
-				     1, vectype);
+				     1, vectype, NULL);
 	    }
 	}
       return true;
@@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
 
 /* Record that a fully-masked version of LOOP_VINFO would need MASKS to
    contain a sequence of NVECTORS masks that each control a vector of type
-   VECTYPE.  */
+   VECTYPE. SCALAR_MASK if non-null, represents the mask used for corresponding
+   load/store stmt.  */
 
 void
 vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks,
-		       unsigned int nvectors, tree vectype)
+		       unsigned int nvectors, tree vectype, tree scalar_mask)
 {
   gcc_assert (nvectors != 0);
   if (masks->length () < nvectors)
@@ -8329,6 +8330,12 @@ vect_record_loop_mask (loop_vec_info loop_vinfo, vec_loop_masks *masks,
       rgm->max_nscalars_per_iter = nscalars_per_iter;
       rgm->mask_type = build_same_sized_truth_vector_type (vectype);
     }
+
+  if (scalar_mask)
+    {
+      scalar_cond_masked_key cond (scalar_mask, nvectors);
+      loop_vinfo->scalar_cond_masked_set.add (cond);
+    }
 }
 
 /* Given a complete set of masks MASKS, extract mask number INDEX
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index dd9d45a9547..14c2fcb53a7 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, tree, stmt_vec_info,
    says how the load or store is going to be implemented and GROUP_SIZE
    is the number of load or store statements in the containing group.
    If the access is a gather load or scatter store, GS_INFO describes
-   its arguments.
+   its arguments. SCALAR_MASK is the scalar mask used for corresponding
+   load or store stmt.
 
    Clear LOOP_VINFO_CAN_FULLY_MASK_P if a fully-masked loop is not
    supported, otherwise record the required mask types.  */
@@ -1888,7 +1889,7 @@ static void
 check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
 			  vec_load_store_type vls_type, int group_size,
 			  vect_memory_access_type memory_access_type,
-			  gather_scatter_info *gs_info)
+			  gather_scatter_info *gs_info, tree scalar_mask)
 {
   /* Invariant loads need no special support.  */
   if (memory_access_type == VMAT_INVARIANT)
@@ -1912,7 +1913,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
 	  return;
 	}
       unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype);
-      vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
+      vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, scalar_mask);
       return;
     }
 
@@ -1936,7 +1937,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
 	  return;
 	}
       unsigned int ncopies = vect_get_num_copies (loop_vinfo, vectype);
-      vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype);
+      vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, scalar_mask);
       return;
     }
 
@@ -1974,7 +1975,7 @@ check_load_store_masking (loop_vec_info loop_vinfo, tree vectype,
   poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned int nvectors;
   if (can_div_away_from_zero_p (group_size * vf, nunits, &nvectors))
-    vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype);
+    vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype, scalar_mask);
   else
     gcc_unreachable ();
 }
@@ -3436,7 +3437,9 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 	  unsigned int nvectors = (slp_node
 				   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
 				   : ncopies);
-	  vect_record_loop_mask (loop_vinfo, masks, nvectors, vectype_out);
+	  tree scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
+	  vect_record_loop_mask (loop_vinfo, masks, nvectors,
+				 vectype_out, scalar_mask);
 	}
       return true;
     }
@@ -7390,7 +7393,7 @@ vectorizable_store (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
       if (loop_vinfo
 	  && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
 	check_load_store_masking (loop_vinfo, vectype, vls_type, group_size,
-				  memory_access_type, &gs_info);
+				  memory_access_type, &gs_info, mask);
 
       STMT_VINFO_TYPE (stmt_info) = store_vec_info_type;
       vect_model_store_cost (stmt_info, ncopies, rhs_dt, memory_access_type,
@@ -8637,7 +8640,7 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
       if (loop_vinfo
 	  && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
 	check_load_store_masking (loop_vinfo, vectype, VLS_LOAD, group_size,
-				  memory_access_type, &gs_info);
+				  memory_access_type, &gs_info, mask);
 
       STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
       vect_model_load_cost (stmt_info, ncopies, memory_access_type,
@@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
   /* Handle cond expr.  */
   for (j = 0; j < ncopies; j++)
     {
+      tree loop_mask = NULL_TREE;
+      bool swap_cond_operands = false;
+
+      if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
+	{
+	  scalar_cond_masked_key cond (cond_expr, ncopies);
+	  if (loop_vinfo->scalar_cond_masked_set.contains (cond))
+	    {
+	      vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+	      loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
+	    }
+	  else
+	    {
+	      cond.code = invert_tree_comparison (cond.code,
+						  HONOR_NANS (TREE_TYPE (cond.op0)));
+	      if (loop_vinfo->scalar_cond_masked_set.contains (cond))
+		{
+		  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+		  loop_mask = vect_get_loop_mask (gsi, masks, ncopies, vectype, j);
+		  cond_code = cond.code; 
+		  swap_cond_operands = true;
+		}
+	    }
+	}
+
       stmt_vec_info new_stmt_info = NULL;
       if (j == 0)
 	{
@@ -10052,6 +10080,9 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
           vec_then_clause = vec_oprnds2[i];
           vec_else_clause = vec_oprnds3[i];
 
+	  if (swap_cond_operands)
+	    std::swap (vec_then_clause, vec_else_clause);
+
 	  if (masked)
 	    vec_compare = vec_cond_lhs;
 	  else
@@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 		    }
 		}
 	    }
+
+	  if (loop_mask)
+	    {
+	      if (COMPARISON_CLASS_P (vec_compare))
+		{
+		  tree tmp = make_ssa_name (vec_cmp_type);
+		  gassign *g = gimple_build_assign (tmp,
+						    TREE_CODE (vec_compare),
+						    TREE_OPERAND (vec_compare, 0),
+						    TREE_OPERAND (vec_compare, 1));
+		  vect_finish_stmt_generation (stmt_info, g, gsi);
+		  vec_compare = tmp;
+		}
+
+	      tree tmp2 = make_ssa_name (vec_cmp_type);
+	      gassign *g = gimple_build_assign (tmp2, BIT_AND_EXPR, vec_compare, loop_mask);
+	      vect_finish_stmt_generation (stmt_info, g, gsi);
+	      vec_compare = tmp2;
+	    }
+
 	  if (reduction_type == EXTRACT_LAST_REDUCTION)
 	    {
 	      if (!is_gimple_val (vec_compare))
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index dc181524744..c4b2d8e8647 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -1513,3 +1513,39 @@ make_pass_ipa_increase_alignment (gcc::context *ctxt)
 {
   return new pass_ipa_increase_alignment (ctxt);
 }
+
+/* If code(T) is comparison op or def of comparison stmt,
+   extract it's operands.
+   Else return <NE_EXPR, T, 0>.  */
+
+void
+scalar_cond_masked_key::get_cond_ops_from_tree (tree t) 
+{
+  if (TREE_CODE_CLASS (TREE_CODE (t)) == tcc_comparison)
+    {
+      this->code = TREE_CODE (t);
+      this->op0 = TREE_OPERAND (t, 0);
+      this->op1 = TREE_OPERAND (t, 1);
+      return;
+    }
+
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gassign *stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (t));
+      if (stmt)
+        {
+          tree_code code = gimple_assign_rhs_code (stmt);
+          if (TREE_CODE_CLASS (code) == tcc_comparison)
+            {
+              this->code = code;
+              this->op0 = gimple_assign_rhs1 (stmt);
+              this->op1 = gimple_assign_rhs2 (stmt);
+              return;
+            }
+        }
+    }
+
+  this->code = NE_EXPR;
+  this->op0 = t;
+  this->op1 = build_zero_cst (TREE_TYPE (t));
+}
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1456cde4c2c..e20a61ee33f 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -26,6 +26,7 @@ typedef class _stmt_vec_info *stmt_vec_info;
 #include "tree-data-ref.h"
 #include "tree-hash-traits.h"
 #include "target.h"
+#include "hash-set.h"
 
 /* Used for naming of new temporaries.  */
 enum vect_var_kind {
@@ -174,7 +175,71 @@ public:
 #define SLP_TREE_TWO_OPERATORS(S)		 (S)->two_operators
 #define SLP_TREE_DEF_TYPE(S)			 (S)->def_type
 
+struct scalar_cond_masked_key
+{
+  scalar_cond_masked_key (tree t, unsigned ncopies_)
+    : ncopies (ncopies_)
+  {
+    get_cond_ops_from_tree (t);
+  }
+
+  void get_cond_ops_from_tree (tree);
+
+  unsigned ncopies;
+  tree_code code;
+  tree op0;
+  tree op1;
+};
 
+template<>
+struct default_hash_traits<scalar_cond_masked_key>
+{
+  typedef scalar_cond_masked_key compare_type;
+  typedef scalar_cond_masked_key value_type;
+
+  static inline hashval_t
+  hash (value_type v)
+  {
+    inchash::hash h;
+    h.add_int (v.code);
+    inchash::add_expr (v.op0, h, 0);
+    inchash::add_expr (v.op1, h, 0);
+    h.add_int (v.ncopies);
+    return h.end ();
+  }
+
+  static inline bool
+  equal (value_type existing, value_type candidate)
+  {
+    return (existing.ncopies == candidate.ncopies
+	    && existing.code == candidate.code
+	    && operand_equal_p (existing.op0, candidate.op0, 0)
+	    && operand_equal_p (existing.op1, candidate.op1, 0));
+  }
+
+  static inline void
+  mark_empty (value_type &v)
+  {
+    v.ncopies = 0;
+  }
+
+  static inline bool
+  is_empty (value_type v)
+  {
+    return v.ncopies == 0;
+  }
+
+  static inline void mark_deleted (value_type &) {}
+
+  static inline bool is_deleted (const value_type &)
+  {
+    return false;
+  }
+
+  static inline void remove (value_type &) {}
+};
+
+typedef hash_set<scalar_cond_masked_key> scalar_cond_masked_set_type;
 
 /* Describes two objects whose addresses must be unequal for the vectorized
    loop to be valid.  */
@@ -255,6 +320,9 @@ public:
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
+  /* Set of scalar conditions that have loop mask applied.  */
+  scalar_cond_masked_set_type scalar_cond_masked_set;
+
 private:
   stmt_vec_info new_stmt_vec_info (gimple *stmt);
   void set_vinfo_for_stmt (gimple *, stmt_vec_info);
@@ -1617,7 +1685,7 @@ extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *,
 extern tree vect_halve_mask_nunits (tree);
 extern tree vect_double_mask_nunits (tree);
 extern void vect_record_loop_mask (loop_vec_info, vec_loop_masks *,
-				   unsigned int, tree);
+				   unsigned int, tree, tree);
 extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *,
 				unsigned int, tree, unsigned int);
 

Reply via email to