> -----Original Message-----
> From: Richard Biener <rguent...@suse.de>
> Sent: Tuesday, December 12, 2023 10:10 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; j...@ventanamicro.com;
> Richard Sandiford <richard.sandif...@arm.com>
> Subject: RE: [PATCH 9/21]middle-end: implement vectorizable_early_exit for
> codegen of exit code
> 
> On Mon, 11 Dec 2023, Tamar Christina wrote:
> 
> > > > +         vectype = truth_type_for (comp_type);
> > >
> > > so this leaves the producer of the mask in the GIMPLE_COND and we
> > > vectorize the GIMPLE_COND as
> > >
> > >   mask_1 = ...;
> > >   if (mask_1 != {-1,-1...})
> > >     ..
> > >
> > > ?  In principle only the mask producer needs a vector type and that
> > > adjusted by bool handling, the branch itself doesn't need any
> > > STMT_VINFO_VECTYPE.
> > >
> > > As said I believe if you recognize a GIMPLE_COND pattern for conds
> > > that aren't bool != 0 producing the mask stmt this should be picked
> > > up by bool handling correctly already.
> > >
> > > Also as said piggy-backing on the COND_EXPR handling in this function
> > > which has the condition split out into a separate stmt(!) might not
> > > completely handle things correctly and you are likely missing
> > > the tcc_comparison handling of the embedded compare.
> > >
> >
> > Ok, I've stopped piggy-backing on the COND_EXPR handling and created
> > vect_recog_gcond_pattern.  As you said in the previous email I've also
> > stopped setting the vectype for the gcond and instead use the type of the
> > operand.
> >
> > Note that because the pattern doesn't apply if you were already an NE_EXPR
> > I do need the extra truth_type_for for that case.  Because in the case of 
> > e.g.
> >
> > a = b > 4;
> > If (a != 0)
> >
> > The producer of the mask is already outside of the cond but will not trigger
> > Boolean recognition.
> 
> It should trigger because we have a mask use of 'a', I always forget
> where we do that - it might be where we compute mask precision stuff
> or it might be bool pattern recognition itself ...
> 
> That said, a GIMPLE_COND (be it pattern or not) should be recognized
> as mask use.
> 
> >  That means that while the integral type is correct it
> > Won't be a Boolean one and vectorable_comparison expects a Boolean
> > vector.  Alternatively, we can remove that assert?  But that seems worse.
> >
> > Additionally in the previous email you mention "adjusted Boolean statement".
> >
> > I'm guessing you were referring to generating a COND_EXPR from the gcond.
> > So vect_recog_bool_pattern detects it?  The problem with that this gets 
> > folded
> > to x & 1 and doesn't trigger.  It also then blocks vectorization.  So 
> > instead I've
> > not forced it.
> 
> Not sure what you are refering to, but no - we shouln't generate a
> COND_EXPR from the gcond.  Pattern recog generates COND_EXPRs for
> _data_ uses of masks (if we need a 'bool' data type for storing).
> We then get mask != 0 ? true : false;
> 

Thought so.. but there happens to be a function called adjust_bool_stmts which
I thought you wanted me to call.  This is where the confusion came from, 
couldn't
tell whether "adjusted Boolean statement" meant just the new modified one or
one from adjust_bool_stmts.  But that last one didn't make much sense so hence
the question above..

> > > > +  /* Determine if we need to reduce the final value.  */
> > > > +  if (stmts.length () > 1)
> > > > +    {
> > > > +      /* We build the reductions in a way to maintain as much 
> > > > parallelism as
> > > > +        possible.  */
> > > > +      auto_vec<tree> workset (stmts.length ());
> > > > +
> > > > +      /* Mask the statements as we queue them up.  */
> > > > +      if (masked_loop_p)
> > > > +       for (auto stmt : stmts)
> > > > +         workset.quick_push (prepare_vec_mask (loop_vinfo, TREE_TYPE 
> > > > (mask),
> > > > +                                               mask, stmt, &cond_gsi));
> > > > +      else
> > > > +       workset.splice (stmts);
> > > > +
> > > > +      while (workset.length () > 1)
> > > > +       {
> > > > +         new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > > > +         tree arg0 = workset.pop ();
> > > > +         tree arg1 = workset.pop ();
> > > > +         new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, 
> > > > arg1);
> > > > +         vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > > > +                                      &cond_gsi);
> > > > +         workset.quick_insert (0, new_temp);
> > > > +       }
> > > > +    }
> > > > +  else
> > > > +    new_temp = stmts[0];
> > > > +
> > > > +  gcc_assert (new_temp);
> > > > +
> > > > +  tree cond = new_temp;
> > > > +  /* If we have multiple statements after reduction we should check 
> > > > all the
> > > > +     lanes and treat it as a full vector.  */
> > > > +  if (masked_loop_p)
> > > > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > > > +                            &cond_gsi);
> > >
> > > You didn't fix any of the code above it seems, it's still wrong.
> > >
> >
> > Apologies, I hadn't realized that the last argument to get_loop_mask was the
> index.
> >
> > Should be fixed now. Is this closer to what you wanted?
> > The individual ops are now masked with separate masks. (See testcase when
> N=865).
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >     * tree-vect-patterns.cc (vect_init_pattern_stmt): Support gconds.
> >     (vect_recog_gcond_pattern): New.
> >     (vect_vect_recog_func_ptrs): Use it.
> >     * tree-vect-stmts.cc (vectorizable_comparison_1): Support stmts without
> >     lhs.
> >     (vectorizable_early_exit): New.
> >     (vect_analyze_stmt, vect_transform_stmt): Use it.
> >     (vect_is_simple_use, vect_get_vector_types_for_stmt): Support gcond.
> >
> > gcc/testsuite/ChangeLog:
> >
> >     * gcc.dg/vect/vect-early-break_88.c: New test.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..b64becd588973f5860119
> 6bfcb15afbe4bab60f2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_88.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +
> > +/* { dg-additional-options "-Ofast --param vect-partial-vector-usage=2" } 
> > */
> > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> > +
> > +#ifndef N
> > +#define N 5
> > +#endif
> > +float vect_a[N] = { 5.1f, 4.2f, 8.0f, 4.25f, 6.5f };
> > +unsigned vect_b[N] = { 0 };
> > +
> > +__attribute__ ((noinline, noipa))
> > +unsigned test4(double x)
> > +{
> > + unsigned ret = 0;
> > + for (int i = 0; i < N; i++)
> > + {
> > +   if (vect_a[i] > x)
> > +     break;
> > +   vect_a[i] = x;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +extern void abort ();
> > +
> > +int main ()
> > +{
> > +  if (test4 (7.0) != 0)
> > +    abort ();
> > +
> > +  if (vect_b[2] != 0 && vect_b[1] == 0)
> > +    abort ();
> > +}
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index
> 7debe7f0731673cd1bf25cd39d55e23990a73d0e..359d30b5991a50717c269df
> 577c08adffa44e71b 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -132,6 +132,7 @@ vect_init_pattern_stmt (vec_info *vinfo, gimple
> *pattern_stmt,
> >    if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> >      {
> >        gcc_assert (!vectype
> > +             || is_a <gcond *> (pattern_stmt)
> >               || (VECTOR_BOOLEAN_TYPE_P (vectype)
> >                   == vect_use_mask_type_p (orig_stmt_info)));
> >        STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> > @@ -5553,6 +5554,83 @@ integer_type_for_mask (tree var, vec_info *vinfo)
> >    return build_nonstandard_integer_type (def_stmt_info->mask_precision, 1);
> >  }
> >
> > +/* Function vect_recog_gcond_pattern
> > +
> > +   Try to find pattern like following:
> > +
> > +     if (a op b)
> > +
> > +   where operator 'op' is not != and convert it to an adjusted boolean 
> > pattern
> > +
> > +     mask = a op b
> > +     if (mask != 0)
> > +
> > +   and set the mask type on MASK.
> > +
> > +   Input:
> > +
> > +   * STMT_VINFO: The stmt at the end from which the pattern
> > +            search begins, i.e. cast of a bool to
> > +            an integer type.
> > +
> > +   Output:
> > +
> > +   * TYPE_OUT: The type of the output of this pattern.
> > +
> > +   * Return value: A new stmt that will be used to replace the pattern.  */
> > +
> > +static gimple *
> > +vect_recog_gcond_pattern (vec_info *vinfo,
> > +                    stmt_vec_info stmt_vinfo, tree *type_out)
> > +{
> > +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> > +  gcond* cond = NULL;
> > +  if (!(cond = dyn_cast <gcond *> (last_stmt)))
> > +    return NULL;
> > +
> > +  auto lhs = gimple_cond_lhs (cond);
> > +  auto rhs = gimple_cond_rhs (cond);
> > +  auto code = gimple_cond_code (cond);
> > +
> > +  tree scalar_type = TREE_TYPE (lhs);
> > +  if (VECTOR_TYPE_P (scalar_type))
> > +    return NULL;
> > +
> > +  if (code == NE_EXPR && zerop (rhs))
> 
> I think you need && VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type) here,
> an integer != 0 would not be an appropriate mask.  I guess two
> relevant testcases would have an early exit like
> 
>    if (here[i] != 0)
>      break;
> 
> once with a 'bool here[]' and once with a 'int here[]'.
> 
> > +    return NULL;
> > +
> > +  tree vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  /* Build a scalar type for the boolean result that when vectorized 
> > matches the
> > +     vector type of the result in size and number of elements.  */
> > +  unsigned prec
> > +    = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vecitype)),
> > +                      TYPE_VECTOR_SUBPARTS (vecitype));
> > +
> > +  scalar_type
> > +    = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (scalar_type));
> > +
> > +  vecitype = get_vectype_for_scalar_type (vinfo, scalar_type);
> > +  if (vecitype == NULL_TREE)
> > +    return NULL;
> > +
> > +  tree vectype = truth_type_for (vecitype);
> 
> That looks awfully complicated.  I guess one complication is that
> we compute mask_precision & friends before this pattern gets
> recognized.  See vect_determine_mask_precision and its handling
> of tcc_comparison, see also integer_type_for_mask.  For comparisons
> properly handled during pattern recog the vector type is determined
> in vect_get_vector_types_for_stmt via
> 
>   else if (vect_use_mask_type_p (stmt_info))
>     {
>       unsigned int precision = stmt_info->mask_precision;
>       scalar_type = build_nonstandard_integer_type (precision, 1);
>       vectype = get_mask_type_for_scalar_type (vinfo, scalar_type,
> group_size);
>       if (!vectype)
>         return opt_result::failure_at (stmt, "not vectorized: unsupported"
>                                        " data-type %T\n", scalar_type);
> 
> Richard, do you have any advice here?  I suppose vect_determine_precisions
> needs to handle the gcond case with bool != 0 somehow and for the
> extra mask producer we add here we have to emulate what it would have
> done, right?
> 

There seems to be an awful lots of places that determine types and precision 😊
It's quite hard to figure out which part is used where... and Boolean handling
seems to be especially complicated.

> > +  tree new_lhs = vect_recog_temp_ssa_var (boolean_type_node, NULL);
> > +  gimple *new_stmt = gimple_build_assign (new_lhs, code, lhs, rhs);
> > +  append_pattern_def_seq (vinfo, stmt_vinfo, new_stmt, vectype, 
> > scalar_type);
> > +
> > +  gimple *pattern_stmt
> > +    = gimple_build_cond (NE_EXPR, new_lhs,
> > +                    build_int_cst (TREE_TYPE (new_lhs), 0),
> > +                    NULL_TREE, NULL_TREE);
> > +  *type_out = vectype;
> > +  vect_pattern_detected ("vect_recog_gcond_pattern", last_stmt);
> > +  return pattern_stmt;
> > +}
> > +
> >  /* Function vect_recog_bool_pattern
> >
> >     Try to find pattern like following:
> > @@ -6860,6 +6938,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] =
> {
> >    { vect_recog_divmod_pattern, "divmod" },
> >    { vect_recog_mult_pattern, "mult" },
> >    { vect_recog_mixed_size_cond_pattern, "mixed_size_cond" },
> > +  { vect_recog_gcond_pattern, "gcond" },
> >    { vect_recog_bool_pattern, "bool" },
> >    /* This must come before mask conversion, and includes the parts
> >       of mask conversion that are needed for gather and scatter
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index
> 582c5e678fad802d6e76300fe3c939b9f2978f17..7c50ee37f2ade24eccf7a7d1ea
> 2e00b4450023f9 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -12489,7 +12489,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    vec<tree> vec_oprnds0 = vNULL;
> >    vec<tree> vec_oprnds1 = vNULL;
> >    tree mask_type;
> > -  tree mask;
> > +  tree mask = NULL_TREE;
> >
> >    if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
> >      return false;
> > @@ -12629,8 +12629,9 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >    /* Transform.  */
> >
> >    /* Handle def.  */
> > -  lhs = gimple_assign_lhs (STMT_VINFO_STMT (stmt_info));
> > -  mask = vect_create_destination_var (lhs, mask_type);
> > +  lhs = gimple_get_lhs (STMT_VINFO_STMT (stmt_info));
> > +  if (lhs)
> > +    mask = vect_create_destination_var (lhs, mask_type);
> >
> >    vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
> >                  rhs1, &vec_oprnds0, vectype,
> > @@ -12644,7 +12645,10 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> >        gimple *new_stmt;
> >        vec_rhs2 = vec_oprnds1[i];
> >
> > -      new_temp = make_ssa_name (mask);
> > +      if (lhs)
> > +   new_temp = make_ssa_name (mask);
> > +      else
> > +   new_temp = make_temp_ssa_name (mask_type, NULL, "cmp");
> >        if (bitop1 == NOP_EXPR)
> >     {
> >       new_stmt = gimple_build_assign (new_temp, code,
> > @@ -12723,6 +12727,211 @@ vectorizable_comparison (vec_info *vinfo,
> >    return true;
> >  }
> >
> > +/* Check to see if the current early break given in STMT_INFO is valid for
> > +   vectorization.  */
> > +
> > +static bool
> > +vectorizable_early_exit (vec_info *vinfo, stmt_vec_info stmt_info,
> > +                    gimple_stmt_iterator *gsi, gimple **vec_stmt,
> > +                    slp_tree slp_node, stmt_vector_for_cost *cost_vec)
> > +{
> > +  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
> > +  if (!loop_vinfo
> > +      || !is_a <gcond *> (STMT_VINFO_STMT (stmt_info)))
> > +    return false;
> > +
> > +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_condition_def)
> > +    return false;
> > +
> > +  if (!STMT_VINFO_RELEVANT_P (stmt_info))
> > +    return false;
> > +
> > +  DUMP_VECT_SCOPE ("vectorizable_early_exit");
> > +
> > +  auto code = gimple_cond_code (STMT_VINFO_STMT (stmt_info));
> > +
> > +  tree vectype_op0 = NULL_TREE;
> > +  slp_tree slp_op0;
> > +  tree op0;
> > +  enum vect_def_type dt0;
> > +  if (!vect_is_simple_use (vinfo, stmt_info, slp_node, 0, &op0, &slp_op0, 
> > &dt0,
> > +                      &vectype_op0))
> > +    {
> > +      if (dump_enabled_p ())
> > +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +                      "use not simple.\n");
> > +   return false;
> > +    }
> > +
> > +  stmt_vec_info op0_info = vinfo->lookup_def (op0);
> > +  tree vectype = truth_type_for (STMT_VINFO_VECTYPE (op0_info));
> > +  gcc_assert (vectype);
> > +
> > +  machine_mode mode = TYPE_MODE (vectype);
> > +  int ncopies;
> > +
> > +  if (slp_node)
> > +    ncopies = 1;
> > +  else
> > +    ncopies = vect_get_num_copies (loop_vinfo, vectype);
> > +
> > +  vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > +  bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo);
> > +
> > +  /* Analyze only.  */
> > +  if (!vec_stmt)
> > +    {
> > +      if (direct_optab_handler (cbranch_optab, mode) == CODE_FOR_nothing)
> > +   {
> > +     if (dump_enabled_p ())
> > +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +                          "can't vectorize early exit because the "
> > +                          "target doesn't support flag setting vector "
> > +                          "comparisons.\n");
> > +     return false;
> > +   }
> > +
> > +      if (ncopies > 1
> > +     && direct_optab_handler (ior_optab, mode) == CODE_FOR_nothing)
> > +   {
> > +     if (dump_enabled_p ())
> > +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > +                          "can't vectorize early exit because the "
> > +                          "target does not support boolean vector OR for "
> > +                          "type %T.\n", vectype);
> > +     return false;
> > +   }
> > +
> > +      if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +                                 vec_stmt, slp_node, cost_vec))
> > +   return false;
> > +
> > +      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> > +   {
> > +     if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
> > +                                         OPTIMIZE_FOR_SPEED))
> > +       return false;
> > +     else
> > +       vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
> > +   }
> > +
> > +
> > +      return true;
> > +    }
> > +
> > +  /* Tranform.  */
> > +
> > +  tree new_temp = NULL_TREE;
> > +  gimple *new_stmt = NULL;
> > +
> > +  if (dump_enabled_p ())
> > +    dump_printf_loc (MSG_NOTE, vect_location, "transform early-exit.\n");
> > +
> > +  if (!vectorizable_comparison_1 (vinfo, vectype, stmt_info, code, gsi,
> > +                             vec_stmt, slp_node, cost_vec))
> > +    gcc_unreachable ();
> > +
> > +  gimple *stmt = STMT_VINFO_STMT (stmt_info);
> > +  basic_block cond_bb = gimple_bb (stmt);
> > +  gimple_stmt_iterator  cond_gsi = gsi_last_bb (cond_bb);
> > +
> > +  auto_vec<tree> stmts;
> > +
> > +  tree mask = NULL_TREE;
> > +  if (masked_loop_p)
> > +    mask = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype, 
> > 0);
> > +
> > +  if (slp_node)
> > +    stmts.safe_splice (SLP_TREE_VEC_DEFS (slp_node));
> > +  else
> > +    {
> > +      auto vec_stmts = STMT_VINFO_VEC_STMTS (stmt_info);
> > +      stmts.reserve_exact (vec_stmts.length ());
> > +      for (auto stmt : vec_stmts)
> > +   stmts.quick_push (gimple_assign_lhs (stmt));
> > +    }
> > +
> > +  /* Determine if we need to reduce the final value.  */
> > +  if (stmts.length () > 1)
> > +    {
> > +      /* We build the reductions in a way to maintain as much parallelism 
> > as
> > +    possible.  */
> > +      auto_vec<tree> workset (stmts.length ());
> > +
> > +      /* Mask the statements as we queue them up.  Normally we loop over
> > +    vec_num,  but since we inspect the exact results of vectorization
> > +    we don't need to and instead can just use the stmts themselves.  */
> > +      if (masked_loop_p)
> > +   for (unsigned i = 0; i < stmts.length (); i++)
> > +     {
> > +       tree stmt_mask
> > +         = vect_get_loop_mask (loop_vinfo, gsi, masks, ncopies, vectype,
> > +                               i);
> > +       stmt_mask
> > +         = prepare_vec_mask (loop_vinfo, TREE_TYPE (stmt_mask), stmt_mask,
> > +                             stmts[i], &cond_gsi);
> > +       workset.quick_push (stmt_mask);
> > +     }
> > +      else
> > +   workset.splice (stmts);
> > +
> > +      while (workset.length () > 1)
> > +   {
> > +     new_temp = make_temp_ssa_name (vectype, NULL, "vexit_reduc");
> > +     tree arg0 = workset.pop ();
> > +     tree arg1 = workset.pop ();
> > +     new_stmt = gimple_build_assign (new_temp, BIT_IOR_EXPR, arg0, arg1);
> > +     vect_finish_stmt_generation (loop_vinfo, stmt_info, new_stmt,
> > +                                  &cond_gsi);
> > +     workset.quick_insert (0, new_temp);
> > +   }
> > +    }
> > +  else
> > +    new_temp = stmts[0];
> > +
> > +  gcc_assert (new_temp);
> > +
> > +  tree cond = new_temp;
> > +  /* If we have multiple statements after reduction we should check all the
> > +     lanes and treat it as a full vector.  */
> > +  if (masked_loop_p)
> > +    cond = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, cond,
> > +                        &cond_gsi);
> 
> This is still wrong, you are applying mask[0] on the IOR reduced result.
> As suggested do that in the else { new_temp = stmts[0] } clause instead
> (or simply elide the optimization of a single vector)

PEBKAC.. I had looked at it, and thought, it doesn't seem right since why would
mask[0] be used for both the elements and the final, but left it ☹

I'll wait for Richard's thoughts on the precision before re-spining. 

Thanks,
Tamar
> 
> > +  /* Now build the new conditional.  Pattern gimple_conds get dropped 
> > during
> > +     codegen so we must replace the original insn.  */
> > +  stmt = STMT_VINFO_STMT (vect_orig_stmt (stmt_info));
> > +  gcond *cond_stmt = as_a <gcond *>(stmt);
> > +  /* When vectorizing we assume that if the branch edge is taken that we're
> > +     exiting the loop.  This is not however always the case as the 
> > compiler will
> > +     rewrite conditions to always be a comparison against 0.  To do this it
> > +     sometimes flips the edges.  This is fine for scalar,  but for vector 
> > we
> > +     then have to flip the test, as we're still assuming that if you take 
> > the
> > +     branch edge that we found the exit condition.  */
> > +  auto new_code = NE_EXPR;
> > +  tree cst = build_zero_cst (vectype);
> > +  if (flow_bb_inside_loop_p (LOOP_VINFO_LOOP (loop_vinfo),
> > +                        BRANCH_EDGE (gimple_bb (cond_stmt))->dest))
> > +    {
> > +      new_code = EQ_EXPR;
> > +      cst = build_minus_one_cst (vectype);
> > +    }
> > +
> > +  gimple_cond_set_condition (cond_stmt, new_code, cond, cst);
> > +  update_stmt (stmt);
> > +
> > +  if (slp_node)
> > +    SLP_TREE_VEC_DEFS (slp_node).truncate (0);
> > +   else
> > +    STMT_VINFO_VEC_STMTS (stmt_info).truncate (0);
> > +
> > +
> > +  if (!slp_node)
> > +    *vec_stmt = stmt;
> > +
> > +  return true;
> > +}
> > +
> >  /* If SLP_NODE is nonnull, return true if vectorizable_live_operation
> >     can handle all live statements in the node.  Otherwise return true
> >     if STMT_INFO is not live or if vectorizable_live_operation can handle 
> > it.
> > @@ -12949,7 +13158,9 @@ vect_analyze_stmt (vec_info *vinfo,
> >       || vectorizable_lc_phi (as_a <loop_vec_info> (vinfo),
> >                               stmt_info, NULL, node)
> >       || vectorizable_recurr (as_a <loop_vec_info> (vinfo),
> > -                              stmt_info, NULL, node, cost_vec));
> > +                              stmt_info, NULL, node, cost_vec)
> > +     || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +                                 cost_vec));
> >    else
> >      {
> >        if (bb_vinfo)
> > @@ -12972,7 +13183,10 @@ vect_analyze_stmt (vec_info *vinfo,
> >                                      NULL, NULL, node, cost_vec)
> >           || vectorizable_comparison (vinfo, stmt_info, NULL, NULL, node,
> >                                       cost_vec)
> > -         || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec));
> > +         || vectorizable_phi (vinfo, stmt_info, NULL, node, cost_vec)
> > +         || vectorizable_early_exit (vinfo, stmt_info, NULL, NULL, node,
> > +                                     cost_vec));
> > +
> >      }
> >
> >    if (node)
> > @@ -13131,6 +13345,12 @@ vect_transform_stmt (vec_info *vinfo,
> >        gcc_assert (done);
> >        break;
> >
> > +    case loop_exit_ctrl_vec_info_type:
> > +      done = vectorizable_early_exit (vinfo, stmt_info, gsi, &vec_stmt,
> > +                                 slp_node, NULL);
> > +      gcc_assert (done);
> > +      break;
> > +
> >      default:
> >        if (!STMT_VINFO_LIVE_P (stmt_info))
> >     {
> > @@ -14321,10 +14541,19 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >      }
> >    else
> >      {
> > +      gcond *cond = NULL;
> >        if (data_reference *dr = STMT_VINFO_DATA_REF (stmt_info))
> >     scalar_type = TREE_TYPE (DR_REF (dr));
> >        else if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
> >     scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
> > +      else if ((cond = dyn_cast <gcond *> (stmt)))
> > +   {
> > +     /* We can't convert the scalar type to boolean yet, since booleans 
> > have a
> > +        single bit precision and we need the vector boolean to be a
> > +        representation of the integer mask.  So set the correct integer 
> > type and
> > +        convert to boolean vector once we have a vectype.  */
> > +     scalar_type = TREE_TYPE (gimple_cond_lhs (cond));
> 
> You should get into the vect_use_mask_type_p (stmt_info) path for
> early exit conditions (see above with regard to mask_precision).
> 
> > +   }
> >        else
> >     scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
> >
> > @@ -14339,12 +14568,18 @@ vect_get_vector_types_for_stmt (vec_info
> *vinfo, stmt_vec_info stmt_info,
> >                          "get vectype for scalar type: %T\n", scalar_type);
> >     }
> >        vectype = get_vectype_for_scalar_type (vinfo, scalar_type, 
> > group_size);
> > +
> >        if (!vectype)
> >     return opt_result::failure_at (stmt,
> >                                    "not vectorized:"
> >                                    " unsupported data-type %T\n",
> >                                    scalar_type);
> >
> > +      /* If we were a gcond, convert the resulting type to a vector 
> > boolean type
> now
> > +    that we have the correct integer mask type.  */
> > +      if (cond)
> > +   vectype = truth_type_for (vectype);
> > +
> 
> which makes this moot.
> 
> Richard.
> 
> >        if (dump_enabled_p ())
> >     dump_printf_loc (MSG_NOTE, vect_location, "vectype: %T\n", vectype);
> >      }
> >
> 
> --
> Richard Biener <rguent...@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to