https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 52492
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52492&action=edit
GIMPLE testcase

So I think that the IL we produce from SLP vectorizing the if-converted loop
body is not great and we should address this issue there.  In particular
emitting a
VECTOR_BOOLEAN_TYPE_P CTOR for the external bools is not OK which is also what
the iffy code in vect_create_constant_vectors shows.  A non-loop GIMPLE
testcase
for this is attached.

It doesn't ICE but the code generated is just awful.

I've tried to compensate in vect_create_constant_vectors itself by creating
a non-VECTOR_BOOLEAN_TYPE_P CTOR and producing a VECTOR_BOOLEAN_TYPE_P via
a NE comparison but with just AVX512F we can handle V16SImode compares but
not V16QImode which is what would naturally appear - and vector lowering will
decompose that again and we have no means of failing vectorization in this
function.

Instead I think this needs to be handled by patterns and if it is not,
rejected.  In this case it's vectorizable_operation for bitwise ops
that just picks the result vector type here

  /* If op0 is an external or constant def, infer the vector type
     from the scalar type.  */
  if (!vectype)
    {
      /* For boolean type we cannot determine vectype by
         invariant value (don't know whether it is a vector
         of booleans or vector of integers).  We use output
         vectype because operations on boolean don't change
         type.  */
      if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op0)))
        {
          if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (scalar_dest)))
            {
              if (dump_enabled_p ())
                dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                                 "not supported operation on bool value.\n");
              return false;
            }
          vectype = vectype_out;
        }

but that assumes we can create a vector bool from invariants or externals
which we generally cannot.  If we disable that here we'll run into the
same issue for the COND_EXPR.

Looking at vect_recog_bool_pattern it really does two things at the same time,
optimize |& sequences _and_ perform correctness transforms based on mask
uses.  In this case we only start from the COND_EXPR as a mask use but
once we see the internal-def & external-def mask def we decide we do not
want to optimize it.  But we'd still need to make the external def suitable
for the mask use (and we know the precision to use there).

Reply via email to