https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 52492 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52492&action=edit GIMPLE testcase So I think that the IL we produce from SLP vectorizing the if-converted loop body is not great and we should address this issue there. In particular emitting a VECTOR_BOOLEAN_TYPE_P CTOR for the external bools is not OK which is also what the iffy code in vect_create_constant_vectors shows. A non-loop GIMPLE testcase for this is attached. It doesn't ICE but the code generated is just awful. I've tried to compensate in vect_create_constant_vectors itself by creating a non-VECTOR_BOOLEAN_TYPE_P CTOR and producing a VECTOR_BOOLEAN_TYPE_P via a NE comparison but with just AVX512F we can handle V16SImode compares but not V16QImode which is what would naturally appear - and vector lowering will decompose that again and we have no means of failing vectorization in this function. Instead I think this needs to be handled by patterns and if it is not, rejected. In this case it's vectorizable_operation for bitwise ops that just picks the result vector type here /* If op0 is an external or constant def, infer the vector type from the scalar type. */ if (!vectype) { /* For boolean type we cannot determine vectype by invariant value (don't know whether it is a vector of booleans or vector of integers). We use output vectype because operations on boolean don't change type. */ if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op0))) { if (!VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (scalar_dest))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "not supported operation on bool value.\n"); return false; } vectype = vectype_out; } but that assumes we can create a vector bool from invariants or externals which we generally cannot. If we disable that here we'll run into the same issue for the COND_EXPR. Looking at vect_recog_bool_pattern it really does two things at the same time, optimize |& sequences _and_ perform correctness transforms based on mask uses. In this case we only start from the COND_EXPR as a mask use but once we see the internal-def & external-def mask def we decide we do not want to optimize it. But we'd still need to make the external def suitable for the mask use (and we know the precision to use there).