On Thu, Nov 16, 2023 at 11:30 PM Robin Dapp <rdapp....@gmail.com> wrote: > > > For the fortran testcase we don't even run into this but hit an > > internal def and assert on > > > > gcc_assert (STMT_VINFO_VEC_STMTS (def_stmt_info).length () == > > ncopies); > > > > I think this shows missing handling of .COND_* in the bool pattern > > recognition > > as we get the 'bool' condition as boolean data vector rather than a mask. > > The > > same is true for the testcase with the invariant condition. This causes us > > to > > select the wrong vector type here. The "easiest" might be to look at > > how COND_EXPR is handled in vect_recog_bool_pattern and friends and > > handle .COND_* IFNs the same for the mask operand. > > For the first (imagick) testcase adding a bool pattern does not help > because we always pass NULL as vectype to vect_get_vec_defs. > Doing so we will always use get_vectype_for_scalar_type (i.e. > a "full" bool vector) because vectype of the (conditional) stmt > is the lhs type and not the mask's type. > For cond_exprs in vectorizable_condition we directly pass a > comp_vectype instead (truth_type). Wouldn't that, independently > of the pattern recog, make sense?
The issue with the first testcase is that the condition operand of the .COND_ADD is loop invariant. When not using SLP there's no way to store the desired vector type for those as they are not code-generated explicitly but implicitly when we use them via vect_get_vec_defs. But note you can explicitly specify a vector type as well, there's an overload for it, so we can fix the "invariant" case with the following (OK if you can test this on relevant targets) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 3f59139cb01..936a3de9534 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -8450,12 +8450,14 @@ vect_transform_reduction (loop_vec_info loop_vinfo, value. */ bool cond_fn_p = code.is_internal_fn () && conditional_internal_fn_code (internal_fn (code)) != ERROR_MARK; + int cond_index = -1; if (cond_fn_p) { gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB || code == IFN_COND_MUL || code == IFN_COND_AND || code == IFN_COND_IOR || code == IFN_COND_XOR); gcc_assert (op.num_ops == 4 && (op.ops[1] == op.ops[3])); + cond_index = 0; } bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); @@ -8486,12 +8488,13 @@ vect_transform_reduction (loop_vec_info loop_vinfo, vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, ncopies, single_defuse_cycle && reduc_index == 0 ? NULL_TREE : op.ops[0], &vec_oprnds0, + cond_index == 0 ? truth_type_for (vectype_in) : NULL_TREE, single_defuse_cycle && reduc_index == 1 - ? NULL_TREE : op.ops[1], &vec_oprnds1, + ? NULL_TREE : op.ops[1], &vec_oprnds1, NULL_TREE, op.num_ops == 4 || (op.num_ops == 3 && !(single_defuse_cycle && reduc_index == 2)) - ? op.ops[2] : NULL_TREE, &vec_oprnds2); + ? op.ops[2] : NULL_TREE, &vec_oprnds2, NULL_TREE); /* For single def-use cycles get one copy of the vectorized reduction definition. */ > > Now for the Fortran testcase I'm still a bit lost. Opposed to > before we now vectorize with a variable VF and hit the problem > in the epilogue with ncopies = 2. > > .COND_ADD (_7, __gcov0.__brute_force_MOD_brute_I_lsm.21_67, 1, > __gcov0.__brute_force_MOD_brute_I_lsm.21_67); > where > _7 = *_6 > which is an internal_def. > > I played around with doing it analogously to the COND_EXPR > handling, so creating a COND_ADD (_7 != 0) which will required > several fixups in other places because we're not prepared to > handle that. In the end it seems to only shift the problem > because we will still need the definition of _7. No, you shouldn't place _7 != 0 inside the .COND_ADD but instead have an extra pattern stmt producing that so patt_8 = _7 != 0; patt_9 = .COND_ADD (patt_8, ...); that's probably still not enough, but I always quickly forget how bool patterns work ... basically a comparison like patt_8 = _7 != 0 vectorizes to a mask (aka vector boolean) while any "data" uses of bools are replaced by mask ? 1 : 0; - there's a complication for bool data producing loads which is why we need to insert the "fake" compares to produce a mask. IIRC. > I guess you're implying that the definition should have already > been handled by pattern recognition so that at the point when > we need it, it has a related pattern stmt with the proper mask > type? As said, the compare needs to be a separate pattern stmt part of the .COND_ADD patterns pattern sequence. Richard. > > Regards > Robin >