https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116352
--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> ---
First of all we end up with weird
t.c:4:12: note: op: VEC_PERM_EXPR
t.c:4:12: note: [l] stmt 0 center_x_317 = _316 * _stepX_14(D);
t.c:4:12: note: [l] stmt 1 center_y_320 = _319 * _stepY_17(D);
t.c:4:12: note: [l] stmt 2 center_x_317 = _316 * _stepX_14(D);
t.c:4:12: note: [l] stmt 3 center_y_320 = _319 * _stepY_17(D);
t.c:4:12: note: lane permutation { 0[0] 0[2] 0[0] 0[2] }
t.c:4:12: note: children 0x4d3c8c0 0x4d3c8c0
t.c:4:12: note: node (external) 0x4d3c8c0 (max_nunits=1, refcnt=3) vector(4)
float
t.c:4:12: note: { center_x_317, _68, center_y_320, _69 }
t.c:4:12: note: node 0x4d3c9e0 (max_nunits=1, refcnt=2) vector(4) float
t.c:4:12: note: op: VEC_PERM_EXPR
t.c:4:12: note: stmt 0 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note: stmt 1 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note: stmt 2 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note: lane permutation { 0[1] 0[3] 0[1] 0[3] }
t.c:4:12: note: children 0x4d3c8c0 0x4d3c8c0
that's "weird" because permuting an external isn't very optimal. We do
t.c:4:12: note: Replace two_operators operands:
t.c:4:12: note: Operand 0:
t.c:4:12: note: stmt 0 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note: stmt 1 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note: stmt 2 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note: stmt 3 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note: Operand 1:
t.c:4:12: note: stmt 0 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note: stmt 1 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note: stmt 2 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;
t.c:4:12: note: With a single operand:
t.c:4:12: note: stmt 0 center_x_417 = _416 * _stepX_14(D);
t.c:4:12: note: stmt 1 _68 = _boxWidth_20(D) * 5.0e-1;
t.c:4:12: note: stmt 2 center_y_420 = _419 * _stepY_17(D);
t.c:4:12: note: stmt 3 _69 = _boxHeight_21(D) * 5.0e-1;
it looks like we fail discovery here and fall back to externs but still
end up generating the permutes. Looks like the has_two_operators_perm
code will not backtrack after all?
Anyway, we're using get_later_stmt all over the place which assumes defs
are in the same basic-block. Both in vect_create_constant_vectors for
the case we have former internal-defs but also when scheduling via
vect_find_first/last_scalar_stmt_in_slp. When we allow different BBs
we'd have to ensure we can order defs or at least insert locations which
we do not yet verify (that we can schedule - we'd only figure during transform
at the moment).
So the fix is to restrict both SLP build (thereby extern promotion) and
the new "mixing" of two operands to honor a single BB def.
I'll not pursue a more complex solution unless the former causes regressions
(we're eventually fine for strictly orderable defs).
That said ...
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7f69a3f57b4..bca3e31d6eb 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1854,8 +1854,10 @@ vect_orig_stmt (stmt_vec_info stmt_info)
inline stmt_vec_info
get_later_stmt (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info)
{
- if (gimple_uid (vect_orig_stmt (stmt1_info)->stmt)
- > gimple_uid (vect_orig_stmt (stmt2_info)->stmt))
+ gimple *stmt1 = vect_orig_stmt (stmt1_info)->stmt;
+ gimple *stmt2 = vect_orig_stmt (stmt2_info)->stmt;
+ gcc_assert (gimple_bb (stmt1) == gimple_bb (stmt2));
+ if (gimple_uid (stmt1) > gimple_uid (stmt2))
return stmt1_info;
else
return stmt2_info;