https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #33 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see. Note it's SLP discoveries association code that figures out a SLP
graph, disabling this ends up with single-lane (store-lanes) from the start.
The association that "succeeds" first wins, and it's an unfortunate one (for
SLP pattern detection).
The thing is that the re-association greedily figures the best operand
order as well. We start with
t.c:3:21: note: pre-sorted chains of plus_expr
plus_expr _19 plus_expr _27 minus_expr _26
plus_expr _18 minus_expr _29 minus_expr _28
and if we'd start with
plus_expr _19 plus_expr _27 minus_expr _26
plus_expr _18 minus_expr _28 minus_expr _29
instead we get the desired SLP pattern match but still store-lanes is
prefered it seems (not sure how we got away with no store-lanes in GCC 13).
We could simply refuse to override the SLP graph with laod/store-lanes
when patterns were found:
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..4fb57a76f85 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5064,7 +5065,7 @@ vect_analyze_slp (vec_info *vinfo, unsigned
max_tree_size,
to cancel SLP when this applied to all instances in a loop but now
we decide this per SLP instance. It's important to do this only
after SLP pattern recognition. */
- if (is_a <loop_vec_info> (vinfo))
+ if (!pattern_found && is_a <loop_vec_info> (vinfo))
FOR_EACH_VEC_ELT (LOOP_VINFO_SLP_INSTANCES (vinfo), i, instance)
if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_store
&& !SLP_INSTANCE_TREE (instance)->ldst_lanes)
when starting with the swapped ops above we then get the desired code
again. I've hacked that in with
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..4fb57a76f85 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -2275,6 +2275,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
/* 1. pre-sort according to def_type and operation. */
for (unsigned lane = 0; lane < group_size; ++lane)
chains[lane].stablesort (dt_sort_cmp, vinfo);
+ std::swap (chains[1][2], chains[1][1]);
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
it happens that in this specific case the optimal operand order matches
stmt order so the following produces that - but I'm not positively sure
that's always good (though the 'stablesort' also tries to not disturb
order - but in this case it's the DFS order collecting the scalar ops).
In reality there's not enough info on the op or its definition to locally
decide a better order for future pattern matching.
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 3892e1be3f2..f21e8b909ff 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1684,7 +1684,12 @@ dt_sort_cmp (const void *op1_, const void *op2_, void *)
auto *op2 = (const chain_op_t *) op2_;
if (op1->dt != op2->dt)
return (int)op1->dt - (int)op2->dt;
- return (int)op1->code - (int)op2->code;
+ if ((int)op1->code != (int)op2->code)
+ return (int)op1->code - (int)op2->code;
+ if (TREE_CODE (op1->op) == SSA_NAME && TREE_CODE (op2->op) == SSA_NAME)
+ return (gimple_uid (SSA_NAME_DEF_STMT (op1->op))
+ - gimple_uid (SSA_NAME_DEF_STMT (op2->op)));
+ return 0;
}
/* Linearize the associatable expression chain at START with the
That said, I don't have a good idea on how to make this work better, not
even after re-doing SLP discovery. Maybe SLP patterns need to work on the
initial single-lane SLP graph? But then we'd have to find lane-matches
on two unconnected SLP sub-graphs which complicates the pattern matching
part.
We basically form SLP nodes from two sets of (two lanes) plus/minus ops
(three each) but we of course try to avoid SLP build of all 3! permutations
possible and stop at the first one that succeeds.