On 14/01/2022 09:57, Richard Biener wrote:
The 'used_vector_modes' is also a heuristic by itself since it registers
every vector type we query, not only those that are used in the end ...
So it's really all heuristics that can eventually go bad.
IMHO remembering the VF that we ended up with (maybe w/o unrolling)
for each analyzed vector_mode[] might be really the easiest thing to do,
that should make it easy to skip those modes where the VF is larger
or equal as the VF of the main loop for the purpose of epilogue
vectorization. Likewise those vector_mode[] that failed analysis can
be remembered (with -1U VF for example).
Richard.
I liked the caching suggestion, so here it is. Sorry for the delay,
wanted to post this after pushing the vect unroll which was waiting on
some retesting for the rebase.
gcc/ChangeLog:
PR 103997
* tree-vect-loop.c (vect_analyze_loop): Fix mode skipping for
epilogue
vectorization.
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index
0fe3529b2d1cf36617c04c1d0f1c4c7bb363607c..6d3dbb2187b4096a7db947b6343257a47cd9fb3d
100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3004,6 +3004,12 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
unsigned int mode_i = 0;
unsigned HOST_WIDE_INT simdlen = loop->simdlen;
+ /* Keep track of the VF for each mode. Initialize all to 0 which indicates
+ a mode has not been analyzed. */
+ auto_vec<poly_uint64, 8> cached_vf_per_mode;
+ for (unsigned i = 0; i < vector_modes.length (); ++i)
+ cached_vf_per_mode.safe_push (0);
+
/* First determine the main loop vectorization mode, either the first
one that works, starting with auto-detecting the vector mode and then
following the targets order of preference, or the one with the
@@ -3011,6 +3017,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
while (1)
{
bool fatal;
+ unsigned int last_mode_i = mode_i;
+ /* Set cached VF to -1 prior to analysis, which indicates a mode has
+ failed. */
+ cached_vf_per_mode[last_mode_i] = -1;
opt_loop_vec_info loop_vinfo
= vect_analyze_loop_1 (loop, shared, &loop_form_info,
NULL, vector_modes, mode_i,
@@ -3020,6 +3030,12 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
if (loop_vinfo)
{
+ /* Analyzis has been successful so update the VF value. The
+ VF should always be a multiple of unroll_factor and we want to
+ capture the original VF here. */
+ cached_vf_per_mode[last_mode_i]
+ = exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+ loop_vinfo->suggested_unroll_factor);
/* Once we hit the desired simdlen for the first time,
discard any previous attempts. */
if (simdlen
@@ -3105,7 +3121,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
would be at least as high as the main loop's and we would be
vectorizing for more scalar iterations than there would be left. */
if (!supports_partial_vectors
- && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
+ && maybe_ge (cached_vf_per_mode[mode_i], first_vinfo_vf))
{
mode_i++;
if (mode_i == vector_modes.length ())