https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82436

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ok.  So what happens is that during analysis we can perform all permutations
but later during code-gen we'd fail (but we may not fail there plus we don't
actually check for it).

This is because during analysis we have vf == 2 and during transform vf == 8.
group_size is 5 so we go 0 1 5 6 10 11 15 16 20 21 ...
and for 10 11 15 16 we cross three vectors which we do not support.

Either we can try to be more conservative in

  /* For loop vectorization verify we can generate the permutation.  */
  unsigned n_perms;
  FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
    if (node->load_permutation.exists ()
        && !vect_transform_slp_perm_load
              (node, vNULL, NULL,
               SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), slp_instn, true,
               &n_perms))
      return false;

given the vectorization factor can end up as least-common-multiple of the
loop VF and the maximum SLP instance unrolling factor.  But at this point
it's hard to guess.

We can also re-do the analysis after finalizing the vectorization factor
and give up on SLP if failing (as the only option).

Or we can enhance code-generation to support the three-vector case if it
happens (but still reject it if it occurs at analysis time).

First option whuch may reject SLP cases we can actually handle (and isn't
100% fool-proof either):

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 253439)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1567,14 +1567,20 @@ vect_supported_load_permutation_p (slp_i
       return true;
     }

-  /* For loop vectorization verify we can generate the permutation.  */
+  /* For loop vectorization verify we can generate the permutation.  Be
+     conservative about the vectorization factor, there are permutations
+     that will use three vector inputs only starting from a specific factor
+     and the vectorization factor is not yet final.
+     ???  The SLP instance unrolling factor might not be the maximum one.  */
   unsigned n_perms;
+  unsigned test_vf
+    = least_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn),
+                            LOOP_VINFO_VECT_FACTOR
+                              (STMT_VINFO_LOOP_VINFO (vinfo_for_stmt
(stmt))));
   FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
     if (node->load_permutation.exists ()
-       && !vect_transform_slp_perm_load
-             (node, vNULL, NULL,
-              SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), slp_instn, true,
-              &n_perms))
+       && !vect_transform_slp_perm_load (node, vNULL, NULL, test_vf,
+                                         slp_instn, true, &n_perms))
       return false;

   return true;
@@ -3560,6 +3566,7 @@ vect_transform_slp_perm_load (slp_tree n
                  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
                                    stmt, 0);
                }
+             gcc_assert (analyze_only);
              return false;
            }

@@ -3583,6 +3590,7 @@ vect_transform_slp_perm_load (slp_tree n
                        dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
                      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
                    }
+                 gcc_assert (analyze_only);
                  return false;
                }

Reply via email to