https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82436
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Ok. So what happens is that during analysis we can perform all permutations but later during code-gen we'd fail (but we may not fail there plus we don't actually check for it). This is because during analysis we have vf == 2 and during transform vf == 8. group_size is 5 so we go 0 1 5 6 10 11 15 16 20 21 ... and for 10 11 15 16 we cross three vectors which we do not support. Either we can try to be more conservative in /* For loop vectorization verify we can generate the permutation. */ unsigned n_perms; FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node) if (node->load_permutation.exists () && !vect_transform_slp_perm_load (node, vNULL, NULL, SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), slp_instn, true, &n_perms)) return false; given the vectorization factor can end up as least-common-multiple of the loop VF and the maximum SLP instance unrolling factor. But at this point it's hard to guess. We can also re-do the analysis after finalizing the vectorization factor and give up on SLP if failing (as the only option). Or we can enhance code-generation to support the three-vector case if it happens (but still reject it if it occurs at analysis time). First option whuch may reject SLP cases we can actually handle (and isn't 100% fool-proof either): Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 253439) +++ gcc/tree-vect-slp.c (working copy) @@ -1567,14 +1567,20 @@ vect_supported_load_permutation_p (slp_i return true; } - /* For loop vectorization verify we can generate the permutation. */ + /* For loop vectorization verify we can generate the permutation. Be + conservative about the vectorization factor, there are permutations + that will use three vector inputs only starting from a specific factor + and the vectorization factor is not yet final. + ??? The SLP instance unrolling factor might not be the maximum one. */ unsigned n_perms; + unsigned test_vf + = least_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), + LOOP_VINFO_VECT_FACTOR + (STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt)))); FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node) if (node->load_permutation.exists () - && !vect_transform_slp_perm_load - (node, vNULL, NULL, - SLP_INSTANCE_UNROLLING_FACTOR (slp_instn), slp_instn, true, - &n_perms)) + && !vect_transform_slp_perm_load (node, vNULL, NULL, test_vf, + slp_instn, true, &n_perms)) return false; return true; @@ -3560,6 +3566,7 @@ vect_transform_slp_perm_load (slp_tree n dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0); } + gcc_assert (analyze_only); return false; } @@ -3583,6 +3590,7 @@ vect_transform_slp_perm_load (slp_tree n dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]); dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); } + gcc_assert (analyze_only); return false; }