https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81303
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- So one useful change is the following which makes the runtime profitability threshold 6 and thus the vector loop is never entered. Even though that should be quite a predictable conditional jump it turns out we mess up BB placement so that the result isn't a big improvement (254s -> 250s). This is probably also due to the fact that we end up peeling the inner loop completely (we know it iterates <= profitability threshold times). Plus we do not version the loop but share the non-profitable part with the peeled copy making RAs job harder :/ Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 250384) +++ gcc/tree-vect-loop.c (working copy) @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop " Calculated minimum iters for profitability: %d\n", min_profitable_iters); - min_profitable_iters = - min_profitable_iters < vf ? vf : min_profitable_iters; + /* We want the vectorized loop to execute at least once. */ + if (min_profitable_iters < (vf + peel_iters_prologue + peel_iters_epilogue)) + min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,