http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53355
--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-16 14:46:33 UTC --- One major remaining issue is that the entry checks for the versioned loops all base on the number of iterations of said loop instead of on the feature (in the case of peeling for alignment we don't check for aligned but we check for does the align-loop run zero times). That causes unnecessary code to be executed in the path that does not need it. Another remaining issue is that we allow for if (unaligned) for (;;) if (any-vectorized-iterations) for (;;) if (any-remaining-iterations) for (;;) thus any-vectorized-iterations may be false when we leave the alignment loop. That shounds bogus - this case should drop into a single purely scalar loop instead, avoiding the check and making all cases faster. In the testcase this gets optimized away (because we have a known number of overall iterations), but with a variable upper bound cost considerations should not allow this case. In general the scalar loop version, the epilogue and only then the prologue loop should be considered as fallback for the non profitable case (in this order). Not the prologue loop first.