http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53355

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-16 
14:46:33 UTC ---
One major remaining issue is that the entry checks for the versioned loops
all base on the number of iterations of said loop instead of on the
feature (in the case of peeling for alignment we don't check for aligned but
we check for does the align-loop run zero times).  That causes unnecessary
code to be executed in the path that does not need it.

Another remaining issue is that we allow for

  if (unaligned)
    for (;;)
  if (any-vectorized-iterations)
    for (;;)
  if (any-remaining-iterations)
    for (;;)

thus any-vectorized-iterations may be false when we leave the alignment
loop.  That shounds bogus - this case should drop into a single purely
scalar loop instead, avoiding the check and making all cases faster.
In the testcase this gets optimized away (because we have a known number
of overall iterations), but with a variable upper bound cost considerations
should not allow this case.

In general the scalar loop version, the epilogue and only then the prologue
loop should be considered as fallback for the non profitable case (in this
order).  Not the prologue loop first.

Reply via email to