It seems the auto-vectorizer could not recognize that this loop will roll at most 3 times. And it will generate quite messy code.
int a[1024], b[1024];
void foo (int n)
{
int i;
for (i = (n/4)*4; i< n; i++)
a[i] = a[i] + b[i];
}
How can we correctly estimate the number of iterations for this case and use
this info for the vectorizer?
Thanks,
Changpeng
