>> It seems the auto-vectorizer could not recognize that this loop will
>> roll at most 3 times.
>> And it will generate quite messy code.
>>
>> int a[1024], b[1024];
>> void foo (int n)
>> {
>> int i;
>> for (i = (n/4)*4; i< n; i++)
>> a[i] = a[i] + b[i];
>> }
>>
>> How can we correctly estimate the number of iterations for this case
>> and use this info for the vectorizer?
>Does it recognise it if you rewrite the loop as follows:
>for (i = n&~0x3; i< n; i++)
> a[i] = a[i] + b[i];
NO.
But it is OK for the following case:
for (i = n-3; i< n; i++)
a[i] = a[i] + b[i];
It seems it fails at the case of "unknown but small". Anyway, this mostly
affects compilation time and code size, and has limited impact on
performance.
For
for (i = n&~0x3; i< n; i++)
a[i] = a[i] + b[i];
The attached foo-O3-no-tree-vectorize.s is what we expect from the optimizer.
foo-O3.s is too bad.
Thanks,
Changpeng
foo-O3-no-tree-vectorize.s
Description: foo-O3-no-tree-vectorize.s
foo-O3.s
Description: foo-O3.s
