Hi,
I have a question about a loop that when vectorized does not get unrolled
(by the rtl-level unroller), whereas the same loop when not vectorized does
get unrolled. This is the testcase:
#define N 40
#define M 10
float in[N+M], coeff[M], out[N];
void fir (){
int i,j,k;
float diff;
for (i = 0; i < N; i++) {
diff = 0;
for (j = 0; j < M; j++) {
diff += in[j+i]*coeff[j];
}
out[i] = diff;
}
}
When compiled as follows: gcc -O3 -funroll-loops vect-outer-fir-kernel.c -S
--param max-completely-peeled-insns=5000 --param
max-completely-peel-times=40 [-ftree-vectorize -maltivec]
...the inner loop gets completely unrolled (by the tree-level unroller) but
the outer-loop (the i-loop, that is now an inner-loop) does not get later
unrolled by the rtl unroller, although without vectorization it does.
In both cases we start from this iv form (after iv_canon):
[1] loop:
iv = phi (iv, 40)
iv = iv - 1;
if (iv != 0) then goto loop else exit
In the case that does get unrolled (i.e. without vectorization), after
ivopts it looks like this:
[2] loop:
iv = phi (iv, 0)
iv = iv + 4;
if (iv != 160) then goto loop else exit
And finally just before rtl unrolling it looks like this:
[3] r258 = 0;
loop:
r258 = r258 + 4
if (r258 != 160) then goto loop else exit
In the case that the loop gets vectorized, then we start from [1], and
after vectorization we have:
[4] loop:
iv = phi (iv, 0)
iv = iv + 1;
if (iv < 10) then goto loop else exit
After that, ivopts transforms it to the following:
[5] loop:
iv0 = &out
iv = phi (iv0, iv)
iv = iv + 16;
LB = &out + 160;
if (iv != LB) then goto loop else exit
And finally at the stage of rtl unrolling it looks like this:
[6] r186 = r2 + C;
r318 = r186 + 160;
loop:
r186 = r186 + 16
if (r186 != r318) then goto loop else exit
Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
calls iv_analyze_biv (with r258/r186), which in turn calls
latch_dominating_def.
There, when processing the vectorized case, it encounters the def in the
loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
point it fails cause it found two defs (and so we get that this is not a
simple iv, and not a simple loop, and unrolling fails: "Unable to prove
that the loop iterates constant times").
When processing the unvectorized case, we also first encounter the def in
the loop ('r258+16'), and then the def outside the loop ('0'), but this def
succeeds the test "if (!bitmapset (bb_info->out,....))", and so we don't
fail when we encounter the second def, and all is well.
So one question I have is what is that bitmap exactly, and why does loop
[6] fail rtl iv-analysis?
The other question is what seems to be the most appropriate place to fix
this - the vectorizer, the ivopts, or the rtl iv-analysis?
Note that even when I change the vectorizer to use the same iv form as in
[1], so that it produces this:
[4] loop:
iv = phi (iv,10)
iv = iv - 1;
if (iv != 0) then goto loop else exit
ivopts still changes it to [5], and the loop still doesn't get unrolled, so
I don't see what can be done in the vectorizer (?).
thanks,
dorit