On Mon, Dec 18, 2017 at 2:35 PM, Michael Matz <m...@suse.de> wrote: > Hi, > > On Mon, 18 Dec 2017, Richard Biener wrote: > >> where *unroll is similar to *max_vf I think. dist_v[0] is the innermost >> loop. > > [0] is always outermost loop. > >> The vectorizer does way more complicated things and only looks at the >> distance with respect to the outer loop as far as I can see which can be >> negative. >> >> Not sure if fusion and vectorizer "interleaving" makes a difference >> here. I think the idea was that when interleaving stmt-by-stmt then >> forward dependences would be preserved and thus we don't need to check >> the inner loop dependences. speaking with "forward vs. backward" >> dependences again, not distances... >> >> This also means that unroll-and-jam could be enhanced to "interleave" >> stmts and thus cover more cases? >> >> Again, I hope Micha can have a look here... > > Haven't yet looked at the patch, but some comments anyway: > > fusion and interleaving interact in the following way in outer loop > vectorization, conceptually: > * (1) the outer loop is unrolled > * (2) the inner loops are fused > * (3) the (now single) inner body is rescheduled/shuffled/interleaved. > Thanks Michael for explaining issue clearer, this is what I meant. As for PR60276, I think it's actually the other side of the problem, which only relates to dependence validity of interleaving.
Thanks, bin > (1) is always okay. But (2) and (3) as individual transformations must > both be checked for validity. If fusion isn't possible the whole > transformation is invalid, and if interleaving isn't possible the same is > true. In the specific example: > > for (b = 4; b >= 0; b--) > for (c = 0; c <= 6; c++) > t = a[c][b + 1]; // S1 > a[c + 1][b + 2] = t; // S2 > > it's already the fusion step that's invalid. There's a > dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0) > with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5]. So a > write-after-read. After fusing: > > for (c = 0; c <= 6; c++) > { > t = a[c][5]; // S1 > a[c + 1][6] = t; > t = a[c][4]; > a[c + 1][5] = t; // S2 > a[c + 1][4] = a[c][3]; > a[c + 1][3] = a[c][2]; > } > > here we have at iterations (c) = (0) comes-before (1), at S2(0) writing > a[1][5] and S1(1) writing a[1][5]. I.e. now it's a read-after-write (the > write in iteration 0 overwrites the value that is going to be read at > iteration 1, which wasn't the case in the original loop). The dependence > switched direction --> invalid. > > The simple interleaving of statements can't rectify this. > Interleaving is an inner-body reordering but the brokenness comes from a > cross-iteration ordering. > > This example can be unroll-jammed or outer-loop vectorized if one of the > two loops is reversed. Let's say we reverse the inner loop, so that it > runs in the same direction as the outer loop (reversal is possible here). > > It'd then be something like: > > for (c = 6; c >= 0; c--) > { > t = a[c][5]; // S1 > a[c + 1][6] = t; > t = a[c][4]; > a[c + 1][5] = t; // S2 > a[c + 1][4] = a[c][3]; > a[c + 1][3] = a[c][2]; > } > > The dependence between S1/S2 would still be a write-after-read, and all > would be well. This reversal of the inner loop can partly be simulated by > not only interleaving the inner insns, but by also _reodering_ them. But > AFAIK the vectorizer doesn't do this? > > > Ciao, > Michael.