On Mon, Dec 18, 2017 at 2:35 PM, Michael Matz <m...@suse.de> wrote:
> Hi,
>
> On Mon, 18 Dec 2017, Richard Biener wrote:
>
>> where *unroll is similar to *max_vf I think.  dist_v[0] is the innermost 
>> loop.
>
> [0] is always outermost loop.
>
>> The vectorizer does way more complicated things and only looks at the
>> distance with respect to the outer loop as far as I can see which can be
>> negative.
>>
>> Not sure if fusion and vectorizer "interleaving" makes a difference
>> here. I think the idea was that when interleaving stmt-by-stmt then
>> forward dependences would be preserved and thus we don't need to check
>> the inner loop dependences.  speaking with "forward vs. backward"
>> dependences again, not distances...
>>
>> This also means that unroll-and-jam could be enhanced to "interleave"
>> stmts and thus cover more cases?
>>
>> Again, I hope Micha can have a look here...
>
> Haven't yet looked at the patch, but some comments anyway:
>
> fusion and interleaving interact in the following way in outer loop
> vectorization, conceptually:
> * (1) the outer loop is unrolled
> * (2) the inner loops are fused
> * (3) the (now single) inner body is rescheduled/shuffled/interleaved.
>
Thanks Michael for explaining issue clearer, this is what I meant.  As
for PR60276, I think it's actually the other side of the problem,
which only relates to dependence validity of interleaving.

Thanks,
bin
> (1) is always okay.  But (2) and (3) as individual transformations must
> both be checked for validity.  If fusion isn't possible the whole
> transformation is invalid, and if interleaving isn't possible the same is
> true.  In the specific example:
>
>   for (b = 4; b >= 0; b--)
>     for (c = 0; c <= 6; c++)
>       t = a[c][b + 1];      // S1
>       a[c + 1][b + 2] = t;  // S2
>
> it's already the fusion step that's invalid.  There's a
> dependence between S1 and S2, e.g. for (b,c) = (4,1) comes-before (3,0)
> with S1(4,1) reading a[1][5] and S2(3,0) writing a[1][5].  So a
> write-after-read.  After fusing:
>
>    for (c = 0; c <= 6; c++)
>      {
>        t = a[c][5];              // S1
>        a[c + 1][6] = t;
>        t = a[c][4];
>        a[c + 1][5] = t;          // S2
>        a[c + 1][4] = a[c][3];
>        a[c + 1][3] = a[c][2];
>      }
>
> here we have at iterations (c) = (0) comes-before (1), at S2(0) writing
> a[1][5] and S1(1) writing a[1][5].  I.e. now it's a read-after-write (the
> write in iteration 0 overwrites the value that is going to be read at
> iteration 1, which wasn't the case in the original loop).  The dependence
> switched direction --> invalid.
>
> The simple interleaving of statements can't rectify this.
> Interleaving is an inner-body reordering but the brokenness comes from a
> cross-iteration ordering.
>
> This example can be unroll-jammed or outer-loop vectorized if one of the
> two loops is reversed.  Let's say we reverse the inner loop, so that it
> runs in the same direction as the outer loop (reversal is possible here).
>
> It'd then be something like:
>
>    for (c = 6; c >= 0; c--)
>      {
>        t = a[c][5];              // S1
>        a[c + 1][6] = t;
>        t = a[c][4];
>        a[c + 1][5] = t;          // S2
>        a[c + 1][4] = a[c][3];
>        a[c + 1][3] = a[c][2];
>      }
>
> The dependence between S1/S2 would still be a write-after-read, and all
> would be well.  This reversal of the inner loop can partly be simulated by
> not only interleaving the inner insns, but by also _reodering_ them.  But
> AFAIK the vectorizer doesn't do this?
>
>
> Ciao,
> Michael.

Reply via email to