On October 28, 2017 12:03:58 AM GMT+02:00, Thomas Koenig 
<tkoe...@netcologne.de> wrote:
>Hello world,
>
>this is a draft patch which interchanges the indices for FORALL and
>DO CONCURRENT loops for cases like PR 82471, where code like
>
>   DO CONCURRENT( K=1:N, J=1:M, I=1:L)
>      C(I,J,K) = A(I,J,K) + B(I,J,K)
>   END DO
>
>led to very poor code because of stride issues.  Currently,
>Graphite is not able to do this.
>
>Without the patch, the code above is translated as
>
>
>         i.7 = 1;
>         count.10 = 512;
>         while (1)
>           {
>             if (ANNOTATE_EXPR <count.10 <= 0, ivdep>) goto L.4;
>             j.6 = 1;
>             count.9 = 512;
>             while (1)
>               {
>                 if (ANNOTATE_EXPR <count.9 <= 0, ivdep>) goto L.3;
>                 k.5 = 1;
>                 count.8 = 512;
>                 while (1)
>                   {
>                     if (ANNOTATE_EXPR <count.8 <= 0, ivdep>) goto L.2;
>                    (*(real(kind=4)[0:] * restrict) c.data)[((c.offset 
>+ (integer(kind=8)) k.5 * c.dim[2].stride) + (integer(kind=8)) j.6 * 
>c.dim[1].stride) + (integer(kind=8)) i.7] = (*(real(kind=4)[0:] * 
>restrict) a.data)[((a.offset + (integer(kind=8)) k.5 * a.dim[2].stride)
>
>+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.7] + 
>(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) 
>k.5 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + 
>(integer(kind=8)) i.7];
>                     L.1:;
>                     k.5 = k.5 + 1;
>                     count.8 = count.8 + -1;
>                   }
>                 L.2:;
>                 j.6 = j.6 + 1;
>                 count.9 = count.9 + -1;
>               }
>             L.3:;
>             i.7 = i.7 + 1;
>             count.10 = count.10 + -1;
>           }
>         L.4:;
>
>so the innermost loop has the biggest stride. With the patch
>(and front-end optimization turned on), this is turned into
>
>         k.7 = 1;
>         count.10 = 512;
>         while (1)
>           {
>             if (ANNOTATE_EXPR <count.10 <= 0, ivdep>) goto L.4;
>             j.6 = 1;
>             count.9 = 512;
>             while (1)
>               {
>                 if (ANNOTATE_EXPR <count.9 <= 0, ivdep>) goto L.3;
>                 i.5 = 1;
>                 count.8 = 512;
>                 while (1)
>                   {
>                     if (ANNOTATE_EXPR <count.8 <= 0, ivdep>) goto L.2;
>                    (*(real(kind=4)[0:] * restrict) c.data)[((c.offset 
>+ (integer(kind=8)) k.7 * c.dim[2].stride) + (integer(kind=8)) j.6 * 
>c.dim[1].stride) + (integer(kind=8)) i.5] = (*(real(kind=4)[0:] * 
>restrict) a.data)[((a.offset + (integer(kind=8)) k.7 * a.dim[2].stride)
>
>+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.5] + 
>(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) 
>k.7 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + 
>(integer(kind=8)) i.5];
>                     L.1:;
>                     i.5 = i.5 + 1;
>                     count.8 = count.8 + -1;
>                   }
>                 L.2:;
>                 j.6 = j.6 + 1;
>                 count.9 = count.9 + -1;
>               }
>             L.3:;
>             k.7 = k.7 + 1;
>             count.10 = count.10 + -1;
>           }
>         L.4:;
>
>so the innermost loop is the one that gets traversed with unity stride
>(the way it should have been done).
>
>Although the algorithm here is quite simple, it resolves the issues
>raised in the PR so far, and it definitely worth fixing.
>
>If we do this kind of thing, it might also be possible to
>interchange normal DO loops in a similar way (which Graphite also
>cannot do at the moment, at least not if the bounds are variables).
>
>So, comments? Suggestions? Other ideas? Any ideas how to write
>a test case? Any volunteers to re-implement Graphite in the
>Fortran front end (didn't think so) or to make Graphite catch
>this sort of pattern (which it currently doesn't) instead?

Graphite has major issues with the scalarization of multidimensional arrays to 
a single array dimension. Indeed doing graphite optimization on the Fortran FE 
IL would be an interesting experiment.  Maybe list that as summer of code 
project idea. 

Richard. 

>Regards
>
>       Thomas
>
>2017-10-27  Thomas Koenig  <tkoe...@gcc.gnu.org>
>
>         * frontend-passes.c (index_interchange): New funciton,
>         prototype.
>         (optimize_namespace): Call index_interchange.
>         (ind_type): New function.
>         (has_var): New function.
>         (index_cost): New function.
>         (loop_comp): New function.

Reply via email to