On October 28, 2017 12:03:58 AM GMT+02:00, Thomas Koenig <tkoe...@netcologne.de> wrote: >Hello world, > >this is a draft patch which interchanges the indices for FORALL and >DO CONCURRENT loops for cases like PR 82471, where code like > > DO CONCURRENT( K=1:N, J=1:M, I=1:L) > C(I,J,K) = A(I,J,K) + B(I,J,K) > END DO > >led to very poor code because of stride issues. Currently, >Graphite is not able to do this. > >Without the patch, the code above is translated as > > > i.7 = 1; > count.10 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.10 <= 0, ivdep>) goto L.4; > j.6 = 1; > count.9 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.9 <= 0, ivdep>) goto L.3; > k.5 = 1; > count.8 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.8 <= 0, ivdep>) goto L.2; > (*(real(kind=4)[0:] * restrict) c.data)[((c.offset >+ (integer(kind=8)) k.5 * c.dim[2].stride) + (integer(kind=8)) j.6 * >c.dim[1].stride) + (integer(kind=8)) i.7] = (*(real(kind=4)[0:] * >restrict) a.data)[((a.offset + (integer(kind=8)) k.5 * a.dim[2].stride) > >+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.7] + >(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) >k.5 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + >(integer(kind=8)) i.7]; > L.1:; > k.5 = k.5 + 1; > count.8 = count.8 + -1; > } > L.2:; > j.6 = j.6 + 1; > count.9 = count.9 + -1; > } > L.3:; > i.7 = i.7 + 1; > count.10 = count.10 + -1; > } > L.4:; > >so the innermost loop has the biggest stride. With the patch >(and front-end optimization turned on), this is turned into > > k.7 = 1; > count.10 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.10 <= 0, ivdep>) goto L.4; > j.6 = 1; > count.9 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.9 <= 0, ivdep>) goto L.3; > i.5 = 1; > count.8 = 512; > while (1) > { > if (ANNOTATE_EXPR <count.8 <= 0, ivdep>) goto L.2; > (*(real(kind=4)[0:] * restrict) c.data)[((c.offset >+ (integer(kind=8)) k.7 * c.dim[2].stride) + (integer(kind=8)) j.6 * >c.dim[1].stride) + (integer(kind=8)) i.5] = (*(real(kind=4)[0:] * >restrict) a.data)[((a.offset + (integer(kind=8)) k.7 * a.dim[2].stride) > >+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.5] + >(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) >k.7 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + >(integer(kind=8)) i.5]; > L.1:; > i.5 = i.5 + 1; > count.8 = count.8 + -1; > } > L.2:; > j.6 = j.6 + 1; > count.9 = count.9 + -1; > } > L.3:; > k.7 = k.7 + 1; > count.10 = count.10 + -1; > } > L.4:; > >so the innermost loop is the one that gets traversed with unity stride >(the way it should have been done). > >Although the algorithm here is quite simple, it resolves the issues >raised in the PR so far, and it definitely worth fixing. > >If we do this kind of thing, it might also be possible to >interchange normal DO loops in a similar way (which Graphite also >cannot do at the moment, at least not if the bounds are variables). > >So, comments? Suggestions? Other ideas? Any ideas how to write >a test case? Any volunteers to re-implement Graphite in the >Fortran front end (didn't think so) or to make Graphite catch >this sort of pattern (which it currently doesn't) instead?
Graphite has major issues with the scalarization of multidimensional arrays to a single array dimension. Indeed doing graphite optimization on the Fortran FE IL would be an interesting experiment. Maybe list that as summer of code project idea. Richard. >Regards > > Thomas > >2017-10-27 Thomas Koenig <tkoe...@gcc.gnu.org> > > * frontend-passes.c (index_interchange): New funciton, > prototype. > (optimize_namespace): Call index_interchange. > (ind_type): New function. > (has_var): New function. > (index_cost): New function. > (loop_comp): New function.