Loop-im and PRE can hoist loads out of loops, creating artificial dependencies that inhibit graphite's analysis.
do k = 1,4096 do j = 1,4096 do i = 1,4096 c(i,j)= c(i, j) + a(k,j) * b(i, k) enddo enddo enddo In the preceding loop body, the a(k,j) load can be hoisted out of the inner-most loop, which is a valuable optimization, however also one that creates a cross-iteration dependency, inhibiting polynomial transformation of the nested-loop. An attempt to tile will fail as graphite will assume a dependency between loop iterations and fail. By inhibiting hoists until after graphite has run, we preserve the loop structure, while allowing hoists to be performed by later passes of loop-im and/or PRE. If graphite is not enabled on the command line, hoists are performed early as normal. This change gives ~8% improvement on spec2017/fotonik3d on my system. It should also help graphite become applicable to further applications that are blocked by data dependency.