Loop-im and PRE can hoist loads out of loops, creating artificial dependencies 
that inhibit graphite's analysis.

do k = 1,4096
    do j = 1,4096
        do i = 1,4096
            c(i,j)= c(i, j) + a(k,j) * b(i, k)
        enddo
     enddo
enddo

In the preceding loop body, the a(k,j) load can be hoisted out of the 
inner-most loop, which is a valuable optimization, however also one that 
creates a cross-iteration dependency, inhibiting polynomial transformation of 
the nested-loop. An attempt to tile will fail as graphite will assume a 
dependency between loop iterations and fail.

By inhibiting hoists until after graphite has run, we preserve the loop 
structure, while allowing hoists to be performed by later passes of loop-im 
and/or PRE. If graphite is not enabled on the command line, hoists are 
performed early as normal.

This change gives ~8% improvement on spec2017/fotonik3d on my system. It should 
also help graphite become applicable to further applications that are blocked 
by data dependency.

Reply via email to