https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
--- Comment #12 from amker at gcc dot gnu.org --- (In reply to rguent...@suse.de from comment #11) > On Thu, 18 Jan 2018, amker at gcc dot gnu.org wrote: > > > I think the zeroing stmt can be distributed into a separate loop nest > (up to whavever level we choose) and in the then non-parallelized nest > the memset can stay at the current level. So distribute > > > do j=1,ny > > jm1=mod(j+ny-2,ny)+1 > > jp1=mod(j,ny)+1 > > do i=1,nx > > im1=mod(i+nx-2,nx)+1 > > ip1=mod(i,nx)+1 > > do l=1,nb > > y(l,i,j,k)=0.0d0 > > do m=1,nb > > y(l,i,j,k)=y(l,i,j,k)+ > > ;; .... > > enddo > > enddo > > enddo > > enddo > > to > > > do j=1,ny > > jm1=mod(j+ny-2,ny)+1 > > jp1=mod(j,ny)+1 > > do i=1,nx > > im1=mod(i+nx-2,nx)+1 > > ip1=mod(i,nx)+1 > > do l=1,nb > > y(l,i,j,k)=0.0d0 > > enddo > > enddo > > enddo > > do j=1,ny > > jm1=mod(j+ny-2,ny)+1 > > jp1=mod(j,ny)+1 > > do i=1,nx > > im1=mod(i+nx-2,nx)+1 > > ip1=mod(i,nx)+1 > > do l=1,nb > > do m=1,nb > > y(l,i,j,k)=y(l,i,j,k)+ > > ;; .... > > enddo > > enddo > > enddo > > enddo > Yes, this can be done. For now, it's disabled because without classifying zeroing stmt as a builtin partition, it's fused because of shared memory reference to y(l,i,j,k). This step can be made by cost model changes. The on;y problem is the cost model change doesn't make sense here (without considering builtin partition stuff, it should be fused, right?) > And then do memset replacement in the first loop. I guess this step is equally hard to what I mentioned? We still need to prove loops of zeroing statement doesn't leave bubble in memory. > > I think the current cost modeling doesn't consider this because > of the re-use of y. IIRC this is what my original nest distribution > patches did. > > This might be doable by just cost model changes?