https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82604
--- Comment #10 from amker at gcc dot gnu.org --- For the record, there is another possible fix. Quoted loop nest from gcc/testsuite/gfortran.dg/pr81303.f: do j=1,ny jm1=mod(j+ny-2,ny)+1 jp1=mod(j,ny)+1 do i=1,nx im1=mod(i+nx-2,nx)+1 ip1=mod(i,nx)+1 do l=1,nb y(l,i,j,k)=0.0d0 do m=1,nb y(l,i,j,k)=y(l,i,j,k)+ ;; .... enddo enddo enddo enddo Originally GCC can parallelize loop nest at i loop, but now GCC only parallelize it at l loop because stmt "y(l,i,j,k)=0.0d0" is distributed into memset into i loop. As a result the distributed memset call can't be analyzed by data reference analyzer. An idea is to distribute the stmt to outer loop j, so at least we can parallelize at loop level i as before. Unfortunately this is not easy. To distribute it into memset at loop level j, we have to prove that memory range set to ZERO at each loop level doesn't leave any bubble in it. Given the array bound and loop niters are not constant, we need to prove non-trivially equality for difference expressions. This needs to be done in function tree-loop-distribution.c:compute_access_range. Specifically in this function we have: <bb 2>: _1 = *nb_113(D); ubound.86_114 = (integer(kind=8)) _1; stride.88_115 = MAX_EXPR <ubound.86_114, 0>; ... <bb 34>: // thus in loop nest we have _1 > 0 if (_1 <= 0) goto <bb 24>; [15.00%] else goto <bb 35>; [85.00%] ... And in the end, we need to prove: ((sizetype) ((unsigned int) _1 + 4294967295) + 1) * 8 == (sizetype) stride.88_115 * 8 We first need to prove: ((sizetype) ((unsigned int) _1 + 4294967295) + 1) == (sizetype) _1 using pre-condition "_1 > 0" Then need to prove: MAX_EXPR <ubound.86_114, 0> == ubound.86_114 also because of "_1 > 0". I doubt this can be done (without heavy messy code) in GCC now. Or there might be another way out of this? Thanks,