https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713
--- Comment #17 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- What an inline packing would (approximately) produce is this: subroutine processBPP(X, BPP, N) integer, intent(in) :: N real, dimension(N,3), intent(out) :: X real, dimension(N,10), intent(in) :: BPP integer :: i real :: tmp1(3) real :: tmp2(6) integer :: k1, k2 do concurrent (i = 1:N) k1 = 0 do if (.not. k1 < 3) exit tmp1(k1+1) = BPP(i,k1+1) k1 = k1 + 1 end do k2 = 0 do if (.not. k2 < 6) exit tmp2(k2+1) = BPP(i,k2+5) k2 = k2 + 1 end do X(i,:) = fpdbacksolve(tmp1, tmp2) end do end subroutine processBPP I see no timing difference for gfortran with this to the (:) version. Chris, can you confirm this? And is flang still faster by a factor of two if you use this version?