http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636
--- Comment #10 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2011-04-20 16:40:46 UTC --- (In reply to comment #6) > Not strictly related to inlining, but in the new descriptor we'll have a field > specifying whether the array is simply contiguous, so it might make sense to > generate two loops for each loop over the array in the source, one for the > contiguous case where it can be vectorized etc. and another loop for the > general case. This might reduce the profitability of inlining. Consider the following, hand-crafted matmul: Here, we have three nested loops. The most interesting one is the innermost loop of the matmul, which we vectorize by inlining if we omit the call to my_matmul with non-unity stride for a when compiling with -fwhole-program -O3. How many versions of the loop should we generate? Two or eight, depending on what the caller may do? ;-) module foo implicit none contains subroutine my_matmul(a,b,c) implicit none integer :: count, m, n real, dimension(:,:), intent(in) :: a,b real, dimension(:,:), intent(out) :: c integer :: i,j,k m = ubound(a,1) n = ubound(b,2) count = ubound(a,2) c = 0 do j=1,n do k=1, count do i=1,m c(i,j) = c(i,j) + a(i,k) * b(k,j) end do end do end do end subroutine my_matmul end module foo program main use foo implicit none integer, parameter :: factor=100 integer, parameter :: n = 2*factor, m = 3*factor, count = 4*factor real, dimension(m, count) :: a real, dimension(count, n) :: b real, dimension(m,n) :: c1, c2 real, dimension(m/2, n) :: ch_1, ch_2 call random_number(a) call random_number(b) call my_matmul(a,b,c1) c2 = matmul(a,b) if (any(abs(c1 - c2) > 1e-5)) call abort call my_matmul(a(1:m:2,:),b,ch_1) ch_2 = matmul(a(1:m:2,:),b) if (any(abs(ch_1 - ch_2) > 1e-5)) call abort end program main