http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636

--- Comment #10 from Thomas Koenig <tkoenig at gcc dot gnu.org> 2011-04-20 
16:40:46 UTC ---
(In reply to comment #6)

> Not strictly related to inlining, but in the new descriptor we'll have a field
> specifying whether the array is simply contiguous, so it might make sense to
> generate two loops for each loop over the array in the source, one for the
> contiguous case where it can be vectorized etc. and another loop for the
> general case.  This might reduce the profitability of inlining.

Consider the following, hand-crafted matmul:

Here, we have three nested loops. The most interesting one is
the innermost loop of the matmul, which we vectorize by inlining if we omit
the call to my_matmul with non-unity stride for a when compiling with
-fwhole-program -O3.

How many versions of the loop should we generate?  Two or eight, depending
on what the caller may do? ;-)

module foo
  implicit none
contains
  subroutine my_matmul(a,b,c)
    implicit none
    integer :: count, m, n
    real, dimension(:,:), intent(in) :: a,b
    real, dimension(:,:), intent(out) :: c
    integer :: i,j,k

    m = ubound(a,1)
    n = ubound(b,2)
    count = ubound(a,2)
    c = 0
    do j=1,n
       do k=1, count
          do i=1,m
             c(i,j) = c(i,j) + a(i,k) * b(k,j)
          end do
       end do
    end do
  end subroutine my_matmul
end module foo

program main
  use foo
  implicit none
  integer, parameter :: factor=100
  integer, parameter :: n = 2*factor, m = 3*factor, count = 4*factor
  real, dimension(m, count) :: a
  real, dimension(count, n) :: b
  real, dimension(m,n) :: c1, c2
  real, dimension(m/2, n) :: ch_1, ch_2

  call random_number(a)
  call random_number(b)
  call my_matmul(a,b,c1)
  c2 = matmul(a,b)
  if (any(abs(c1 - c2) > 1e-5)) call abort
  call my_matmul(a(1:m:2,:),b,ch_1)
  ch_2 = matmul(a(1:m:2,:),b)
  if (any(abs(ch_1 - ch_2) > 1e-5)) call abort
end program main

Reply via email to