[Bug fortran/85531] Implement some loop fusion in the Fortran front end

2018-04-26 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85531

--- Comment #5 from rguenther at suse dot de  ---
On April 26, 2018 6:09:40 PM GMT+02:00, "tkoenig at gcc dot gnu.org"
 wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85531
>
>--- Comment #4 from Thomas Koenig  ---
>What is the best strategy on this?
>
>I assume the Fortran front end could do a dependency analysis,
>the existing code could be extended for this.
>
>We could then either do the scalarization in the front end, or
>annotate the generated loops in some way to indicate that it
>is OK to merge them.
>
>What would be preferred?

Well. I think we need sth in the middle end. In the end the question will be
whether that's good enough or whether the frontend can do better in some cases.
We _do_ have issues with the frontend lowering everything to 1-dimensional
accesses.

[Bug fortran/85531] Implement some loop fusion in the Fortran front end

2018-04-26 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85531

--- Comment #4 from Thomas Koenig  ---
What is the best strategy on this?

I assume the Fortran front end could do a dependency analysis,
the existing code could be extended for this.

We could then either do the scalarization in the front end, or
annotate the generated loops in some way to indicate that it
is OK to merge them.

What would be preferred?

[Bug fortran/85531] Implement some loop fusion in the Fortran front end

2018-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85531

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-26
 CC||rguenth at gcc dot gnu.org
Version|unknown |9.0
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
Thanks.  So -floop-nest-optimize (graphite) doens't do anything here, it
detects the two loops just fine but simply doesn't do any transform.  Probably
similar to the interchange failure we miss to provide it with spatial
constraints to minimize or so.

The loop distribution pass is presented with a CFG and IL that should be indeed
trivially analyzable (if we solve the dependence analysis issue).

[Bug fortran/85531] Implement some loop fusion in the Fortran front end

2018-04-26 Thread tkoenig at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85531

--- Comment #1 from Richard Biener  ---
Can you provide a testcase that can be compiled?

--- Comment #2 from Thomas Koenig  ---
Here it is. The internal writes are there just to confuse the
optimizer.

module x
  implicit none
contains
  subroutine foo(a,b,c, n)
integer, intent(in) :: n
double precision, dimension(n), intent(in) :: a
double precision, dimension(n), intent(out) :: b,c
b = a
c = a
  end subroutine foo
  subroutine bar(a,b,c,n)
integer, intent(in) :: n
double precision, dimension(n), intent(in) :: a
double precision, dimension(n), intent(out) :: b,c
integer :: i
do concurrent (i=1:n)
   b(i) = a(i)
   c(i) = a(i)
end do
  end subroutine bar
end module x

program main
  use x
  implicit none
  double precision, dimension(:), allocatable :: a, b, c
  integer, parameter :: n = 10**7
  double precision :: t1, t2
  character(len=80) :: line, line2
  integer :: i
  allocate (a(n), b(n), c(n))
  call random_number(a)

  line = '20'
  call cpu_time(t1)
  call foo(a,b,c,n)
  call cpu_time(t2)
  print *,t2-t1
  read (unit=line,fmt=*) i
  write (unit=line2, fmt=*) b(i),c(i)
  line = '20'
  call cpu_time(t1)
  call bar(a,b,c,n)
  call cpu_time(t2)
  print *,t2-t1
  read (unit=line,fmt=*) i
  write (unit=line2, fmt=*) b(i),c(i)

end program main