[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #12 from Joost VandeVondele --- (In reply to Marc Glisse from comment #11) > So S2 cannot call free (or realloc) on the pointer and then exit or call > longjmp or do an infinite loop or anything like that in fortran? Maybe we'll > need a flag set by the front-end that says whether (all) functions are safe. well, not free or realloc or longjmp, but infinite loops are allowed in S2. The point is, one is really not passing a pointer to S2 (from a Fortran point of view).
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #11 from Marc Glisse --- (In reply to Joost VandeVondele from comment #10) > (In reply to Marc Glisse from comment #9) > > Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my > > current patch would handle it. It is hard for gcc to guess that > > _gfortran_os_error is safe. > > For the Fortran FE people (not me, I'm a user), but _gfortran_os_error > should have an attribute like 'abort' or 'noreturn'. abort doesn't exist, and noreturn is not sufficient, as a function that calls free on the pointer then exits is noreturn but unsafe. > However, the compiler > should also be able to figure out this can never be called (if B is > 'allocated on the stack') in this subroutine. Yes. > > Er, no, I missed the call to s2. I would also need some attribute on s2 so > > the compiler knows that s2 doesn't do anything too weird. > > Actually, in Fortran, S2 can't do anything 'weird' with B, in the sense that > your optimization should certainly apply. Not so sure about the correct > terms here, but in approximate C-speak, B 'as a pointer' is guaranteed to be > pointing to exactly the same address, nothing has happened to its target, > and no pointer can be pointing to whatever B was pointing to So S2 cannot call free (or realloc) on the pointer and then exit or call longjmp or do an infinite loop or anything like that in fortran? Maybe we'll need a flag set by the front-end that says whether (all) functions are safe.
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #10 from Joost VandeVondele --- (In reply to Marc Glisse from comment #9) > > Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my > current patch would handle it. It is hard for gcc to guess that > _gfortran_os_error is safe. For the Fortran FE people (not me, I'm a user), but _gfortran_os_error should have an attribute like 'abort' or 'noreturn'. However, the compiler should also be able to figure out this can never be called (if B is 'allocated on the stack') in this subroutine. > Er, no, I missed the call to s2. I would also need some attribute on s2 so > the compiler knows that s2 doesn't do anything too weird. Actually, in Fortran, S2 can't do anything 'weird' with B, in the sense that your optimization should certainly apply. Not so sure about the correct terms here, but in approximate C-speak, B 'as a pointer' is guaranteed to be pointing to exactly the same address, nothing has happened to its target, and no pointer can be pointing to whatever B was pointing to
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #9 from Marc Glisse --- (In reply to Joost VandeVondele from comment #8) > Marc, looks like the fortran FE changed a lot since this bug was filed, and > there is no explicit allocate anymore, in fact the variable is created on > stack by the frontend... Cool, the best optimizations are those you don't need to do ;-) > So now, the proper testcase would be: > > cat PR38318-3.f90 > SUBROUTINE S1(N,A) > REAL :: A(3) > REAL, DIMENSION(:), ALLOCATABLE :: B > DO I=1,N >ALLOCATE(B(3)) >B=-A >CALL S2(B) >DEALLOCATE(B) > ENDDO > END SUBROUTINE > > which really should contain any call to _gfortran_runtime_error_at, > _gfortran_os_error, __builtin_malloc, __builtin_free if all were perfect, > and certainly not in the loop Ok. If you used __builtin_abort instead of _gfortran_os_error, I think my current patch would handle it. It is hard for gcc to guess that _gfortran_os_error is safe. On the other hand, if I special case the test if(VAR==0) as mentioned in a comment in my patch, it won't look at that branch anymore and the optimization should apply. Er, no, I missed the call to s2. I would also need some attribute on s2 so the compiler knows that s2 doesn't do anything too weird. Hopefully, when the compiler has the sources for s2, we could later let it guess those attributes...
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #8 from Joost VandeVondele --- (In reply to Marc Glisse from comment #7) > (In reply to Joost VandeVondele from comment #6) > > Marc, I think your recently posted patch: > > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html > > could fix the problem with the testcase subroutine S1, even though 'moving > > allocations out of loops' is more or less a side effect. > > I don't speak fortran fluently so I tried compiling S1 with an unpatched > compiler and -O2 -fdump-tree-optimized, but I don't see any call to malloc > in there. Could you explain, with references to a dump, what the internal > functions mean and where my patch might help? Marc, looks like the fortran FE changed a lot since this bug was filed, and there is no explicit allocate anymore, in fact the variable is created on stack by the frontend... this is controlled by -fmax-stack-var-size=0 (putting it to zero, will yield your __builtin_malloc() that I recalled, in the PR38318.f90.003t.original dump). You have a precedent for getting the a reasonable size (32768 for fortran). The _gfortran_internal_(un)pack is a fortran FE thing, that guarantees that memory is contiguous... clearly a missed frontend optimization in this case. So now, the proper testcase would be: > cat PR38318-3.f90 SUBROUTINE S1(N,A) REAL :: A(3) REAL, DIMENSION(:), ALLOCATABLE :: B DO I=1,N ALLOCATE(B(3)) B=-A CALL S2(B) DEALLOCATE(B) ENDDO END SUBROUTINE which really should contain any call to _gfortran_runtime_error_at, _gfortran_os_error, __builtin_malloc, __builtin_free if all were perfect, and certainly not in the loop
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 --- Comment #7 from Marc Glisse --- (In reply to Joost VandeVondele from comment #6) > Marc, I think your recently posted patch: > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html > could fix the problem with the testcase subroutine S1, even though 'moving > allocations out of loops' is more or less a side effect. I don't speak fortran fluently so I tried compiling S1 with an unpatched compiler and -O2 -fdump-tree-optimized, but I don't see any call to malloc in there. Could you explain, with references to a dump, what the internal functions mean and where my patch might help?
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 Joost VandeVondele changed: What|Removed |Added CC||glisse at gcc dot gnu.org, ||Joost.VandeVondele at mat dot ethz ||.ch --- Comment #6 from Joost VandeVondele --- Marc, I think your recently posted patch: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01049.html could fix the problem with the testcase subroutine S1, even though 'moving allocations out of loops' is more or less a side effect.
[Bug middle-end/38318] moving the allocation of temps out of loops.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318 Dominique d'Humieres changed: What|Removed |Added CC||jh at suse dot cz, ||rguenther at suse dot de --- Comment #5 from Dominique d'Humieres 2010-10-07 15:04:34 UTC --- Another case of interest is "automatic arrays". An interesting example is the polyhedron test nf.f90. On Core2 Duo and Darwin the following patch --- nf.f902005-10-11 22:53:32.0 +0200 +++ nf_v2.f902010-10-07 16:49:38.0 +0200 @@ -153,7 +153,7 @@ integer :: nx , nxy , nxyz , maxiter real(dpkind),dimension(nxyz):: ad,au1,au2,au3,x,b real(dpkind)::targrms -real(dpkind),allocatable,dimension(:) :: r,q,p,z,g,gi +real(dpkind),allocatable,dimension(:) :: r,q,p,z,g,gi,t,u real(dpkind):: alpha,beta,qr,qrp,rmserr integer :: iter , tbase , tgi , tcg , tickspersec , maxticks @@ -163,7 +163,7 @@ call GetGI3D(1,nxyz) ! c call system_clock(tgi,tickspersec,maxticks) deallocate(g) -allocate (r(nxyz),q(nxyz),p(nxyz),z(nxyz)) +allocate (r(nxyz),q(nxyz),p(nxyz),z(nxyz),t(nxyz),u(nxyz)) CALL SPMMULT(x,r) ; r = b - r! compute initial residual vector write(*,'(A)') ' Iter AlphaBeta RMS Residual Sum of Residuals' @@ -171,12 +171,12 @@ write(*,'(I4,24X,2G18.7)') 0,sqrt(DOT_PR ! Do a single iteration with alpha =1 ! to reduce sum of residuals to 0 -p = r ; CALL NF3DPrecon(p,1,nxyz) ; CALL SPMMULT(p,z) +p = r ; CALL NF3DPrecon(p,t,u,1,nxyz) ; CALL SPMMULT(p,z) x = x + p ; r = r - z write(*,'(I4,F12.5,12X,2G18.7)') 0,1.0,sqrt(DOT_PRODUCT(r,r)/nxyz),sum(r) do iter = 1 , maxiter - q = r ; CALL NF3DPrecon(q,1,nxyz) + q = r ; CALL NF3DPrecon(q,t,u,1,nxyz) qr = DOT_PRODUCT(q,r) if ( iter==1 ) then beta = 0.0 @@ -197,7 +197,7 @@ call system_clock(tcg,tickspersec,maxtic write(*,'(/A,F10.3/A,F10.3/A,F10.3)') ' Time for setup ',REAL(tgi-tbase)/REAL(tickspersec) , & ' Time per iteration ',REAL(tcg-tgi)/REAL(tickspersec*min(iter,maxiter)) , & ' Total Time ',REAL(tcg-tbase)/REAL(tickspersec) -deallocate(r,q,p,z,gi) +deallocate(r,q,p,z,gi,t,u) contains != ! Banded matrix multiply b = A.x = @@ -253,7 +253,7 @@ end subroutine GetGI2D !== != ! solve for a plane of cells using == -subroutine NF2DPrecon(x,i1,i2) ! 2D NF Preconditioning matrix +subroutine NF2DPrecon(x,t,i1,i2) ! 2D NF Preconditioning matrix integer :: i1 , i2 real(dpkind),dimension(i2)::x,t integer :: i @@ -272,11 +272,12 @@ end subroutine NF2DPrecon!== subroutine GetGI3D(i1,i2)! compute gi for a 3D block of cells = integer :: i1 , i2 integer :: i +real(dpkind),dimension(nxyz)::t g = ad do i = i1 , i2 , nxy ! advance one plane at a time if ( i>i1 ) then ! get contribution from previous plane g(i-nxy:i-1) = au3(i-nxy:i-1) - call NF2DPrecon(g,i-nxy,i-1) + call NF2DPrecon(g,t,i-nxy,i-1) g(i:i+nxy-1) = g(i:i+nxy-1) - au3(i-nxy:i-1)*g(i-nxy:i-1) endif call GetGI2D(i,i+nxy-1) ! get contribution from this plane @@ -285,17 +286,17 @@ end subroutine GetGI3D !== != ! solve for a 3D block of cells using -subroutine NF3DPrecon(x,i1,i2) ! 3D Preconditioning matrix +subroutine NF3DPrecon(x,t,u,i1,i2) ! 3D Preconditioning matrix integer :: i1 , i2 -real(dpkind),dimension(i2)::x,t +real(dpkind),dimension(i2)::x,t,u integer :: i do i = i1 , i2 , nxy if ( i>i1 ) x(i:i+nxy-1) = x(i:i+nxy-1) - au3(i-nxy:i-1)*x(i-nxy:i-1) - call NF2DPrecon(x,i,i+nxy-1) + call NF2DPrecon(x,u,i,i+nxy-1) enddo do i = i2-2*nxy+1 , i1 , -nxy t(i:i+nxy-1) = au3(i:i+nxy-1)*x(i+nxy:i+2*nxy-1) - call NF2DPrecon(t,i,i+nxy-1) + call NF2DPrecon(t,u,i,i+nxy-1) x(i:i+nxy-1) = x(i:i+nxy-1) - t(i:i+nxy-1) enddo end subroutine NF3DPrecon != cuts the execution time from ~28s to ~20s (Note that with the options I use all the procs are inlined).
[Bug middle-end/38318] moving the allocation of temps out of loops.
--- Comment #4 from jv244 at cam dot ac dot uk 2010-02-21 12:11 --- (In reply to comment #3) > (In reply to comment #2) > > seemingly being discussed, since useful for tonto > > > > http://gcc.gnu.org/ml/fortran/2010-02/msg00157.html > > > > But there: "it's unfortunately not possible to avoid the temporary creation > without serious data-flow analysis work - too late for the frontend" > > Thus, this seems to be more a middle-end item. right, changing component as such. This would actually be much more powerful as a middle-end thing, since it would also capture the case where a programmer would explicitly allocate/deallocate stuff in a loop. -- jv244 at cam dot ac dot uk changed: What|Removed |Added Component|fortran |middle-end http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38318