[Bug tree-optimization/49851] IVOPTs makes a mess out of polyhedron air derivx and derivy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49851 --- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2011-07-26 12:53:47 UTC --- Testcase with caller, use -fwhole-program to force inlining. IMPLICIT REAL*8(a-H,O-Z) PARAMETER (NX=150,NY=150) DIMENSION ux(NX,NY) , uy(NX,NY) , vx(NX,NY) , vy(NX,NY) , tx(NX,NY) DIMENSION DX(NX,33) , DY(NY,33) DIMENSION U(NX,NY) , V(NX,NY) , P(NX,NY) , RHO(NX,NY) , E(NX,NY) DIMENSION NPX(30) , x(NX) , NPY(30) , y(NY) , ALX(30) , bex(30) COMMON /XD1 / FP1 , FM1 , FP2 , FM2 , FP3 , FM3 , FP4 , FM4 , FP1x , FM1x , FP2x , FM2x , FP3x , FM3x , FP4x , FM4x , FV2 , FV3 , FV4 , DXP2 , DXM2 , DXP3 , DXM3 , DXP4 , DXM4 , DX , NPX , ALX , NDX , MXPy COMMON /BNDRY / U , V , P , RHO , T , E , AS1 , AS2 , AS3 , AS4 , NX1 , NY1 , NX2 , NY2 , SIG , GMA , S0 , T0 , P0 , PE , HOO , RR , MINlet CALL DERIVX(DX,U,ux,ALX,NPX,NDX,MXPy) END SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) IMPLICIT REAL*8(A-H,O-Z) PARAMETER (NX=150,NY=150) DIMENSION D(NX,33) , U(NX,NY) , Ux(NX,NY) , Al(30) , Np(30) DO jm = 1 , M jmax = 0 jmin = 1 DO i = 1 , Nd jmax = jmax + Np(i) + 1 DO j = jmin , jmax uxt = 0. DO k = 0 , Np(i) uxt = uxt + D(j,k+1)*U(jmin+k,jm) ENDDO Ux(j,jm) = uxt*Al(i) ENDDO jmin = jmin + Np(i) + 1 ENDDO ENDDO CONTINUE END
[Bug tree-optimization/49851] IVOPTs makes a mess out of polyhedron air derivx and derivy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49851 --- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2011-07-26 13:59:26 UTC --- AIR spends 86% of its time in DERIV[XY] (for ICC), 78% of its time there for GCC. The performance difference also reproduces when not inlining DERIV[XY] at all (though it's slightly less of a difference - GCC doesn't care).
[Bug tree-optimization/49851] IVOPTs makes a mess out of polyhedron air derivx and derivy
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49851 --- Comment #3 from Richard Guenther rguenth at gcc dot gnu.org 2011-07-26 14:32:35 UTC --- (In reply to comment #2) AIR spends 86% of its time in DERIV[XY] (for ICC), 78% of its time there for GCC. The performance difference also reproduces when not inlining DERIV[XY] at all (though it's slightly less of a difference - GCC doesn't care). Actually it does not. Without inlining: GCC: air 2.42 4728556 3.99 10 0.1798 SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) /* derivx_ total: 8194999 42.3750 */ SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M) /* derivy_ total: 8176250 42.2781 */ ICC: air 2.90 4072563 3.45 10 0.1809 SUBROUTINE DERIVX(D,U,Ux,Al,Np,Nd,M) /* derivx_ total: 8060834 47.2620 */ SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M) /* derivy_ total: 7070627 41.4563 */ so not much difference in the total hits. Which means ICC performs some context-dependent optimization.