[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #14 from howarth at nitro dot med dot uc dot edu 2008-09-04 02:07 --- Paul, I'll check current gcc trunk later this week but I think this issue has gone latent in gcc 4.4 so it only exists in the gcc 4.3 releases. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #13 from pault at gcc dot gnu dot org 2008-09-03 21:20 --- (In reply to comment #12) > I just checked on an Opteron and a Xeon with Fedora 9. Neither shows the > problem so it may be Core 2 specific. > Jack, We're in regression fixing - what do you want to do with this one? Cheers Paul -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #12 from howarth at nitro dot med dot uc dot edu 2008-06-23 15:22 --- I just checked on an Opteron and a Xeon with Fedora 9. Neither shows the problem so it may be Core 2 specific. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #11 from howarth at nitro dot med dot uc dot edu 2008-06-23 14:54 --- Any suggestions for debugging this further in gcc 4.3 branch? Currently we have the following observations... 1) The problem is specific to -fassociative-math -fno-signed-zeros -fno-trapping-math (-fno-signed-zeros -fno-trapping-math are required for -fassociative-math to be active) and happens with -O3 but not -O2. 2) The problem doesn't occur on powerpc-apple-darwin9. 3) The problem occurs on i686-apple-darwin9 at both -m32 and -m64. 4) -funroll-loops -param min-vect-loop-bound=2 suppresses the problem on Macintel (but is it just going latent as with r134730 on gcc trunk)? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #10 from rguenth at gcc dot gnu dot org 2008-06-23 14:02 --- It probably made it latent. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #9 from howarth at nitro dot med dot uc dot edu 2008-06-23 13:41 --- Since r134730 represents a new feature and can't be backported, this leaves one question. Did that change un gcc trunk really 'fix' the problem in the induct benchmark performance using -fassociative-math and -O3 or just make it go latent? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #8 from howarth at nitro dot med dot uc dot edu 2008-06-23 00:46 --- Looking at the SuSe polyhedron benchmark servers, the induct regression in trunk was eliminated between 2008-04-27 and 2008-04-28. My guess is this was fixed with... r134730 | rguenth | 2008-04-27 12:27:08 -0400 (Sun, 27 Apr 2008) | 42 lines 2008-04-27 Richard Guenther <[EMAIL PROTECTED]> PR tree-optimization/18754 PR tree-optimization/34223 * tree-pass.h (pass_complete_unrolli): Declare. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Print loop size before and after unconditionally of UL_NO_GROWTH in effect. Rewrite loop into loop closed SSA form if it is not already. (tree_unroll_loops_completely): Re-structure to iterate over innermost loops with intermediate CFG cleanups. Unroll outermost loops only if requested or the code does not grow doing so. * tree-ssa-loop.c (gate_tree_vectorize): Don't shortcut if no loops are available. (tree_vectorize): Instead do so here. (tree_complete_unroll): Also unroll outermost loops. (tree_complete_unroll_inner): New function. (gate_tree_complete_unroll_inner): Likewise. (pass_complete_unrolli): New pass. * tree-ssa-loop-manip.c (find_uses_to_rename_use): Only record uses outside of the loop. (tree_duplicate_loop_to_header_edge): Only verify loop-closed SSA form if it is available. * tree-flow.h (tree_unroll_loops_completely): Add extra parameter. * passes.c (init_optimization_passes): Schedule complete inner loop unrolling pass before the first CCP pass after final inlining. * gcc.dg/tree-ssa/loop-36.c: New testcase. * gcc.dg/tree-ssa/loop-37.c: Likewise. * gcc.dg/vect/vect-118.c: Likewise. * gcc.dg/Wunreachable-8.c: XFAIL bogus warning. * gcc.dg/vect/vect-66.c: Increase loop trip count. * gcc.dg/vect/no-section-anchors-vect-66.c: Likewise. * gcc.dg/vect/no-section-anchors-vect-69.c: Likewise. * gcc.dg/vect/vect-76.c: Likewise. * gcc.dg/vect/vect-outer-6.c: Likewise. * gcc.dg/vect/vect-outer-1.c: Likewise. * gcc.dg/vect/vect-outer-1a.c: Likewise. * gcc.dg/vect/vect-11a.c: Likewise. * gcc.dg/vect/vect-shift-1.c: Likewise. * gcc.target/i386/vectorize1.c: Likewise. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #7 from howarth at nitro dot med dot uc dot edu 2008-06-23 00:07 --- Looking at the polyhedron benchmark data from http://physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/, there appears to be a big jump in performance in the induct runtime with -ffast-math and -O3 between 2008-04-24 and 2008-05-01. Unfortunately, there is a gap in the data between those dates with the later showing the increase. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #6 from howarth at nitro dot med dot uc dot edu 2008-06-22 22:30 --- IMHO when a new optimization technique is enabled by default in -O3 and degrades common benchmark performance, it qualifies as a performance regression for that release. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #5 from dominiq at lps dot ens dot fr 2008-06-22 20:43 --- I think the problem is that the vector cost model is not tune for the Intel Core family. My understanding of the problem is that without the relevant suboption of -ffast-math the inner implicit loops in induct are not vectorized, while they are wrongly vectorized (they are of length 3) with -ffast-math. This can be prevented by using '--param min-vect-loop-bound=2': [ibook-dhum] lin/test% gfortran -O3 induct.f90 76.901u 0.099s 1:17.11 99.8%0+0k 0+1io 34pf+0w [ibook-dhum] lin/test% gfortran -ffast-math -O3 induct.f90 96.605u 0.133s 1:36.82 99.9%0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -ffast-math -O3 --param min-vect-loop-bound=2 induct.f90 73.239u 0.093s 1:13.39 99.9%0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -O3 induct.f90 65.322u 0.075s 1:05.44 99.9%0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 induct.f90 90.604u 0.097s 1:30.77 99.9%0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 --param min-vect-loop-bound=2 induct.f90 61.007u 0.049s 1:01.13 99.8%0+0k 0+0io 41pf+0w In trunk these inner loops are unrolled before vectorization and the run time is now ~36s. So I am not sure that the observed behavior is really a regression, but rather a lack of suitable cost model. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #4 from howarth at nitro dot med dot uc dot edu 2008-06-22 18:30 --- If the problem is due to the honoring of parenthesis in fortran, why doesn't this issue manifest itself on powerpc-apple-darwin as well? http://gcc.gnu.org/ml/fortran/2008-06/msg00249.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #3 from burnus at gcc dot gnu dot org 2008-06-22 17:54 --- Fortran requires that parenthesis are always honoured. All other associative maths operations are allowed in Fortran, but as IEEE is also allowed, this gives problems e.g. INF is involved or signed zeros. Therefore, gfortran does no associative operations by default, only if -ffinite-math-only is enabled. But in this case, there the parenthesis are still always honoured. At least this is my knowledge about the optimizations done at the moment. Though, at the moment it is unclear to me, why -O3 is slower than -O2. See also PR 35259. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #2 from howarth at nitro dot med dot uc dot edu 2008-06-22 17:00 --- Created an attachment (id=15803) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15803&action=view) assembly file generate with -fassociative-math -fno-signed-zeros -fno-trapping-math -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599
[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel
--- Comment #1 from howarth at nitro dot med dot uc dot edu 2008-06-22 16:58 --- Created an attachment (id=15802) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15802&action=view) assembly file generate with -fno-signed-zeros -fno-trapping-math -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599