[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-09-03 Thread howarth at nitro dot med dot uc dot edu


--- Comment #14 from howarth at nitro dot med dot uc dot edu  2008-09-04 
02:07 ---
Paul,
   I'll check current gcc trunk later this week but I think this issue has gone
latent in gcc 4.4 so it only exists in the gcc 4.3 releases.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-09-03 Thread pault at gcc dot gnu dot org


--- Comment #13 from pault at gcc dot gnu dot org  2008-09-03 21:20 ---
(In reply to comment #12)
> I just checked on an Opteron and a Xeon with Fedora 9. Neither shows the
> problem so it may be Core 2 specific.
> 
Jack,

We're in regression fixing - what do you want to do with this one?

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-23 Thread howarth at nitro dot med dot uc dot edu


--- Comment #12 from howarth at nitro dot med dot uc dot edu  2008-06-23 
15:22 ---
I just checked on an Opteron and a Xeon with Fedora 9. Neither shows the
problem so it may be Core 2 specific.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-23 Thread howarth at nitro dot med dot uc dot edu


--- Comment #11 from howarth at nitro dot med dot uc dot edu  2008-06-23 
14:54 ---
Any suggestions for debugging this further in gcc 4.3 branch? Currently we have
the following observations...

1) The problem is specific to -fassociative-math -fno-signed-zeros
-fno-trapping-math (-fno-signed-zeros -fno-trapping-math are required for
-fassociative-math to be active) and happens with -O3 but not -O2.
2) The problem doesn't occur on powerpc-apple-darwin9.
3) The problem occurs on i686-apple-darwin9 at both -m32 and -m64.
4) -funroll-loops -param min-vect-loop-bound=2 suppresses the problem on
Macintel 
(but is it just going latent as with r134730 on gcc trunk)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-23 Thread rguenth at gcc dot gnu dot org


--- Comment #10 from rguenth at gcc dot gnu dot org  2008-06-23 14:02 
---
It probably made it latent.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-23 Thread howarth at nitro dot med dot uc dot edu


--- Comment #9 from howarth at nitro dot med dot uc dot edu  2008-06-23 
13:41 ---
Since r134730 represents a new feature and can't be backported, this leaves one
question. Did that change un gcc trunk really 'fix' the problem in the induct
benchmark performance using -fassociative-math and -O3 or just make it go
latent?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #8 from howarth at nitro dot med dot uc dot edu  2008-06-23 
00:46 ---
Looking at the SuSe polyhedron benchmark servers, the induct regression in
trunk was eliminated between 2008-04-27 and 2008-04-28. My guess is this was
fixed with...

r134730 | rguenth | 2008-04-27 12:27:08 -0400 (Sun, 27 Apr 2008) | 42 lines

2008-04-27  Richard Guenther  <[EMAIL PROTECTED]>

PR tree-optimization/18754
PR tree-optimization/34223
* tree-pass.h (pass_complete_unrolli): Declare.
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Print
loop size before and after unconditionally of UL_NO_GROWTH in effect.
Rewrite loop into loop closed SSA form if it is not already.
(tree_unroll_loops_completely): Re-structure to iterate over
innermost loops with intermediate CFG cleanups.
Unroll outermost loops only if requested or the code does not grow
doing so.
* tree-ssa-loop.c (gate_tree_vectorize): Don't shortcut if no
loops are available.
(tree_vectorize): Instead do so here.
(tree_complete_unroll): Also unroll outermost loops.
(tree_complete_unroll_inner): New function.
(gate_tree_complete_unroll_inner): Likewise.
(pass_complete_unrolli): New pass.
* tree-ssa-loop-manip.c (find_uses_to_rename_use): Only record
uses outside of the loop.
(tree_duplicate_loop_to_header_edge): Only verify loop-closed SSA
form if it is available.  
* tree-flow.h (tree_unroll_loops_completely): Add extra parameter.
* passes.c (init_optimization_passes): Schedule complete inner
loop unrolling pass before the first CCP pass after final inlining.

* gcc.dg/tree-ssa/loop-36.c: New testcase.
* gcc.dg/tree-ssa/loop-37.c: Likewise.
* gcc.dg/vect/vect-118.c: Likewise.
* gcc.dg/Wunreachable-8.c: XFAIL bogus warning.
* gcc.dg/vect/vect-66.c: Increase loop trip count.
* gcc.dg/vect/no-section-anchors-vect-66.c: Likewise.
* gcc.dg/vect/no-section-anchors-vect-69.c: Likewise.
* gcc.dg/vect/vect-76.c: Likewise.
* gcc.dg/vect/vect-outer-6.c: Likewise.
* gcc.dg/vect/vect-outer-1.c: Likewise.
* gcc.dg/vect/vect-outer-1a.c: Likewise.
* gcc.dg/vect/vect-11a.c: Likewise.
* gcc.dg/vect/vect-shift-1.c: Likewise.
* gcc.target/i386/vectorize1.c: Likewise.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #7 from howarth at nitro dot med dot uc dot edu  2008-06-23 
00:07 ---
Looking at the polyhedron benchmark data from
http://physik.fu-berlin.de/~tburnus/gcc-trunk/benchmark/, there appears to be a
big jump in performance in the induct runtime with -ffast-math and -O3 between
2008-04-24 and 2008-05-01. Unfortunately, there is a gap in the data between
those dates with the later showing the increase.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #6 from howarth at nitro dot med dot uc dot edu  2008-06-22 
22:30 ---
IMHO when a new optimization technique is enabled by default in -O3
and degrades common benchmark performance, it qualifies as a 
performance regression for that release.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread dominiq at lps dot ens dot fr


--- Comment #5 from dominiq at lps dot ens dot fr  2008-06-22 20:43 ---
I think the problem is that the vector cost model is not tune for the Intel
Core family.
My understanding of the problem is that without the relevant suboption of
-ffast-math
the inner implicit loops in induct are not vectorized, while they are wrongly
vectorized
(they are of length 3) with -ffast-math. This can be prevented by using 
'--param min-vect-loop-bound=2':

[ibook-dhum] lin/test% gfortran -O3 induct.f90
76.901u 0.099s 1:17.11 99.8%0+0k 0+1io 34pf+0w
[ibook-dhum] lin/test% gfortran -ffast-math -O3 induct.f90
96.605u 0.133s 1:36.82 99.9%0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -ffast-math -O3 --param min-vect-loop-bound=2
induct.f90
73.239u 0.093s 1:13.39 99.9%0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -O3 induct.f90
65.322u 0.075s 1:05.44 99.9%0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 induct.f90
90.604u 0.097s 1:30.77 99.9%0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 --param
min-vect-loop-bound=2 induct.f90
61.007u 0.049s 1:01.13 99.8%0+0k 0+0io 41pf+0w

In trunk these inner loops are unrolled before vectorization and the run time
is now ~36s.
So I am not sure that the observed behavior is really a regression, but rather
a lack of suitable
cost model.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #4 from howarth at nitro dot med dot uc dot edu  2008-06-22 
18:30 ---
If the problem is due to the honoring of parenthesis in fortran, why doesn't
this issue manifest itself on powerpc-apple-darwin as well?

http://gcc.gnu.org/ml/fortran/2008-06/msg00249.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread burnus at gcc dot gnu dot org


--- Comment #3 from burnus at gcc dot gnu dot org  2008-06-22 17:54 ---
Fortran requires that parenthesis are always honoured. All other associative
maths operations are allowed in Fortran, but as IEEE is also allowed, this
gives problems e.g. INF is involved or signed zeros. Therefore, gfortran does
no associative operations by default, only if -ffinite-math-only is enabled.
But in this case, there the parenthesis are still always honoured.

At least this is my knowledge about the optimizations done at the moment.

Though, at the moment it is unclear to me, why -O3 is slower than -O2.
See also PR 35259.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #2 from howarth at nitro dot med dot uc dot edu  2008-06-22 
17:00 ---
Created an attachment (id=15803)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15803&action=view)
assembly file generate with -fassociative-math -fno-signed-zeros
-fno-trapping-math


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599



[Bug fortran/36599] major execution regression for induct.f90 polyhedron benchmark in 4.3.1 on Intel

2008-06-22 Thread howarth at nitro dot med dot uc dot edu


--- Comment #1 from howarth at nitro dot med dot uc dot edu  2008-06-22 
16:58 ---
Created an attachment (id=15802)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15802&action=view)
assembly file generate with -fno-signed-zeros -fno-trapping-math


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599