------- Comment #5 from dominiq at lps dot ens dot fr  2008-06-22 20:43 -------
I think the problem is that the vector cost model is not tune for the Intel
Core family.
My understanding of the problem is that without the relevant suboption of
-ffast-math
the inner implicit loops in induct are not vectorized, while they are wrongly
vectorized
(they are of length 3) with -ffast-math. This can be prevented by using 
'--param min-vect-loop-bound=2':

[ibook-dhum] lin/test% gfortran -O3 induct.f90
76.901u 0.099s 1:17.11 99.8%    0+0k 0+1io 34pf+0w
[ibook-dhum] lin/test% gfortran -ffast-math -O3 induct.f90
96.605u 0.133s 1:36.82 99.9%    0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -ffast-math -O3 --param min-vect-loop-bound=2
induct.f90
73.239u 0.093s 1:13.39 99.9%    0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -O3 induct.f90
65.322u 0.075s 1:05.44 99.9%    0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 induct.f90
90.604u 0.097s 1:30.77 99.9%    0+0k 0+0io 0pf+0w
[ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 --param
min-vect-loop-bound=2 induct.f90
61.007u 0.049s 1:01.13 99.8%    0+0k 0+0io 41pf+0w

In trunk these inner loops are unrolled before vectorization and the run time
is now ~36s.
So I am not sure that the observed behavior is really a regression, but rather
a lack of suitable
cost model.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599

Reply via email to