------- Comment #5 from dominiq at lps dot ens dot fr 2008-06-22 20:43 ------- I think the problem is that the vector cost model is not tune for the Intel Core family. My understanding of the problem is that without the relevant suboption of -ffast-math the inner implicit loops in induct are not vectorized, while they are wrongly vectorized (they are of length 3) with -ffast-math. This can be prevented by using '--param min-vect-loop-bound=2':
[ibook-dhum] lin/test% gfortran -O3 induct.f90 76.901u 0.099s 1:17.11 99.8% 0+0k 0+1io 34pf+0w [ibook-dhum] lin/test% gfortran -ffast-math -O3 induct.f90 96.605u 0.133s 1:36.82 99.9% 0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -ffast-math -O3 --param min-vect-loop-bound=2 induct.f90 73.239u 0.093s 1:13.39 99.9% 0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -O3 induct.f90 65.322u 0.075s 1:05.44 99.9% 0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 induct.f90 90.604u 0.097s 1:30.77 99.9% 0+0k 0+0io 0pf+0w [ibook-dhum] lin/test% gfortran -m64 -ffast-math -O3 --param min-vect-loop-bound=2 induct.f90 61.007u 0.049s 1:01.13 99.8% 0+0k 0+0io 41pf+0w In trunk these inner loops are unrolled before vectorization and the run time is now ~36s. So I am not sure that the observed behavior is really a regression, but rather a lack of suitable cost model. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599