------- Comment #4 from dominiq at lps dot ens dot fr 2010-01-05 14:35 ------- Note that the inner loops in subroutine mutual_ind_quad_rec_coil are not vectorized at -O3, unless -ffast-math is used. Timing the code with and without -ffast-math gives
[macbook] lin/test% gfc -O3 induct.f90 [macbook] lin/test% time a.out > /dev/null 23.048u 0.037s 0:23.09 99.9% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfc -O3 -ffast-math induct.f90 [macbook] lin/test% time a.out > /dev/null 13.747u 0.033s 0:13.78 99.9% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfc -O3 -floop-block induct.f90 [macbook] lin/test% time a.out > /dev/null 8.452u 0.028s 0:08.48 99.8% 0+0k 0+0io 0pf+0w [macbook] lin/test% gfc -O3 -ffast-math -floop-block induct.f90 [macbook] lin/test% time a.out > /dev/null 8.154u 0.030s 0:08.18 100.0% 0+0k 0+0io 0pf+0w -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42479