http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47298
--- Comment #2 from Joost VandeVondele <Joost.VandeVondele at pci dot uzh.ch> 2011-01-14 20:52:54 UTC --- (In reply to comment #1) > It's faster for me with -O3 (Athlon64, using -march=native). well not on model name : Intel(R) Xeon(R) CPU X5550 @ 2.67GHz stepping : 5 I have 8Gflops with -O2 and somewhat more than 4 with -O3 BTW, the proper test program is > cat test_compare.f90 REAL(KIND=8), DIMENSION(12,12) :: A,B,C A=0 ; B=0 ; C=0 DO I=1,10000000 CALL HARD_NN_12_12_12(C,A,B) ENDDO END