The (to be) attached code runs about ~15% (4.4 vs 4.2) slower compiled with: gfortran -O3 -march=native -funroll-loops -ffast-math test.f90
4.4: 5.060s 4.3: 4.376s 4.2: 4.316s most time would be spent in PD2VAL. FYI, the cpu is: cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 8218 stepping : 2 cpu MHz : 2612.084 cache size : 1024 KB flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy (-march -> -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8) on Core2 4.4 is actually faster: 4.4: 4.236s 4.3.0: 4.572s -march=core2 -mcx16 -msahf --param l1-cache-size=32 --param l1-cache-line-size=64 -mtune=core2 -- Summary: [4.4 Regression] 15% slowdown of computational kernel Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jv244 at cam dot ac dot uk GCC host triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306