The (to be) attached code runs about ~15% (4.4 vs 4.2) slower compiled with:
gfortran -O3 -march=native -funroll-loops  -ffast-math test.f90

4.4:  5.060s
4.3:  4.376s
4.2:  4.316s

most time would be spent in PD2VAL.

FYI, the cpu is:

cpu family      : 15
model           : 65
model name      : Dual-Core AMD Opteron(tm) Processor 8218
stepping        : 2
cpu MHz         : 2612.084
cache size      : 1024 KB
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext
3dnow pni cx16 lahf_lm cmp_legacy svm cr8_legacy

(-march -> -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8)

on Core2 4.4 is actually faster:

4.4: 4.236s
4.3.0: 4.572s

-march=core2 -mcx16 -msahf --param l1-cache-size=32 --param
l1-cache-line-size=64 -mtune=core2


-- 
           Summary: [4.4 Regression] 15% slowdown of computational kernel
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jv244 at cam dot ac dot uk
  GCC host triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38306

Reply via email to