------- Comment #8 from jv244 at cam dot ac dot uk  2008-08-19 05:43 -------
(In reply to comment #7)
> That is, GCCs inner loop is
> 
> .L6:
>         addl    $1, %eax
>         addsd   %xmm12, %xmm11
>         cmpl    $100000000, %eax
>         addsd   %xmm14, %xmm3
>         addsd   %xmm15, %xmm2
>         addsd   %xmm13, %xmm1
>         jne     .L6
> 
> which doesn't necessarily look slower than ICCs.
> 

Right... checked trunk, and it now does something very smart with the testcase
from comment 4 ... it is now about 10 times faster than ifort (9.1 /11.0)

> gfortran -O3 -ftree-vectorize -ffast-math -march=native -S PR31079_4.f90
> ./a.out
  0.25201499

> ifort -xT -O2 PR31079_4.f90
> ./a.out
   2.040127

I'll see if there is a way to get the testcase somewhat smarter. I checked the
very first program (comment #0), and this is still slower with gfortran (intel
3.51 vs gfortran 4.1). Just for completeness, I attach the Fortran source and
the intel assembly. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079

Reply via email to