------- Comment #6 from burnus at gcc dot gnu dot org  2009-07-15 20:27 -------
You should also add -march=native to the command line; it probably does not
help much, bit it should help a bit. I recall also the standard GLIBC misses
some optimized version for math on x86-64 while AMD provides patches for those
(applied by standard on SUSE Linux). Though, I am not sure whether this is
still an issue.

With openSUSE Factory (x86_64, glibc 2.10.1, GCC 4.5.0) I get on an AMD Athlon
64 x2 4800+ the following timings, which do not look too bad:

$ ifort -O3 -xHost aa.f90; time ./a.out/
real  1m59.997s    user  1m59.651s   sys   0m0.252s

$ gfortran -O3 -ffast-math -march=native aa.f90; time ./a.out
real  2m29.711s    user  2m28.841s   sys   0m0.236s

$ gfortran -O3 -ffast-math  -mveclibabi=acml -march=native aa.f90 \
  -L /opt/acml4.2.0/gfortran64_mp/lib/ -lacml_mv   #(Note: current is ACML 4.3)
real  2m29.693s    user  2m29.373s   sys   0m0.192s

$ gfortran -O3 -ffast-math  -mveclibabi=svml -march=native aa.f90 \
  -L /opt/intel/Compiler/11.1/038/lib/intel64 -lsvml -limf -lintlc; \
  time ./a.out
real  3m56.189s    user  3m55.839s   sys   0m0.200s

Thus with the GLIBC (with AMD patches) or with the AMCL, one gets only a
slowdown of 25%, which is still acceptable. Why the Intel routines are so slow
on my AMD, I do not know.

With -mveclibabi=svml sincosf and tanf are linked; for -mveclibabi=acml and no
-mvec* option, sincosf and tanf@@GLIBC_2.2.5. ifort by contrast calls:
vmlsSinCos4 vmlsTan4

Thus the question is really: Why are neither vmlsSinCos4 nor vmlsTan4 - nor for
ACML vrs4_sincosf/vrsa_sincosf (vrs*_tan* does not exist) called?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40766

Reply via email to