------- Comment #67 from whaley at cs dot utsa dot edu  2006-08-11 15:22 -------
Uros,

>Slightly offtopic, but to put some numbers to comment #8 and comment #11,
>equivalent SSE code now reaches only 50% of x87 single performance and 60% of
>x87 double performance on AMD x86_64

FYI, you *may* get slightly better single SSE performance with these flags:
   -fomit-frame-pointer -march=athlon64 -O2 -mfpmath=sse \
   -msse -msse2 -msse3 -fargument-noalias-global

Also, when ATLAS is allowed to exercise the code generator to find the best
kernel, for double precision gcc 4's SSE could be made to almost tie gcc3's x87
performance (gcc3's double x87 performance is roughly 92% of the patched gcc 4
for this platform).  However, single precision SSE, even allowing the code
generator to go crazy, could only achieve about 2/3 of double *SSE*
performance, and since x87 single perf is actually greater for x87 . . .

You can find some details at:
  
https://sourceforge.net/mailarchive/forum.php?thread_id=10026092&forum_id=426

Cheers,
Clint


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827

Reply via email to