------- Comment #67 from whaley at cs dot utsa dot edu 2006-08-11 15:22 ------- Uros,
>Slightly offtopic, but to put some numbers to comment #8 and comment #11, >equivalent SSE code now reaches only 50% of x87 single performance and 60% of >x87 double performance on AMD x86_64 FYI, you *may* get slightly better single SSE performance with these flags: -fomit-frame-pointer -march=athlon64 -O2 -mfpmath=sse \ -msse -msse2 -msse3 -fargument-noalias-global Also, when ATLAS is allowed to exercise the code generator to find the best kernel, for double precision gcc 4's SSE could be made to almost tie gcc3's x87 performance (gcc3's double x87 performance is roughly 92% of the patched gcc 4 for this platform). However, single precision SSE, even allowing the code generator to go crazy, could only achieve about 2/3 of double *SSE* performance, and since x87 single perf is actually greater for x87 . . . You can find some details at: https://sourceforge.net/mailarchive/forum.php?thread_id=10026092&forum_id=426 Cheers, Clint -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827