------- Comment #64 from uros at kss-loka dot si 2006-08-11 09:18 ------- Slightly offtopic, but to put some numbers to comment #8 and comment #11, equivalent SSE code now reaches only 50% of x87 single performance and 60% of x87 double performance on AMD x86_64:
ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== [float] -O2 -mfpmath=sse -march=k8: atlasmm 60 1000 0.273 1582.66 [float] -O2 -mfpmath=387 -march=k8: atlasmm 60 1000 0.138 3130.91 [double] -O2 -mfpmath=sse -march=k8: atlasmm 60 1000 0.252 1714.54 [double] -O2 -mfpmath=387 -march=k8: atlasmm 60 1000 0.152 2842.55 This effect was first observed in PR19780. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827