http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874
--- Comment #2 from stevenj at alum dot mit.edu 2011-03-07 23:13:41 UTC --- Created attachment 23579 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23579 benchmark extracted from FFTW3 - size 64 FFT with SSE2 I extracted a little benchmark of a size-64 FFT using double-precision SSE2 from FFTW3; this is a hard-coded (program-generated) routine specifically for transforms of that size, and is usually a good test of the optimizer. I played around with the compiler flags a bit, but it seems that just "-O3" is about as good as anything. i.e. gcc -O3 n1fv_64.c -o n1fv_64 I then ran a few timing tests on my Debian/x86_64 box (2.83GHz Intel Xeon E5440), with the command: (for n in `seq 1 40`; do time ./n1fv_64; done) 2>&1 |grep user |sort to time it a bunch of times, keeping only the fastest result to try and remove random variations. The results seemed pretty repeatable. Results: gcc 3.4.6: 0m0.208s gcc 4.1.3: 0m0.216s gcc 4.3.2: 0m0.232s So, there does seem to be a definite slight slowdown. I haven't tried gcc 4.4 or 4.5, since they are not installed on this box, but seems worthwhile for someone to try them.