http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29874

--- Comment #2 from stevenj at alum dot mit.edu 2011-03-07 23:13:41 UTC ---
Created attachment 23579
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=23579
benchmark extracted from FFTW3 - size 64 FFT with SSE2

I extracted a little benchmark of a size-64 FFT using double-precision SSE2
from FFTW3; this is a hard-coded (program-generated) routine specifically for
transforms of that size, and is usually a good test of the optimizer.

I played around with the compiler flags a bit, but it seems that just "-O3" is
about as good as anything.  i.e. gcc -O3 n1fv_64.c -o n1fv_64

I then ran a few timing tests on my Debian/x86_64 box (2.83GHz Intel Xeon
E5440), with the command:
      (for n in `seq 1 40`; do time ./n1fv_64; done) 2>&1 |grep user |sort
to time it a bunch of times, keeping only the fastest result to try and remove
random variations.  The results seemed pretty repeatable.

Results:
   gcc 3.4.6:    0m0.208s
   gcc 4.1.3:   0m0.216s
   gcc 4.3.2:   0m0.232s

So, there does seem to be a definite slight slowdown.  I haven't tried gcc 4.4
or 4.5, since they are not installed on this box, but seems worthwhile for
someone to try them.

Reply via email to