On 25 January 2012 00:04, a <a...@a.com> wrote: > Since SIMD types were added to D I've ported an FFT that I was writing in > C++ to D. The code is here: > > https://github.com/jerro/pfft > > Because dmd currently doesn't have an intrinsic for the SHUFPS instruction > I've included a version block with some GDC specific code (this gave me a > speedup of up to 80%). I've benchmarked the scalar and SSE version of code > compiled with both DMD and GDC and also the c++ code using SSE. The results > are below. The left column is base two logarithm of the array size and the > right column is GFLOPS defined as the number of floating point operations > that the most basic FFT algorithm would perform divided by the time taken > (the algorithm I used performs just a bit less operations): > > GFLOPS = 5 n log2(n) / (time for one FFT in nanoseconds) (I took that > definition from http://www.fftw.org/speed/ ) > > Chart: http://cloud.github.com/downloads/jerro/pfft/image.png > > Results: > > GDC SSE: > > 2 0.833648 > 3 1.23383 > 4 6.92712 > 5 8.93348 > 6 10.9212 > 7 11.9306 > 8 12.5338 > 9 13.4025 > 10 13.5835 > 11 13.6992 > 12 13.493 > 13 12.7082 > 14 9.32621 > 15 9.15256 > 16 9.31431 > 17 8.38154 > 18 8.267 > 19 7.61852 > 20 7.14305 > 21 7.01786 > 22 6.58934 > > G++ SSE: > > 2 1.65933 > 3 1.96071 > 4 7.09683 > 5 9.66308 > 6 11.1498 > 7 11.9315 > 8 12.5712 > 9 13.4241 > 10 13.4907 > 11 13.6524 > 12 13.4215 > 13 12.6472 > 14 9.62755 > 15 9.24289 > 16 9.64412 > 17 8.88006 > 18 8.66819 > 19 8.28623 > 20 7.74581 > 21 7.6395 > 22 7.33506 > > GDC scalar: > > 2 0.808422 > 3 1.20835 > 4 2.66921 > 5 2.81166 > 6 2.99551 > 7 3.26423 > 8 3.61477 > 9 3.90741 > 10 4.04009 > 11 4.20405 > 12 4.21491 > 13 4.30896 > 14 3.79835 > 15 3.80497 > 16 3.94784 > 17 3.98417 > 18 3.58506 > 19 3.33992 > 20 3.42309 > 21 3.21923 > 22 3.25673 > > DMD SSE: > > 2 0.497946 > 3 0.773551 > 4 3.79912 > 5 3.78027 > 6 3.85155 > 7 4.06491 > 8 4.30895 > 9 4.53038 > 10 4.61006 > 11 4.82098 > 12 4.7455 > 13 4.85332 > 14 3.37768 > 15 3.44962 > 16 3.54049 > 17 3.40236 > 18 3.47339 > 19 3.40212 > 20 3.15997 > 21 3.32644 > 22 3.22767 > > DMD scalar: > > 2 0.478998 > 3 0.772341 > 4 1.6106 > 5 1.68516 > 6 1.7083 > 7 1.70625 > 8 1.68684 > 9 1.66931 > 10 1.66125 > 11 1.63756 > 12 1.61885 > 13 1.60459 > 14 1.402 > 15 1.39665 > 16 1.37894 > 17 1.36306 > 18 1.27189 > 19 1.21033 > 20 1.25719 > 21 1.21315 > 22 1.21606 > > SIMD gives between 2 and 3.5 speedup for GDC compiled code and between 2.5 > and 3 for DMD. Code compiled with GDC is just a little bit slower than G++ > (and just for some values of n), which is really nice.
That is quite interesting, and really cool at the same time. :) Did you run into any issues with GDC's implementation of vectors? -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0';