a: > Because dmd currently doesn't have an intrinsic for the SHUFPS > instruction I've included a version block with some GDC specific > code (this gave me a speedup of up to 80%).
It seems an instruction worth having in dmd too. > Chart: http://cloud.github.com/downloads/jerro/pfft/image.png I know your code is relatively simple, so it's not meant to be the fastest on the ground, but in your nice graph _as reference point_ I'd like to see a line for the FTTW too. Such line is able to show us how close or how far all this is from an industry standard performance. (And if possible I'd like to see two lines for the LDC2 compiler too.) Bye, bearophile