On Saturday, 22 June 2013 at 15:41:43 UTC, Benjamin Thaut wrote:
Am 22.06.2013 15:53, schrieb jerro:
In its current state you don't want to be using SIMD with dmd
because
the generated assembly will be significantly slower then if
you just
use the default FPU math.
That may be true for some kinds of code, but it isn't true int
general.
For example, see the comparison of pfft's performance when
built with 64
bit DMD using SIMD and without SIMD:
http://i.imgur.com/kYYI9R9.png
This benchmark was run on a core i5 2500K on 64 bit Debian
Wheezy.
Ok I saw that you did write quite a few cirtical functions in
inline assembly. Not really a good argument for dmds codegen
with simd intrinsics.
Kind Regards
Benjamin Thaut
I have actually run that benchmark with the code from this branch:
https://github.com/jerro/pfft/tree/experimental
The only function in sse_float.d on that branch that uses inline
assembly is scalar_to_vector. The reason why I used more inline
assembly in the master branch is that DMD didn't have intrinsics
for some instructions such as shufps at the time.
I'm not really arguing for DMD's codegen with SIMD intrinsics.
It's more that, from what I've seen, it doesn't produce very good
scalar floating point code either (at least when compared to LDC
or GDC). Whether I use scalar floating point or SIMD, pfft is
about two times slower if I compile it with DMD than it is if I
compile it with GDC.