On Saturday, 22 June 2013 at 15:41:43 UTC, Benjamin Thaut wrote:
Am 22.06.2013 15:53, schrieb jerro:
In its current state you don't want to be using SIMD with dmd because the generated assembly will be significantly slower then if you just
use the default FPU math.

That may be true for some kinds of code, but it isn't true int general. For example, see the comparison of pfft's performance when built with 64
bit DMD using SIMD and without SIMD:

http://i.imgur.com/kYYI9R9.png

This benchmark was run on a core i5 2500K on 64 bit Debian Wheezy.

Ok I saw that you did write quite a few cirtical functions in inline assembly. Not really a good argument for dmds codegen with simd intrinsics.

Kind Regards
Benjamin Thaut

I have actually run that benchmark with the code from this branch:

https://github.com/jerro/pfft/tree/experimental

The only function in sse_float.d on that branch that uses inline assembly is scalar_to_vector. The reason why I used more inline assembly in the master branch is that DMD didn't have intrinsics for some instructions such as shufps at the time.

I'm not really arguing for DMD's codegen with SIMD intrinsics. It's more that, from what I've seen, it doesn't produce very good scalar floating point code either (at least when compared to LDC or GDC). Whether I use scalar floating point or SIMD, pfft is about two times slower if I compile it with DMD than it is if I compile it with GDC.

Reply via email to