Re: SSE in D

bearophile Sat, 02 Oct 2010 18:35:37 -0700

Emil Madsen:

You are asking many different things, let's disentangle your questions a little.


>Is there a D equivalent of the "xmmintrin.h", or any other convenient way of 
>doing SSE in D?<

D2 language is not designed to be an academic language, it's designed to be a 
reasonably practical language (despite some of its feature are not just buggy 
or unfinished, but also contain new design ideas, that far from being "battle 
tested", so no one knows if they will actually turn out to be good in large or 
very large D2 programs).

But its implementation is not fully practical yet. In a compiler like GCC you 
may see a ton of dirty or smelly little features that turn out being 
practically useful or even almost necessary for real-world code, that are 
absent from the C standard. The D2 compiler lacks a big amount of such dirty 
utility corner cases. Even the (D1) compiler LDC shows some of such necessary 
dirty little features, like the allow_inline pragma to allow inlining of 
functions that contain asm, and so on. I guess that when D2 will be more 
finished, and some people will write a more efficient implementation of D2, 
those little smelly things will be added in abundance.

The xmmintrin little dirty intrinsics are absent from DMD and D, both in 
practice and by design. GCC C is not designed much, they just add those SIMD 
operations to the ball of mud named GNU C (plus handy operator overloading if 
you want to sum or mult two registers represented as special arrays of doubles 
or floats or ints). D here is designed in a bit more idealistic way, and it 
tries to be semantically cleaner, so instead of those intrinsics, you are 
supposed to use vectorial operations done on arrays (both static and dynamic).

Many of such operations are already implemented and more or less they work, but 
unless your arrays are large, they actually usually slow down your code, 
because they are chunks of pre-written asm (that use SSE+ registers too) 
designed for large arrays, are they are not inlined. In theory in future the D 
front-end will be able to replace a sum of two 4-float static arrays with a 
single SSE instruction (or little more) (if you have compiled the code for 
SSE-enabled CPUs). In practice DMD is far from this point, and the development 
efforts are (rightly!) focused on finishing core features and removing the 
worst implementation (or even design) bugs. Optimization of code generation 
matters are for later.


> - I've been looking into the Array Operators, but will those work, for
> instance if I'm doing something alike:
> a[3], b[4]
> c[4] = a+b;

The right D syntax is:

float[4] a, b, c;
c[] = a[] + b[];

You must always use [] after the array name. Arrays must have the same length.

And currently you can't use this syntax:

void main() {
    float[4] a, b;
    float[4] c[] = a[] + b[];
}


That gives the error:

test.d(3): Error: cannot implicitly convert expression (a[] + b[]) of type 
float[] to float[4u][]

Probably because of a unforeseen design bug that causes such collision between 
D and C syntax that is accepted still in D.

See this bug report for more info about this design problem, that so far most 
people (including the main designers) seem to happily ignore:
http://d.puremagic.com/issues/show_bug.cgi?id=3971
Here I have suggested a possible solution, the introduction of a -cstyle 
compiler flag, that was ignored even more:
http://d.puremagic.com/issues/show_bug.cgi?id=4580

So this code works:

void main() {
    float[4] a, b, c;
    c[] = a[] + b[];
}

But it performs a call to the asm routine that performs the vector c=a+b in 
assembly, that uses SSE registers too if your CPU (detected at runtime) 
supports them.


> and when will the compiler write SSE asm for the array operators?

DMD currently never writes SSE asm, unless you use those asm instructions in 
inlined asm code. The 64 bit DMD will probably be able to use those registers 
too, but I have no idea if then 32 bit DMD too will use them, I hope so, but I 
have little hope. I'd like to know this.

D1 LDC now uses SSE registers for most of its floating point operations because 
LLVM is very bad in using the X86 floating point stack.

Low-level D code written for D1 ldc is usually about as efficient as C code 
written for GCC. This is a very good thing. But recently the development of LDC 
has slowed down a lot, and there is no D2 version of it, it's not updated to 
the latest versions of LLVM and there's no Windows support because LLVM devs 
are paid by Apple and they don't care to make LLVM work fully (== with 
exceptions too) for Windows too, they just need to give to people the illusion 
that LLVM is multi-platform. I used to help LLVM development, but I have 
stopped until they will add a good support of exceptions on Windows.

There is a GCC-based D compiler too, named GDC, and I think it works, but I 
have never appreciated it much on Windows. Other people may give you 
more/better info on it.


> - is there a target=architecture for the compiler? or will it simply write
> SSE if one defines something alike -msse4? -

LDC D1 allows you to specify the target a little, while I think DMD always 
targets a Pentium1.


> I'm having a bit of trouble finding stuff
> about SSE for D, sources on the subject anyone?

There is not much to search :-)

Bye,
bearophile

Re: SSE in D

Reply via email to