Emil Madsen:

>>uses SSE registers too if your CPU (detected at runtime) supports them.

>How is this done? - using codepaths after a call to cpuid?

In your dmd distribution there is compiler/druntime/phobos source code too, 
take a peek there. This souce code shows you how it's done:

http://www.dsource.org/projects/druntime/browser/trunk/src/rt/arraydouble.d


> what if I want to for instance shuffle? - would it
> be possible to overload >> for that, or something? and how would it shuffle?
> 4 elements or the entire thing? - Say I want to shuffle elements once to the
> right like this:
> a b c d

At the moment I think you have to write a little function that performs the 
shuffle (and if it contains asm it will not be inlined). A similar solution is 
to use a little shuffling struct that uses opDispatch to give a nice shuffling 
syntax.
You may also use a string mixin, if your asm code must be inlined, but this is 
not nice.
I think currently there is no very good way to do what you need to do. I think 
Don or someone else will need to invent something good enough for the efficient 
shuffling :-)


> implementing
> "xmmintrin.h" using bits of small inline asm? - that however wouldn't yield
> any speed, if its not getting inlined?

In DMD functions that contain asm don't get inlined, so those small snippets 
become kind of useless if your purpose is max performance. 

LDC (D1) compiler being more practical has two different ways to do what you 
need to do, the pragma(allow_inline):
http://www.dsource.org/projects/ldc/wiki/Docs#allow_inline
And Inline Assembly Expressions:
http://www.dsource.org/projects/ldc/wiki/InlineAsmExpressions

In DMD you probably have to build your code as string at compile-time and then 
mix-in in the normal code. This is not handy nor clean, but it may work.

Bye,
bearophile

Reply via email to