Re: core.simd woes

jerro Tue, 02 Oct 2012 03:50:38 -0700

On Tuesday, 2 October 2012 at 08:17:33 UTC, Manu wrote:

On 7 August 2012 16:56, jerro <a...@a.com> wrote:
That said, almost all simd opcodes are directly accessible instd.simd.
There are relatively few obscure operations that don't have arepresenting
function.
The unpck/shuf example above for instance, they botheffectively perform a
sort of swizzle, and both are accessible through swizzle!().
They aren't. Swizzle only takes one argument, so you cant useit to selectelements from two vectors. Both unpcklps and shufps take twoarguments.
Writing a swizzle with two arguments would be much harder.
Any usages I've missed/haven't thought of; I'm all ears.

I don't think it is possible to think of all usages of this, butfor every simd instruction there are valid usages. At least forwriting pfft, I found shuffling two vectors very useful. For,example, I needed a function that takes a small, square, power oftwo number of elements stored in vectors and bit-reverses them -it rearanges them so that you can calculate the new index of eachelement by reversing bits of the old index (for 16 elements using4 element vectors this can actually be done usingstd.simd.transpose, but for AVX it was more efficient to makethis function work on 64 elements). There are other places inpfft where I need to select elements from two vectors (forexample, herehttps://github.com/jerro/pfft/blob/sine-transform/pfft/avx_float.d#L141is the platform specific code for AVX).

I don't think this are the kind of things that should beimplemented in std.simd. If you wanted to implement all suchoperations (for example bit reversing a small array) thatsomebody may find useful at some time, std.simd would need to behuge, and most of it would never be used.

I can imagine, I'll have a go at it... it's something Iconsidered, but not
all architectures can do it efficiently.
That said, a most-efficient implementation would probably stillbe usefulon all architectures, but for cross platform code, I usuallyprefer toencourage people taking another approach rather than supply afunction that
is not particularly portable (or not efficient when ported).

One way to do it would be to do the following for every set ofselected indices: go through all the two element one instructionoperations, and check if any of them does exactly what you need,and use it if it does. Otherwise do something that will alwayswork although it may not always be optimal. One option would beto use swizzle on both vectors to get each of the elements totheir final index and then blend the two vectors together. Forsse 1, 2 and 3 you would need to use xorps to blend them, so Iguess this is one more place where you would need vector literals.

Someone who knows which two element shuffling operations theplatform supports could still write optimal platform specific(but portable across compilers) code this way and for others thiswould still be useful to some degree (the documentation shouldmention that it may not be very efficient, though). But I thinkthat it would be better to have platform specific APIs forplatform specific code, as I said earlier in this thread.

Unfortunately I can't, at least not a clean one. Using stringmixins wouldbe one way but I think no one wants that kind of API inDruntime or Phobos.
Yeah, absolutely not.
This is possibly the most compelling motivation behind a__forceinline
mechanism that I've seen come up... ;)

 I'm already unhappy that
std.simd produces redundant function calls.

<rant> please  please please can haz __forceinline! </rant>
I agree that we need that.
Huzzah! :)


Walter opposes this, right? I wonder how we could convince him.

There's one more thing that I wanted to ask you. If I were to addLDC support to std.simd, should I just add version(LDC) blocks toall the functions? Sounds like a lot of duplicated code...

Re: core.simd woes

Reply via email to