On 9 October 2012 20:46, jerro <a...@a.com> wrote: > On Tuesday, 9 October 2012 at 16:59:58 UTC, Jacob Carlborg wrote: > >> On 2012-10-09 16:52, Simen Kjaeraas wrote: >> >> Nope, like: >>> >>> module std.simd; >>> >>> version(Linux64) { >>> public import std.internal.simd_linux64; >>> } >>> >>> >>> Then all std.internal.simd_* modules have the same public interface, and >>> only the version that fits /your/ platform will be included. >>> >> >> Exactly, what he said. >> > > I'm guessing the platform in this case would be the CPU architecture, > since that determines what SIMD instructions are available, not the OS. But > anyway, this does not address the problem Manu was talking about. The > problem is that the API for the intrisics for the same architecture is not > consistent across compilers. So for example, if you wanted to generate the > instruction "movaps XMM1, XMM2, 0x88" (this extracts all even elements from > two vectors), you would need to write: > > version(GNU) > { > return __builtin_ia32_shufps(a, b, 0x88); > } > else version(LDC) > { > return shufflevector(a, b, 0, 2, 4, 6); > } > else version(DMD) > { > // can't do that in DMD yet, but the way to do it will probably be > different from the way it is done in LDC and GDC > } > > What Manu meant with having std.simd.sse and std.simd.neon was to have > modules that would provide access to the platform dependent instructions > that would be portable across compilers. So for the shufps instruction > above you would have something like this ins std.simd.sse: > > float4 shufps(int i0, int i1, int i2, int i3)(float4 a, float4 b){ ... } > > std.simd currently takes care of cases when the code can be written in a > cross platform way. But when you need to use platform specific instructions > directly, std.simd doesn't currently help you, while std.simd.sse, > std.simd.neon and others would. What Manu is worried about is that having > instructions wrapped in another level of functions would hurt performance. > It certainly would slow things down in debug builds (and IIRC he has > written in his previous posts that he does care about that). I don't think > it would make much of a difference when compiled with optimizations turned > on, at least not with LDC and GDC. >
Perfect! You saved me writing anything at all ;) I do indeed care about debug builds, but one interesting possibility that I discussed with Walter last week was a #pragma inline statement, which may force-enable inlining even in debug. I'm not sure how that would translate to GDC/LDC, and that's an important consideration. I'd also like to prove that the code-gen does work well with 2 or 3 levels of inlining, and that the optimiser is still able to perform sensible code reordering in the target context.