On Wednesday, 7 September 2016 at 10:31:13 UTC, finalpatch wrote:
I think the problem here is two fold.

First question, how do we combine pipeline stages with minimal overhead

I think the key to this problem is reliable *forceinline*

for example, a pipeline like this

input.map!(x=>x.f1().f2().f3().store(output));

if we could make sure f1(), f2(), f3(), store(), and map() itself are all inlined, then we end up with a single loop with no function calls and the compiler is free to perform cross function optimizations. This is about as good as you can get. Unfortunately at the moment I hear it's difficult to make sure D functions get inlined.


If the compiler is unable to inline (or wrongly decides it is too costly), I'd consider this a compiler bug. Of course, sometimes workarounds like `pragma(inline, true)` or `@forceinline` might be needed from time to time in practice, but they shouldn't influence the design of the pipeline interface.

Second question, how do we combine SIMD pipeline stages with minimal overhead

Besides reliable inlining, we also need some template code to repeat stages until their strides match. This requires details about each stage's logical unit size, input/output type and size at compile time. I can't think of what the interface of this would look like but the current map!() is likely insufficient to support this.

Would a `vectorize` range adapter be feasible that prepares the input to make it SIMD compatible? That is, force alignment, process left-over elements at the end, etc.? As far as I understand, the problems with auto vectorization stem from a difficulty of compilers to recognize vectorizing opportunities, and (as Manu described) from incompatible semantics of scalar and vector types that the compiler needs to preserve. But if that hypothetical `vectorize` helper forces the input data into one of a handful of well-known formats and types, wouldn't it be possible to make the compilers recognize those (especially if they are accompanied by suitable pragma or other compiler hints)?


I still don't believe auto-select between scalar or vector paths would be a very useful feature. Normally I would only consider SIMD solution when I know in advance that this is a performance hotspot. When the amount of data is small I simply don't care about performance and would just choose whatever simplest way to do it, like map!(), because the performance impact is not noticeable and definitely not worth the increased complexity.

In the above scenario, you can add `.vectorize` to the pipeline to enable vectorizing wherever you need it.

Reply via email to