On Wednesday, 7 September 2016 at 10:31:13 UTC, finalpatch wrote:
I think the problem here is two fold.
First question, how do we combine pipeline stages with minimal
overhead
I think the key to this problem is reliable *forceinline*
for example, a pipeline like this
input.map!(x=>x.f1().f2().f3().store(output));
if we could make sure f1(), f2(), f3(), store(), and map()
itself are all inlined, then we end up with a single loop with
no function calls and the compiler is free to perform cross
function optimizations. This is about as good as you can get.
Unfortunately at the moment I hear it's difficult to make sure
D functions get inlined.
If the compiler is unable to inline (or wrongly decides it is too
costly), I'd consider this a compiler bug. Of course, sometimes
workarounds like `pragma(inline, true)` or `@forceinline` might
be needed from time to time in practice, but they shouldn't
influence the design of the pipeline interface.
Second question, how do we combine SIMD pipeline stages with
minimal overhead
Besides reliable inlining, we also need some template code to
repeat stages until their strides match. This requires details
about each stage's logical unit size, input/output type and
size at compile time. I can't think of what the interface of
this would look like but the current map!() is likely
insufficient to support this.
Would a `vectorize` range adapter be feasible that prepares the
input to make it SIMD compatible? That is, force alignment,
process left-over elements at the end, etc.? As far as I
understand, the problems with auto vectorization stem from a
difficulty of compilers to recognize vectorizing opportunities,
and (as Manu described) from incompatible semantics of scalar and
vector types that the compiler needs to preserve. But if that
hypothetical `vectorize` helper forces the input data into one of
a handful of well-known formats and types, wouldn't it be
possible to make the compilers recognize those (especially if
they are accompanied by suitable pragma or other compiler hints)?
I still don't believe auto-select between scalar or vector
paths would be a very useful feature. Normally I would only
consider SIMD solution when I know in advance that this is a
performance hotspot. When the amount of data is small I simply
don't care about performance and would just choose whatever
simplest way to do it, like map!(), because the performance
impact is not noticeable and definitely not worth the increased
complexity.
In the above scenario, you can add `.vectorize` to the pipeline
to enable vectorizing wherever you need it.