Re: Taking pipeline processing to the next level

Marc Schütz via Digitalmars-d Wed, 07 Sep 2016 04:56:10 -0700

On Wednesday, 7 September 2016 at 10:31:13 UTC, finalpatch wrote:

I think the problem here is two fold.
First question, how do we combine pipeline stages with minimaloverhead
I think the key to this problem is reliable *forceinline*

for example, a pipeline like this

input.map!(x=>x.f1().f2().f3().store(output));
if we could make sure f1(), f2(), f3(), store(), and map()itself are all inlined, then we end up with a single loop withno function calls and the compiler is free to perform crossfunction optimizations. This is about as good as you can get.Unfortunately at the moment I hear it's difficult to make sureD functions get inlined.

If the compiler is unable to inline (or wrongly decides it is toocostly), I'd consider this a compiler bug. Of course, sometimesworkarounds like `pragma(inline, true)` or `@forceinline` mightbe needed from time to time in practice, but they shouldn'tinfluence the design of the pipeline interface.

Second question, how do we combine SIMD pipeline stages withminimal overhead
Besides reliable inlining, we also need some template code torepeat stages until their strides match. This requires detailsabout each stage's logical unit size, input/output type andsize at compile time. I can't think of what the interface ofthis would look like but the current map!() is likelyinsufficient to support this.

Would a `vectorize` range adapter be feasible that prepares theinput to make it SIMD compatible? That is, force alignment,process left-over elements at the end, etc.? As far as Iunderstand, the problems with auto vectorization stem from adifficulty of compilers to recognize vectorizing opportunities,and (as Manu described) from incompatible semantics of scalar andvector types that the compiler needs to preserve. But if thathypothetical `vectorize` helper forces the input data into one ofa handful of well-known formats and types, wouldn't it bepossible to make the compilers recognize those (especially ifthey are accompanied by suitable pragma or other compiler hints)?

I still don't believe auto-select between scalar or vectorpaths would be a very useful feature. Normally I would onlyconsider SIMD solution when I know in advance that this is aperformance hotspot. When the amount of data is small I simplydon't care about performance and would just choose whateversimplest way to do it, like map!(), because the performanceimpact is not noticeable and definitely not worth the increasedcomplexity.

In the above scenario, you can add `.vectorize` to the pipelineto enable vectorizing wherever you need it.

Re: Taking pipeline processing to the next level

Reply via email to