Re: Taking pipeline processing to the next level

finalpatch via Digitalmars-d Wed, 07 Sep 2016 03:36:07 -0700

On Wednesday, 7 September 2016 at 02:09:17 UTC, Manu wrote:

The lesson I learned from this is that you need the user codeto provide a lot of extra information about the algorithm atcompile time for the templates to work out a way to fusepipeline stages together efficiently.
I believe it is possible to get something similar in D becauseD has more powerful templates than C++ and D also has sometype introspection which C++ lacks. Unfortunately I'm not asgood on D so I can only provide some ideas rather than actualworking code.
Once this problem is solved, the benefit is huge. It allowedme to perform high level optimizations (streaming load/save,prefetching, dynamic dispatching depending on data alignmentetc.) in the main loop which automatically benefits allkernels and pipelines.
Exactly!


I think the problem here is two fold.

First question, how do we combine pipeline stages with minimaloverhead


I think the key to this problem is reliable *forceinline*

for example, a pipeline like this

input.map!(x=>x.f1().f2().f3().store(output));

if we could make sure f1(), f2(), f3(), store(), and map() itselfare all inlined, then we end up with a single loop with nofunction calls and the compiler is free to perform cross functionoptimizations. This is about as good as you can get.Unfortunately at the moment I hear it's difficult to make sure Dfunctions get inlined.

Second question, how do we combine SIMD pipeline stages withminimal overhead

Besides reliable inlining, we also need some template code torepeat stages until their strides match. This requires detailsabout each stage's logical unit size, input/output type and sizeat compile time. I can't think of what the interface of thiswould look like but the current map!() is likely insufficient tosupport this.

I still don't believe auto-select between scalar or vector pathswould be a very useful feature. Normally I would only considerSIMD solution when I know in advance that this is a performancehotspot. When the amount of data is small I simply don't careabout performance and would just choose whatever simplest way todo it, like map!(), because the performance impact is notnoticeable and definitely not worth the increased complexity.

Re: Taking pipeline processing to the next level

Reply via email to