On Wednesday, 7 September 2016 at 01:38:47 UTC, Manu wrote:
On 7 September 2016 at 11:04, finalpatch via Digitalmars-d <digitalmars-d@puremagic.com> wrote:

It shouldn't be hard to have the framework look at the buffer size and choose the scalar version when number of elements are small, it wasn't done that way simply because we didn't need it.

No, what's hard is working this into D's pipeline patterns seamlessly.

The lesson I learned from this is that you need the user code to provide a lot of extra information about the algorithm at compile time for the templates to work out a way to fuse pipeline stages together efficiently.

I believe it is possible to get something similar in D because D has more powerful templates than C++ and D also has some type introspection which C++ lacks. Unfortunately I'm not as good on D so I can only provide some ideas rather than actual working code.

Once this problem is solved, the benefit is huge. It allowed me to perform high level optimizations (streaming load/save, prefetching, dynamic dispatching depending on data alignment etc.) in the main loop which automatically benefits all kernels and pipelines.

Reply via email to