On 7 September 2016 at 12:00, finalpatch via Digitalmars-d
<digitalmars-d@puremagic.com> wrote:
> On Wednesday, 7 September 2016 at 01:38:47 UTC, Manu wrote:
>> On 7 September 2016 at 11:04, finalpatch via Digitalmars-d
>> <digitalmars-d@puremagic.com> wrote:
>>> It shouldn't be hard to have the framework look at the buffer size and
>>> choose the scalar version when number of elements are small, it wasn't done
>>> that way simply because we didn't need it.
>> No, what's hard is working this into D's pipeline patterns seamlessly.
> The lesson I learned from this is that you need the user code to provide a
> lot of extra information about the algorithm at compile time for the
> templates to work out a way to fuse pipeline stages together efficiently.
> I believe it is possible to get something similar in D because D has more
> powerful templates than C++ and D also has some type introspection which C++
> lacks.  Unfortunately I'm not as good on D so I can only provide some ideas
> rather than actual working code.
> Once this problem is solved, the benefit is huge.  It allowed me to perform
> high level optimizations (streaming load/save, prefetching, dynamic
> dispatching depending on data alignment etc.) in the main loop which
> automatically benefits all kernels and pipelines.


Reply via email to