On 9 October 2016 at 18:25, Nicholas Wilson via Digitalmars-d <digitalmars-d@puremagic.com> wrote: > On Sunday, 9 October 2016 at 05:21:32 UTC, Manu wrote: >> >> On 9 October 2016 at 14:03, Nicholas Wilson via Digitalmars-d >> <digitalmars-d@puremagic.com> wrote: >>> >>> How far would `r.inBatchesOf!(N)` go in terms of compiler optimisations >>> (e.g. vectorisation) if N is a power of 2? >>> >>> auto inBatchesOf(size_t N,R)(R r) if(N!=0 &&isInputRange!R && >>> hasLength!R) >>> { >>> struct InBatchesOfN >>> { >>> R r; >>> ElementType!(R)[N] batch; >>> this(R _r) >>> { >>> assert(_r.length % N ==0);// could have overloads where >>> undefined elements == ElementType!(R).init >>> r = _r; >>> foreach( i; 0..N) >>> { >>> batch[i] = r.front; >>> r.popFront; >>> } >>> } >>> >>> bool empty() { return r.empty; } >>> auto front { return batch; } >>> void popFront() >>> { >>> foreach( i; 0..N) >>> { >>> batch[i] = r.front; >>> r.popFront; >>> } >>> } >>> } >>> >>> return InBatchesOfN(r); >>> } >> >> >> Well the trouble is the lambda that you might give to 'map' won't work >> anymore. Operators don't work on batches, you need to use a completely >> different API, and I think that's unfortunate. > > > How? All you need is an extra `each` e.g. r.inBatchesOf!(8).each!(a > =>a[].map!(convertColor!RGBA8)) > > perhaps define a helper function for it that does each + the explicit slice > + map, but it certainly doesn't scream completely different API to me.
As you demonstrate; convertColor doesn't accept RGBA8[16], it accepts a single RGBA8... there's no way the optimiser will be able to magic-up an efficient inline of convertColor which works with 16 elements at a time, but I could easily write a super-fast version by hand. My point about the separate API is, any function that works on a single element would need a compliment of functions that work on 'n' elements, where 'n' is some context-specific number of elements that suits that particular workload. Now, that's conceivable, and it's even possible to make the magic meta that calls these functions work out there is a batch overload and call it if it can, but we're miles away from std.algorithm and common ranges now. The other issue is that every such efficient batch version would need to be hand-written, and that sucks because there are too many permutations.