Re: protocol for using InputRanges

Andrei Alexandrescu Tue, 25 Mar 2014 14:31:27 -0700

I thought I had only 1-2 comments, but I have a few more.


On 3/25/14, 1:15 PM, Walter Bright wrote:

1. the protocol is COMPLETELY undocumented and undefined.


s/COMPLETELY/loosely/

Since ranges+algorithms are central to D, this is a very serious
problem.


Agreed.

We want to appeal to the high performance coders. To maximize
performance, ranges should be optimized to the inner loop case, which is:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }


Actually generic code often prefers:

while (!r.empty) { ... r.front ...; r.popFront(); }

That saves copying r.front if it returns by ref. A bunch of algos in stddo that.

Since front is guaranteed to succeed if !empty, this puts a requirement
on many ranges that they have a non-trivial constructor that 'primes the
pump'. Of course, priming may fail, and so construction may throw, which
is not good design.


Again, I disagree with this assertion.

Next, priming requires a buffer in the range.

Priming has nothing do to with the range being buffered. The entiredesign of ranges is a one-element buffer.

Buffers add size, making them slower to copy, and will often require
another field saying if the buffer has data in it or not, further
bumping the size.


That's an argument for adding an unbuffered range abstraction.

All this saves for the user is one call to empty for the entire
algorithm, at a cost incurred with every iteration. I.e. it selects O(n)
to save O(1).


I don't understand how that O() reasoning works.

B) I want to be able to call front multiple times in a row, and expect
to get the same value.


Correct.

This can force the range to incorporate a one element buffer and a flag
indicating the status of that buffer.


It may, but many ranges naturally work that way (e.g. arrays).

The range instance gets bigger and
more expensive to copy, and the cost of manipulating the flag and the
buffer is added to every loop iteration. Note that the user of a range
can trivially use:
      auto e = r.front;
      ... using e multiple times ...
instead.


That would pessimize code using arrays of large structs.

The big problem with (A) and (B) is these costs become present in every
range, despite being rarely needed and having simple alternatives. This
is the wrong place to put cost and complexity. The user should be making
these decisions based on his needs.

Hence, I propose that the protocol for using input ranges be defined as:

     while (!r.empty) { auto e = r.front; ... e ...; r.popFront(); }

This makes it possible to build pipelines that are firehoses with no
kinks or constrictions in them. It optimizes for the inner loop case,
not boundary cases.

The proposed protocol pessimizes arrays of large structs and will tripthe unwary if calling r.front again returns something else. I don'tthink the proposal is good.



Andrei

Re: protocol for using InputRanges

Reply via email to