On 02/04/2011 11:46 PM, Andrei Alexandrescu wrote:
I've had the opportunity today to put some solid hours of thinking into
the relationship (better said the relatedness) of what would be called
"buffered streams" and ranges. They have some commonalities and some
differences, but it's been difficult to capture them. I think now I have
a clear view, caused by a few recent discussions. One was the CSV reader
discussed on the Phobos list; another was the discussion on defining the
"right" std.xml.
First, let's start with the humblest abstraction of all - an input
range, which only defines the troika empty/front/popFront with the known
semantics.
An input range consumes input destructively and has a one-element
horizon. It may as well considered a buffered stream with the buffer
length exactly one.
Going from there, we may say that certain streaming can be done by using
an input range of ubyte (or dchar for text). That would be the
UTFpowered equivalent of getchar(). The readf function operates that way
- it only needs to look one character ahead. Incidentally, the CSV
format also requires lookahead of 1, so it also can operate on a range
of dchar.
At this point we need to ask ourselves an essential question. Since we
have this "input range" abstraction for a 1-element buffer, what would
its n-elements buffer representation look like? How do we go from "input
range of T" (which really is "unbuffered input range of T" to "buffered
input range of T"?
Honestly, the answer was extremely unclear to me for the longest time. I
thought that such a range would be an extension of the unbuffered one,
e.g. a range that still offers T from front() but also offers some
additional functions - e.g. a lookahead in the form of a random-access
operator. I still think something can be defined along those lines, but
today I came together with a design that is considerably simpler both
for the client and the designer of the range.
I hereby suggest we define "buffered input range of T" any range R that
satisfies the following conditions:
1. R is an input range of T[]
2. R defines a primitive shiftFront(size_t n). The semantics of the
primitive is that, if r.front.length >= n, then shiftFront(n) discards
the first n elements in r.front. Subsequently r.front will return a
slice of the remaining elements.
Does shiftFront literally move element n to index 0 and so on? It seems
to me that if you do, its going to have horrid performance, and if you
don't, then you will eventually run into situations where appendToFront
will require a wrap around, which loses you your contiguity, or a
reallocation of the buffer.