Re: buffered input

Michel Fortin Fri, 04 Feb 2011 23:50:53 -0800

On 2011-02-05 00:46:40 -0500, Andrei Alexandrescu<seewebsiteforem...@erdani.org> said:

I've had the opportunity today to put some solid hours of thinking intothe relationship (better said the relatedness) of what would be called"buffered streams" and ranges. They have some commonalities and somedifferences, but it's been difficult to capture them. I think now Ihave a clear view, caused by a few recent discussions. One was the CSVreader discussed on the Phobos list; another was the discussion ondefining the "right" std.xml.
First, let's start with the humblest abstraction of all - an inputrange, which only defines the troika empty/front/popFront with theknown semantics.
An input range consumes input destructively and has a one-elementhorizon. It may as well considered a buffered stream with the bufferlength exactly one.
Going from there, we may say that certain streaming can be done byusing an input range of ubyte (or dchar for text). That would be theUTFpowered equivalent of getchar(). The readf function operates thatway - it only needs to look one character ahead. Incidentally, the CSVformat also requires lookahead of 1, so it also can operate on a rangeof dchar.
At this point we need to ask ourselves an essential question. Since wehave this "input range" abstraction for a 1-element buffer, what wouldits n-elements buffer representation look like? How do we go from"input range of T" (which really is "unbuffered input range of T" to"buffered input range of T"?
Honestly, the answer was extremely unclear to me for the longest time.I thought that such a range would be an extension of the unbufferedone, e.g. a range that still offers T from front() but also offers someadditional functions - e.g. a lookahead in the form of a random-accessoperator. I still think something can be defined along those lines, buttoday I came together with a design that is considerably simpler bothfor the client and the designer of the range.
I hereby suggest we define "buffered input range of T" any range R thatsatisfies the following conditions:
1. R is an input range of T[]
2. R defines a primitive shiftFront(size_t n). The semantics of theprimitive is that, if r.front.length >= n, then shiftFront(n) discardsthe first n elements in r.front. Subsequently r.front will return aslice of the remaining elements.
3. R defines a primitive appendToFront(size_t n). Semantics: adds atmost n more elements from the underlying stream and makes themavailable in addition to whatever was in front. For example ifr.front.length was 1024, after the call r.appendToFront(512) will haver.front have length 1536 of which the first 1024 will be the old frontand the rest will be newly-read elements (assuming that the stream hadenough data). If n = 0, this instructs the stream to add any number ofelements at its own discretion.
This is it. I like many things about this design, although I still fearsome fatal flaw may be found with it.
With these primitives a lot of good operating operating on bufferedstreams can be written efficiently. The range is allowed to reuse datain its buffers (unless that would contradict language invariants, e.g.if T is invariant), so if client code wants to stash away parts of theinput, it needs to make a copy.
One great thing is that buffered ranges as defined above play very wellwith both ranges and built-in arrays - two quintessential parts of D. Ilook at this and say, "this all makes sense". For example the designcould be generalized to operate on some random-access range other thanthe built-in array, but then I'm thinking, unless some advantage comesabout, why not giving T[] a little special status? Probably everyonethinks of contiguous memory when thinking "buffers", so heregeneralization may be excessive (albeit meaningful).
Finally, this design is very easy to experiment with and causes nodisruption to ranges. I can readily add the primitives to byLine andbyChunk so we can see what streaming we can do that way.
What do you all think?

One thing I'm wondering is whether it'd be more efficient if we couldprovide our own buffer to be filled. In cases where you want topreserve the data, this could let you avoid double-copying: first copyin the temporary buffer and then at the permanent storage location. Ifyou need the data only temporarily however providing your buffer to befilled might be less efficient for a range that can't avoid copying tothe temporary buffer for some reason..

Overall, it looks like a good design. It's quite low-level, but that'snot a bad thing. I'll have to think a little to see how I couldintegrate it into my XML parser (which only deal with complete files inmemory at this time). Being able to say "fill this buffer" wouldcertainly make things easier for me.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Re: buffered input

Reply via email to