On 2/6/11 3:22 EST, Jonathan M Davis wrote:
Okay. I think that I've been misunderstanding some stuff here. I forgot that we
were dealing with input ranges rather than forward ranges, and many range
functions just don't work with input ranges, since they lack save(). Bleh.

Okay. Honestly, what I'd normally want to be dealing with when reading a stream
or file is a buffered forward range which is implemented in a manner which
minimized copies. Having to deal with a input range, let alone what Andrei is
suggesting here would definitely be annoying to say the least.

Couldn't we do something which created a new buffer each time that it read in
data from a file, and then it could be a forward range with infinite look-ahead.
The cost of creating a new buffer would likely be minimal, if not outright
neglible, in comparison to reading in the data from a file, and having multiple
buffers would allow it to be a forward range. Perhaps, the creation of a new
buffer could even be skipped if save had never been called and therefore no
external references to the buffer would exist - at least as long as we're 
talking
about bytes or characters or other value types.

APIs predicated on the notion that I/O is very expensive and extra overheads are not measurable have paid dearly for it (e.g. C++'s iostreams).

Maybe there's some major flaw in that basic idea. I don't know. But Andrei's
suggestion sounds like a royal pain for basic I/O. If that's all I had to deal
with when trying to lazily read in a file and process it, I'd just use 
readText()
instead, since it would just be way easier to use.

Clearly reading the entire file in an in-memory structure simplifies things. But the proposed streaming interface is extremely convenient as it always was; the two added APIs help people who need extra flexibility without hurting efficiency.

If you want to read a file in Java: http://www.java-tips.org/java-se-tips/java.io/how-to-read-file-in-java.html

In C (with many caveats): http://www.phanderson.com/files/file_read.html

In D:

foreach (line; File("name").byLine()) {
   ...
}

I plan to add a simpler API:

foreach (line; File.byLine("name")) {
   ...
}

To read fixed-sized chunks, use byChunk. This covers the vast majority of file I/O needs.

There are two limitations of the current APIs:

1. You can't add a new line to the existing line (or a buffer to the existing buffer) if you sometimes want to process multiple lines as a logical unit (some programs and file formats need that, as well as composing streams).

2. You can't comfortably read data of user-specified size if that size varies. This is the case for e.g. binary formats where you need to read "doped chunks", i.e. chunks prefixed by their lengths.

My proposal addresses 1 and makes 2 possible.


Andrei

Reply via email to