On Sat, 03 Sep 2011 15:54:05 -0400, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

Hello,


There are a number of issues related to D's current handling of streams, including the existence of the imperfect etc.stream and the over-specialization of std.stdio.

Steve has worked on an extensive overhaul of std.stdio which would obviate the need for etc.stream and would improve both the generality and efficiency of std.stdio.

Please chime in with feedback; he's away from the Usenet but allowed me to post this on his behalf. I uploaded the docs to

http://erdani.com/d/new-stdio/phobos-prerelease/std_stdio.html


Thank you Andrei for posting this. Before I add some more details, let me first say, this is a very early version, but it does work (and spanks the pants off of the current stdio in the tests I've run).

I'll add several very important things:

1. At the moment, this is written for Linux *ONLY*. I have very good experience with Windows i/o, and I am 100% certain I can implement this library for it. However, it's not my main OS, so I wanted to first get something working with my main working environment. 2. This is *not* currently multithread aware. But it will be. However, I think one important aspect to consider is to make a *thread-local* aware i/o library to avoid unnecessary locking when an i/o connection is only used in one thread. But please leave that part alone for now, I'm working on how to make the code reusable as shared types. Actually, if anyone has good ideas on that, please share! 3. Although I am dead-set on getting *something* into Phobos, I am not attached at all to the symbol names, or even some major design choices. I have seen so far it's one of the major concerns, and I think we can find good names. The names I came up with are not exactly arbitrary, but they are somewhat based on earlier designs that I have since abandoned, so renaming is definitely in order. 4. You can get the full source here: https://github.com/schveiguy/phobos/tree/new-io I used the 2.054 stock compiler, and a version of druntime that includes Lars' new std-process changes, also on my github account: https://github.com/schveiguy/druntime/tree/new-std-process Please use those when trying out the code.

--------------------------

So let me tell you about the library design and why I did it the way I did it. Then, I'll respond to individual concerns already posted.

The major problem I think the current std.stdio has is, it's buffered solution is based on C's FILE * implementation. Specifically, we have very little control and access to the buffer implementation. I think the key (or at least one of the keys) to uber-fast I/O is trying to copy as little as possible *needlessly*. Seamless and safe buffer access I think is the key to this. In addition to that, C's FILE * has several limitations:

1. On Windows, it's based on DMC's runtime, which limits 60 simultaneous open files (Windows OS limit is 10,000 I think)
2. 64-bit support is not standard in all C implementations (namely Windows)
3. All FILE * objects are inherently shared, meaning lock-free I/O is very cumbersome, especially considering we have D's shared/unshared system. 4. C supports UTF-8, and it's supposed to support UTF-16 (but I can't get UTF-16 to work). I think D ought to support all forms of UTF, since UTF is an integral part of the language.

In addition to this, we have numerous D tools at our disposal -- delegates, closures, ranges, etc. In other words, limiting us to C's interfaces means either duct-taping on those features, or abandoning them. While a noble effort, and probably the best we could get, a prime example is the LockingFileReader range in std.stdio. Just reading it made me cringe. Have a look: https://github.com/D-Programming-Language/phobos/blob/master/std/stdio.d#L1282

I felt, we must be able to do something better.

So I started creating what I thought would be a good i/o library. I did not start from the existing code, but just rewrote everything. The basic concept is, we implement buffering once, and implement low-level devices that can be wrapped by the buffering implementation. Almost everything that would use I/O wants to use a buffered version of it, so make the low-level aggregate minimal, and put all the useful functionality into the buffer. I also wanted to make sure it is very easy to implement *efficient* ranges.

One design decision early on is that the device-level should be a class. There are a few good reasons for this:

1. an I/O device is a reference-type. Copying it does not open another handle. So even if we *wanted* structs, they would be pImpl structs. 2. One simple idea that works very well at the OS level is the file descriptor concept. The file descriptor provides an *interface* to user code for operating on a stream. And they are easily inter-changeable. This means a fd could be a network socket, a file, a pipe, a COM port, and the basic interface never changes. So we should use that same concept -- define a simple interface for a low-level device, and then you can implement the buffer around that interface. Since classes are the only types which support interfaces, I chose them.

Yes, I know classes suffer from the dreaded "I don't know when the GC is going to get around to closing this file" problem. I think though, we have ways to mediate that (I'll post some responses to points about that elsewhere in the thread).

One other important design decision I made was that the standard handles *must* be changable at runtime to C-based i/o. This was mainly to appease Walter, as he insists on having compatible I/O with C functions (such as printf). I think he has a good point, but I think limiting this to basically the standard handles is the right level of compatibility.

After going through many iterations (you can look at the github history if you are interested), I settled on this basic tree. Note that I'm very open to changing any parts of this, as long as the basic concept of a common buffer type surrounding a low-level device type is kept intact.

interface Seekable => an interface defining seek functions for a device.
interface InputStream : Seekable => an interface defining functions that can be called on an input device. This is non-buffered. interface OutputStream : Seekable => an interface defining functions that can be called on an output device. Also non-buffered.

class File : InputStream, OutputStream => The implementation for the OS handle-based input output stream. This is akin to a file descriptor. (Note, I realize this is a poor name choice for this, it should probably be changed).

final class DInput => The buffered input stream. This implements the buffer which surrounds an InputStream. final class DOutput => The buffered output stream. This implements the buffer which surrounds an OutputStream. final class CStream => A Buffered Input and output stream based on C's FILE *. This is used if you want to be compatible with C input or output, and is used in TextInput and TextOutput when using the C standard handles.

struct TextInput => A text-based input stream. This implements UTF translation of all forms and handles formatted input. Main member function is readf. struct TextOutput => A text-based output stream. This implements UTF translation of all forms and handles formatted output. Main member functions are the write* family.

It seems like a lot. But keep in mind that almost everyone will only ever used DInput, DOutput, TextInput and TextOutput. These replace the current std.stdio.File. The low level devices are for implementing low-level devices. They are not really for being used, except to wrap in a buffer. I expect that convenience functions will exist to create the correct buffered stream when given the right parameters. The most obvious example is the function openFile (which is included). The nice thing is, due to the auto return feature and templates, this takes care of some of the mess of having 4 main types to deal with.

I want to reiterate, I have created something that works, not something that is perfect. I want everyone's input on how it should be changed -- including major design decisions. I'm open to changing just about everything. The *only* major concept I want to keep is the buffering surrounding a low-level device.

Thanks for taking the time to look at this. I hope it will become good enough to be included in Phobos. I plan to do everything I can to make it happen.

-Steve

Reply via email to