Re: An IO Streams Library

Jason White via Digitalmars-d Sun, 07 Feb 2016 17:05:43 -0800

On Sunday, 7 February 2016 at 10:50:24 UTC, Johannes Pfau wrote:

I saw this on code.dlang.org some time ago and had a quicklook. First of all this would have to go into phobos to makesure it's used as some kind of a standard. Conflicting streamlibraries would only cause more trouble.
Then if you want to go for phobos inclusion I'd recommendlooking atother stream implementations and learning from their mistakes;-)
There's
https://github.com/schveiguy/phobos/tree/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io
which was supposed to be a stream replacement for phobos. Thenthere
are also vibe.d streams*.

I saw Steven's stream implementation quite some time ago and Ihad a look at vibe's stream implementation just now. I think itis a mistake to use classes over structs for this sort of thing.I briefly tried implementing it with classes, but ran intoproblems. The non-deterministic destruction of classes isprobably the biggest issue. One has to be careful about callingf.close() in order to avoid accumulating too many open filedescriptors in programs that open a lot of files. Referencecounting takes care of this problem nicely and has less overhead.This is one area where classes relying on the GC is not ideal.Rust's ownership system solves this problem quite well. Pythonalso solves this with "with" statements.

Your Stream interfaces looks like standard streamimplementations (whichis a good thing) which also work for unbuffered streams. Ithink it's agood idea to support partial reads and writes. For anexplanation whypartial reads, see the vibe.d rant below. Partial writes areusefulas a write syscall can be interrupted by posix signals to stopthewrite. I'm not sure if the API should expose this feature (e.g.byreturning a partial write on EINTR) but it can sometimes beuseful.

I don't want to assume what the user wants to do in the event ofan EINTR unless a certain behavior is desired 100% of the time. Idon't think that is the case here. Thus, that is probablysomething the user should handle manually, if needed.

Still readExactly / writeAll helpers functions are useful. Iwould tryto implement these as UFCS functions instead of as a structwrapper.


I agree. I went ahead and made that change.

For some streams you'll need a TimeoutException. An interesting
question is whether users should be able to recover from
TimeoutExceptions. This essentially means if a read/writefunctioninternally calls read/write posix calls more than once and onlythe
last one timed out, we already processed some data and it's not
possible to recover from a TimeoutException if the amount ofalready
processed data is unknown.
The simplest solution is using only one syscall internally. Then
TimeoutException => no data was processed. But this doesn'twork forread/writeExcatly (Another reason why read/writeExactlyshouldn't be
the default. vibe.d...)

In the current implementation of readExactly/writeExactly, onecannot assume how much was read or written in the event of anexception anyway. The only way around this I can see is to returnthe number of bytes read/written in the exception itself. Infact, that might solve the TimeoutException problem, too. Hmm...

I'd like to keep the fundamental read/write functions at just onesystem call each in order to guarantee that they are atomic inrelation to each other.

Regarding buffers / sliding windows I'd have a look athttps://github.com/schveiguy/phobos/blob/babe9fe338f03cafc0fb50fc0d37ea96505da3e3/std/io/buffer.d
Another design question is whether there should be an interfacefor such buffered streams or whether it's OK to have onlyunbuffered streams + one buffer struct / class. Basically thequestion is whether there might be streams that can offer abuffer interface but can't use the standard implementation.

I think it's OK to re-implement buffering for different types ofstreams where it is more efficient to do so. For example, thereis no need to implement buffering for an in-memory streambecause, by definition, it is already buffered.

I'm not sure if having multiple buffering strategies would beuseful. Right now, there is only the fixed-sized sliding window.If multiple buffering strategies are useful, then it makes senseto have all streams unbuffered by default and have separatebuffering implementations.

There is an interesting buffering approach here that is mainlygeared towards parsing:https://github.com/DmitryOlshansky/datapicked/blob/master/dpick/buffer/buffer.d

* vibe.d stream rant ahead:
vibe.d streams get some things right and some things verywrong. Forexample their leastSize/empty/read combo means you mightactuallyhave to implement reading data in any of these functions. Usershave to
handle timeouts or other errors for any of these as well.
Then the API requires a buffered stream, it simply won't workforunbuffered IO (leastSize, empty). And the fact that read readsexactlyn bytes makes stream implementations more complicated(re-reading untilenough data has been read should be done by a generic function,notreimplemented in every stream). It even makes some user codemore
complicated: I've implemented a serial port library for vibe-d.
If I don't know how many bytes will arrive with the nextpacket, theread posix function usually returns the expected/availableamount ofdata. But now vibe.d requires me to specify a fixed length whencalling
the stream read method. This leads to ugly code using peak...
Then vibe.d also mixes the sliding window / buffer concept intothe stream class, but does so in a bad way. A sliding windowshould expose the internal buffer so that it's possible toconsume bytes from the buffer, skip bytes, refill... In vibe.dyou can peak at the buffer. But you can't discard data. You'llhave to call read instead which copies from the internal bufferto an external buffer, even if you only want to skip data. Evenworse, your external buffer size is limited. So you have toimplement some loop logic if you want to skip more data thanfits your buffer. And all you need is a discard(size_t n)function which does _buffer = _buffer[n .. $] in the streamclass...

These are the golden nuggets of experience I was looking for whenmaking this post. They definitely help to guide an ergonomic APIdesign. Standing on the shoulders of giants and such. Thanks!

TLDR: API design is very important.


Completely agree.

Re: An IO Streams Library

Reply via email to