Re: Streaming library

Denis Koroskin Wed, 13 Oct 2010 09:20:52 -0700

On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu<seewebsiteforem...@erdani.org> wrote:

On 10/11/2010 07:49 PM, Daniel Gibson wrote:

Andrei Alexandrescu schrieb:

Agreed. Maybe this is a good time to sart making a requirements list
for streams. What are the essential features/feature groups?


Andrei


Maybe something like the following (I hope it's not too extensive):

* Input- Output- and InputAndOutput- Streams
- having InputStream and OutputStream as an interface like in the old
design may be a good idea
- implementing the standard operations that are mostly independent from
the data source/sink
like read/write for basic types, strings, ... in mixin templates is
probably elegant to create
streams that are both Input and Output (one mixin that implements most
of InputStream and
one that implements most of OutputStream)

So far so good. I will point out, however, that the classic read/writeroutines are not all that good. For example if you want to implement aline-buffered stream on top of a block-buffered stream you'll be forcedto write inefficient code.

Never heard of filesystems that allow reading files in lines - they alwaysread in blocks, and that's what streams should do. That's because most ofthe steams are binary streams, and there is no such thing as a "line" inthem (e.g. how often do you need to read a line from a SocketStream?).

I don't think streams should buffer anything either (what an underlying OSI/O API caches should suffice), buffered streams adapters can do that in astream-independent way (why duplicate code when you can do that asefficiently with external methods?).

Besides, as you noted, the buffering is redundant for byChunk/byLineadapter ranges. It means that byChunk/byLine should operate on unbufferedstreams.

I'll explain my I/O streams implementation below in case you didn't readmy message (I've changed some stuff a little since then). My Streaminterface is very simple:


// A generic stream
interface Stream
{
    @property InputStream input();
    @property OutputStream output();
    @property SeekableStream seekable();
    @property bool endOfStream();
    void close();
}

You may ask, why separate Input and Output streams? Well, that's becauseyou either read from them, write from them, or both.Some streams are read-only (think Stdin), some write-only (Stdout), somesupport both, like FileStream. Right?

Not exactly. Does FileStream support writing when you open file forreading? Does it support reading when you open for writing?So, you may or may not read from a generic stream, and you also may or maynot write to a generic stream. With a design like that you can make amistake: if a stream isn't readable, you have no reference to invokeread() method on.

Similarly, a stream is either seekable, or not. SeekableStreams allowstream cursor manipulation:


interface SeekableStream : Stream
{
    long getPosition(Anchor whence = Anchor.begin);
    void setPosition(long position, Anchor whence = Anchor.begin);
}

InputStream doesn't really has many methods:

interface InputStream
{
        // reads up to buffer.length bytes from a stream
        // returns number of bytes read
        // throws on error
        size_t read(ubyte[] buffer);

        // reads from current position
        AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
}

So is OutputStream:

interface OutputStream
{
        // returns number of bytes written
        // throws on error
        size_t write(const(ubyte)[] buffer);

        // writes from current position

AsyncWriteRequest writeAsync(const(ubyte)[] buffer, Mailbox* mailbox =null);

They basically support only reading and writing in blocks, nothing else.However, they support asynchronous reads/writes, too (think of mailbox asa std.concurrency's Tid).

Unlike Daniel's proposal, my design reads up to buffer size bytes for tworeasons:

- it avoids potential buffering and multiple sys calls

- it is the only way to go with SocketStreams. I mean, you often don'tknow how many bytes an incoming socket message contains. You either haveto read it byte-by-byte, or your application might stall for potentiallyinfinite time (if message was shorter than your buffer, and no moremessages are being sent)

Why do my streams provide async methods? Because it's the modern approachto I/O - blocking I/O (aka one thread per client) doesn't scale. E.g. Javaadds a second revision of Async I/O API in JDK7 (called NIO2, firstappeared in February, 2002), C# has asynchronous operations as part oftheir Stream interface since .NET 1.1 (April, 2003).

With async I/O you can server many clients with one thread. Here is anexample (pseude-code, usings std.concurrency):


foreach (connection; networkConnections) {
    connection.receiveMessage(getTid());
}

receiveOnly!( (NetworkMessage message) { /* do stuff */ }

This is still not the most performant solution, but it's still a lotbetter than one thread per client.

Async I/O not only needed for network stuff. Here is a code snippet fromDMD (comments added):


#define ASYNCREAD 1
#if ASYNCREAD
    AsyncRead *aw = AsyncRead::create(modules.dim);
    for (i = 0; i < modules.dim; i++)
    {
        m = (Module *)modules.data[i];
        aw->addFile(m->srcfile);
    }
    aw->start(); // executes async request, doesn't block
#else
    // Single threaded
    for (i = 0; i < modules.dim; i++)
    {
        m = (Module *)modules.data[i];
        m->read(0); // blocks
    }
#endif

    // Do some other stuff

    for (i = 0; i < modules.dim; i++)
    {
        ...
#if ASYNCREAD
        aw->read(i); // waits until async operation finishes
#endif

Walter told that this small change gave quite a speed up in compilationtime.

Also, my async methods return a reference to AsyncRequest interface thatallows waiting for completion (that's what Walter does in DMD), canceling,querying a status (complete, in progress, failed), reporting an error, etcand that's very useful, too.


I strongly believe we shouldn't ignore this type of API.

P.S. For threads this deep it's better fork a new one, especially whenchanging the subject.

Re: Streaming library

Reply via email to