Re: Streaming library

Denis Koroskin Wed, 13 Oct 2010 12:05:35 -0700

On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu<seewebsiteforem...@erdani.org> wrote:

On 10/13/10 11:16 CDT, Denis Koroskin wrote:

On Wed, 13 Oct 2010 18:32:15 +0400, Andrei Alexandrescu

So far so good. I will point out, however, that the classic read/write
routines are not all that good. For example if you want to implement a
line-buffered stream on top of a block-buffered stream you'll be
forced to write inefficient code.


Never heard of filesystems that allow reading files in lines - they
always read in blocks, and that's what streams should do.


http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html

I don't think streams must mimic the low-level OS I/O interface.

I in contrast think that Streams should be a lowest-level possibleplatform-independent abstraction.No buffering besides what an OS provides, no additional functionality. Ifyou need to be able to read something up to some character (besides, whatshould be considered a new-line separator: \r, \n, \r\n?), this should bedone manually in "byLine".

That's because
most of the steams are binary streams, and there is no such thing as a
"line" in them (e.g. how often do you need to read a line from a
SocketStream?).


http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html

These are special cases I don't like. There is no such thing in Windowsanyway.

You need a line when e.g. you parse a HTML header or a email header oran FTP response. Again, if at a low level the transfer occurs in blocks,that doesn't mean the API must do the same at all levels.

BSD sockets transmits in blocks. If you need to find a special sequence ina socket stream, you are forced to fetch a chunk, and manually search fora needed sequence. My position is that you should do it with an externalpredicate (e.g. read until whitespace).

I don't think streams should buffer anything either (what an underlying
OS I/O API caches should suffice), buffered streams adapters can do that
in a stream-independent way (why duplicate code when you can do that as
efficiently with external methods?).
Most OS primitives don't give access to their own internal buffers.Instead, they ask user code to provide a buffer and transfer data intoit.


Right. This is why Stream may not cache.

So clearly buffering on the client side is a must.


I don't see how is it implied from above.

Besides, as you noted, the buffering is redundant for byChunk/byLine
adapter ranges. It means that byChunk/byLine should operate on
unbuffered streams.
Chunks keep their own buffer so indeed they could operate on streamsthat don't do additional buffering. The story with lines is a fairamount more complicated if it needs to be done efficiently.

Yes. But line-reading is a case that I don't see a need to be handledspecially.

I'll explain my I/O streams implementation below in case you didn't read
my message (I've changed some stuff a little since then).
Honest, I opened it to remember to read it but somehow your fonts aresmall and make my eyes hurt.
My Stream
interface is very simple:

// A generic stream
interface Stream
{
@property InputStream input();
@property OutputStream output();
@property SeekableStream seekable();
@property bool endOfStream();
void close();
}

You may ask, why separate Input and Output streams?
I think my first question is: why doesn't Stream inherit InputStream andOutputStream? My hypothesis: you want to sometimes return null. Nice.


Right.

Well, that's because
you either read from them, write from them, or both.
Some streams are read-only (think Stdin), some write-only (Stdout), some
support both, like FileStream. Right?


Sounds good. But then where's flush()? Must be in OutputStream.


That's probably because unbuffered streams don't need them.

Not exactly. Does FileStream support writing when you open file for
reading? Does it support reading when you open for writing?
So, you may or may not read from a generic stream, and you also may or
may not write to a generic stream. With a design like that you can make
a mistake: if a stream isn't readable, you have no reference to invoke
read() method on.
That is indeed pretty nifty. I hope you would allow us to copy thatfeature in Phobos (unless you are considering submitting your librarywholesale). Let me know.


Would love to contribute with design and implementation.

Similarly, a stream is either seekable, or not. SeekableStreams allow
stream cursor manipulation:

interface SeekableStream : Stream
{
long getPosition(Anchor whence = Anchor.begin);
void setPosition(long position, Anchor whence = Anchor.begin);
}

Makes sense. Why is getPosition signed? Why do you need an anchor forgetPosition?

long is chosen to be consistent with setPosition. Also getPosition mayreturn a negative value:


long pos = getPosition(Anchor.end); // how far is it till file end?

Also this is how you can get file size (need to invert though). This isconsistent with setPosition:


setPosition(getPosition(anchor), anchor); // a no-op for any kind of achor

I just thought why not? I'm okay with dropping it, but I find it nice.

InputStream doesn't really has many methods:

interface InputStream
{
// reads up to buffer.length bytes from a stream
// returns number of bytes read
// throws on error
size_t read(ubyte[] buffer);


That makes implementation of line buffering inefficient :o).

There is no way you can do it more efficient on Windows. Fetch a chunk;search for a line end; found ? return : continue.

// reads from current position
AsyncReadRequest readAsync(ubyte[] buffer, Mailbox* mailbox = null);
}
Why doesn't Sean's concurrency API scale for your needs? Can that befixed? Would you consider submitting some informed bug reports?

It's rather a design issue than a bug on its own. I'll write a separateletter on that.

So is OutputStream:

interface OutputStream
{
// returns number of bytes written
// throws on error
size_t write(const(ubyte)[] buffer);

// writes from current position
AsyncWriteRequest writeAsync(const(ubyte)[] buffer, Mailbox* mailbox =
null);
}

They basically support only reading and writing in blocks, nothing else.


I'm surprised there's no flush().


No buffering - no flush.

However, they support asynchronous reads/writes, too (think of mailbox
as a std.concurrency's Tid).

Unlike Daniel's proposal, my design reads up to buffer size bytes for
two reasons:
- it avoids potential buffering and multiple sys calls
But there's a problem. It's very rare that the user knows what a goodbuffer size is. And often there are size and alignment restrictions atthe low level.

I agree, but he can guess. Or a library can give him a hint. E.g.BUFFER_SIZE is a good buffer size to start with :)

So somewhere there is still buffering going on, and also there arepotential inefficiencies (if a user reads small buffers).
- it is the only way to go with SocketStreams. I mean, you often don't
know how many bytes an incoming socket message contains. You either have
to read it byte-by-byte, or your application might stall for potentially
infinite time (if message was shorter than your buffer, and no more
messages are being sent)
But if you don't know how many bytes are in an incoming socket message,a better design is to do this:
void read(ref ubyte[] buffer);


That could work, too.

and resize the buffer to accommodate the incoming packet. Your design_imposes_ that the socket does additional buffering.

The socket API does it anyway. I just don't complicate it even further butproviding an additional layer of buffering.

Why do my streams provide async methods? Because it's the modern
approach to I/O - blocking I/O (aka one thread per client) doesn't
scale. E.g. Java adds a second revision of Async I/O API in JDK7 (called
NIO2, first appeared in February, 2002), C# has asynchronous operations
as part of their Stream interface since .NET 1.1 (April, 2003).

Async I/O is nice, no two ways about that. I have on my list to definebyChunkAsync that works exactly like byChunk from the client'sperspective, except it does I/O concurrently with client code.


[snip]

I strongly believe we shouldn't ignore this type of API.

P.S. For threads this deep it's better fork a new one, especially when
changing the subject.


I thought I did by changing the title...


Andrei


No, changing title isn't enough.

Re: Streaming library

Reply via email to