Re: Streaming transport interfaces: input

Andrei Alexandrescu Thu, 14 Oct 2010 11:45:30 -0700

On 10/14/10 13:14 CDT, Steven Schveighoffer wrote:

On Thu, 14 Oct 2010 13:39:03 -0400, Andrei Alexandrescu
<[email protected]> wrote:

On 10/14/10 12:27 CDT, Steven Schveighoffer wrote:

On Thu, 14 Oct 2010 11:34:12 -0400, Andrei Alexandrescu
<[email protected]> wrote:
Please, use the term "seek", and allow an anchor. Every OS allows this,
it makes no sense not to provide it.


I've always thought that's a crappy appendix. Every OS that ever
allows seek/tell with anchors allows ALL anchors, and always allows
either both or none of seek and tell. So I decided to cut through the
crap and simplify. You want to seek 100 bytes from here, you write
stream.position = stream.position + 100.


Um.. yuck. We need to use two system calls to seek 100 bytes?


seek and tell don't always issue system calls.

Oh, that reminds me I need to provide length as a property as well.
This would save us crap like seek(0, SEEK_END); ftell() to figure out
the length of a file.


So now you need to do stream.position = stream.length to seek to the end
of the file instead of stream.seek(0, Anchor.END)?


Yes.

Plus, how will you
implement length, probably like this:
auto curpos = seek(0, SEEK_CUR);
auto len = seek(0, SEEK_END);
seek(curpos, SEEK_BEG);
return len;


Depends. For files, you can just use stat.

So that looks like 3 system calls instead of one, plus you just wasted
time seeking back to the current position.

Well again they don't always issue system calls, but point taken. I dosee a need for fast positioning at end of stream. Perhaps we couldaccommodate an enum equal to ulong.max such that this goes to the end ofstream:


stream.position = StreamBase.atEnd;

I don't like appendDelim. We don't need to define that until we have
buffering.


Why?


Because appendDelim deals with buffering. If I defined a buffered
stream, I'd include a function like this:

size_t read(bool delegate(T[] data) sink);

which buffers data until sink returned false (passing each read chunk
into sink), extending the buffer as necessary.

Then it's trivial to implement readDelim on top of this.


Interesting. But that would still force readDelim to store leftover bytes.

The simple function of an input stream is to read data.


It does read data.


I mean, that's *all* it should do. It should not be appending to buffers.

This comes from a practical need. I've often had a buffer and wanted toread one more line into it, keeping the existing content. It wasimpossible without extra allocation and copying.

I think the appendDelim method allows fast and simple implementations
of a variety of patterns. As I (thought I) have shown elsethread,
without appendDelim there's no way to efficiently implement a
line-oriented stream on top of a block-oriented one.


Um... the read system call is the same interface as the proposed
block-oriented interface. How are you avoiding using system calls?

I think we don't have the same definition for "system call". For exampleby my definition fread is NOT a system call.

Basically, appendDelim can be defined outside this class, because the
primitive read is enough.


You can only define it if you accept extra copying. I'd say one extra
interface function is acceptable for fast I/O.


No, you can define it without extra copying.

How? Denis' implementation has two copies in the mix. (I'm not counting.dup etc.) Anyhow, let's do this - write down your interfaces so I cancomment on them. We talk "oh that's a buffering interface" and "thatrequires buffering" and "that's an extra copy" and so on but we havelittle concrete contenders. I put my cards on the table, you put yours.

If you don't allow direct
access to the buffer, then you have extra copying. But we don't have to
mimic C here. We should not be encouraging constant reinventing of the
buffer wheel here. Buffering is a well-defined task that can be
implemented once.

Just as a note, Tango does this, and it's very fast. There is certainly
no extra copying there.

Shouldn't the text transport be defined on top of the binary transport?


No, because there are transports that genuinely do not accept binary
data.


I mean, a text transport uses a binary transport underneath. What text
transport doesn't use a binary transport to do its dirty work? And what
exactly does a text transport do so differently that it needs to be a
separate interface?

A text transport does not accept raw binary data and requires e.g.Base64 encoding (e.g. mail attachments do that). The console is a textdevice - makes no sense to dump binary data on it. A JSON encoder isalso a text transport.

In other words, if 90% of the text transport duplicates the binary
transport, I see an opportunity for consolidation.

Consolidation brings simplification, which is good. But I believe thereexist text entities that do make the distinction worthwhile.

And in any case, I'd expect buffering to go between the two.


How do you define buffering? Would a buffered transport implement a
different interface?


Yes, but if we expect to reuse code, I'd expect a buffered transport to
use a primitive transport underneath for actually reading/writing data.
If you have multiple versions of the class that actually reads/writes
data (such as binary vs. text), then the buffer which uses it must
support all of them.

Text based processing to me seems to be a buffered activity (reading
lines, ensuring you don't have sliced utf-8 data, etc.).

Yes. What may be not so obvious is that binary processing withuser-imposed data lenghts is ALSO a buffer activity. This is because thelow-level buffers do NOT come at arbitrary positions (alignmentrestrictions) and to NOT come at arbitrary lengths.

If all you
are adding are the different widths of characters, I don't think you
need this extra layer. It's going to make the buffering layer more
difficult to implement (now it must handle both a text version and
abinary version).


I don't understand this.


buffer uses a transport. If you have two different transport interfaces,
the buffer must support them both. And if the benefit is, one simply
defines [w|d]char versions of read, then we haven't gained much for the
trouble of having to support both.


I'll be looking forward to seeing your interfaces.


Andrei

Re: Streaming transport interfaces: input

Reply via email to