On Thu, 14 Oct 2010 00:19:45 +0400, Andrei Alexandrescu <seewebsiteforem...@erdani.org> wrote:

On 10/13/10 14:02 CDT, Denis Koroskin wrote:
On Wed, 13 Oct 2010 20:55:04 +0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> wrote:
http://www.gnu.org/s/libc/manual/html_node/Buffering-Concepts.html

I don't think streams must mimic the low-level OS I/O interface.


I in contrast think that Streams should be a lowest-level possible
platform-independent abstraction.
No buffering besides what an OS provides, no additional functionality.
If you need to be able to read something up to some character (besides,
what should be considered a new-line separator: \r, \n, \r\n?), this
should be done manually in "byLine".

This aggravates client code for the sake of simplicity in a library that was supposed to make streaming easy. I'm not seeing progress.


This library code needs to be put somewhere. I just believe it belongs to line-reader, not a generic stream. By putting line reading into a stream interface, you want make it more efficient.

That's because
most of the steams are binary streams, and there is no such thing as a
"line" in them (e.g. how often do you need to read a line from a
SocketStream?).

http://www.opengroup.org/onlinepubs/009695399/functions/isatty.html


These are special cases I don't like. There is no such thing in Windows
anyway.

I didn't say I like them. Windows has _isatty: http://msdn.microsoft.com/en-us/library/f4s0ddew(v=VS.80).aspx


I stand corrected. Windows pretends to be Posix compliant, yes, but that's a sad story to tell. I don't see why would

You need a line when e.g. you parse a HTML header or a email header or
an FTP response. Again, if at a low level the transfer occurs in
blocks, that doesn't mean the API must do the same at all levels.


BSD sockets transmits in blocks. If you need to find a special sequence
in a socket stream, you are forced to fetch a chunk, and manually search
for a needed sequence. My position is that you should do it with an
external predicate (e.g. read until whitespace).

Problem is how you set up interfaces to avoid inefficiencies and contortions in the client.

I don't think streams should buffer anything either (what an underlying OS I/O API caches should suffice), buffered streams adapters can do that in a stream-independent way (why duplicate code when you can do that as
efficiently with external methods?).

Most OS primitives don't give access to their own internal buffers.
Instead, they ask user code to provide a buffer and transfer data into
it.

Right. This is why Stream may not cache.

This is a big misunderstanding. If the interface is:

size_t read(byte[] buffer);

then *I*, the client, need to provide the buffer. It's in client space. This means willing or not I need to do buffering, regardless of whatever internal buffering is going on under the wraps.


Use BufferedStream adapter if you need buffering, and raw streams if you do the buffering manually. That's the way it's implemented in C#, Java, Tango and many many other APIs.

So clearly buffering on the client side is a must.


I don't see how is it implied from above.

Please implement an abstraction that given this:

interface InputStream
{
     size_t read(ubyte[] buf);
}

defines a line reader.


I thought we agreed that byLine/byChunk need to do buffering manually anyway.

class ByLine
{
        ubyte[] nextLine()
        {
                ubyte[BUFFER_SIZE] buffer;
                while (!inputStream.endOfStream()) {
                        size_t bytesRead = inputStream.read(buffer);
                        foreach (i, ubyte c; buffer[0..bytesRead]) {
                                if (c != '\n') {
                                        continue;
                                }
                                
                                appender.put(buffer[0..i]);
                                ubyte[] line = appender.data.dup();
                                appender.reset();
                                appender.put(buffer[i+1..$]);
                        
                                return line;
                        }
                
                        appender.put(buffer[0..bytesRead]);
                }

                ubyte[] line = appender.data.dup();
                appender.reset();
                return line;
        }
        
        InputStream inputStream;
        Appender!(ubyte[]) appender;
}

(I've skipped the range interface for the sake of simplicity, replaced it with nextLine() function. I also don't remember proper appender interface, so I've used imaginary function names).

Once again, what's the point of byLine, if all it does is call stream.readLine(); ? That's moving code from one place to many unrelated ones. I don't agree with that.

I'm not convinced we need line-based API at core stream level. I don't think we need to sacrifice performance for a general case in order to avoid performance hit and a special case. who even told you it will be any less efficient that way?


Andrei

Reply via email to