Re: streaming redux

Andrei Alexandrescu Tue, 28 Dec 2010 15:45:25 -0800

On 12/28/10 5:14 PM, Haruki Shigemori wrote:

(2010/12/28 16:02), Andrei Alexandrescu wrote:

I've put together over the past days an embryonic streaming interface.
It separates transport from formatting, input from output, and buffered
from unbuffered operation.


http://erdani.com/d/phobos/std_stream2.html

There are a number of questions interspersed. It would be great to start
a discussion using that design as a baseline. Please voice any related
thoughts - thanks!


Andrei


I've waited so long for this day.
Excuse me, would you give me a user side code and librarian side code
using std.stream2?
I don't know a concrete implementation of the std.stream2 interfaces.

There isn't one. The source code is just support for documentation, andI attach it with this message.

Thanks for participating! I know there has been some good stream-relatedactivity in the Japanese D community.



Andrei

// Written in the D programming language.

/**
Streams are structured in two layers. At the bottom there's the
transport layer, which is responsible for opening and closing a
stream, positioning in the stream, and transferring bytes. Atop of the
transport layer sits the formatting layer, which is concerned with
formatting typed data into raw bytes which then are passed to the
underlying transport.

Macros:
WIKI = Phobos/StdAlgorithm
QUESTION = $(I <font color=red>Question:</font> $0)

Copyright: Andrei Alexandrescu 2010-.

License: $(WEB boost.org/LICENSE_1_0.txt, Boost License 1.0).

Authors:   $(WEB erdani.com, Andrei Alexandrescu)
 */

module std.stream2;
import std.variant;

/**
The base transport interface $(D TransportBase) supports primitives
for checking whether the transport is opened, closing the transport,
and positioning in the stream. Opening is not part of this interface;
it is assumed that a factory function opens the transport with the
appropriate parameters. Some streams may not actually be positionable,
in which case the positioning primitives throw.

$(QUESTION Should we offer an $(D open) primitive at this level? If
so, what parameter(s) should it take?)

$(QUESTION Should we offer a primitive $(D rewind) that takes the
stream back to the beginning? That might be supported even by some
streams that don't support general $(D seek) calls. Alternatively,
some streams might support $(D seek(0, SeekAnchor.start)) but not
other calls to $(D seek).)
 */
interface TransportBase 
{
    /**
       Positions the stream $(D position) bytes from the beginning,
       returns the new absolute _position. Throws on error.
     */
    ulong seek(ulong position);

    /**
       Seeks the stream $(D position) bytes from stream's current
       position, returns the new absolute _position. Throws on error.
     */
    ulong seekFromCurrent(long position);

    /**
       Seeks the stream $(D position) bytes from stream's end, returns
       the new absolute _position. Throws on error. The semantics of
       this primitive for $(D position > 0) are defined by the stream
       implementation (e.g. on certain file systems, such calls may
       allow writing sparse files).

       $(QUESTION May we eliminate $(D seekFromCurrent) and $(D
       seekFromEnd) and just have $(D seek) with absolute positioning?
       I don't know of streams that allow $(D seek) without allowing
       $(D tell). Even if some stream doesn't, it's easy to add
       support for $(D tell) in a wrapper. The marginal cost of
       calling $(D tell) is small enough compared to the cost of $(D
       seek).)
     */
    ulong seekFromEnd(long position);

    /**
       Returns the absolute position in the stream. Throws on error. 
     */
    ulong tell() const;

    /**
       Returns whether the stream is at its logical end. Subsequent
       reads from the stream will fail, and subsequent writes to the
       stream will add new data.
     */
    @property bool atEnd() const;
    
    /**
      Is this stream open?
    */
    @property bool isOpen() const;

    /**
       Close the stream. Does nothing on an unopened stream. Throws on error.

       $(QUESTION Should this throw on an unopened stream? I don't
       think so, because throwing does not offer any additional
       information that user code didn't have, and the idiom $(D if
       (s.isOpen) s.close()) is verbose and frequently encountered.)
    */
    void close();
}

/**
Unbuffered transport interfaces hold no buffers of their own and
therefore rely on user-supplied buffers to do their deed.
 */
interface UnbufferedInputTransport : TransportBase
{
    /**
       Reads data off the stream and returns the data _read (which is
       a slice of $(D buffer)). If this function returns zero, the
       stream has become empty. Reading from a stream that is $(D
       atEnd) just returns empty slices. If the stream is closed or
       some error occurs during reading, an exception is thrown.

       $(QUESTION Should we allow $(D read) to return an empty slice
       even if $(D atEnd) is $(D false)? If we do, we allow
       non-blocking streams with burst transfer. However, naive client
       code on non-blocking streams will be inefficient because it
       would essentially implement busy-waiting.)
     */
    ubyte[] read(ubyte[] buffer);
}

/**
Unbuffered output transport offers one primitive for writing.  Client
code should never assume that unbuffered writes in fact go straight to
the hardware support of the stream. This is because of at least two
factors. First, the underlying operating system-specific primitives
might not offer guaranteed write-through (which is e.g. the case for
Linux unbuffered files). Second, $(D BufferedOutputTransport) (below)
inherits $(D UnbufferedOutputTransport) to offer guaranteed buffering.

So $(D UnbufferedOutputTransport) is best understood as "transport
without guaranteed buffering".
 */
interface UnbufferedOutputTransport : TransportBase
{
    /**
       Writes data to the stream. Throws on error.
     */
    void write(in ubyte[] buffer); 
    /**
       Alias for $(D write) that supports the output range interface.
     */
    alias write put;
}

/**
Buffered transport interfaces hold internal buffers as intermediaries
between the data source and client code.

The $(D BufferedOutputTransport) interface is formally an input range
of $(D ubyte[]), which means it can be used directly with a variety of
algorithms.
 */
interface BufferedInputTransport : UnbufferedInputTransport
{
    /**
       Alias for $(D atEnd) for compliance with the input range
       interface.
    */
    alias atEnd empty;

    /**
       If the internal buffer is not empty, returns the
       already-buffered data, which user code may inspect or copy as
       it finds fit. No reading from the stream is made. If there is
       no already buffered data, makes sure more data is input off the
       stream. The amount of data read depends on the actual stream.

       $(QUESTION Should we allow an empty _front on a non-empty
       stream? This goes back to handling non-blocking streams.)
     */
    @property ubyte[] front();
    
    /**
       Discards the existing buffer, reads a new buffer.
    */
    void popFront();

    /**
       Peeks $(D n) bytes forward in the stream. The buffer returned
       may be shorter than $(D n) only in case the stream has
       ended. Following a call $(D peek(n)), $(D front) will yield the
       same buffer.
     */
    ubyte[] peek(size_t n);

    /**
       Discards $(D n) bytes off the stream. Returns the number of
       bytes discarded, which may be less than $(D n) if and only if
       the stream has ended. The stream need not be seekable.

       $(QUESTION Should we eliminate this function? Theoretically
       calling $(D advance(n)) is equivalent with $(D
       seekFromCurrent(n)). However, in practice a file-based stream
       will have to implement $(D advance) even though the underlying
       file is not seekable.)
     */
    ulong advance(ulong n);
}

/**
Buffered transport interfaces hold internal buffers as intermediaries
between the data source and client code.

The $(D BufferedOutputTransport) interface is formally an output range
of $(D ubyte[]), which means it can be used with a variety of
algorithms directly.
 */
interface BufferedOutputTransport : UnbufferedOutputTransport
{
    /**
       Normally data may not be written immediately. $(D flush) makes
       sure that buffers are actually written to the stream. It is up
       to the stream to ensure that data is written to its actual
       destination (e.g. disk).
    */
    void flush(); 
}

/**
   The $(D Formatter) interface is concerned with formatting typed
   objects into bytes. The resulting bytes are passed to a backend
   transport object.
 */
interface Formatter
{
    /**
       Gets and sets the underlying _transport object. Each formatter
       is associated with one _transport object and forwards to it the
       bytes to be read after formatting. It is an error to attempt
       writes to a $(D Formatter) that has a $(D null)
       _transport. Also, certain formatters might enforce during
       runtime that the _transport must be buffered.

       $(QUESTION Should all formatters require buffered _transport?
       Otherwise they might need to keep their own buffering, which
       ends up being less efficient with buffered transports.)
     */
    @property UnbufferedOutputTransport transport();
    /// Ditto
    @property void transport(UnbufferedOutputTransport);

    /**
       Formats and writes an integral _value, including a UTF character.
     */
    void put(ubyte value);
    /// Ditto
    void put(ushort value);
    /// Ditto
    void put(uint value);
    /// Ditto
    void put(ulong value);
    /// Ditto
    void put(byte value);
    /// Ditto
    void put(short value);
    /// Ditto
    void put(int value);
    /// Ditto
    void put(long value);
    /// Ditto
    void put(char value);
    /// Ditto
    void put(wchar value);
    /// Ditto
    void put(dchar value);

    /**
       Formats and writes a floating-point _value.
     */
    void put(float value);
    /// Ditto
    void put(double value);
    /// Ditto
    void put(real value);

    /**
       Formats and writes a UTF-encoded string.

       $(QUESTION Should we also define $(D putln) that writes the string
       and then an line terminator?)
     */
    void put(in char[] value);
    /// Ditto
    void put(in wchar[] value);
    /// Ditto
    void put(in dchar[] value);

    /**
       Formats and writes an array (other than strings). The type of
       the array element is passed dynamically as $(D elementType).
     */
    void put(void[] value, TypeInfo elementType);

    /**
       Convenience generic function that accepts an array of any type
       and forwards it to $(D put(array, typeid(T.init))). Due to a
       bug in the implementation, this function has temporarily the
       name $(D put_) although it will ultimately be $(D put).
     */
    final void put_(T)(in T[] array) if (!isSomeChar!T) {
      return put(array, typeid(T.init));
    }

    /**
       Writes a class object to the stream. The stream must implement
       $(D toString(Formatter)). This function simply calls $(D
       obj.toString(this)), thereby closing a double dispatch
       loop. The responsibility of formatting the object's contents is
       left to the object.

       $(QUESTION Should we define a more involved protocol? For
       example, even for objects that don't implement formatting, a
       $(D Formatter) might define a reasonable output routine by
       using introspection to figure out the object's layout. This
       approach has the nice consequence that one implementation can
       be applied to many objects. But that also means we need to wait
       for better reflection support. We also need to figure out a way
       to detect that an object does not override $(D
       toString(Formatter)), which at the moment I consider a
       to-be-added primitive method of $(D Object).)
     */
    void put(Object obj);

    /**
       Writes a struct to the stream. This final function writes a
       customizable "header" and a customizable "footer". Inside, the
       elements of the struct are formatted transitively. Due to a bug
       in the implementation, this function has temporarily the name
       $(D put_) although it will ultimately be $(D put).
       
       $(QUESTION Should we put some support for avoiding writing the
       same subobject twice, or is that more of a charter of
       serialization?)
     */
    final void put_(S)(auto ref S) if (is(S == struct)) {
    }

    /**
       Overridable hooks called before and after writing a $(D
       struct)'s fields.

       $(QUESTION How to handle associative arrays? They don't have a
       common base, as arrays do. Should we offer some overridable
       hooks similar to these? For example, $(D beforeAssocArray), $(D
       afterAssocArray), $(D beforeAssocArrayElement), $(D
       afterAssocArrayElement).)
     */
    void beforeStruct(void * s, TypeInfo ti);
    /// Ditto
    void afterStruct(void * s, TypeInfo ti);

    /**
       Formats and writes _data according to an extended $(D
       printf)-like format specifier.

       $(QUESTION How to define format specifiers for $(D struct)s and
       $(D class)es in ways that extend $(D printf) specifiers naturally?)

       $(QUESTION Should we define $(D writefln) too? Note that that
       only makes sense for streams that use a text-based transport.)
     */
    void writef(in char[] format, Variant[] data...);
}

/**
   $(D Unformatter) in an interface for formatted read. The name $(D
   Parser) has been avoided in order to prevent confusion with the
   meaning of "parser" in formal grammars.
 */
interface Unformatter
{
    /**
       Gets and sets the underlying _transport object. Each
       unformatter is associated with one _transport object. It is an
       error to attempt reads from an $(D Unformatter) that has a $(D
       null) _transport. Also, certain formatters might enforce during
       runtime that the transport must be buffered.
     */
    @property UnbufferedInputTransport transport();
    /// Ditto
    @property void transport(UnbufferedInputTransport);

    /**
       Reads an integral _value, including a UTF character.
     */
    void read(ref ubyte value);
    /// Ditto
    void read(ref ushort value);
    /// Ditto
    void read(ref uint value);
    /// Ditto
    void read(ref ulong value);
    /// Ditto
    void read(ref byte value);
    /// Ditto
    void read(ref short value);
    /// Ditto
    void read(ref int value);
    /// Ditto
    void read(ref long value);
    /// Ditto
    void read(ref char value);
    /// Ditto
    void read(ref wchar value);
    /// Ditto
    void read(ref dchar value);

    /**
       Reads a floating-point _value.
     */
    void read(ref float value);
    /// Ditto
    void read(ref double value);
    /// Ditto
    void read(ref real value);

    /**
       Reads a UTF-encoded string.

       $(QUESTION Should we pass the size in advance, or make the
       stream responsible for inferring it?)
     */
    void read(ref char[] value);
    /// Ditto
    void read(ref wchar[] value);
    /// Ditto
    void read(ref dchar[] value);

    /**
       Formats and writes an array (other than strings). The type of
       the array element is passed dynamically as $(D elementType).
     */
    void read(ref void[] value, TypeInfo elementType);

    /**
       Convenience generic function that accepts an array of any type
       and forwards it to $(D read(array, typeid(T.init))). Due to a
       bug in the implementation, this function has temporarily the
       name $(D read_) although it will ultimately be $(D read).
     */
    final void read_(T)(in T[] array) if (!isSomeChar!T) {
      return put(array, typeid(T.init));
    }

    /**
       Writes a class object to the stream. The stream must implement
       $(D toString(Formatter)). This function simply calls $(D
       obj.toString(this)), thereby closing a double dispatch
       loop. The responsibility of formatting the object's contents is
       left to the object.

       $(QUESTION Should we define a more involved protocol? For
       example, even for objects that don't implement formatting, a
       $(D Formatter) might define a reasonable output routine by
       using introspection to figure out the object's layout. This
       approach has the nice consequence that one implementation can
       be applied to many objects. But that also means we need to wait
       for better reflection support. We also need to figure out a way
       to detect that an object does not override $(D
       toString(Formatter)), which at the moment I consider a
       to-be-added primitive method of $(D Object).)
     */
    void read(ref Object obj);

    /**
       Reads a struct from the stream. This final function reads a
       customizable "header" and a customizable "footer". Inside, the
       elements of the struct are formatted transitively. Due to a bug
       in the implementation, this function has temporarily the name
       $(D read_) although it will ultimately be $(D read).
     */
    final void read_(S)(ref S) if (is(S == struct)) {
    }

    /**
       Overridable hooks called before and after writing a $(D
       struct)'s fields.

       $(QUESTION How to handle associative arrays? They don't have a
       common base, as arrays do. Should we offer some overridable
       hooks similar to these? For example, $(D beforeAssocArray), $(D
       afterAssocArray), $(D beforeAssocArrayElement), $(D
       afterAssocArrayElement).)
     */
    void beforeStruct(void * s, TypeInfo ti);
    /// Ditto
    void afterStruct(void * s, TypeInfo ti);

    /**
       Convenience function that forwards to the appropriate
       by-reference overload. Due to a bug in the implementation, this
       function has temporarily the name $(D read_) although it will
       ultimately be $(D read).
     */
    final T read_(T)() {
        T result;
        read(result);
        return result;
    }

    /**
       Reads _data according to an extended $(D scanf)-like format
       specifier.
     */
    void readf(in char[] format, Variant[] data...);
}

Re: streaming redux

Reply via email to