16-Jan-2014 19:55, Steven Schveighoffer пишет:
On Tue, 07 Jan 2014 05:04:07 -0500, Dmitry Olshansky
<dmitry.o...@gmail.com> wrote:
Then our goals are aligned. Be sure to take a peek at (if you haven't
already):
https://github.com/schveiguy/phobos/blob/new-io/std/io.d

Yes, I'm gearing up to revisit that after a long D hiatus, and I came
across this thread.

At this point, I really really like the ideas that you have in this. It
solves an issue that I struggled with, and my solution was quite clunky.

I am thinking of this layout for streams/buffers:

1. Unbuffered stream used for raw i/o, based on a class hierarchy (which
I have pretty much written)
2. Buffer like you have, based on a struct, with specific primitives.
It's job is to collect data from the underlying stream, and present it
to consumers as a random-access buffer.

The only interesting thing I'd add here s that some buffer may work without underlying stream. Best examples are arrays and MM-files.

3. Filter that has access to transform the buffer data/copy it.
4. Ranges that use the buffer/filter to process/present the data.


Yes, yes and yes. I find it surprisingly good to see our vision seems to match. I was half-expecting you'd come along and destroy it all ;)

The problem I struggled with is the presentation of UTF data of any
format as char[] wchar[] or dchar[]. 2 things need to happen. First is
that the data needs to be post-processed to perform any necessary byte
swapping. The second is to transcode the data into the correct width.

In this way, you can process UTF data of any type (I even have code to
detect the encoding and automatically process it), and then use it in a
way that makes sense for your code.

My solution was to paste in a "processing" delegate into the class
hierarchy of buffered streams that allowed one read/write access to the
buffer. But it's clunky, and difficult to deal with in a generalized
fashion.

But the idea of using a buffer in between the stream and the range, and
possibly bolting together multiple transformations in a clean way, makes
this problem easy to solve, and I think it is closer to the vision
Andrei/Walter have.

In essence a transcoding filter for UTF-16 would wrap a buffer of ubyte and itself present a buffer interface (but of wchar).

My own stuff currently deals only in ubyte and the limited decoding is represented by a "decode" function that takes a buffer of ubyte and decodes UTF-8. I think typed buffers/filters is the way to go.


I also like the idea of "pinning" the data instead of my mechanism of
using a delegate (which was similar but not as general). It also has
better opportunities for optimization.

Other ideas that came to me that buffer filters could represent:

* compression/decompression
* encryption

I am going to study your code some more and see how I can update my code
to use it. I still need to maintain the std.stdio.File interface, and
Walter is insistent that the initial state of stdout/err/in must be
synchronous with C (which kind of sucks, but I have plans on how to make
it not be so bad).

I seriously not seeing how interfacing with C runtime could be fast enough.

There is still a lot of work left to do, but I think one of the hard
parts is done, namely dealing with UTF transcoding. The remaining sticky
part is dealing with shared. But with structs, this should make things
much easier.

I'm thinking a generic locking wrapper is possible along the lines of:

shared Locked!(GenericBuffer!char) stdin; //usage

struct Locked(T){
shared:
private:
        T _this;
        Mutex mut;
public:
        //forwarded methods
}

The wrapper will introduce a lock, and implement every method of wrapped struct roughly like this:
mut.lock();
scope(exit) mut.unlock();
(cast(T*)_this).method(args);

I'm sure it could be pretty automatic.

One question, is there a reason a buffer type has to be a range at all?
I can see where it's easy to make it a range, but I don't see
higher-level code using the range primitives when dealing with chunks of
a stream.

Lexers/parsers enjoy it - i.e. they work pretty much as ranges especially when skipping spaces and the like. As I said the main reason was: if it fits as range why not? After all it makes one-pass processing of data trivial as it rides on top of foreach:

foreach(octect; mybuffer)
{
        if(intersting(octect))
                do_cool_stuff();
}

Things like countUntil make perfect sense when called on buffer (e.g. to find matching sentinel).

--
Dmitry Olshansky

Reply via email to