Re: Another new io library

2016-02-19 Thread Chad Joan via Digitalmars-d
On Friday, 19 February 2016 at 01:29:15 UTC, Steven Schveighoffer 
wrote:

On 2/18/16 6:52 PM, Chad Joan wrote:

...

This is why I think it will be much more important to have at 
least

these two interfaces take front-and-center:
A.  The presence of a .popAs!(...) operation (mentioned by 
Wyatt in this

thread, IIRC) for simple deserialization, and maybe for other
miscellaneous things like structured user interaction.


To me, this is a higher-level function. popAs cannot assume to 
know how to read what it is reading. If you mean something like 
reading an entire struct in binary form, that's not difficult 
to do.




I think I understand what you mean.  We are entering the problem 
domain of serializing and deserializing arbitrary types.


I think what I'd expect is to have the basic language types 
(ubyte, int, char, string, etc) all covered, and to provide some 
way (or ways) to integrate with serialization code provided by 
other types.  So you can do ".popAs!int" out of the box, but 
".popAs!MyType" will require MyType to provide a .deserialize 
member function.  Understandably, this may require some thought 
(ex: what if MyType is already under constraints from some other 
API that expects serialization? what does this look like if there 
are multiple serialization frameworks? etc etc).  I don't have 
the answer right now and I don't expect it to be solved quickly ;)


B.  The ability to attach parsers to streams easily.  This 
might be as
easy as coercing the input stream into the basic encoding that 
the
parser expects (ex: char/wchar/dchar Ranges for compilers, or 
maybe
ubyte Ranges for our PostgreSQL client's network layer), 
though it might
need (A) to help a bit first if the encoding isn't known in 
advance
(text files can be represented in sooo many ways!  isn't it 
fabulous!).


This is the fundamental goal for my library -- enabling parsers 
to read data from a "stream" efficiently no matter how that 
data is sourced. I know your time is limited, but I would 
invite you to take a look at the convert program example that I 
created in my library. In it, I handle converting any UTF 
format to any other UTF format.


https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d



Awesome!



I understand that most unsuspecting programmers will arrive at 
a stream
library expecting to immediately see an InputRange interface.  
This

/probably/ is not what they really want at the end of the day.
 So, I
think it will be very important for any such library to 
concisely and
convincingly explain the design methodology and rationale 
early and

aggressively.  Neglect to do this, and the library and it's
documentation will become a frustration and a violation of 
expectations
(an "astonishment"). Do it right, and the library's 
documentation will
become a teaching tool that leaves visitors feeling 
enlightened and

empowered.


Good points! I will definitely spend some time explaining this.



Best of luck :)

Of course, I have to wonder if someone else has contrasting 
experiences
with stream use-cases.  Maybe they really would be frustrated 
with a
range-agnostic design.  I don't want to alienate this 
hypothetical
individual either, so if this is you, then please share your 
experiences.


I hope this helps and is worth making a bunch of you read a 
wall of text ;)


Thanks for taking the time.

-Steve


Thank you for making progress on this problem!

- Chad


Re: Another new io library

2016-02-19 Thread Steven Schveighoffer via Digitalmars-d

On 2/19/16 6:27 AM, Dejan Lekic wrote:

Steven, this is superb!

Some 10+ years ago, I talked to Tango guys when they worked on I/O part
of the Tango library and told them that in my head ideal abstraction for
any I/O work is pipe and that I would actually build an I/O library
around this abstraction instead of the Channel in Java or Conduit in
Tango (well, we all know Tango borrowed ideas from Java API).

Your work is precisely what I was talking about. Well-done!



Thanks! It is definitely true that my time with Tango opened up my eyes 
to how I/O could be better. I actually wrote the ThreadPipe conduit: 
https://github.com/SiegeLord/Tango-D2/blob/d2port/tango/io/device/ThreadPipe.d


This is one of those libraries where the source code is almost writing 
itself. I feel like I got it right :) Took 5 tries though...


-Steve


Re: Another new io library

2016-02-19 Thread Steven Schveighoffer via Digitalmars-d

On 2/19/16 5:22 AM, Kagamin wrote:

On Thursday, 18 February 2016 at 18:27:28 UTC, Steven Schveighoffer wrote:

The philosophy that I settled on is to create an iopipe that extends
one "item" at a time, even if more are available. Then, apply the
range interface on that.

When I first started to write byLine, I made it a range. Then I
thought, "what if you wanted to iterate by 2 lines at a time, or
iterate by one line at a time, but see the last 2 for context?", well,
then that would be another type, and I'd have to abstract out the
functionality of line searching.


You mean window has current element and context - lookahead and
lookbehind? I stumbled across this article
http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/
it suggests that such window abstraction is generally useful for data
analysis.


window doesn't have any "current" pointer. The window itself is the 
current data. But with byLine, you could potentially remember where the 
last N lines were delineated. Hm...


auto byLineWithContext(size_t extraLines = 1, Chain)(Chain c)
{
   auto input = byLine(c);
   static struct ByLineWithContext
   {
  typeof(input) chain;
  size_t[extraLines] prevLines;
  auto front() { return chain.window[prevLines[$-1] .. $]; }
  void popFront()
  {
  auto offset = prevLines[0];
  foreach(i; 0 .. prevLines.length-1)
  {
  prevLines[i] = prevLines[i+1] - offset;
  }
  prevLines[$-1] = chain.window.length - offset;
  chain.release(offset);
  chain.extend(0); // extend in the next line
  }
  void empty()
  {
  return chain.window.length != prevLines[$-1];
  }
  // previous line of context (i = 0 is the oldest context line)
  auto contextLine(size_t i)
  {
  assert(i < prevLines.length);
  return chain.window[i == 0 ? 0 : prevLines[i-1] .. prevLines[i])
  }
   }
   return ByLineWithContext(input);
}

It's an interesting transition to think about looking at an entire 
buffer of data instead of some pointer to a single point in a stream as 
the primitive that you have.


-Steve


Re: Another new io library

2016-02-19 Thread Dejan Lekic via Digitalmars-d

Steven, this is superb!

Some 10+ years ago, I talked to Tango guys when they worked on 
I/O part of the Tango library and told them that in my head ideal 
abstraction for any I/O work is pipe and that I would actually 
build an I/O library around this abstraction instead of the 
Channel in Java or Conduit in Tango (well, we all know Tango 
borrowed ideas from Java API).


Your work is precisely what I was talking about. Well-done!



Re: Another new io library

2016-02-19 Thread Kagamin via Digitalmars-d
On Thursday, 18 February 2016 at 18:27:28 UTC, Steven 
Schveighoffer wrote:
The philosophy that I settled on is to create an iopipe that 
extends one "item" at a time, even if more are available. Then, 
apply the range interface on that.


When I first started to write byLine, I made it a range. Then I 
thought, "what if you wanted to iterate by 2 lines at a time, 
or iterate by one line at a time, but see the last 2 for 
context?", well, then that would be another type, and I'd have 
to abstract out the functionality of line searching.


You mean window has current element and context - lookahead and 
lookbehind? I stumbled across this article 
http://blog.jooq.org/2016/01/06/2016-will-be-the-year-remembered-as-when-java-finally-had-window-functions/ it suggests that such window abstraction is generally useful for data analysis.


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 6:52 PM, Chad Joan wrote:

Steve: My apologies in advance if I a misunderstood any of the
functionality of your IO library.  I haven't read any of the
documentation, just this thread, and I my time is over-committed as usual.


Understandable.



Anyhow...

I believe that when I am dealing with streams, >90% of the time I am
dealing with data that is *structured* and *heterogeneous*. Here are
some use-cases:
1. Parsing/writing configuration files (ex: XML, TOML, etc)
2. Parsing/writing messages from some protocol, possibly over a network
socket (or sockets).  Example: I am writing a PostgreSQL client and need
to deserialize messages:
http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html
3. Serializing/deserializing some data structures to/from disk. Example:
I am writing a game and I need to implement save/load functionality.
4. Serializing/deserializing tabular data to/from disk (ex: .CSV files).
5. Reading/writing binary data, such as images or video, from/to disk.
This will probably involve doing a bunch of (3), which is kind of like
(2), but followed by large homogenous arrays of some data (ex: pixels).
6. Receiving unstructured user input.  This is my <10%.

Note that (6) is likely to happen eventually but also likely to be
minuscule: why are we receiving user input?  Maybe it's just to store it
for retrieval later.  BUT, maybe we actually want it to DO something.
If we want it to do something, then we need to structure it before code
will be able to operate on it.

(5) is a mix of structured heterogeneous data and structured homogenous
data.  In aggregate, this is structured heterogeneous data, because you
need to do parsing to figure out where the arrays of homogeneous data
start and end (and what they *mean*).

This is why I think it will be much more important to have at least
these two interfaces take front-and-center:
A.  The presence of a .popAs!(...) operation (mentioned by Wyatt in this
thread, IIRC) for simple deserialization, and maybe for other
miscellaneous things like structured user interaction.


To me, this is a higher-level function. popAs cannot assume to know how 
to read what it is reading. If you mean something like reading an entire 
struct in binary form, that's not difficult to do.



B.  The ability to attach parsers to streams easily.  This might be as
easy as coercing the input stream into the basic encoding that the
parser expects (ex: char/wchar/dchar Ranges for compilers, or maybe
ubyte Ranges for our PostgreSQL client's network layer), though it might
need (A) to help a bit first if the encoding isn't known in advance
(text files can be represented in sooo many ways!  isn't it fabulous!).


This is the fundamental goal for my library -- enabling parsers to read 
data from a "stream" efficiently no matter how that data is sourced. I 
know your time is limited, but I would invite you to take a look at the 
convert program example that I created in my library. In it, I handle 
converting any UTF format to any other UTF format.


https://github.com/schveiguy/iopipe/blob/master/examples/convert/convert.d



I understand that most unsuspecting programmers will arrive at a stream
library expecting to immediately see an InputRange interface.  This
/probably/ is not what they really want at the end of the day.  So, I
think it will be very important for any such library to concisely and
convincingly explain the design methodology and rationale early and
aggressively.  Neglect to do this, and the library and it's
documentation will become a frustration and a violation of expectations
(an "astonishment"). Do it right, and the library's documentation will
become a teaching tool that leaves visitors feeling enlightened and
empowered.


Good points! I will definitely spend some time explaining this.


Of course, I have to wonder if someone else has contrasting experiences
with stream use-cases.  Maybe they really would be frustrated with a
range-agnostic design.  I don't want to alienate this hypothetical
individual either, so if this is you, then please share your experiences.

I hope this helps and is worth making a bunch of you read a wall of text ;)


Thanks for taking the time.

-Steve


Re: Another new io library

2016-02-18 Thread Chad Joan via Digitalmars-d
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven 
Schveighoffer wrote:
It's no secret that I've been looking to create an updated io 
library for phobos. In fact, I've been working on one on and 
off since 2011 (ouch).


...


Hi everyone, it's been a while.

I wanted to chime in on the streams-as-ranges thing, since I've 
thought about this quite a bit in the past and discussed it with 
Wyatt outside of the forum.


Steve: My apologies in advance if I a misunderstood any of the 
functionality of your IO library.  I haven't read any of the 
documentation, just this thread, and I my time is over-committed 
as usual.


Anyhow...

I believe that when I am dealing with streams, >90% of the time I 
am dealing with data that is *structured* and *heterogeneous*.  
Here are some use-cases:

1. Parsing/writing configuration files (ex: XML, TOML, etc)
2. Parsing/writing messages from some protocol, possibly over a 
network socket (or sockets).  Example: I am writing a PostgreSQL 
client and need to deserialize messages: 
http://www.postgresql.org/docs/9.2/static/protocol-message-formats.html
3. Serializing/deserializing some data structures to/from disk.  
Example: I am writing a game and I need to implement save/load 
functionality.
4. Serializing/deserializing tabular data to/from disk (ex: .CSV 
files).
5. Reading/writing binary data, such as images or video, from/to 
disk.  This will probably involve doing a bunch of (3), which is 
kind of like (2), but followed by large homogenous arrays of some 
data (ex: pixels).

6. Receiving unstructured user input.  This is my <10%.

Note that (6) is likely to happen eventually but also likely to 
be minuscule: why are we receiving user input?  Maybe it's just 
to store it for retrieval later.  BUT, maybe we actually want it 
to DO something.  If we want it to do something, then we need to 
structure it before code will be able to operate on it.


(5) is a mix of structured heterogeneous data and structured 
homogenous data.  In aggregate, this is structured heterogeneous 
data, because you need to do parsing to figure out where the 
arrays of homogeneous data start and end (and what they *mean*).


This is why I think it will be much more important to have at 
least these two interfaces take front-and-center:
A.  The presence of a .popAs!(...) operation (mentioned by Wyatt 
in this thread, IIRC) for simple deserialization, and maybe for 
other miscellaneous things like structured user interaction.
B.  The ability to attach parsers to streams easily.  This might 
be as easy as coercing the input stream into the basic encoding 
that the parser expects (ex: char/wchar/dchar Ranges for 
compilers, or maybe ubyte Ranges for our PostgreSQL client's 
network layer), though it might need (A) to help a bit first if 
the encoding isn't known in advance (text files can be 
represented in sooo many ways!  isn't it fabulous!).


I understand that most unsuspecting programmers will arrive at a 
stream library expecting to immediately see an InputRange 
interface.  This /probably/ is not what they really want at the 
end of the day.  So, I think it will be very important for any 
such library to concisely and convincingly explain the design 
methodology and rationale early and aggressively.  Neglect to do 
this, and the library and it's documentation will become a 
frustration and a violation of expectations (an "astonishment").  
Do it right, and the library's documentation will become a 
teaching tool that leaves visitors feeling enlightened and 
empowered.


Of course, I have to wonder if someone else has contrasting 
experiences with stream use-cases.  Maybe they really would be 
frustrated with a range-agnostic design.  I don't want to 
alienate this hypothetical individual either, so if this is you, 
then please share your experiences.


I hope this helps and is worth making a bunch of you read a wall 
of text ;)


- Chad


Re: Another new io library

2016-02-18 Thread H. S. Teoh via Digitalmars-d
On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via 
Digitalmars-d wrote:
> On 2/18/16 2:53 PM, Wyatt wrote:
> >On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:
> 
> >>But the concept of what constitutes an "item" in a stream may not be
> >>the "element type". That's what I'm getting at.
> >>
> >Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a
> >stream?  It sort of precludes that by definition, which is why we
> >have to give it a type manually.  What benefit is there to giving the
> >buffer type separately from the window that gives you a typed slice
> >into it? (I like that, btw.)
> 
> An "item" in a stream may be a line of text, it may be a packet of
> data, it may actually be a byte. But the compiler requires we type the
> buffer as something rigid that it can work with.
> 
> The elements of the stream are the basic fixed-sized units we use (the
> array element type). The items are less concrete.
[...]

But array elements don't necessarily have to be fixed-sized, do they?
For example, an array of lines can be string[] (or const(char)[][]). Of
course, dealing with variable-sized items is messy, and probably rather
annoying to implement.  But it's *possible*, in theory.


T

-- 
People tell me that I'm paranoid, but they're just out to get me.


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 4:02 PM, H. S. Teoh via Digitalmars-d wrote:

On Thu, Feb 18, 2016 at 03:20:58PM -0500, Steven Schveighoffer via 
Digitalmars-d wrote:

On 2/18/16 2:53 PM, Wyatt wrote:

On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:



But the concept of what constitutes an "item" in a stream may not be
the "element type". That's what I'm getting at.


Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a
stream?  It sort of precludes that by definition, which is why we
have to give it a type manually.  What benefit is there to giving the
buffer type separately from the window that gives you a typed slice
into it? (I like that, btw.)


An "item" in a stream may be a line of text, it may be a packet of
data, it may actually be a byte. But the compiler requires we type the
buffer as something rigid that it can work with.

The elements of the stream are the basic fixed-sized units we use (the
array element type). The items are less concrete.

[...]

But array elements don't necessarily have to be fixed-sized, do they?
For example, an array of lines can be string[] (or const(char)[][]). Of
course, dealing with variable-sized items is messy, and probably rather
annoying to implement.  But it's *possible*, in theory.


But the point of a stream is that it's contiguous data. A string[] has 
contiguous data that are pointers and lengths of a fixed size 
(sizeof(string) is fixed).


This is not how you'd get data from a file or socket.

Since this library doesn't discriminate what the data source provides 
(it will accept string[] as window type), it's possible. In this case, 
the element type might make sense as the range front type, but it's not 
a typical case. However, it might be interesting as, say, a message 
stream from one thread to another.


-Steve


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 2:53 PM, Wyatt wrote:

On Thursday, 18 February 2016 at 18:35:40 UTC, Steven Schveighoffer wrote:



But the concept of what constitutes an "item" in a stream may not be
the "element type". That's what I'm getting at.


Hmm, I guess I'm not seeing it.  Like, what even is an "item" in a
stream?  It sort of precludes that by definition, which is why we have
to give it a type manually.  What benefit is there to giving the buffer
type separately from the window that gives you a typed slice into it? (I
like that, btw.)


An "item" in a stream may be a line of text, it may be a packet of data, 
it may actually be a byte. But the compiler requires we type the buffer 
as something rigid that it can work with.


The elements of the stream are the basic fixed-sized units we use (the 
array element type). The items are less concrete.



And I think parsing/processing stream data works better by examining
the buffer than shoehorning range functions in there.


I think it's debatable.  But part of stream semantics is being able to
use it like a stream, and my BER toy was in that vein. Sorry again, this
is probably not the place for it unless you try to replace the
std.stream for real.


I think stream semantics are what you should use. I haven't used 
std.stream, so I don't know what the API looks like.


I assumed as! was something that returns a range of that type. Maybe I'm 
wrong?


-Steve


Re: Another new io library

2016-02-18 Thread Wyatt via Digitalmars-d
On Thursday, 18 February 2016 at 18:35:40 UTC, Steven 
Schveighoffer wrote:

On 2/18/16 12:08 PM, Wyatt wrote:


I hadn't thought of this before, but if we accept that a 
stream is raw,

untyped data, it may be best _not_ to provide a range interface
directly.  It's easy enough to

alias source = sourceStream.as!ubyte;

anyway, right?


An iopipe is typed however you want it to be.

Sorry, sorry, just thinking (too much?) in terms of the 
conceptual underpinnings.


But I don't think we really disagree, either: if you don't give a 
stream a type it doesn't have one "naturally", so it's best to be 
explicit even if you're just asking for raw bytes.  That's all 
I'm really saying there.


But the concept of what constitutes an "item" in a stream may 
not be the "element type". That's what I'm getting at.


Hmm, I guess I'm not seeing it.  Like, what even is an "item" in 
a stream?  It sort of precludes that by definition, which is why 
we have to give it a type manually.  What benefit is there to 
giving the buffer type separately from the window that gives you 
a typed slice into it? (I like that, btw.)


However, you have some issues there :) popFront doesn't return 
anything.


Clearly, as!() returns the data! ;)

But criminy, I do actually forget that ALL the damn time!  (I 
blame Broadcom.)  The worst part is I think I've even read the 
rationale for why it's like that and agreed with it with much 
nodding of the head and all that. :(


And I think parsing/processing stream data works better by 
examining the buffer than shoehorning range functions in there.


I think it's debatable.  But part of stream semantics is being 
able to use it like a stream, and my BER toy was in that vein.  
Sorry again, this is probably not the place for it unless you try 
to replace the std.stream for real.


-Wyatt


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 12:08 PM, Wyatt wrote:

On Thursday, 18 February 2016 at 15:44:00 UTC, Steven Schveighoffer wrote:

On 2/17/16 5:54 AM, John Colvin wrote:

On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer
wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:


A few things:
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126


why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...


Not sure what you mean.


I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by default.


What is front for an input stream? A byte? A character? A word? A line?


Why not just say it's a ubyte and then compose with ranges from there?


If I provide a range by element (it may not be ubyte), then that's
likely not the most useful range to have.


I hadn't thought of this before, but if we accept that a stream is raw,
untyped data, it may be best _not_ to provide a range interface
directly.  It's easy enough to

alias source = sourceStream.as!ubyte;

anyway, right?


An iopipe is typed however you want it to be.

bufferedInput by default uses an ArrayBuffer!ubyte. You can have it use 
any type of buffer you want, it doesn't discriminate. The only 
requirement is that the buffer's window is a random-access range 
(although I'm having thoughts that I should just require it to be an array).


But the concept of what constitutes an "item" in a stream may not be the 
"element type". That's what I'm getting at.





This is why I think it's better to have the user specifically tell me
"this is how I want to range-ify this stream" rather than assume.


I think this makes more sense with TLV encodings, too.  Thinking of
things like:

switch(source.as!(BERType).popFront){
 case(UNIVERSAL|PRIMITIVE|UTF8STRING){
 int len;
 if(source.as!(BERLength).front & 0b10_00_00_00) {
 // X.690? Never heard of 'em!
 } else {
 len = source.as!(BERLength).popFront;
 }
 return source.buffered(len).as!(string).popFront;
 }
 ...etc.
}


Very cool looking!

However, you have some issues there :) popFront doesn't return anything. 
And I think parsing/processing stream data works better by examining the 
buffer than shoehorning range functions in there.


-Steve


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 12:16 PM, Wyatt wrote:

On Thursday, 18 February 2016 at 16:36:37 UTC, Steven Schveighoffer wrote:

Note, asInputRange may not do what you want here. If multiple
zmqPollItems come in at once (I'm not sure how your socket works), the
input range's front will provide the entire window of data, and flush
it on popFront.


Not so great!  That's really not what I'd expect at all. :( (This isn't
to say it doesn't make sense semantically, but I don't like how it feels.)


The philosophy that I settled on is to create an iopipe that extends one 
"item" at a time, even if more are available. Then, apply the range 
interface on that.


When I first started to write byLine, I made it a range. Then I thought, 
"what if you wanted to iterate by 2 lines at a time, or iterate by one 
line at a time, but see the last 2 for context?", well, then that would 
be another type, and I'd have to abstract out the functionality of line 
searching.


So I decided to just make an abstract "asInputRange" and just wrap the 
functionality of extending data one line at a time. The idea is to make 
building blocks as simple and useful as possible.


So what I think may be a good fit for your application (without knowing 
all the details) is to create an iopipe that delineates each message and 
extends exactly one message per call to extend. Then, you can wrap that 
in asInputRange, or create your own range which translates the actual 
binary data to a nicer object for each call to front.


So something like:

foreach(pollItem; zmqSocket.bufferedInput
.byZmqPacket
.asInputRange)

I'm still not 100% sure that this is the right way to do it...

Hm... if asInputRange took a template parameter of what type it should 
return, then asInputRange!zmqPacket could return zmqPacket(pipe.window) 
for front. That's kind of nice.



I'm thinking I'll change the name byInputRange to byWindow, and add a
byElement for an element-wise input range.


Oh, I see.  Naming.  Naming is hard.


Yes. It's especially hard when you haven't seen how others react to it :)

-Steve


Re: Another new io library

2016-02-18 Thread Wyatt via Digitalmars-d
On Thursday, 18 February 2016 at 16:36:37 UTC, Steven 
Schveighoffer wrote:

On 2/18/16 11:07 AM, Wyatt wrote:
This looks pretty all-right so far.  Would something like this 
work?


foreach(pollItem; zmqSocket.bufferedInput
 .as!(zmqPollItem)
 .asInputRange)


Yes, that is the intent. All without copying.


Great!

Note, asInputRange may not do what you want here. If multiple 
zmqPollItems come in at once (I'm not sure how your socket 
works), the input range's front will provide the entire window 
of data, and flush it on popFront.


Not so great!  That's really not what I'd expect at all. :(  
(This isn't to say it doesn't make sense semantically, but I 
don't like how it feels.)


I'm thinking I'll change the name byInputRange to byWindow, and 
add a byElement for an element-wise input range.



Oh, I see.  Naming.  Naming is hard.

-Wyatt


Re: Another new io library

2016-02-18 Thread Wyatt via Digitalmars-d
On Thursday, 18 February 2016 at 15:44:00 UTC, Steven 
Schveighoffer wrote:

On 2/17/16 5:54 AM, John Colvin wrote:
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
Schveighoffer wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:


A few things:
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126

why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...


Not sure what you mean.


I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction 
by default.


What is front for an input stream? A byte? A character? A 
word? A line?


Why not just say it's a ubyte and then compose with ranges 
from there?


If I provide a range by element (it may not be ubyte), then 
that's likely not the most useful range to have.


I hadn't thought of this before, but if we accept that a stream 
is raw, untyped data, it may be best _not_ to provide a range 
interface directly.  It's easy enough to


alias source = sourceStream.as!ubyte;

anyway, right?

This is why I think it's better to have the user specifically 
tell me "this is how I want to range-ify this stream" rather 
than assume.


I think this makes more sense with TLV encodings, too.  Thinking 
of things like:


switch(source.as!(BERType).popFront){
case(UNIVERSAL|PRIMITIVE|UTF8STRING){
int len;
if(source.as!(BERLength).front & 0b10_00_00_00) {
// X.690? Never heard of 'em!
} else {
len = source.as!(BERLength).popFront;
}
return source.buffered(len).as!(string).popFront;
}
...etc.
}

Musing: I'd probably want a helper like popAs!() so I don't 
forget popFront()...


-Wyatt


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/18/16 11:07 AM, Wyatt wrote:

On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven Schveighoffer wrote:


foreach(line; (new IODevice(0)).bufferedInput
.asText!(UTFType.UTF8)
.byLine
.asInputRange)
   // handle line


This looks pretty all-right so far.  Would something like this work?

foreach(pollItem; zmqSocket.bufferedInput
 .as!(zmqPollItem)
 .asInputRange)


Yes, that is the intent. All without copying.

Note, asInputRange may not do what you want here. If multiple 
zmqPollItems come in at once (I'm not sure how your socket works), the 
input range's front will provide the entire window of data, and flush it 
on popFront.


I'll also point at arrayCastPipe 
(https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L399), 
which simply casts the input array window to a new type of array window 
(if the items are coming in binary form).


I'm thinking I'll change the name byInputRange to byWindow, and add a 
byElement for an element-wise input range.





6. There is a concept in here I called "valves". It's very weird, but
it allows unifying input and output into one seamless chain. In fact,
I can't think of how I could have done output in this regime without
them. See the convert example application for details on how it is used.


This... might be cool?  It bears some similarity to my own ideas.  I'd
like to see more examples, though.


I'm hoping people can come up with ideas for other uses for them. I 
really like the concept, but the only use case I have right now is 
output streams.


It would be cool to see if there's a use case for multiple valves.

-Steve


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/17/16 5:47 PM, deadalnix wrote:

First, I'm very happy to see that. Sounds like a good project. Some
remarks:
  - You seems to be using classes. These are good to compose at runtime,


I have one class, the IODevice. As I said in the announcement, this 
isn't a focus of the library, just a way to play with the other pieces 
:) It's utility isn't very important. One thing it does do (a relic from 
when I was thinking of trying to replace stdio.File innards) is take 
over a FILE *, and close the FILE * on destruction.


But I'm steadfastly against using classes for the meat of the library 
(i.e. the range-like pipeline types). I do happen to think classes work 
well for raw i/o, since the OS treats i/o items that way (e.g. a network 
socket is a file descriptor, not some other type), but it would be nice 
if you could have class features for non-GC lifetimes. Classes are bad 
for correct deallocation of i/o resources.



  - Being able to read.write from an io device in a generator like
manner is I think important if we are rolling out something new.


I'm not quite sure what this means.


Literally the only thing that can explain the success of Node.js is this
(everything else is crap). See async/await in C#


async I/O I was hoping could be handled like vibe does (i.e. under the 
hood with fibers).



  - Please explain valves more.


Valves allow all the types that process buffered input to process 
buffered output without changing pretty much anything. It allows me to 
have a "push" mechanism by pulling from the other end automatically.


In essence, the problem of buffered input is very different from the 
problem of buffered output. One is pulling data chunks at a time, and 
processing in finer detail, the other is processing data in finer detail 
and then pushing out chunks that are ready.


The big difference is the end of the pipe that needs user intervention. 
For input, the user is the consumer of data. With output, the user is 
the provider of data.


The problem is, how do you construct such a pipeline? The iopipe 
convention is to wrap the upstream data. For output, the upstream data 
is what you need access to. A std.algorithm.map doesn't give you access 
to the underlying range, right? So if you need access to the earlier 
part of the pipeline, how do you get to it? And how do you know how FAR 
to get to it (i.e. pipline.subpipe.subpipe.subpipe)


This is what the valve is for. The valve has 3 parts, the inlet, the 
processed data, and the outlet. The inlet works like a normal iopipe, 
but instead of releasing data upstream, it pushes the data to the 
processed data area. The outlet can only pull data from the processed 
data. So this really provides a way for the user to control the flow of 
data. (note, a lot of this is documented in the concepts.txt document)


The reason it's special is because every iopipe is required to provide 
access to an upstream valve inlet if it exists. This makes the API of 
accessing the upstream data MUCH easier to deal with. (i.e. pipeline.valve)


Then I have this wrapper called autoValve, which automatically flushes 
the downstream data when more space is needed, and makes it look like 
you are just dealing with the upstream end. This is exactly the model we 
need for buffered output.


This way, I can have a push mechanism for output, and all the processing 
pieces (for instance, byte swapping, converting to a different array 
type, etc.) don't even need to care about providing a push mechanism.



  - Profit ?


Yes, absolutely :)

-Steve


Re: Another new io library

2016-02-18 Thread Wyatt via Digitalmars-d
On Wednesday, 17 February 2016 at 06:45:41 UTC, Steven 
Schveighoffer wrote:


foreach(line; (new IODevice(0)).bufferedInput
.asText!(UTFType.UTF8)
.byLine
.asInputRange)
   // handle line

This looks pretty all-right so far.  Would something like this 
work?


foreach(pollItem; zmqSocket.bufferedInput
.as!(zmqPollItem)
.asInputRange)

3. The focus of this library is NOT replacement of std.stream, 
or even low-level i/o in general.


Oh.  Well maybe that's not the case, but it may have potential 
anyway.  If nothing else, for testing API concepts.


6. There is a concept in here I called "valves". It's very 
weird, but it allows unifying input and output into one 
seamless chain. In fact, I can't think of how I could have done 
output in this regime without them. See the convert example 
application for details on how it is used.


This... might be cool?  It bears some similarity to my own ideas. 
 I'd like to see more examples, though.


-Wyatt


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/17/16 9:52 AM, Adam D. Ruppe wrote:

On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:

Why not just say it's a ubyte and then compose with ranges from there?


You could put a range interface on it... but I think it would be of very
limited value. For one, what about fseek? How does that interact with
the range interface?


seeking a stream is not a focus of my library. I'm focusing on raw data 
throughput for an established pipeline that you expect not to move around.


A seek would require resetting the pipeline (something that is possible, 
but I haven't planned for it).



Or, what about reading a network interface where you get variable-sized
packets?


This I HAVE planned for, and it should work quite nicely. I agree that 
providing a by-default range interface may not be the most useful thing.



Copying it into a buffer is probably the most sane... but it is a
wasteful copy if your existing buffer has enough space. But how to you
say that to a range? popFront takes no arguments.


The asInputRange adapter in iopipe/bufpipe.d provides the following 
crude interface:


1. front is the current window
2. empty returns true if the window is empty.
3. popFront discards the window, and extends in the next window.

With this, any ioPipe can be turned into a crude range. It should be 
good enough for things like std.algorithm.copy. And in the case of 
byLine, it allows one to create an iopipe that caters to creating a 
range, while also giving useful functionality as a pipe.


I'm on the fence as to whether all ioPipes should be ranges. Yes, it's 
easy to do (though a lot of boilerplate, you can't UFCS this), but I 
just can't see the use case being worth it.



Ranges are great for a sequence of data that is the same type on each
call. Files, however, tend to have variable length (which you might want
to skip large sections of) and different types of data as you iterate
through them.


Very much agree.


I find std.stdio's byChunk and byLine to be almost completely useless in
my cases.


byLine I find useful (think of grep), byChunk I've never found a reason 
to use.


-Steve


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/17/16 5:54 AM, John Colvin wrote:

On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:


A few things:
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126

why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...


Not sure what you mean.


I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by default.


What is front for an input stream? A byte? A character? A word? A line?


Why not just say it's a ubyte and then compose with ranges from there?


If I provide a range by element (it may not be ubyte), then that's 
likely not the most useful range to have.


For example, the byLine iopipe gives you one more line of data each time 
you call extend. But the data in the window is not necessarily one line, 
and the element type is char, wchar, or dchar. None of those I would 
this is what someone would expect or want.


This is why I think it's better to have the user specifically tell me 
"this is how I want to range-ify this stream" rather than assume.


-Steve


Re: Another new io library

2016-02-18 Thread Steven Schveighoffer via Digitalmars-d

On 2/17/16 3:54 AM, yawniek wrote:

On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven Schveighoffer wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:
What would be the benefit of having it an input range by default?


https://en.wikipedia.org/wiki/Principle_of_least_astonishment
something the D community is lacking a bit in general imho.


There are exceptions (e.g. byLine), but the likelihood that providing a 
range interface is the range that the user would expect is pretty low.



but awesome library, will definitely use, thanks!


Thanks! Please let me know what you think if you end up using it.

-Steve


Re: Another new io library

2016-02-17 Thread deadalnix via Digitalmars-d
On Wednesday, 17 February 2016 at 23:15:51 UTC, Jonathan M Davis 
wrote:

On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:
See async/await in C# 
(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)


Or for those poor souls who can't read French... ;)

https://msdn.microsoft.com/en-us/library/hh191443.aspx

- Jonathan M Davis


Thank you for the fixup :)


Re: Another new io library

2016-02-17 Thread Jonathan M Davis via Digitalmars-d

On Wednesday, 17 February 2016 at 22:47:27 UTC, deadalnix wrote:
See async/await in C# 
(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx)


Or for those poor souls who can't read French... ;)

https://msdn.microsoft.com/en-us/library/hh191443.aspx

- Jonathan M Davis


Re: Another new io library

2016-02-17 Thread deadalnix via Digitalmars-d
First, I'm very happy to see that. Sounds like a good project. 
Some remarks:
 - You seems to be using classes. These are good to compose at 
runtime, but we can do better at compile time using value types. 
I suggest using value types and have a class wrapper that can be 
used to make things composable at runtime if desirable.
 - Being able to read.write from an io device in a generator like 
manner is I think important if we are rolling out something new. 
Literally the only thing that can explain the success of Node.js 
is this (everything else is crap). See async/await in C# 
(https://msdn.microsoft.com/fr-fr/library/hh191443.aspx) or Hack 
(https://docs.hhvm.com/hack/async/introduction).

 - I like the input range stuff. Input ranges needs more love.
 - Please explain valves more.
 - ...
 - Profit ?


Re: Another new io library

2016-02-17 Thread Adam D. Ruppe via Digitalmars-d

On Wednesday, 17 February 2016 at 10:54:56 UTC, John Colvin wrote:
Why not just say it's a ubyte and then compose with ranges from 
there?


You could put a range interface on it... but I think it would be 
of very limited value. For one, what about fseek? How does that 
interact with the range interface?



Or, what about reading a network interface where you get 
variable-sized packets?


A ubyte[] is probably the closest thing you can get to 
usefulness, but even then you'd need non-range buffering controls 
to make it efficient and usable. Consider the following:


Packet 1: 11\nHello
Packet 2:  World05\nD ro
Packet 3: x


You take the ubyte[] thing that gives each packet at a time as it 
comes off the hardware interface. Good, you can process as it 
comes and it fits the range interface.


But it isn't terribly useful. Are you going to copy the partial 
message into another buffer so the next range.popFront doesn't 
overwrite it? Or will you present the incomplete message from 
packet 1 to the consumer? The former is less than efficient (and 
still needs to wrap the range in some other interface to make the 
user code pretty) and the latter leads to ugly user code being 
directly exposed.


Copying it into a buffer is probably the most sane... but it is a 
wasteful copy if your existing buffer has enough space. But how 
to you say that to a range? popFront takes no arguments.


What about packet 2, which has part of the first message and part 
of the second message? Can you tell it that you already consumed 
the first six bytes and it can now append the next packet to the 
existing buffer, but please return that slice on the next call?




Ranges are great for a sequence of data that is the same type on 
each call. Files, however, tend to have variable length (which 
you might want to skip large sections of) and different types of 
data as you iterate through them.


I find std.stdio's byChunk and byLine to be almost completely 
useless in my cases.


Re: Another new io library

2016-02-17 Thread John Colvin via Digitalmars-d
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
Schveighoffer wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:


A few things:
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...


Not sure what you mean.


I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by 
default.


What is front for an input stream? A byte? A character? A word? 
A line?


Why not just say it's a ubyte and then compose with ranges from 
there?


Re: Another new io library

2016-02-17 Thread yawniek via Digitalmars-d
On Wednesday, 17 February 2016 at 07:15:01 UTC, Steven 
Schveighoffer wrote:

On 2/17/16 1:58 AM, Rikki Cattermole wrote:
What would be the benefit of having it an input range by 
default?


-Steve


https://en.wikipedia.org/wiki/Principle_of_least_astonishment
something the D community is lacking a bit in general imho.

but awesome library, will definitely use, thanks!


Re: Another new io library

2016-02-16 Thread Steven Schveighoffer via Digitalmars-d

On 2/17/16 1:58 AM, Rikki Cattermole wrote:


A few things:
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126
why isn't that used more especially with e.g. window?
After all, window seems like a very well used word...


Not sure what you mean.


I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by default.


What is front for an input stream? A byte? A character? A word? A line?

It's not there by default because it would be too assuming IMO. You can 
create an input range out of a stream quite easily.


e.g. 
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/bufpipe.d#L664


What would be the benefit of having it an input range by default?

-Steve


Re: Another new io library

2016-02-16 Thread Rikki Cattermole via Digitalmars-d

On 17/02/16 7:45 PM, Steven Schveighoffer wrote:

It's no secret that I've been looking to create an updated io library
for phobos. In fact, I've been working on one on and off since 2011 (ouch).

After about 5 iterations of API and design, and testing out ideas, I
think I have come up with something pretty interesting. It started out
as a plan to replace std.stdio (and that did not go over well:
https://forum.dlang.org/post/j3u0l4$1atr$1...@digitalmars.com), in addition
to trying to find a better way to deal with i/o. However, I've scaled
back my plan of world domination to just try for the latter, and save
tackling the replacement of Phobos's i/o guts for a later battle, if at
all. It's much easier to reason about something new than to muddle the
discussion with how it will break code. It's also much easier to build
something that doesn't have to be a drop-in replacement of something so
insanely complex.

I also have been inspired over the last few years by various great
presentations and libraries, two being Dmitry's proof-of-concept library
to have buffers that automatically move/fill when more data is needed,
and Andrei's std.allocator library. They have changed drastically the
way I have approached this challenge.

Therefore, I now have a new dub-based repository available for playing
with: https://github.com/schveiguy/iopipe. First, the candy:

- This is a piping library. It allows one to hook buffered i/o through
various processors/transformers much like unix pipes or range
functions/algorithms. However, unlike unix pipes, this library attempts
to make as few copies as possible of the data.

example:

foreach(line; (new IODevice(0)).bufferedInput
 .asText!(UTFType.UTF8)
 .byLine
 .asInputRange)
// handle line

- It can handle 5 forms of UTF encoding - UTF8, UTF16, UTF16LE, UTF32,
UTF32LE (phobos only partially handles UTF8). Sorry, no grapheme support
or other utf-related things, but this of course can be added later.

- Arrays are first-class ioPipe types. This works:

foreach(line; "one\ntwo\nthree\nfour\n".byLine.asInputRange)

- Everything is compile-time for the most part, and uses lots of
introspection. The intent is to give the compiler full gamut of
optimization capabilities.

- I added rudimentary compression/decompression support using
etc.c.zlib. Using compression is done like so:

foreach(line; (new IODevice(0)).bufferedInput
 .unzip
 .asText!(UTFType.UTF8)
 .byLine
 .asInputRange)

- The plan is for this to be a basis to make super-fast and modular
parsing libraries. I plan to write a JSON one as a proof of concept. So
all you have to do is add a parseJSON function to the end of any chain,
as long as the the input is some pipe of text data (including a string
literal).


=

I will stress some very very important things:

1. This library is FAR from finished. Even the concepts probably need
some tweaking. But I'm very happy with the current API/usage.

2. Docs are very thin. Unit tests are sparse (but do pass).

3. The focus of this library is NOT replacement of std.stream, or even
low-level i/o in general. In fact, I have copied over my stream class
from previous attempts at this i/o rewrite ONLY as a mechanism to have
something that can read/write from file descriptors with the right API
(located in iopipe/stream.d). I admit to never having looked at
std.stream really, so I have no idea how it would compare.

4. As the stream framework is only for playing with the other useful
parts of the library, I only wrote it for my OS (OSX), so you won't be
able to play out of the box on Windows (probably can be added without
much effort, or use another stream library such as this one that was
recently announced:
https://forum.dlang.org/post/xtxiuxcmewxnhseub...@forum.dlang.org), but
it will likely work on other Unixen.

5. This is NOT thread-aware out of the box.

6. There is a concept in here I called "valves". It's very weird, but it
allows unifying input and output into one seamless chain. In fact, I
can't think of how I could have done output in this regime without them.
See the convert example application for details on how it is used.

7. I expect to be changing the buffer API, as I think perhaps I have the
wrong abstraction for buffers. However, I did attempt to have a
std.allocator version of the buffer.

8. It's not on code.dlang.org yet. I'll work on this.

Destroy!

-Steve


A few things: 
https://github.com/schveiguy/iopipe/blob/master/source/iopipe/traits.d#L126 
why isn't that used more especially with e.g. window?

After all, window seems like a very well used word...

I don't like that a stream isn't inherently an input range.
This seems to me like a good place to use this abstraction by default.