@tim

The API that I used in this blog post is a simplified version of the API I 
implemented in streamline. I simplified it in the blog post because I just 
wanted to demo the equivalence between the two styles of API.

The streams module that I am using 
(https://github.com/Sage/streamlinejs/blob/master/lib/streams/server/streams.md)
 
has most of the features that you saw missing:

* an optional "len" parameter in the read call.
* low and high water mark options in the ReadableStream constructor.

The "len" parameter has your "bytes" semantics and I use it exactly the way 
you describe (typically to read 4 bytes to get a frame length and then read 
N bytes for a frame). I did not implement "maxBytes" semantics because I 
did not need it (which does not mean it would not be useful). The thing is 
that all the additional bells and whistles can be implemented around the 
basic read(cb) call (called readChunk in my module).

I introduced low and high mark options because I wanted to avoid a 
pause/resume dance around every data event when the data arrives faster 
than it is consumed. My assumption was that a little queue with high and 
low marks would reduce the number of pause/resume calls and improve 
performance. Basically tradiing a bit of space for speed. But I have to 
admit that I did not bench it. So, if the pause/resume dance costs very 
little this may be overkill.

@isaac and mikeal,

This callback proposal may sound very "anti-eventish" and it may give the 
impression that I'm sorta trying to eradicate events from node's APis 
(nobody said it but I can see how it could be perceived this way). This is 
not the case. I like node's event API and I find it very elegant. But node 
gives us two API styles (callbacks and events) and it is not always easy to 
choose between the two. Here is the rationale that I use to decide between 
them:

My main criteria is CORRELATION. Basically, I start with the assumption 
that the API is event-oriented and then I analyze the degree of correlation 
between the various events. If the events are highly correlated, I choose 
the callback style. If there are loosely correlated, I keep the event 
style. Some examples:

* User events (browser side) are very loosely correlated => event style
* Incoming HTTP requests (server side) are also very loosely correlated => 
event style
* Data streams vary. If each data chunk is a complete message which is more 
or less independent from other messages, the event style is best. If, on 
the other hand, the chunks are correlated (because the whole stream has a 
strong internal structure, or because it has been chunked on arbitrary 
boundaries that don't match its internal structure), then the callback 
style is best.
* Confirmation events (like "connect/error" events that follows a 
connection attempt, or a "drain" event that follows a write returning 
false) are fully correlated => callback style.

Also, the event style API is more powerful than the callback style API as 
it supports multiple listeners. 
BUT:

* It is very easy to wrap a callback API with an event listener.
* Very often, in the correlated case, there is a "main" consumer which 
needs to correlate the events, and auxiliary consumers that don't care that 
much about the correlations (log them, feed statistics, etc). A dual API 
with callbacks for the main consumer and events for  the auxiliary ones 
works great.
* Wrapping an event style API with a callback style API is a lot more 
difficult.
* Callback style APIs are easier to use when the events are correlated 
because you don't need to setup state machines to re-correlate the events.

Given this, I probably favor the callback style a lot more than most node 
developers. But this is not a systematic "anti-event" attitude, there is a 
rationale behind it and I wanted to share it with you.

Bruno


On Saturday, July 28, 2012 9:14:11 PM UTC+2, Mikeal Rogers wrote:
>
>
> On Jul 28, 2012, at July 28, 201212:05 PM, Tim Caswell <t...@creationix.com> 
> wrote: 
>
> > FWIW, I actually like Bruno's proposal.  It doesn't cover all the use 
> > cases, but it makes backpressure enabled pumps really easy. 
> > 
> > One use case missing that's easy to add is when consuming a binary 
> > protocol, I often only want part of the input.  For example, I might 
> > want to get the first 4 bytes, decode that as a uint32 length header 
> > and then read n more bytes for the body.  Without being able to 
> > request how many bytes I want, I have to handle putting data back in 
> > the stream that I don't need.  That's very error prone and tedious. 
> > So on the read function, add an optional "maxBytes" or "bytes" 
> > parameter.  The difference is in the maxBytes case, I want the data as 
> > soon as there is anything, even if it's less than the number of bytes 
> > I want.   In the "bytes" case I want to wait till that many bytes are 
> > available.  Both are valid for different use cases. 
>
> The early stuff I saw included a "length" option. 
>
> > 
> > Also streams (both readable and writable) need a configurable 
> > low-water mark.  I don't want to wait till the pipe is empty before I 
> > start piping data again.  This mark would control how soon writable 
> > streams called my write callback and how much readable streams would 
> > readahead from their data source before waiting for me to call read. 
> > I want to keep it always full.  It would be great if this was handled 
> > internally in the stream and consumers of the stream simply configured 
> > what the mark should be. 
>
> I think you're missing how this works. Nobody automatically asks for data 
> so watermarks aren't strictly necessary. You ask for data if it's available 
> and you read as much as you can handle. 
>
> There is no "readahead". If someone stops calling read() then the buffer 
> fills and, if it's a TCP stream, it's asked to stop sending data. 
>
> Remember that when the "readable" event goes off it's expected that the 
> pending data is read in the same event loop cycle. 
>
>
>
>
>

Reply via email to