On Friday, October 05, 2012 04:56:17 PM Jason Miller wrote: > The only thing that is hard is streaming a large amount of data that > isn't a file.
I have this problem as I'm developing an HTTP proxy. I want to be able to send data back to the client as each packet of an HTTP response is received, as opposed to buffering an entire HTTP response in the proxy before sending it to the client. So, this is not a file source, but I do need some kind of flow control. Fixed rate is an unfortunate solution. > What are the use cases for that? I know there are some, but there might > be few enough (or the majority might be similar enough) that a targeted > solution would make more sense then a general purpose "This will let you > stream anything without using too much RAM or loading down the server > too much" which is a non-trivial problem to solve. > > For example, the extended reply system is flexible enough that we could > use it to establish a plain-old TCP (or unix domain) connection betweem > m2 and the handler to send the data over on a 1 connection per client ID > basis. The problem with zeromq is that it buffers arbitrarily large > amounts of data under-the-hood. Streaming might make more sense over > TCP. Yes, 1 to 1 TCP connections for streaming could work. It's probably the most straightforward, but it does mean lots of extra fds being used, and it may require some socket judo to ensure you don't wind up with a thread-per- connection. The other idea is credits-based flow control, which I mentioned in an earlier email. Here's how I would do it: Have M2 offer a third socket, ROUTER type, intended for delivering messages directly to known handlers. Basically it will be used to send credits, although it could have more uses in the future. Handlers that need to stream responses should connect to this socket so that they may receive the credits messages. Handlers should set ZMQ_IDENTITY to some value (even a random value generated by the handler is fine, it just needs to be set to /something/ so that the handler becomes referenceable). When M2 pushes a request to an arbitrary handler, an initial number of credits in bytes, e.g. 200000, is provided as an integer value in the request. When a handler pubs its first response message for this request id, it should include the identity of the socket it used to connect to M2's ROUTER socket. M2 will then associate the handler's socket id to the request. Whenever M2 successfully writes data to a client connection, it sends a message over the ROUTER socket directly to the handler associated with the request containing an integer value of credits equal to the number of bytes written. The handler's job is to ensure it does not send more data to M2 than is allowed by the credits. M2 itself doesn't actually have to enforce credits nor even record credits. It can trust handlers to not be evil. This approach would allow keeping the number of sockets and workers fixed. Note: The third socket also opens the door for non-file-based streaming of large inbound requests. This is another feature I want, too. ;) Some handler could ack the initial request message and include its identity, and then M2 could stream the rest of the body using the ROUTER socket to send specifically to the handler that claimed responsibility of reading the request. Justin
