Re: [mongrel2] Need save memory sending big responses from handlers

Jason Miller Mon, 08 Oct 2012 11:18:21 -0700
On 17:45 Fri 05 Oct     , Justin Karneges wrote:
> On Friday, October 05, 2012 04:56:17 PM Jason Miller wrote:
> > The only thing that is hard is streaming a large amount of data that
> > isn't a file.
> 
> I have this problem as I'm developing an HTTP proxy. I want to be able to 
> send 
> data back to the client as each packet of an HTTP response is received, as 
> opposed to buffering an entire HTTP response in the proxy before sending it 
> to 
> the client. So, this is not a file source, but I do need some kind of flow 
> control. Fixed rate is an unfortunate solution.
> 
Fair enough
> > For example, the extended reply system is flexible enough that we could
> > use it to establish a plain-old TCP (or unix domain) connection betweem
> > m2 and the handler to send the data over on a 1 connection per client ID
> > basis.  The problem with zeromq is that it buffers arbitrarily large
> > amounts of data under-the-hood.  Streaming might make more sense over
> > TCP.
> 
> Yes, 1 to 1 TCP connections for streaming could work. It's probably the most 
> straightforward, but it does mean lots of extra fds being used, and it may 
> require some socket judo to ensure you don't wind up with a thread-per-
> connection.
> 
> The other idea is credits-based flow control, which I mentioned in an earlier 
> email. Here's how I would do it:
> 
> Have M2 offer a third socket, ROUTER type, intended for delivering messages 
> directly to known handlers. Basically it will be used to send credits, 
> although it could have more uses in the future. Handlers that need to stream 
> responses should connect to this socket so that they may receive the credits 
> messages. Handlers should set ZMQ_IDENTITY to some value (even a random value 
> generated by the handler is fine, it just needs to be set to /something/ so 
> that the handler becomes referenceable).
> 
> When M2 pushes a request to an arbitrary handler, an initial number of 
> credits 
> in bytes, e.g. 200000, is provided as an integer value in the request. When a 
> handler pubs its first response message for this request id, it should 
> include 
> the identity of the socket it used to connect to M2's ROUTER socket. M2 will 
> then associate the handler's socket id to the request.
> 
> Whenever M2 successfully writes data to a client connection, it sends a 
> message over the ROUTER socket directly to the handler associated with the 
> request containing an integer value of credits equal to the number of bytes 
> written.
> 
> The handler's job is to ensure it does not send more data to M2 than is 
> allowed by the credits. M2 itself doesn't actually have to enforce credits 
> nor 
> even record credits. It can trust handlers to not be evil.
> 
> This approach would allow keeping the number of sockets and workers fixed.
> 
> Note: The third socket also opens the door for non-file-based streaming of 
> large inbound requests. This is another feature I want, too. ;) Some handler 
> could ack the initial request message and include its identity, and then M2 
> could stream the rest of the body using the ROUTER socket to send 
> specifically 
> to the handler that claimed responsibility of reading the request.
> 
> Justin
> 
This is similar to an  idea I had that would be simpler to implement
with the current architecture would be to have a 3rd socket that is
PUB/SUB (with mongrel2 being the PUB side and handlers being the SUB
side), and whenever a connection gets within a certain amount of the
limit it publishes a flow-control message.  It's basically the same
idea, but bang/bang rather than value based, and it just publishes them
out there for handlers to care about.  If the handler ignores it, then
mongrel2 will close the connection the same way it does now.  Without
some sort of behavior of mongrel2 closing highly buffered connections,
it becomes too easy to accidentally write a handler that is vulnerable
to DoS attacks, which is something I want to avoid.
Re: [mongrel2] Need save memory sending big responses from handlers

Reply via email to