Hi Ross, On Wed, Jan 13, 2010 at 12:12:04PM -0500, Ross West wrote: > I can see a small confusion here because I've used the wrong > terminology. Proxy is not the correct term, as there are actual proxy > devices out there (eg: Squid) which are generally visible to the > client/server and shouldn't be intentionally resending requests upon > failure.
Indeed you used the wrong term but I'm used to that mistake and I had translated :-) In HTTP terminology, haproxy is a "gateway" (RFC2616, page 8) : A server which acts as an intermediary for some other server. Unlike a proxy, a gateway receives requests as if it were the origin server for the requested resource; the requesting client may not be aware that it is communicating with a gateway. > To describe what I mean is that the loadbalancer would keep a copy > (silently) of the client's request until a server gave a valid > response. That's what I understood and I refuse. Performing a copy of a request is : 1) memory-expensive 2) time-consuming It's totally unacceptable to perform a copy of *every* request that passes through the LB just for the 1/1000 or less which fail. Not to mention that there must be a limit on those requests. There are some situations where we could know the request has not yet been overwritten in the buffer and might be rewinded. But even doing only that would disable pipelining and prevent buffer pooling for people who need to support large numbers of concurrent connections. > So should the connection drop unexpectedly with server "A" > after the request, the load balancer would assume something went wrong > with that server, and then resend the request to Server "B". I understood that you asked for that. But as I explained it, you can only do that for idempotent requests. And very often you don't even know that an apparently idempotent request is not. You remember the CGI counters we all used 15 years ago ? Your browser just had to send a "GET /cgi-bin/counter.gif" and it got an image with all the digits representing the number of calls. Doing it again would simply increase the value. This is an example of a non-idempotent request that must never be replayed blindly, although nothing in the request itself lets any equipment in the whole chain know about it. > Throughout this, the end client would have only sent 1 request to the > loadbalancer (as it sees the LB as the "end server"). Yes, precisely what I don't want to see : a client sending ONE request and not be aware that TWO have been played on the server on his behalf. Better enumerate the uncovered corner cases, document them clearly and have people configure their whole architecture accordingly and decide whether or not they accept the residual risks. For instance, a counter such as the one above might very well fail once in 1 million without being an issue, and it could make sense then to share connections to the servers to save a bit of their CPUs. Those with more risky services would simply disable the feature not to take any risk and will not be bothered anyway because risky services are generally not the most sollicited ones. Also I'd like to add numbers here to illustrate what I mean. Without keep-alive, my dual-core 2.66 GHz server supports 100000 requests per second, which also means 108000 connections per second. That's a full connection-request-response-close cycle. With keep-alive enabled, I can reach 167000 requests per second. That means that doing the keep-alive has saved 1/108000-1/167000 = 3.2 microseconds of CPU time per request. Yes, MICROSECONDS. Since I'm pretty sure that most servers are not running at those rates, let's assume a moderately loaded server which supports 10000 requests per second without keep-alive. That means one request every 100 microseconds. Adding keep-alive in the mix will lead to 96.8 microseconds per request, so 103000 connections per second. A small 3% improvement for a low-end server (by todays standards) running at 10k hits/s, with all the complexity and risks added. At only 1000 requests/s, you'd only get a 0.3% improvement. However, on the load balancer, you still have to cope with that load. On the same server as above, haproxy reaches 42000 requests/s (no keep-alive). Simply duplicating a request in memory at this rate and using a second buffer for that will certainly hit it by more than 3% ! So in the end, what you're trying to save on a single scalable point can be lost on the overall architecture. Don't get me wrong, I'm not saying that aggregating connections is always wrong. There are legitimate uses of it. But often people consider this is the panacea while in fact it just hides the wrong setup on their servers. And in my opinion, tweaking the whole chain for this to work reliably is a lot harder than disabling ip_conntrack on the target server (for instance). What would *really* save resources would be to enable pipelining in such connections. With pipelining, haproxy already reaches 2 millions requests per second on the machine above. Yes, 2 MILLIONS. This is because you considerably reduce the number of packets on the network and merge many requests into one packet. It was the first time I saw it consume more user than system. But doing that with multiple connections is almost impossible because very often a server will close a connection when responding an error, and all other clients requests will be lost. With a single client this is correctly handled because the client knows that it can only send idempotent requests in pipeline mode and that in case of error, it must resend the ones lefts without a response. And at the rates we would reach with pipelining, and the added high complexity, it would not be even remotely thinkable about copying requests in memory just for the sake of replaying them on session close. So basically, yes I'm OK with adding the ability to reuse connections to servers, no I don't want to play dirty tricks to cover the uncoverable cases, and I'm still trying to figure how to make that available while at the same time ensuring that people won't enable it without understanding the consequences. "it works fast" is marketting lies here, "it basically works better but may fail from time to time" is reality. (...) > WT> I believe you that it worked fine. But my concern is not to confirm > WT> after some tests that finally it works fine, but rather to design it > WT> so that it works fine. Unfortunately HTTP doesn't permit it, so there > WT> are tradeoffs to make, and that causes me a real problem you see. > > Yes, the more I re-read the rfc, the more I feel your pain when they > specify "SHOULD/MAY" rather than "MUST/MUST NOT" allowing for those > corner cases to occur in the first place. Thanks, at least someone who cares to read that awful pile of junk that tries to standardize all the crap that has been invented by vendors over the years :-) BTW, the httpbis working group has finally written down some of the exceptions to standard rules, such as the "non-mergeability" of the set-cookie header. > WT> Indeed. Not to mention that applications today use more and more resources > WT> because they're written by stacking up piles of crap and sometimes the > WT> network has absolutely no impact at all due to the amount of crap being > WT> executed during a request. > > I don't want to get started in the [non-]quality of the asp programmer's > code of that project. I still have nightmares. It's not limited to ASP, believe me. Some people who program in 4-letter language create awful things that prove they believe that a computer is the magic box that connects keyboard, mouse and display together to have fun. Regards, Willy