Hi Ross,

On Wed, Jan 13, 2010 at 12:12:04PM -0500, Ross West wrote:
> I can see a small confusion here because I've used the wrong
> terminology. Proxy is not the correct term, as there are actual proxy
> devices out there (eg: Squid) which are generally visible to the
> client/server and shouldn't be intentionally resending requests upon
> failure.

Indeed you used the wrong term but I'm used to that mistake and
I had translated :-) In HTTP terminology, haproxy is a "gateway"
(RFC2616, page 8) :

   A server which acts as an intermediary for some other server.
   Unlike a proxy, a gateway receives requests as if it were the
   origin server for the requested resource; the requesting client
   may not be aware that it is communicating with a gateway.

> To describe what I mean is that the loadbalancer would keep a copy
> (silently) of the client's request until a server gave a valid
> response.

That's what I understood and I refuse. Performing a copy of a request is :
   1) memory-expensive
   2) time-consuming

It's totally unacceptable to perform a copy of *every* request that
passes through the LB just for the 1/1000 or less which fail. Not to
mention that there must be a limit on those requests. There are some
situations where we could know the request has not yet been overwritten
in the buffer and might be rewinded. But even doing only that would
disable pipelining and prevent buffer pooling for people who need to
support large numbers of concurrent connections.

> So should the connection drop unexpectedly with server "A"
> after the request, the load balancer would assume something went wrong
> with that server, and then resend the request to Server "B".

I understood that you asked for that. But as I explained it, you can
only do that for idempotent requests. And very often you don't even
know that an apparently idempotent request is not. You remember the
CGI counters we all used 15 years ago ? Your browser just had to send
a "GET /cgi-bin/counter.gif" and it got an image with all the digits
representing the number of calls. Doing it again would simply increase
the value. This is an example of a non-idempotent request that must
never be replayed blindly, although nothing in the request itself lets
any equipment in the whole chain know about it.

> Throughout this, the end client would have only sent 1 request to the
> loadbalancer (as it sees the LB as the "end server").

Yes, precisely what I don't want to see : a client sending ONE request
and not be aware that TWO have been played on the server on his behalf.
Better enumerate the uncovered corner cases, document them clearly and
have people configure their whole architecture accordingly and decide
whether or not they accept the residual risks. For instance, a counter
such as the one above might very well fail once in 1 million without
being an issue, and it could make sense then to share connections to
the servers to save a bit of their CPUs. Those with more risky services
would simply disable the feature not to take any risk and will not be
bothered anyway because risky services are generally not the most
sollicited ones.

Also I'd like to add numbers here to illustrate what I mean. Without
keep-alive, my dual-core 2.66 GHz server supports 100000 requests per
second, which also means 108000 connections per second. That's a full
connection-request-response-close cycle. With keep-alive enabled, I
can reach 167000 requests per second. That means that doing the
keep-alive has saved 1/108000-1/167000 = 3.2 microseconds of CPU time
per request. Yes, MICROSECONDS.

Since I'm pretty sure that most servers are not running at those rates,
let's assume a moderately loaded server which supports 10000 requests
per second without keep-alive. That means one request every 100
microseconds. Adding keep-alive in the mix will lead to 96.8 microseconds
per request, so 103000 connections per second. A small 3% improvement
for a low-end server (by todays standards) running at 10k hits/s, with
all the complexity and risks added. At only 1000 requests/s, you'd only
get a 0.3% improvement. However, on the load balancer, you still have
to cope with that load. On the same server as above, haproxy reaches
42000 requests/s (no keep-alive). Simply duplicating a request in memory
at this rate and using a second buffer for that will certainly hit it
by more than 3% ! So in the end, what you're trying to save on a single
scalable point can be lost on the overall architecture.

Don't get me wrong, I'm not saying that aggregating connections is
always wrong. There are legitimate uses of it. But often people consider
this is the panacea while in fact it just hides the wrong setup on their
servers. And in my opinion, tweaking the whole chain for this to work
reliably is a lot harder than disabling ip_conntrack on the target server
(for instance).

What would *really* save resources would be to enable pipelining in such
connections. With pipelining, haproxy already reaches 2 millions requests
per second on the machine above. Yes, 2 MILLIONS. This is because you
considerably reduce the number of packets on the network and merge many
requests into one packet. It was the first time I saw it consume more
user than system. But doing that with multiple connections is almost
impossible because very often a server will close a connection when
responding an error, and all other clients requests will be lost. With
a single client this is correctly handled because the client knows that
it can only send idempotent requests in pipeline mode and that in case
of error, it must resend the ones lefts without a response. And at the
rates we would reach with pipelining, and the added high complexity,
it would not be even remotely thinkable about copying requests in memory
just for the sake of replaying them on session close.

So basically, yes I'm OK with adding the ability to reuse connections
to servers, no I don't want to play dirty tricks to cover the uncoverable
cases, and I'm still trying to figure how to make that available while at
the same time ensuring that people won't enable it without understanding
the consequences. "it works fast" is marketting lies here, "it basically
works better but may fail from time to time" is reality.

(...)
> WT> I believe you that it worked fine. But my concern is not to confirm
> WT> after some tests that finally it works fine, but rather to design it
> WT> so that it works fine. Unfortunately HTTP doesn't permit it, so there
> WT> are tradeoffs to make, and that causes me a real problem you see.
> 
> Yes, the more I re-read the rfc, the more I feel your pain when they
> specify "SHOULD/MAY" rather than "MUST/MUST NOT" allowing for those
> corner cases to occur in the first place.

Thanks, at least someone who cares to read that awful pile of junk that
tries to standardize all the crap that has been invented by vendors over
the years :-)

BTW, the httpbis working group has finally written down some of the
exceptions to standard rules, such as the "non-mergeability" of the
set-cookie header.

> WT> Indeed. Not to mention that applications today use more and more resources
> WT> because they're written by stacking up piles of crap and sometimes the
> WT> network has absolutely no impact at all due to the amount of crap being
> WT> executed during a request.
> 
> I don't want to get started in the [non-]quality of the asp programmer's
> code of that project.  I still have nightmares.

It's not limited to ASP, believe me. Some people who program in 4-letter
language create awful things that prove they believe that a computer is
the magic box that connects keyboard, mouse and display together to have
fun.

Regards,
Willy


Reply via email to