Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-)

Willy Tarreau Tue, 12 Jan 2010 21:36:34 -0800

On Tue, Jan 12, 2010 at 07:01:52PM -0500, Ross West wrote:
> 
> WT> It's not only a matter of caching the request to replay it, it is that
> WT> you're simply not allowed to. I know a guy who ordered a book at a
> WT> large well-known site. His order was processed twice. Maybe there is
> WT> something on this site which grants itself the right to replay a user's
> WT> request when a server connection suddenly closes on keep-alive timeout
> WT> or count.
> 
> That's more of an issue with the site than a (proxy based) load
> balancer - the LB would be doing the exact same thing as the client.


Precisely not and that's the problem. The proxy cannot ask the user
if he wants to retry on sensible requests, and the cannot precisely
know what is at risk and what is not. The client knows a lot more about
that. For instance, I think that the client will not necessarily repost
a GET form without asking the user, but it might automatically report a
request for an image.

Also, another difference is that the client *builds* the request, so it's
easy to replay it. The LB does not build it but sees it. In order to replay
anything it would have to duplicate it, which is extremely expensive to do
everytime just for very rare cases.

Last, the proxy knows it's working on a keep-alive connection, not the
client.

> According to the rfc, if a connection is prematurely closed, then the
> client would (silently) retry the request. In our case the LB just
> emulated the client's behavior towards the servers.

Only after the first request, in order to cover premature close of a
keep-alive connection. There is no such thing for the first request
and it's precisely the one causing a problem. If I forward an early
close to a client for the first request, the browser either displays
a blank page (older ones) or immediately shows a cryptic error message 
(newer ones). I was also thinking about returning a redirect to the
client in such a case, so that it automatically reposts the request
instead of displaying an error. But I'm not sure we can cover all
cases with that.

> Unfortunately for your friend, it could mean the code on the site
> didn't do any duplicate order checking.  A corner case taken care of
> by their support department I guess.

100% agreed, but they may argue that their application conforms to
RFC and that if they receive double non-idempotent requests, it's
the customer's fault. And normally it is. But when you put a request
duplicator in the middle, that changes things a lot... There are
reasons why there are specs with MUST/MUST NOT/SHOULD/SHOULD NOT/MAY...

> WT> So probably that a reasonable balance can be found but it is
> WT> clear that from time to time a user will get an error.
> 
> That sounds like the mantra of the internet in general.  :-)

I don't 100% agree. There are sites with perpetual problems and
admins playing the geeks, and there are other ones which never
have even one problem, with everything under control. The farther
you go from the specs, the more issues you encounter, and the more
you're tied to specific, proprietary software or equipment and have
to live with their bugs. When you're respecting standards very
closely, you very rarely have issues.

> WT> Maybe your LB was regularly sending dummy requests on the connections
> WT> to keep them alive, but since there is no NOP instruction in HTTP, you
> WT> have to send real work anyway.
> 
> Well, the site was busy enough that it didn't require to do the
> equivalent of a NOP to keep connections open. :-) But the idea of NOPs
> can be mitigated by adjusting timeouts on stale connections.

That's what I was thinking about too. In fact if the LB's timeout is shorter
than the server's, you can most often kill idle connections before the server,
but there are servers which randomly kill idle connections when they are under
load, in order to reclaim memory, and they are a real problem in this case.

> My understanding was that the loadbalancer actually just used a pool
> of open tcp sessions, and would send the next request (from any of
> it's clients) down the next open tcp connection that wasn't busy. If
> none were free, a new connection was established, which would
> eventually timeout and close naturally. I don't believe it was
> pipelining the requests.
> 
> This would mean that multiple requests from clients A, B, C may go
> down tcp connections X, Y, Z in a 'random' order. (eg: tcp connection
> "X" may have requests from A, B, A, A, C, B)

Oh I precisely see how it works, it has several names from vendor to vendor,
often they call it "connection pooling". Doing a naive implementation is not
hard at all if you don't want to care about errors. The problems start when
you want to add the expected reliability in the process...

In practice, instead of doing an expensive copy, I think that 1) configuring
a maximum number of times a connection can be used, 2) configuring the maximum
duration of a connection and 3) configuring a small idle timeout on a connection
can prevent most of the issues. Then we could also tag some requests "at risk"
and other ones "riskless" and have an option for always renewing a connection
on risked requests. In practice on a web site, most of the requests are images
and a few ones are transactions. You can already lower the load by keeping 95%
of the requests on keep-alive connections.

> Sounds rather chaotic, but actually worked fine.

I believe you that it worked fine. But my concern is not to confirm
after some tests that finally it works fine, but rather to design it
so that it works fine. Unfortunately HTTP doesn't permit it, so there
are tradeoffs to make, and that causes me a real problem you see.

> >> Last time I looked into it, the squid people had made some progress into
> >> it, but hadn't gotten it to successfully proxy.
> 
> After checking, I stand corrected - it looks to be that Squid have a
> working proxy helper application to make ntlm authentication work.

>From my memories, it was an authentication agent to authenticate the
users on the squid server, not to forward auth to the server itself.
But I may be wrong.

> WT> Was it really just an issue with the TCP stack ? maybe there was a 
> firewall
> WT> loaded on the machine ? Maybe IIS was logging connections and not 
> requests,
> WT> so that it almost stopped logging ?
> 
> There was additional security measures on the machines, so yes, I
> should say the stack wasn't fully the issue, but once they got
> disabled in testing, we definitely still had better performance that
> before.

OK.

> WT> It depends a lot on what the server does behind. File serving will not
> WT> change, it's generally I/O bound. However if the server was CPU-bound,
> WT> you might have won something, especially if there was a firewall on
> WT> the server.
> 
> CPU was our main issue - as this was quite a while ago, things have
> since dramatically improved with better offload support in drivers and
> on network cards, plus much profiling been done by OS vendors in their
> kernels with regards to network performance.  So I doubt people would
> get the same level of performance increase these days that we saw back
> then.

Indeed. Not to mention that applications today use more and more resources
because they're written by stacking up piles of crap and sometimes the
network has absolutely no impact at all due to the amount of crap being
executed during a request. I've recently got a java backtrace that was
300 lines long. That means a stack of 300 function calls ! How can anybody
imagine designing something like that and expecting it to work fast (or to
work at all) ? Such crap is more and more common and often done by the same
people who tell you that HTTP is a "network protocol"...

So here too we have less and less to save by doing that.

Regards,
Willy

Re: [ANNOUNCE] haproxy 1.4-dev5 with keep-alive :-)

Reply via email to