Hi Ross, first, thanks for bringing your experience here, it's much appreciated.
On Tue, Jan 12, 2010 at 10:27:09AM -0500, Ross West wrote: > > I'll enter in this conversation as I've used (successfully) a load > balancer which did server-side keep-alive a while ago. > > WT> Hmmm that's different. There are issues with the HTTP protocol > WT> itself making this extremely difficult. When you're keeping a > WT> connection alive in order to send a second request, you never > WT> know if the server will suddenly close or not. If it does, then > WT> the client must retransmit the request because only the client > WT> knows if it takes a risk to resend or not. An intermediate > WT> equipemnt is not allowed to do so because it might send two > WT> orders for one request. > > This might be an architecture based issue and probably depends on the > amount of caching/proxying of the request that the load balancer does > (ie: holds the full request until server side completes successfully). It's not only a matter of caching the request to replay it, it is that you're simply not allowed to. I know a guy who ordered a book at a large well-known site. His order was processed twice. Maybe there is something on this site which grants itself the right to replay a user's request when a server connection suddenly closes on keep-alive timeout or count. A cache is different, generally it knows if it can replay a request because it already holds an old response in its cache and does mostly revalidations. But it's not acceptable to replay a request that was completely sent to the server if you're not absolutely certain it's idempotent. And BTW it's not practical either to keep a copy of all requests which are sent to a server, this would slow the LB down a lot. So probably that a reasonable balance can be found but it is clear that from time to time a user will get an error. > WT> So by doing what you describe, your clients would regularly get some > WT> random server errors when a server closes a connection it does not > WT> want to sustain anymore before haproxy has a chance to detect it. > > Never had any complaints of random server issues that could be > attributed to connection issues. But that's probably attributable to > the above architectural comment. Maybe your LB was regularly sending dummy requests on the connections to keep them alive, but since there is no NOP instruction in HTTP, you have to send real work anyway. > WT> Another issue is that there are (still) some buggy applications which > WT> believe that all the requests from a same session were initiated by > WT> the same client. So such a feature must be used with extreme care. > > We found the biggest culprit is Microsoft's NTLM authentication > system. It actually breaks the http spec by authenticating the tcp > session, not the individual http requests (except the first one in the > tcp session). Exactly and I'd say that it's the *only* motivation I have for advancing on keep-alive, because all my experiences with keep-alive on various types of servers have always been negative : mass reservation of idle connections preventing users from connecting under high loads. > Last time I looked into it, the squid people had made some progress into > it, but hadn't gotten it to successfully proxy. I'm not surprized, it's flawed by design. If the server would offer a one time token in the challenge, the user could reuse it with any connection. Unfortunately, they broke HTTP so much that they authenticate TCP as you said. And on such servers you must absolutely not merge multiple users' requests in a single keep-alive session because they'd all work with the first one's credentials. > WT> Last, I'd say there is in my opinion little benefit to do that. Where > WT> the most time is elapsed is between the client and haproxy. Haproxy > WT> and the server are on the same LAN, so a connection setup/teardown > WT> here is extremely cheap, as it's where we manage to run at more than > WT> 40000 connections per second (including connection setup, send request, > WT> receive response and close). That means only 25 microseconds for the > WT> whole process which isn't measurable at all by the client and is > WT> extremely cheap for the server. > > When we placed the load balancer in front of our IIS based cluster, we > got around a 80-100% (!!) performance improvement immediately. We > were estimating around a 25% increase only with our experience with > Microsoft's tcp stack. Was it really just an issue with the TCP stack ? maybe there was a firewall loaded on the machine ? Maybe IIS was logging connections and not requests, so that it almost stopped logging ? > Running against a unix based stack (Solaris & BSD) got us a much more > realistic 5-10% improvement. I agree it's in the range we can expect, connections setups are extremely cheap. The difference is only the extra packet processing. That's why I spent some time working on saving packets on both sides in 1.4, BTW. For instance, FIN packets are merged with the last data, and you can enable tcp-smart-connect and tcp-smart-accept to remove one packet in each direction during an accept(). That's up to 3 packets saved on a 8-10 packets session, that becomes quite noticeable. > nb: "Improvement" mainly being defined as a reduction in server side > processing/load. Actual request speed was about the same. It depends a lot on what the server does behind. File serving will not change, it's generally I/O bound. However if the server was CPU-bound, you might have won something, especially if there was a firewall on the server. > Obviously over the years OS vendors have improved their systems' > stacks greatly, but server side keep-alives did work quite well for > us in saving server resources, as have the better integration of > network stacks and the hardware (chipsets) they use. I doubt that > you'd get the same kind of performance improvements we did. I really doubt about it too ! As I said in my other mail, I'm not against improvements when there's a real gain, but I have a strong fear of people doing stupid things with settings they don't understand who then come here to seek for debugging help with their nasty setup. My first goal is to prevent unskilled people from shooting themselves in the foot, and I can tell you that no documentation helps with that ! Regards, Willy

