I'll enter in this conversation as I've used (successfully) a load balancer which did server-side keep-alive a while ago.
WT> Hmmm that's different. There are issues with the HTTP protocol WT> itself making this extremely difficult. When you're keeping a WT> connection alive in order to send a second request, you never WT> know if the server will suddenly close or not. If it does, then WT> the client must retransmit the request because only the client WT> knows if it takes a risk to resend or not. An intermediate WT> equipemnt is not allowed to do so because it might send two WT> orders for one request. This might be an architecture based issue and probably depends on the amount of caching/proxying of the request that the load balancer does (ie: holds the full request until server side completes successfully). WT> So by doing what you describe, your clients would regularly get some WT> random server errors when a server closes a connection it does not WT> want to sustain anymore before haproxy has a chance to detect it. Never had any complaints of random server issues that could be attributed to connection issues. But that's probably attributable to the above architectural comment. WT> Another issue is that there are (still) some buggy applications which WT> believe that all the requests from a same session were initiated by WT> the same client. So such a feature must be used with extreme care. We found the biggest culprit is Microsoft's NTLM authentication system. It actually breaks the http spec by authenticating the tcp session, not the individual http requests (except the first one in the tcp session). Last time I looked into it, the squid people had made some progress into it, but hadn't gotten it to successfully proxy. WT> Last, I'd say there is in my opinion little benefit to do that. Where WT> the most time is elapsed is between the client and haproxy. Haproxy WT> and the server are on the same LAN, so a connection setup/teardown WT> here is extremely cheap, as it's where we manage to run at more than WT> 40000 connections per second (including connection setup, send request, WT> receive response and close). That means only 25 microseconds for the WT> whole process which isn't measurable at all by the client and is WT> extremely cheap for the server. When we placed the load balancer in front of our IIS based cluster, we got around a 80-100% (!!) performance improvement immediately. We were estimating around a 25% increase only with our experience with Microsoft's tcp stack. Running against a unix based stack (Solaris & BSD) got us a much more realistic 5-10% improvement. nb: "Improvement" mainly being defined as a reduction in server side processing/load. Actual request speed was about the same. Obviously over the years OS vendors have improved their systems' stacks greatly, but server side keep-alives did work quite well for us in saving server resources, as have the better integration of network stacks and the hardware (chipsets) they use. I doubt that you'd get the same kind of performance improvements we did. Cheers, Ross. --