Hi, I'm hitting the errorfile and keep-alive behavior described in this post https://www.marc.info/?t=141626324400001&r=1&w=1. A WIP patch is attached that helps one tangential portion of this, and pointers are requested for how to work around the rest.
The high-level summary is that client-connections are terminated once per response directly from HAProxy (e.g. 503, 504, 403), and those responses additionally poison downstream HTTP proxies' ability to maintain client keep-alive. The use-case that I have that makes this an issue is use of a third party as a 'TLS funnel' - We terminate a large number of requests to Akamai via a large number of connections (due to the requests tending to be one per connection), then Akamai proxies those requests to us via a much smaller number of connections that have very aggressive keep-alive settings. This works beautifully for the steady state - What would be many tens of thousands of connections per second requirements for TLS termination becomes low hundreds per second. Additionally, the amount of TLS session key reuse skyrockets, as the number of remote hosts involved drops dramatically. Unfortunately, a problem arises when a high-volume backend becomes unavailable. For every timeout a client hits while waiting in the queue, a 503 is generated by haproxy. This 503, like other responses directly from haproxy, has the following notable attributes: - It causes termination of the socket after the response buffers are cleared - It's HTTP/1.0 - It has an explicit 'Connection: Close' header - It doesn't have a 'Transfer-Encoding: Chunked' header - It doesn't have a 'Content-Length' header The first is, as with the previous list posting, the core of the issue, and immediately causes TLS turnup requirements to go from nominal, to nominal + error rate. Given that error rate can be hundreds of times nominal rate, it has the ability to strain TLS termination resources and negatively impact the entire frontend. The others cause a slightly less obvious side-effect. In the case where there is another upstream full HTTP proxy, client-connection keep-alive can be maintained by dynamically translating the response to HTTP/1.1 if possible, and removing any 'Connection: Close' header. Unfortunately lack of any headers indicating a method of determining when the response is over means that unless the full response is buffered, the upstream proxy can't modify the response in a way that would let the receiving client know when the response was over, breaking the ability to keep-alive the connection. The previous paragraph is something I've had to deal with for backends of HAProxy - Some jetty servers will omit Transfer-Encoding and Content-Length if the request they receive is marked as 'Connection: Close', as hypothetically they are not required. If you want to maintain client keep-alives in this case, while not using server keep-alive, you need to fool the backend with http-pretend-keepalive. So as for solutions to this, I'm thinking down the following paths: * Option 1: Modify the responses such that content-length is apparent, then add an upstream full proxy (such as another haproxy) that can tolerate the connection closes to its server (haproxy) and hide them from the client. This would work as long as the connections between the frontend proxy and the backend HAProxy could be guaranteed to work, but it's slightly kludgy and complicated, and making such a guarantee is hard, even if both proxies are on the same host. * Option 2: Add some ACLs that route traffic to static backends that return errors for all requests Some of the ACL conditions could be useful for this, such as querying for the number of free connection slots available to a request. Unfortunately, it seems like such would not help for clients that have already been queued, so a misbehaving backend could still cause a large number of connection closes. It would lower the worst-case, but I don't think by much. It would also significantly complicate routing. * Option 3: Modification of HAProxy to be more careful about terminating connections and header details that might cause other things to terminate connections. All of the HTTP responses are of static length, so modification to be HTTP/1.1 compliant is fairly easy. This can be mostly accomplished with liberal use of errorfile, as well. The unfortunate part is that rfc2616 section 14.10 specifies "HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message.". My read of this is that, if client keep-alives are off and support is added for maintaining keep-alive during responses returned by HAProxy, the responses must include "Connection: Close" only based on that conditional, which slightly complicates generating the responses. There doesn't appear to be a downside of adding Content-Length and returning HTTP/1.1 responses without removal of Connection: Close, though. A patch is attached which I believe positively impacts this part of the issue. For the 'don't close connections on HTTP error' modification, I'm a bit out of my depth, at least as of yet. It seems like https://github.com/haproxy/haproxy/blob/master/src/proto_http.c#L936-L941 is the relevant bit, but I haven't been able to figure out the correct incantation to effect the necessary change to get the client connection to not get closed. Any feedback on the issue is extremely welcome, workarounds included. Additionally, any pointers on changes required to get client keep-alive support working in proto_http.c would be appreciated. Thanks, Graham
From 01f4003bd29e63b72a4a30b400322ec96f278838 Mon Sep 17 00:00:00 2001 From: Graham Forest <gra...@urbanairship.com> Date: Thu, 12 Nov 2015 11:25:56 -0800 Subject: [PATCH] WIP/MINOR: http: Use HTTP 1.1 for local responses Responses that lack either "Content-Length" or "Transfer-Encoding: chunked" are of ambiguous length. This causes upstream proxies to fail to maintain client keep-alive connections, as responses can only be terminated by connection close if the response body length is unknown. Upstream proxies are capable of removing "Connection: close" without buffering the entire response, so use of HTTP/1.1 and addition of "Content-Length" keeps the lack of keep-alive usage from escaping through to the eventual client. Note that "Connection: close" is being left in because rfc2616 14.10 specifies "HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message". In the event that client keep-alive support gets added to these responses, client connections not expected to be kept alive must still contain the explicit close header. --- src/proto_http.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/src/proto_http.c b/src/proto_http.c index 32b9063..8b123bb 100644 --- a/src/proto_http.c +++ b/src/proto_http.c @@ -110,18 +110,20 @@ const char *HTTP_308 = /* Warning: this one is an sprintf() fmt string, with <realm> as its only argument */ const char *HTTP_401_fmt = - "HTTP/1.0 401 Unauthorized\r\n" + "HTTP/1.1 401 Unauthorized\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 112\r\n" "Content-Type: text/html\r\n" "WWW-Authenticate: Basic realm=\"%s\"\r\n" "\r\n" "<html><body><h1>401 Unauthorized</h1>\nYou need a valid user and password to access this content.\n</body></html>\n"; const char *HTTP_407_fmt = - "HTTP/1.0 407 Unauthorized\r\n" + "HTTP/1.1 407 Unauthorized\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 112\r\n" "Content-Type: text/html\r\n" "Proxy-Authenticate: Basic realm=\"%s\"\r\n" "\r\n" @@ -143,81 +145,91 @@ const int http_err_codes[HTTP_ERR_SIZE] = { static const char *http_err_msgs[HTTP_ERR_SIZE] = { [HTTP_ERR_200] = - "HTTP/1.0 200 OK\r\n" + "HTTP/1.1 200 OK\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 58\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>200 OK</h1>\nService ready.\n</body></html>\n", [HTTP_ERR_400] = - "HTTP/1.0 400 Bad request\r\n" + "HTTP/1.1 400 Bad request\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 90\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n", [HTTP_ERR_403] = - "HTTP/1.0 403 Forbidden\r\n" + "HTTP/1.1 403 Forbidden\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 93\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>403 Forbidden</h1>\nRequest forbidden by administrative rules.\n</body></html>\n", [HTTP_ERR_405] = - "HTTP/1.0 405 Method Not Allowed\r\n" + "HTTP/1.1 405 Method Not Allowed\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 146\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>405 Method Not Allowed</h1>\nA request was made of a resource using a request method not supported by that resource\n</body></html>\n", [HTTP_ERR_408] = - "HTTP/1.0 408 Request Time-out\r\n" + "HTTP/1.1 408 Request Time-out\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 110\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>408 Request Time-out</h1>\nYour browser didn't send a complete request in time.\n</body></html>\n", [HTTP_ERR_429] = - "HTTP/1.0 429 Too Many Requests\r\n" + "HTTP/1.1 429 Too Many Requests\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 117\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>429 Too Many Requests</h1>\nYou have sent too many requests in a given amount of time.\n</body></html>\n", [HTTP_ERR_500] = - "HTTP/1.0 500 Server Error\r\n" + "HTTP/1.1 500 Server Error\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 87\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n", [HTTP_ERR_502] = - "HTTP/1.0 502 Bad Gateway\r\n" + "HTTP/1.1 502 Bad Gateway\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 107\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n", [HTTP_ERR_503] = - "HTTP/1.0 503 Service Unavailable\r\n" + "HTTP/1.1 503 Service Unavailable\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 107\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n", [HTTP_ERR_504] = - "HTTP/1.0 504 Gateway Time-out\r\n" + "HTTP/1.1 504 Gateway Time-out\r\n" "Cache-Control: no-cache\r\n" "Connection: close\r\n" + "Content-Length: 92\r\n" "Content-Type: text/html\r\n" "\r\n" "<html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n", -- 2.4.3