HAProxy returned errors, keep-alive, Content-Length

Graham Forest Thu, 12 Nov 2015 13:02:03 -0800

Hi,

I'm hitting the errorfile and keep-alive behavior described in this post
https://www.marc.info/?t=141626324400001&r=1&w=1. A WIP patch is attached
that helps one tangential portion of this, and pointers are requested for
how to work around the rest.


The high-level summary is that client-connections are terminated once per
response directly from HAProxy (e.g. 503, 504, 403), and those responses
additionally poison downstream HTTP proxies' ability to maintain client
keep-alive.

The use-case that I have that makes this an issue is use of a third party
as a 'TLS funnel' - We terminate a large number of requests to Akamai via a
large number of connections (due to the requests tending to be one per
connection), then Akamai proxies those requests to us via a much smaller
number of connections that have very aggressive keep-alive settings.

This works beautifully for the steady state - What would be many tens of
thousands of connections per second requirements for TLS termination
becomes low hundreds per second. Additionally, the amount of TLS session
key reuse skyrockets, as the number of remote hosts involved drops
dramatically.

Unfortunately, a problem arises when a high-volume backend becomes
unavailable. For every timeout a client hits while waiting in the queue, a
503 is generated by haproxy. This 503, like other responses directly from
haproxy, has the following notable attributes:

- It causes termination of the socket after the response buffers are cleared
- It's HTTP/1.0
- It has an explicit 'Connection: Close' header
- It doesn't have a 'Transfer-Encoding: Chunked' header
- It doesn't have a 'Content-Length' header

The first is, as with the previous list posting, the core of the issue, and
immediately causes TLS turnup requirements to go from nominal, to nominal +
error rate. Given that error rate can be hundreds of times nominal rate, it
has the ability to strain TLS termination resources and negatively impact
the entire frontend.

The others cause a slightly less obvious side-effect. In the case where
there is another upstream full HTTP proxy, client-connection keep-alive can
be maintained by dynamically translating the response to HTTP/1.1 if
possible, and removing any 'Connection: Close' header. Unfortunately lack
of any headers indicating a method of determining when the response is over
means that unless the full response is buffered, the upstream proxy can't
modify the response in a way that would let the receiving client know when
the response was over, breaking the ability to keep-alive the connection.

The previous paragraph is something I've had to deal with for backends of
HAProxy - Some jetty servers will omit Transfer-Encoding and Content-Length
if the request they receive is marked as 'Connection: Close', as
hypothetically they are not required. If you want to maintain client
keep-alives in this case, while not using server keep-alive, you need to
fool the backend with http-pretend-keepalive.

So as for solutions to this, I'm thinking down the following paths:


* Option 1: Modify the responses such that content-length is apparent, then
add an upstream full proxy (such as another haproxy) that can tolerate the
connection closes to its server (haproxy) and hide them from the client.

This would work as long as the connections between the frontend proxy and
the backend HAProxy could be guaranteed to work, but it's slightly kludgy
and complicated, and making such a guarantee is hard, even if both proxies
are on the same host.

* Option 2: Add some ACLs that route traffic to static backends that return
errors for all requests

Some of the ACL conditions could be useful for this, such as querying for
the number of free connection slots available to a request. Unfortunately,
it seems like such would not help for clients that have already been
queued, so a misbehaving backend could still cause a large number of
connection closes. It would lower the worst-case, but I don't think by
much. It would also significantly complicate routing.

* Option 3: Modification of HAProxy to be more careful about terminating
connections and header details that might cause other things to terminate
connections.

All of the HTTP responses are of static length, so modification to be
HTTP/1.1 compliant is fairly easy. This can be mostly accomplished with
liberal use of errorfile, as well. The unfortunate part is that rfc2616
section 14.10 specifies "HTTP/1.1 applications that do not support
persistent connections MUST include the "close" connection option in every
message.". My read of this is that, if client keep-alives are off and
support is added for maintaining keep-alive during responses returned by
HAProxy, the responses must include "Connection: Close" only based on that
conditional, which slightly complicates generating the responses. There
doesn't appear to be a downside of adding Content-Length and returning
HTTP/1.1 responses without removal of Connection: Close, though. A patch is
attached which I believe positively impacts this part of the issue.

For the 'don't close connections on HTTP error' modification, I'm a bit out
of my depth, at least as of yet. It seems like
https://github.com/haproxy/haproxy/blob/master/src/proto_http.c#L936-L941 is
the relevant bit, but I haven't been able to figure out the correct
incantation to effect the necessary change to get the client connection to
not get closed.


Any feedback on the issue is extremely welcome, workarounds included.
Additionally, any pointers on changes required to get client keep-alive
support working in proto_http.c would be appreciated.

Thanks,
Graham

From 01f4003bd29e63b72a4a30b400322ec96f278838 Mon Sep 17 00:00:00 2001
From: Graham Forest <gra...@urbanairship.com>
Date: Thu, 12 Nov 2015 11:25:56 -0800
Subject: [PATCH] WIP/MINOR: http: Use HTTP 1.1 for local responses

Responses that lack either "Content-Length" or "Transfer-Encoding:
chunked" are of ambiguous length. This causes upstream proxies to
fail to maintain client keep-alive connections, as responses can only be
terminated by connection close if the response body length is unknown.

Upstream proxies are capable of removing "Connection: close" without
buffering the entire response, so use of HTTP/1.1 and addition of
"Content-Length" keeps the lack of keep-alive usage from escaping
through to the eventual client.

Note that "Connection: close" is being left in because rfc2616 14.10
specifies "HTTP/1.1 applications that do not support persistent
connections MUST include the "close" connection option in every
message". In the event that client keep-alive support gets added to
these responses, client connections not expected to be kept alive must
still contain the explicit close header.
---
 src/proto_http.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/proto_http.c b/src/proto_http.c
index 32b9063..8b123bb 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -110,18 +110,20 @@ const char *HTTP_308 =
 
 /* Warning: this one is an sprintf() fmt string, with <realm> as its only argument */
 const char *HTTP_401_fmt =
-	"HTTP/1.0 401 Unauthorized\r\n"
+	"HTTP/1.1 401 Unauthorized\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 112\r\n"
 	"Content-Type: text/html\r\n"
 	"WWW-Authenticate: Basic realm=\"%s\"\r\n"
 	"\r\n"
 	"<html><body><h1>401 Unauthorized</h1>\nYou need a valid user and password to access this content.\n</body></html>\n";
 
 const char *HTTP_407_fmt =
-	"HTTP/1.0 407 Unauthorized\r\n"
+	"HTTP/1.1 407 Unauthorized\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 112\r\n"
 	"Content-Type: text/html\r\n"
 	"Proxy-Authenticate: Basic realm=\"%s\"\r\n"
 	"\r\n"
@@ -143,81 +145,91 @@ const int http_err_codes[HTTP_ERR_SIZE] = {
 
 static const char *http_err_msgs[HTTP_ERR_SIZE] = {
 	[HTTP_ERR_200] =
-	"HTTP/1.0 200 OK\r\n"
+	"HTTP/1.1 200 OK\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 58\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>200 OK</h1>\nService ready.\n</body></html>\n",
 
 	[HTTP_ERR_400] =
-	"HTTP/1.0 400 Bad request\r\n"
+	"HTTP/1.1 400 Bad request\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 90\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>400 Bad request</h1>\nYour browser sent an invalid request.\n</body></html>\n",
 
 	[HTTP_ERR_403] =
-	"HTTP/1.0 403 Forbidden\r\n"
+	"HTTP/1.1 403 Forbidden\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 93\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>403 Forbidden</h1>\nRequest forbidden by administrative rules.\n</body></html>\n",
 
 	[HTTP_ERR_405] =
-	"HTTP/1.0 405 Method Not Allowed\r\n"
+	"HTTP/1.1 405 Method Not Allowed\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 146\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>405 Method Not Allowed</h1>\nA request was made of a resource using a request method not supported by that resource\n</body></html>\n",
 
 	[HTTP_ERR_408] =
-	"HTTP/1.0 408 Request Time-out\r\n"
+	"HTTP/1.1 408 Request Time-out\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 110\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>408 Request Time-out</h1>\nYour browser didn't send a complete request in time.\n</body></html>\n",
 
 	[HTTP_ERR_429] =
-	"HTTP/1.0 429 Too Many Requests\r\n"
+	"HTTP/1.1 429 Too Many Requests\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 117\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>429 Too Many Requests</h1>\nYou have sent too many requests in a given amount of time.\n</body></html>\n",
 
 	[HTTP_ERR_500] =
-	"HTTP/1.0 500 Server Error\r\n"
+	"HTTP/1.1 500 Server Error\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 87\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>500 Server Error</h1>\nAn internal server error occured.\n</body></html>\n",
 
 	[HTTP_ERR_502] =
-	"HTTP/1.0 502 Bad Gateway\r\n"
+	"HTTP/1.1 502 Bad Gateway\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 107\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>502 Bad Gateway</h1>\nThe server returned an invalid or incomplete response.\n</body></html>\n",
 
 	[HTTP_ERR_503] =
-	"HTTP/1.0 503 Service Unavailable\r\n"
+	"HTTP/1.1 503 Service Unavailable\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 107\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n",
 
 	[HTTP_ERR_504] =
-	"HTTP/1.0 504 Gateway Time-out\r\n"
+	"HTTP/1.1 504 Gateway Time-out\r\n"
 	"Cache-Control: no-cache\r\n"
 	"Connection: close\r\n"
+	"Content-Length: 92\r\n"
 	"Content-Type: text/html\r\n"
 	"\r\n"
 	"<html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n",
-- 
2.4.3

HAProxy returned errors, keep-alive, Content-Length

Reply via email to