Hi Nathan, On Tue, Mar 30, 2021 at 09:21:30AM -0700, Nathan Konopinski wrote: > Sometimes clients (clients are only http 1.1 and use connection: close) are > reporting a body length of ~4000 is less than the content length of ~14000. > The issue does not appear when using nginx as an LB and I've verified > complete responses are being sent from the backends for the requests > clients report errors on. > > It's not clear why a portion of the clients aren't receiving the entire > response. I'm unable to replicate the issue with curl. I have a vanilla > config using https, prometheus metrics, and a h1-case-adjust-bogus-client > option to adjust a couple headers. > > Has anyone come across similar issues? I see an option for request > buffering but nothing for response buffering. Are there options I can > adjust that could be related to this type of issue?
No it's not expected at all and should really never happen. One option could have caused this to happen, it's "option nolinger" but you don't have it and your config is really clean and straightforward. Could you take a capture of the communications between the clients and haproxy ? The fact that you're using close opens the way for a subtle issue that affects certain old clients with POST requests. Some of them send POST requests with a body, and for now particular reason after half a second to a second, they emit a CRLF that cannot be read as not being part of the current body, and could even happen after the response. If haproxy has already sent the response back (and 14kB perfectly fit in a single buffer so that sounds plausible), closed (since there's the connection: close), and the CRLF from the client arrives *after* the close, then the TCP stack will reset the connection and send a TCP RST back. First this will result in pending data to be dropped. Second, when the client receives it, it can also drop some of its previously received but unread data. You don't necessarily need to decrypt HTTPS to detect this. Simply taking a network capture, looking for RSTs and checking if some non-empty TCP segments flow from the client to haproxy just before the RST would already be an indication. What's nasty if you have to deal with this is that it's totally timing-dependent, and that possible workarounds are just that, workarounds. Regards, Willy