I’ll drill into our config a bit, but regardless of the config I’m confused as to how an “S” error (TCP connection aborted or refused) could happen in the “Q” phase (waiting in queue, presumably no server-side activity yet).
We do not have timeout queue set. Here are the (sanitized) pieces that seem relevant from our config file, specifically defaults and the frontend and backend generating these errors. (There are a couple more services more or less like this, a number of SSL interfaces, plus many other proxies which are lightly used. The only global config that seems relevant is that the global maxconn is set to 200000. Also note that there are five servers running HAproxy, all ignorant of each other but pointing to the same backends, with DNS round robin spreading load between them.) defaults log global mode http option httplog option dontlognull option redispatch retries 3 fullconn 4000 maxconn 200000 timeout connect 60000 timeout client 120000 timeout server 120000 timeout http-keep-alive 15000 frontend service1 bind 1.2.3.4:80 bind 1.2.3.4:81 accept-proxy bind-process 1 default_backend service1 backend service1 balance source hash-type consistent wt6 avalanche option forwardfor option http-server-close option http-pretend-keepalive option httplog option httpchk GET /healthCheck.htm HTTP/1.1\r\nHost:\ foo.com cookie SERVERID insert indirect server app1 app1:8080 cookie app1 maxconn 25 maxqueue 5 weight 100 check server app2 app2:8080 cookie app2 maxconn 25 maxqueue 5 weight 100 check server app3 app3:8080 cookie app3 maxconn 25 maxqueue 5 weight 100 check … [there are 80 app servers, all configured similarly] Any ideas on what “SQ” might be telling us would be appreciated. Thanks, -Daniel > On Nov 28, 2014, at 4:09 AM, Baptiste <bed...@gmail.com> wrote: > > On Tue, Nov 25, 2014 at 6:56 AM, Daniel Lieberman > <dlieber...@bitpusher.com> wrote: >> We're managing a fairly high-traffic site and we're seeing a lot of HTTP >> 503s with termination state SQ. I'm trying to figure out what this means, >> since Q indicates that we're still in the queue (in the server-specific >> queue in our case) whereas S indicates a TCP connection issue, which I would >> have thought meant that we'd made it into a connection slot. >> >> Here's a sanitized version of a sample log line: >> Nov 24 03:58:54 lb03 haproxy[20966]: 71.72.73.74:60738 >> [24/Nov/2014:03:58:54.245] foo3 foo/app53 0/-1/-1/-1/0 503 212 - - SQVN >> 159/131/132/25/0 0/0 "PUT /foo/bar/baz HTTP/1.1" >> >> The server maxconn is set to 25, and this shows that the server has 25 >> connections already, so it makes sense that we're in the queue. We do have >> retries set (to 3) and redispatch on, but neither appears to be happening >> here. >> >> We're running HAproxy 1.5.4 on CentOS 6.5. >> >> (Note that we know that the app servers are having issues. Our main goals >> here are to learn what we can about the app server behavior, and to figure >> out what we can do in HAproxy to mitigate the issues until the developers >> manage to fix them.) >> >> Thanks, >> -Daniel >> >> Daniel Lieberman >> BitPusher, LLC > > Hi Daniel > > Sharing your configuration may help :) > > Have you set up a timeout queue? (if not, what's your timeout connect value?) > redispatch and retries are only used when HAProxy tries to establish > connections to the server. Here, you were not even in this phase. > > Baptiste