I’ll drill into our config a bit, but regardless of the config I’m confused as 
to how an “S” error (TCP connection aborted or refused) could happen in the “Q” 
phase (waiting in queue, presumably no server-side activity yet).

We do not have timeout queue set.  Here are the (sanitized) pieces that seem 
relevant from our config file, specifically defaults and the frontend and 
backend generating these errors.  (There are a couple more services more or 
less like this, a number of SSL interfaces, plus many other proxies which are 
lightly used.  The only global config that seems relevant is that the global 
maxconn is set to 200000.  Also note that there are five servers running 
HAproxy, all ignorant of each other but pointing to the same backends, with DNS 
round robin spreading load between them.)

defaults
  log     global
  mode    http
  option  httplog

  option dontlognull
  option redispatch

  retries     3
  fullconn    4000
  maxconn     200000
  timeout connect  60000
  timeout client  120000
  timeout server  120000
  timeout http-keep-alive 15000

frontend service1
  bind 1.2.3.4:80
  bind 1.2.3.4:81 accept-proxy
  bind-process 1
  default_backend service1

backend service1
  balance source
  hash-type consistent wt6 avalanche
  option forwardfor
  option http-server-close
  option http-pretend-keepalive
  option httplog
  option httpchk GET /healthCheck.htm HTTP/1.1\r\nHost:\ foo.com

  cookie SERVERID insert indirect

  server app1  app1:8080 cookie app1 maxconn 25 maxqueue 5 weight 100 check
  server app2  app2:8080 cookie app2 maxconn 25 maxqueue 5 weight 100 check
  server app3  app3:8080 cookie app3 maxconn 25 maxqueue 5 weight 100 check
… [there are 80 app servers, all configured similarly]



Any ideas on what “SQ” might be telling us would be appreciated.

Thanks,
-Daniel

> On Nov 28, 2014, at 4:09 AM, Baptiste <bed...@gmail.com> wrote:
> 
> On Tue, Nov 25, 2014 at 6:56 AM, Daniel Lieberman
> <dlieber...@bitpusher.com> wrote:
>> We're managing a fairly high-traffic site and we're seeing a lot of HTTP 
>> 503s with termination state SQ.  I'm trying to figure out what this means, 
>> since Q indicates that we're still in the queue (in the server-specific 
>> queue in our case) whereas S indicates a TCP connection issue, which I would 
>> have thought meant that we'd made it into a connection slot.
>> 
>> Here's a sanitized version of a sample log line:
>> Nov 24 03:58:54 lb03 haproxy[20966]: 71.72.73.74:60738 
>> [24/Nov/2014:03:58:54.245] foo3 foo/app53 0/-1/-1/-1/0 503 212 - - SQVN 
>> 159/131/132/25/0 0/0 "PUT /foo/bar/baz HTTP/1.1"
>> 
>> The server maxconn is set to 25, and this shows that the server has 25 
>> connections already, so it makes sense that we're in the queue.  We do have 
>> retries set (to 3) and redispatch on, but neither appears to be happening 
>> here.
>> 
>> We're running HAproxy 1.5.4 on CentOS 6.5.
>> 
>> (Note that we know that the app servers are having issues.  Our main goals 
>> here are to learn what we can about the app server behavior, and to figure 
>> out what we can do in HAproxy to mitigate the issues until the developers 
>> manage to fix them.)
>> 
>> Thanks,
>> -Daniel
>> 
>> Daniel Lieberman
>> BitPusher, LLC
> 
> Hi Daniel
> 
> Sharing your configuration may help :)
> 
> Have you set up a timeout queue? (if not, what's your timeout connect value?)
> redispatch and retries are only used when HAProxy tries to establish
> connections to the server. Here, you were not even in this phase.
> 
> Baptiste


Reply via email to