On Thu, Oct 8, 2015 at 11:51 AM, Igor Cicimov <
ig...@encompasscorporation.com> wrote:

>
>
> On Thu, Oct 8, 2015 at 12:18 AM, Dmitry Sivachenko <trtrmi...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am using haproxy-1.5.14 and sometimes I see the following errors in the
>> log:
>>
>> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428]
>> MT-front MT_RU_EN-back/<NOSRV> 0/1000/-1/-1/1000 503 212 - - sQ--
>> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
>> (many similar at one moment)
>>
>> Common part in these errors is "1000" in Tw and Tt, and "sQ--"
>> termination state.
>>
>> Here is the relevant part on my config (I can post more if needed):
>>
>> defaults
>>     balance roundrobin
>>     maxconn 10000
>>     timeout queue 1s
>>     fullconn 3000
>>     default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1
>> slowstart 60s maxqueue 1 minconn 5 maxconn 150
>>
>> backend MT_RU_EN-back
>>     mode http
>>     timeout server 30s
>>     server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
>>     server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
>>     <total 18 of similar servers>
>>
>> So this error log indicates that request was sitting in the queue for
>> timeout queue==1s and his turn did not come.
>>
>> In the stats web interface for MT_RU_EN-back backend I see the following
>> numbers:
>>
>> Sessions: limit=3000, max=126 (for the whole backend)
>> Limit=150, max=5 or 6 (for each server)
>>
>> If I understand minconn/maxconn meaning right, each server should accept
>> up to min(150, 3000/18) connections
>>
>> So according to stats the load were far from limits.
>>
>> What can be the cause of such errors?
>>
>> Thanks!
>>
>
> The only thing I can think of is you have left net.core.somaxconn = 128,
> try increasing it to 4096 lets say to match your planned capacity of 3000
>
>
     sQ   The session spent too much time in queue and has been expired. See
          the "timeout queue" and "timeout connect" settings to find out
how to
          fix this if it happens too often. If it often happens massively in
          short periods, it may indicate general problems on the affected
          servers due to I/O or database congestion, or saturation caused by
          external attacks.

another possibility to investigate. If the backends are too slow, or maybe
the connection is delayed by a firewall in the middle or something, then
maybe tuning the "connect timeout" may help:

If the server is located on the same LAN as haproxy, the connection should be
immediate (less than a few milliseconds). Anyway, it is a good practice to
cover one or several TCP packet losses by specifying timeouts that are
slightly above multiples of 3 seconds (eg: 4 or 5 seconds). By default, the
connect timeout also presets both queue and tarpit timeouts to the same value
if these have not been specified.

Reply via email to