Re: About maxconn and minconn

2015-10-13 Thread Willy Tarreau
Hi Dmitry,

sorry for the delay, I really didn't have time to analyse the config
you sent me.

A few points below :

On Wed, Oct 07, 2015 at 04:18:20PM +0300, Dmitry Sivachenko wrote:
> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] 
> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- 
> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
> (many similar at one moment)
> 
> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination 
> state.
> 
> Here is the relevant part on my config (I can post more if needed):
> 
> defaults
> balance roundrobin
> maxconn 1
> timeout queue 1s
> fullconn 3000
> default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 
> slowstart 60s maxqueue 1 minconn 5 maxconn 150
> 
> backend MT_RU_EN-back
> mode http
> timeout server 30s
> server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
> server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
> 
> 
> So this error log indicates that request was sitting in the queue for timeout 
> queue==1s and his turn did not come.
> 
> In the stats web interface for MT_RU_EN-back backend I see the following 
> numbers:
> 
> Sessions: limit=3000, max=126 (for the whole backend)
> Limit=150, max=5 or 6 (for each server)
> 
> If I understand minconn/maxconn meaning right, each server should accept up 
> to min(150, 3000/18) connections
> 
> So according to stats the load were far from limits.

No, look, the log says there were 108 connections on the backend. This
is important since you're using minconn so you're using dynamic queueing.
This means that the effective limit when handling this request was around
maxconn*currconn/fullconn, which is 150*108/3000 = 5.4 so the limit was
at 5 connections. Thus the limit for this server was indeed reached.

Playing with minconn and fullconn is hard and strongly advised against,
unless you know exactly how to tune it. You must always ensure that a
normal load will be handled without queuing (or with a very small queue),
and that maxconn will quickly be reached to handle high traffic. I tend to
consider that an efficient fullconn is around 10% of the maximum load the
farm may have to deal with (which is the default value IIRC). Regarding
minconn, it's interesting not to set it too low. A good rule of thumb is
to estimate what would happen at 10% of fullconn (1% of the max load).
In your case, at 300 concurrent connections, your servers will accept
15 connections each. I have no idea whether this is enough or not to
handle the load. But let's say you have 4 servers, that's only 60
concurrent connections to process 300 front connections. While it can
be perfectly fine, you may need to increase the queue timeout so that
the requests can wait long enough for a slot. With a 5:1 overbooking and
your 1s queue timeout, that means you expect that the server's average
response time will not go above 200ms. That may be a bit short for some
applications, especially those sensitive to connection count.

Thus I'd suggest that you either lower fullconn or increase minconn, and
in any case that you also increase the queue timeout to cover the worst
overbooking situation with the average server's response time.

During the tuning phase, I'd suggest to *significantly* increase the queue
timeout so that you can observe the connection counts and even the average
response time per connection count, that will help you refine the tuning.

Hoping this helps,
Willy




Re: About maxconn and minconn

2015-10-08 Thread Baptiste
Hi Dmitry,

It says what it says: you configured HAProxy to manage queue to
protect your servers. during your workload, a request remain in queue
for too long (1s) so HAProxy simply return an error.

Now the question is why such situation. Simply because your queue
management is improperly setup (either increase minconn and or
decrease fullconn) and combined to a server which might be quite slow
to answer leading HAProxy to use queues.

Or you met a bug :)

We need the full configuration and log lines around the sQ event
(right before and right after), so we may help.

Baptiste




On Wed, Oct 7, 2015 at 3:18 PM, Dmitry Sivachenko  wrote:
> Hello,
>
> I am using haproxy-1.5.14 and sometimes I see the following errors in the log:
>
> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] 
> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- 
> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
> (many similar at one moment)
>
> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination 
> state.
>
> Here is the relevant part on my config (I can post more if needed):
>
> defaults
> balance roundrobin
> maxconn 1
> timeout queue 1s
> fullconn 3000
> default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 
> slowstart 60s maxqueue 1 minconn 5 maxconn 150
>
> backend MT_RU_EN-back
> mode http
> timeout server 30s
> server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
> server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
> 
>
> So this error log indicates that request was sitting in the queue for timeout 
> queue==1s and his turn did not come.
>
> In the stats web interface for MT_RU_EN-back backend I see the following 
> numbers:
>
> Sessions: limit=3000, max=126 (for the whole backend)
> Limit=150, max=5 or 6 (for each server)
>
> If I understand minconn/maxconn meaning right, each server should accept up 
> to min(150, 3000/18) connections
>
> So according to stats the load were far from limits.
>
> What can be the cause of such errors?
>
> Thanks!



Re: About maxconn and minconn

2015-10-08 Thread Dmitry Sivachenko

> On 7 окт. 2015 г., at 16:18, Dmitry Sivachenko  wrote:
> 
> Hello,
> 
> I am using haproxy-1.5.14 and sometimes I see the following errors in the log:
> 
> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] 
> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- 
> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
> (many similar at one moment)
> 
> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination 
> state.
> 
> Here is the relevant part on my config (I can post more if needed):
> 
> defaults
>balance roundrobin
>maxconn 1
>timeout queue 1s
>fullconn 3000
>default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 
> slowstart 60s maxqueue 1 minconn 5 maxconn 150
> 
> backend MT_RU_EN-back
>mode http
>timeout server 30s
>server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
>server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
>
> 
> So this error log indicates that request was sitting in the queue for timeout 
> queue==1s and his turn did not come.
> 
> In the stats web interface for MT_RU_EN-back backend I see the following 
> numbers:
> 
> Sessions: limit=3000, max=126 (for the whole backend)
> Limit=150, max=5 or 6 (for each server)


I also forgot to mention the "Queue" values from stats web-interface:
Queue max = 0 for all servers
Queue limit = 1 for all servers (as configured in default-server)
So according to stats queue was never used.


Right under the servers list, there is a "Backend" line, which has the value of 
"29" in "Queue Max" column.
What does it mean?


> 
> If I understand minconn/maxconn meaning right, each server should accept up 
> to min(150, 3000/18) connections
> 
> So according to stats the load were far from limits.
> 
> What can be the cause of such errors?
> 
> Thanks!




Re: About maxconn and minconn

2015-10-08 Thread Dmitry Sivachenko

> On 8 окт. 2015 г., at 11:03, Baptiste  wrote:
> 
> Hi Dmitry,
> 
> 
> 
> Now the question is why such situation. Simply because your queue
> management is improperly setup (either increase minconn and or
> decrease fullconn) and combined to a server which might be quite slow
> to answer leading HAProxy to use queues.
> 

What do you mean "improperly setup"?  From the stats I provided I got an 
impression that no limits were reached for request to get into the waiting 
queue.

Or am I wrong?

(I will send you full config and logs in private soon)




Re: About maxconn and minconn

2015-10-08 Thread Daren Sefcik
How about kern.ipc.somaxconn

On Thu, Oct 8, 2015 at 5:22 AM, Dmitry Sivachenko 
wrote:

>
> > On 8 окт. 2015 г., at 3:51, Igor Cicimov 
> wrote:
> >
> >
> > The only thing I can think of is you have left net.core.somaxconn = 128,
> try increasing it to 4096 lets say to match your planned capacity of 3000
> >
>
>
> I forgot to mention that I am using FreeBSD, I don't think it has similar
> sysctl.
>


Re: About maxconn and minconn

2015-10-08 Thread Dmitry Sivachenko

> On 8 окт. 2015 г., at 3:51, Igor Cicimov  
> wrote:
> 
> 
> The only thing I can think of is you have left net.core.somaxconn = 128, try 
> increasing it to 4096 lets say to match your planned capacity of 3000
> 


I forgot to mention that I am using FreeBSD, I don't think it has similar 
sysctl.


Re: About maxconn and minconn

2015-10-08 Thread Dmitry Sivachenko

> On 8 окт. 2015 г., at 15:30, Daren Sefcik  wrote:
> 
> How about kern.ipc.somaxconn


I have this set to 4096, and when it overflows it prints a line in the log 
(Listen queue overflow...)

I have no these errors in logs.

Moreover, connections sitting in socket accept queue are not seen by haproxy 
and haproxy can't count this time and trigger timeouts.



> 
> On Thu, Oct 8, 2015 at 5:22 AM, Dmitry Sivachenko  wrote:
> 
> > On 8 окт. 2015 г., at 3:51, Igor Cicimov  
> > wrote:
> >
> >
> > The only thing I can think of is you have left net.core.somaxconn = 128, 
> > try increasing it to 4096 lets say to match your planned capacity of 3000
> >
> 
> 
> I forgot to mention that I am using FreeBSD, I don't think it has similar 
> sysctl.
> 




Re: About maxconn and minconn

2015-10-08 Thread Igor Cicimov
On Thu, Oct 8, 2015 at 7:15 PM, Dmitry Sivachenko 
wrote:

>
> > On 7 окт. 2015 г., at 16:18, Dmitry Sivachenko 
> wrote:
> >
> > Hello,
> >
> > I am using haproxy-1.5.14 and sometimes I see the following errors in
> the log:
> >
> > Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428]
> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ--
> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
> > (many similar at one moment)
> >
> > Common part in these errors is "1000" in Tw and Tt, and "sQ--"
> termination state.
> >
> > Here is the relevant part on my config (I can post more if needed):
> >
> > defaults
> >balance roundrobin
> >maxconn 1
> >timeout queue 1s
> >fullconn 3000
> >default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1
> slowstart 60s maxqueue 1 minconn 5 maxconn 150
> >
> > backend MT_RU_EN-back
> >mode http
> >timeout server 30s
> >server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
> >server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
> >
> >
> > So this error log indicates that request was sitting in the queue for
> timeout queue==1s and his turn did not come.
> >
> > In the stats web interface for MT_RU_EN-back backend I see the following
> numbers:
> >
> > Sessions: limit=3000, max=126 (for the whole backend)
> > Limit=150, max=5 or 6 (for each server)
>
>
> I also forgot to mention the "Queue" values from stats web-interface:
> Queue max = 0 for all servers
> Queue limit = 1 for all servers (as configured in default-server)
> So according to stats queue was never used.
>
>
> Right under the servers list, there is a "Backend" line, which has the
> value of "29" in "Queue Max" column.
> What does it mean?
>
>
Well that means you had up to 29 requests in the backend queue waiting for
connection. In my case I have never seen this queue be more then 0 on the
backend or any of the backend servers for that matter. Also the queue limit
per server is 128 not 1 (I think you confuse queue limit with queue timeout
which you have set to 1 sec indeed).

So, as mentioned before, and pointed by Baptiste, your servers are not that
fast as you expect them to be, ie you have set your queues size and timeout
too low. First, is haproxy on the same LAN segment as the backend servers?
For example what is the value of the LastChk column, it should be ms
(milliseconds) if your servers are close to haproxy and not under big load.

If I were in your shoes I would:

- drop the fullconn setting and let haproxy do the math for me
- definitely increase the queue timeout to more than 1 sec (why would you
risk loosing messages, except if you are short on ram)
- set connect timeout as per the excerpt I sent previously

and see how I go.


>
> >
> > If I understand minconn/maxconn meaning right, each server should accept
> up to min(150, 3000/18) connections
> >
> > So according to stats the load were far from limits.
> >
> > What can be the cause of such errors?
> >
> > Thanks!
>
>
>


Re: About maxconn and minconn

2015-10-07 Thread Igor Cicimov
On Thu, Oct 8, 2015 at 11:51 AM, Igor Cicimov <
ig...@encompasscorporation.com> wrote:

>
>
> On Thu, Oct 8, 2015 at 12:18 AM, Dmitry Sivachenko 
> wrote:
>
>> Hello,
>>
>> I am using haproxy-1.5.14 and sometimes I see the following errors in the
>> log:
>>
>> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428]
>> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ--
>> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
>> (many similar at one moment)
>>
>> Common part in these errors is "1000" in Tw and Tt, and "sQ--"
>> termination state.
>>
>> Here is the relevant part on my config (I can post more if needed):
>>
>> defaults
>> balance roundrobin
>> maxconn 1
>> timeout queue 1s
>> fullconn 3000
>> default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1
>> slowstart 60s maxqueue 1 minconn 5 maxconn 150
>>
>> backend MT_RU_EN-back
>> mode http
>> timeout server 30s
>> server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
>> server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
>> 
>>
>> So this error log indicates that request was sitting in the queue for
>> timeout queue==1s and his turn did not come.
>>
>> In the stats web interface for MT_RU_EN-back backend I see the following
>> numbers:
>>
>> Sessions: limit=3000, max=126 (for the whole backend)
>> Limit=150, max=5 or 6 (for each server)
>>
>> If I understand minconn/maxconn meaning right, each server should accept
>> up to min(150, 3000/18) connections
>>
>> So according to stats the load were far from limits.
>>
>> What can be the cause of such errors?
>>
>> Thanks!
>>
>
> The only thing I can think of is you have left net.core.somaxconn = 128,
> try increasing it to 4096 lets say to match your planned capacity of 3000
>
>
 sQ   The session spent too much time in queue and has been expired. See
  the "timeout queue" and "timeout connect" settings to find out
how to
  fix this if it happens too often. If it often happens massively in
  short periods, it may indicate general problems on the affected
  servers due to I/O or database congestion, or saturation caused by
  external attacks.

another possibility to investigate. If the backends are too slow, or maybe
the connection is delayed by a firewall in the middle or something, then
maybe tuning the "connect timeout" may help:

If the server is located on the same LAN as haproxy, the connection should be
immediate (less than a few milliseconds). Anyway, it is a good practice to
cover one or several TCP packet losses by specifying timeouts that are
slightly above multiples of 3 seconds (eg: 4 or 5 seconds). By default, the
connect timeout also presets both queue and tarpit timeouts to the same value
if these have not been specified.


Re: About maxconn and minconn

2015-10-07 Thread Igor Cicimov
On Thu, Oct 8, 2015 at 12:18 AM, Dmitry Sivachenko 
wrote:

> Hello,
>
> I am using haproxy-1.5.14 and sometimes I see the following errors in the
> log:
>
> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428]
> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ--
> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
> (many similar at one moment)
>
> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination
> state.
>
> Here is the relevant part on my config (I can post more if needed):
>
> defaults
> balance roundrobin
> maxconn 1
> timeout queue 1s
> fullconn 3000
> default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1
> slowstart 60s maxqueue 1 minconn 5 maxconn 150
>
> backend MT_RU_EN-back
> mode http
> timeout server 30s
> server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
> server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
> 
>
> So this error log indicates that request was sitting in the queue for
> timeout queue==1s and his turn did not come.
>
> In the stats web interface for MT_RU_EN-back backend I see the following
> numbers:
>
> Sessions: limit=3000, max=126 (for the whole backend)
> Limit=150, max=5 or 6 (for each server)
>
> If I understand minconn/maxconn meaning right, each server should accept
> up to min(150, 3000/18) connections
>
> So according to stats the load were far from limits.
>
> What can be the cause of such errors?
>
> Thanks!
>

The only thing I can think of is you have left net.core.somaxconn = 128,
try increasing it to 4096 lets say to match your planned capacity of 3000


About maxconn and minconn

2015-10-07 Thread Dmitry Sivachenko
Hello,

I am using haproxy-1.5.14 and sometimes I see the following errors in the log:

Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] MT-front 
MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- 125/124/108/0/0 0/28 
"POST /some/url HTTP/1.1"
(many similar at one moment)

Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination 
state.

Here is the relevant part on my config (I can post more if needed):

defaults
balance roundrobin
maxconn 1
timeout queue 1s
fullconn 3000
default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 
slowstart 60s maxqueue 1 minconn 5 maxconn 150

backend MT_RU_EN-back
mode http
timeout server 30s
server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38


So this error log indicates that request was sitting in the queue for timeout 
queue==1s and his turn did not come.

In the stats web interface for MT_RU_EN-back backend I see the following 
numbers:

Sessions: limit=3000, max=126 (for the whole backend)
Limit=150, max=5 or 6 (for each server)

If I understand minconn/maxconn meaning right, each server should accept up to 
min(150, 3000/18) connections

So according to stats the load were far from limits.

What can be the cause of such errors?

Thanks!