> On 13 окт. 2015 г., at 18:47, Willy Tarreau <[email protected]> wrote:
> 
> Hi Dmitry,
> 
> sorry for the delay, I really didn't have time to analyse the config
> you sent me.
> 
> A few points below :
> 
> On Wed, Oct 07, 2015 at 04:18:20PM +0300, Dmitry Sivachenko wrote:
>> Oct  7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] 
>> MT-front MT_RU_EN-back/<NOSRV> 0/1000/-1/-1/1000 503 212 - - sQ-- 
>> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1"
>> (many similar at one moment)
>> 
>> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination 
>> state.
>> 
>> Here is the relevant part on my config (I can post more if needed):
>> 
>> defaults
>>    balance roundrobin
>>    maxconn 10000
>>    timeout queue 1s
>>    fullconn 3000
>>    default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 
>> slowstart 60s maxqueue 1 minconn 5 maxconn 150
>> 
>> backend MT_RU_EN-back
>>    mode http
>>    timeout server 30s
>>    server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38
>>    server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38
>>    <total 18 of similar servers>
>> 
>> So this error log indicates that request was sitting in the queue for 
>> timeout queue==1s and his turn did not come.
>> 
>> In the stats web interface for MT_RU_EN-back backend I see the following 
>> numbers:
>> 
>> Sessions: limit=3000, max=126 (for the whole backend)
>> Limit=150, max=5 or 6 (for each server)
>> 
>> If I understand minconn/maxconn meaning right, each server should accept up 
>> to min(150, 3000/18) connections
>> 
>> So according to stats the load were far from limits.
> 
> No, look, the log says there were 108 connections on the backend. This
> is important since you're using minconn so you're using dynamic queueing.
> This means that the effective limit when handling this request was around
> maxconn*currconn/fullconn, which is 150*108/3000 = 5.4 so the limit was
> at 5 connections. Thus the limit for this server was indeed reached.
> 
> Playing with minconn and fullconn is hard and strongly advised against,
> unless you know exactly how to tune it. You must always ensure that a
> normal load will be handled without queuing (or with a very small queue),
> and that maxconn will quickly be reached to handle high traffic. I tend to
> consider that an efficient fullconn is around 10% of the maximum load the
> farm may have to deal with (which is the default value IIRC). Regarding
> minconn, it's interesting not to set it too low. A good rule of thumb is
> to estimate what would happen at 10% of fullconn (1% of the max load).
> In your case, at 300 concurrent connections, your servers will accept
> 15 connections each. I have no idea whether this is enough or not to
> handle the load. But let's say you have 4 servers, that's only 60
> concurrent connections to process 300 front connections. While it can
> be perfectly fine, you may need to increase the queue timeout so that
> the requests can wait long enough for a slot. With a 5:1 overbooking and
> your 1s queue timeout, that means you expect that the server's average
> response time will not go above 200ms. That may be a bit short for some
> applications, especially those sensitive to connection count.
> 
> Thus I'd suggest that you either lower fullconn or increase minconn, and
> in any case that you also increase the queue timeout to cover the worst
> overbooking situation with the average server's response time.
> 
> During the tuning phase, I'd suggest to *significantly* increase the queue
> timeout so that you can observe the connection counts and even the average
> response time per connection count, that will help you refine the tuning.
> 


Thanks for the explanation, looks like I misunderstand minconn/maxconn logic.


Reply via email to