Re: About maxconn and minconn
> On 13 окт. 2015 г., at 18:47, Willy Tarreau wrote: > > Hi Dmitry, > > sorry for the delay, I really didn't have time to analyse the config > you sent me. > > A few points below : > > On Wed, Oct 07, 2015 at 04:18:20PM +0300, Dmitry Sivachenko wrote: >> Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] >> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- >> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" >> (many similar at one moment) >> >> Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination >> state. >> >> Here is the relevant part on my config (I can post more if needed): >> >> defaults >>balance roundrobin >>maxconn 1 >>timeout queue 1s >>fullconn 3000 >>default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 >> slowstart 60s maxqueue 1 minconn 5 maxconn 150 >> >> backend MT_RU_EN-back >>mode http >>timeout server 30s >>server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 >>server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 >> >> >> So this error log indicates that request was sitting in the queue for >> timeout queue==1s and his turn did not come. >> >> In the stats web interface for MT_RU_EN-back backend I see the following >> numbers: >> >> Sessions: limit=3000, max=126 (for the whole backend) >> Limit=150, max=5 or 6 (for each server) >> >> If I understand minconn/maxconn meaning right, each server should accept up >> to min(150, 3000/18) connections >> >> So according to stats the load were far from limits. > > No, look, the log says there were 108 connections on the backend. This > is important since you're using minconn so you're using dynamic queueing. > This means that the effective limit when handling this request was around > maxconn*currconn/fullconn, which is 150*108/3000 = 5.4 so the limit was > at 5 connections. Thus the limit for this server was indeed reached. > > Playing with minconn and fullconn is hard and strongly advised against, > unless you know exactly how to tune it. You must always ensure that a > normal load will be handled without queuing (or with a very small queue), > and that maxconn will quickly be reached to handle high traffic. I tend to > consider that an efficient fullconn is around 10% of the maximum load the > farm may have to deal with (which is the default value IIRC). Regarding > minconn, it's interesting not to set it too low. A good rule of thumb is > to estimate what would happen at 10% of fullconn (1% of the max load). > In your case, at 300 concurrent connections, your servers will accept > 15 connections each. I have no idea whether this is enough or not to > handle the load. But let's say you have 4 servers, that's only 60 > concurrent connections to process 300 front connections. While it can > be perfectly fine, you may need to increase the queue timeout so that > the requests can wait long enough for a slot. With a 5:1 overbooking and > your 1s queue timeout, that means you expect that the server's average > response time will not go above 200ms. That may be a bit short for some > applications, especially those sensitive to connection count. > > Thus I'd suggest that you either lower fullconn or increase minconn, and > in any case that you also increase the queue timeout to cover the worst > overbooking situation with the average server's response time. > > During the tuning phase, I'd suggest to *significantly* increase the queue > timeout so that you can observe the connection counts and even the average > response time per connection count, that will help you refine the tuning. > Thanks for the explanation, looks like I misunderstand minconn/maxconn logic.
Re: About maxconn and minconn
Hi Dmitry, sorry for the delay, I really didn't have time to analyse the config you sent me. A few points below : On Wed, Oct 07, 2015 at 04:18:20PM +0300, Dmitry Sivachenko wrote: > Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] > MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- > 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" > (many similar at one moment) > > Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination > state. > > Here is the relevant part on my config (I can post more if needed): > > defaults > balance roundrobin > maxconn 1 > timeout queue 1s > fullconn 3000 > default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 > slowstart 60s maxqueue 1 minconn 5 maxconn 150 > > backend MT_RU_EN-back > mode http > timeout server 30s > server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 > server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 > > > So this error log indicates that request was sitting in the queue for timeout > queue==1s and his turn did not come. > > In the stats web interface for MT_RU_EN-back backend I see the following > numbers: > > Sessions: limit=3000, max=126 (for the whole backend) > Limit=150, max=5 or 6 (for each server) > > If I understand minconn/maxconn meaning right, each server should accept up > to min(150, 3000/18) connections > > So according to stats the load were far from limits. No, look, the log says there were 108 connections on the backend. This is important since you're using minconn so you're using dynamic queueing. This means that the effective limit when handling this request was around maxconn*currconn/fullconn, which is 150*108/3000 = 5.4 so the limit was at 5 connections. Thus the limit for this server was indeed reached. Playing with minconn and fullconn is hard and strongly advised against, unless you know exactly how to tune it. You must always ensure that a normal load will be handled without queuing (or with a very small queue), and that maxconn will quickly be reached to handle high traffic. I tend to consider that an efficient fullconn is around 10% of the maximum load the farm may have to deal with (which is the default value IIRC). Regarding minconn, it's interesting not to set it too low. A good rule of thumb is to estimate what would happen at 10% of fullconn (1% of the max load). In your case, at 300 concurrent connections, your servers will accept 15 connections each. I have no idea whether this is enough or not to handle the load. But let's say you have 4 servers, that's only 60 concurrent connections to process 300 front connections. While it can be perfectly fine, you may need to increase the queue timeout so that the requests can wait long enough for a slot. With a 5:1 overbooking and your 1s queue timeout, that means you expect that the server's average response time will not go above 200ms. That may be a bit short for some applications, especially those sensitive to connection count. Thus I'd suggest that you either lower fullconn or increase minconn, and in any case that you also increase the queue timeout to cover the worst overbooking situation with the average server's response time. During the tuning phase, I'd suggest to *significantly* increase the queue timeout so that you can observe the connection counts and even the average response time per connection count, that will help you refine the tuning. Hoping this helps, Willy
Re: About maxconn and minconn
On Thu, Oct 8, 2015 at 7:15 PM, Dmitry Sivachenko wrote: > > > On 7 окт. 2015 г., at 16:18, Dmitry Sivachenko > wrote: > > > > Hello, > > > > I am using haproxy-1.5.14 and sometimes I see the following errors in > the log: > > > > Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] > MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- > 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" > > (many similar at one moment) > > > > Common part in these errors is "1000" in Tw and Tt, and "sQ--" > termination state. > > > > Here is the relevant part on my config (I can post more if needed): > > > > defaults > >balance roundrobin > >maxconn 1 > >timeout queue 1s > >fullconn 3000 > >default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 > slowstart 60s maxqueue 1 minconn 5 maxconn 150 > > > > backend MT_RU_EN-back > >mode http > >timeout server 30s > >server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 > >server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 > > > > > > So this error log indicates that request was sitting in the queue for > timeout queue==1s and his turn did not come. > > > > In the stats web interface for MT_RU_EN-back backend I see the following > numbers: > > > > Sessions: limit=3000, max=126 (for the whole backend) > > Limit=150, max=5 or 6 (for each server) > > > I also forgot to mention the "Queue" values from stats web-interface: > Queue max = 0 for all servers > Queue limit = 1 for all servers (as configured in default-server) > So according to stats queue was never used. > > > Right under the servers list, there is a "Backend" line, which has the > value of "29" in "Queue Max" column. > What does it mean? > > Well that means you had up to 29 requests in the backend queue waiting for connection. In my case I have never seen this queue be more then 0 on the backend or any of the backend servers for that matter. Also the queue limit per server is 128 not 1 (I think you confuse queue limit with queue timeout which you have set to 1 sec indeed). So, as mentioned before, and pointed by Baptiste, your servers are not that fast as you expect them to be, ie you have set your queues size and timeout too low. First, is haproxy on the same LAN segment as the backend servers? For example what is the value of the LastChk column, it should be ms (milliseconds) if your servers are close to haproxy and not under big load. If I were in your shoes I would: - drop the fullconn setting and let haproxy do the math for me - definitely increase the queue timeout to more than 1 sec (why would you risk loosing messages, except if you are short on ram) - set connect timeout as per the excerpt I sent previously and see how I go. > > > > > If I understand minconn/maxconn meaning right, each server should accept > up to min(150, 3000/18) connections > > > > So according to stats the load were far from limits. > > > > What can be the cause of such errors? > > > > Thanks! > > >
Re: About maxconn and minconn
> On 8 окт. 2015 г., at 15:30, Daren Sefcik wrote: > > How about kern.ipc.somaxconn I have this set to 4096, and when it overflows it prints a line in the log (Listen queue overflow...) I have no these errors in logs. Moreover, connections sitting in socket accept queue are not seen by haproxy and haproxy can't count this time and trigger timeouts. > > On Thu, Oct 8, 2015 at 5:22 AM, Dmitry Sivachenko wrote: > > > On 8 окт. 2015 г., at 3:51, Igor Cicimov > > wrote: > > > > > > The only thing I can think of is you have left net.core.somaxconn = 128, > > try increasing it to 4096 lets say to match your planned capacity of 3000 > > > > > I forgot to mention that I am using FreeBSD, I don't think it has similar > sysctl. >
Re: About maxconn and minconn
How about kern.ipc.somaxconn On Thu, Oct 8, 2015 at 5:22 AM, Dmitry Sivachenko wrote: > > > On 8 окт. 2015 г., at 3:51, Igor Cicimov > wrote: > > > > > > The only thing I can think of is you have left net.core.somaxconn = 128, > try increasing it to 4096 lets say to match your planned capacity of 3000 > > > > > I forgot to mention that I am using FreeBSD, I don't think it has similar > sysctl. >
Re: About maxconn and minconn
> On 8 окт. 2015 г., at 3:51, Igor Cicimov > wrote: > > > The only thing I can think of is you have left net.core.somaxconn = 128, try > increasing it to 4096 lets say to match your planned capacity of 3000 > I forgot to mention that I am using FreeBSD, I don't think it has similar sysctl.
Re: About maxconn and minconn
> On 8 окт. 2015 г., at 11:03, Baptiste wrote: > > Hi Dmitry, > > > > Now the question is why such situation. Simply because your queue > management is improperly setup (either increase minconn and or > decrease fullconn) and combined to a server which might be quite slow > to answer leading HAProxy to use queues. > What do you mean "improperly setup"? From the stats I provided I got an impression that no limits were reached for request to get into the waiting queue. Or am I wrong? (I will send you full config and logs in private soon)
Re: About maxconn and minconn
> On 7 окт. 2015 г., at 16:18, Dmitry Sivachenko wrote: > > Hello, > > I am using haproxy-1.5.14 and sometimes I see the following errors in the log: > > Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] > MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- > 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" > (many similar at one moment) > > Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination > state. > > Here is the relevant part on my config (I can post more if needed): > > defaults >balance roundrobin >maxconn 1 >timeout queue 1s >fullconn 3000 >default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 > slowstart 60s maxqueue 1 minconn 5 maxconn 150 > > backend MT_RU_EN-back >mode http >timeout server 30s >server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 >server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 > > > So this error log indicates that request was sitting in the queue for timeout > queue==1s and his turn did not come. > > In the stats web interface for MT_RU_EN-back backend I see the following > numbers: > > Sessions: limit=3000, max=126 (for the whole backend) > Limit=150, max=5 or 6 (for each server) I also forgot to mention the "Queue" values from stats web-interface: Queue max = 0 for all servers Queue limit = 1 for all servers (as configured in default-server) So according to stats queue was never used. Right under the servers list, there is a "Backend" line, which has the value of "29" in "Queue Max" column. What does it mean? > > If I understand minconn/maxconn meaning right, each server should accept up > to min(150, 3000/18) connections > > So according to stats the load were far from limits. > > What can be the cause of such errors? > > Thanks!
Re: About maxconn and minconn
Hi Dmitry, It says what it says: you configured HAProxy to manage queue to protect your servers. during your workload, a request remain in queue for too long (1s) so HAProxy simply return an error. Now the question is why such situation. Simply because your queue management is improperly setup (either increase minconn and or decrease fullconn) and combined to a server which might be quite slow to answer leading HAProxy to use queues. Or you met a bug :) We need the full configuration and log lines around the sQ event (right before and right after), so we may help. Baptiste On Wed, Oct 7, 2015 at 3:18 PM, Dmitry Sivachenko wrote: > Hello, > > I am using haproxy-1.5.14 and sometimes I see the following errors in the log: > > Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] > MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- > 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" > (many similar at one moment) > > Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination > state. > > Here is the relevant part on my config (I can post more if needed): > > defaults > balance roundrobin > maxconn 1 > timeout queue 1s > fullconn 3000 > default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 > slowstart 60s maxqueue 1 minconn 5 maxconn 150 > > backend MT_RU_EN-back > mode http > timeout server 30s > server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 > server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 > > > So this error log indicates that request was sitting in the queue for timeout > queue==1s and his turn did not come. > > In the stats web interface for MT_RU_EN-back backend I see the following > numbers: > > Sessions: limit=3000, max=126 (for the whole backend) > Limit=150, max=5 or 6 (for each server) > > If I understand minconn/maxconn meaning right, each server should accept up > to min(150, 3000/18) connections > > So according to stats the load were far from limits. > > What can be the cause of such errors? > > Thanks!
Re: About maxconn and minconn
On Thu, Oct 8, 2015 at 11:51 AM, Igor Cicimov < ig...@encompasscorporation.com> wrote: > > > On Thu, Oct 8, 2015 at 12:18 AM, Dmitry Sivachenko > wrote: > >> Hello, >> >> I am using haproxy-1.5.14 and sometimes I see the following errors in the >> log: >> >> Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] >> MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- >> 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" >> (many similar at one moment) >> >> Common part in these errors is "1000" in Tw and Tt, and "sQ--" >> termination state. >> >> Here is the relevant part on my config (I can post more if needed): >> >> defaults >> balance roundrobin >> maxconn 1 >> timeout queue 1s >> fullconn 3000 >> default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 >> slowstart 60s maxqueue 1 minconn 5 maxconn 150 >> >> backend MT_RU_EN-back >> mode http >> timeout server 30s >> server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 >> server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 >> >> >> So this error log indicates that request was sitting in the queue for >> timeout queue==1s and his turn did not come. >> >> In the stats web interface for MT_RU_EN-back backend I see the following >> numbers: >> >> Sessions: limit=3000, max=126 (for the whole backend) >> Limit=150, max=5 or 6 (for each server) >> >> If I understand minconn/maxconn meaning right, each server should accept >> up to min(150, 3000/18) connections >> >> So according to stats the load were far from limits. >> >> What can be the cause of such errors? >> >> Thanks! >> > > The only thing I can think of is you have left net.core.somaxconn = 128, > try increasing it to 4096 lets say to match your planned capacity of 3000 > > sQ The session spent too much time in queue and has been expired. See the "timeout queue" and "timeout connect" settings to find out how to fix this if it happens too often. If it often happens massively in short periods, it may indicate general problems on the affected servers due to I/O or database congestion, or saturation caused by external attacks. another possibility to investigate. If the backends are too slow, or maybe the connection is delayed by a firewall in the middle or something, then maybe tuning the "connect timeout" may help: If the server is located on the same LAN as haproxy, the connection should be immediate (less than a few milliseconds). Anyway, it is a good practice to cover one or several TCP packet losses by specifying timeouts that are slightly above multiples of 3 seconds (eg: 4 or 5 seconds). By default, the connect timeout also presets both queue and tarpit timeouts to the same value if these have not been specified.
Re: About maxconn and minconn
On Thu, Oct 8, 2015 at 12:18 AM, Dmitry Sivachenko wrote: > Hello, > > I am using haproxy-1.5.14 and sometimes I see the following errors in the > log: > > Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] > MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- > 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" > (many similar at one moment) > > Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination > state. > > Here is the relevant part on my config (I can post more if needed): > > defaults > balance roundrobin > maxconn 1 > timeout queue 1s > fullconn 3000 > default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 > slowstart 60s maxqueue 1 minconn 5 maxconn 150 > > backend MT_RU_EN-back > mode http > timeout server 30s > server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 > server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 > > > So this error log indicates that request was sitting in the queue for > timeout queue==1s and his turn did not come. > > In the stats web interface for MT_RU_EN-back backend I see the following > numbers: > > Sessions: limit=3000, max=126 (for the whole backend) > Limit=150, max=5 or 6 (for each server) > > If I understand minconn/maxconn meaning right, each server should accept > up to min(150, 3000/18) connections > > So according to stats the load were far from limits. > > What can be the cause of such errors? > > Thanks! > The only thing I can think of is you have left net.core.somaxconn = 128, try increasing it to 4096 lets say to match your planned capacity of 3000
About maxconn and minconn
Hello, I am using haproxy-1.5.14 and sometimes I see the following errors in the log: Oct 7 08:33:03 srv1 haproxy[77565]: unix:1 [07/Oct/2015:08:33:02.428] MT-front MT_RU_EN-back/ 0/1000/-1/-1/1000 503 212 - - sQ-- 125/124/108/0/0 0/28 "POST /some/url HTTP/1.1" (many similar at one moment) Common part in these errors is "1000" in Tw and Tt, and "sQ--" termination state. Here is the relevant part on my config (I can post more if needed): defaults balance roundrobin maxconn 1 timeout queue 1s fullconn 3000 default-server inter 5s downinter 1s fastinter 500ms fall 3 rise 1 slowstart 60s maxqueue 1 minconn 5 maxconn 150 backend MT_RU_EN-back mode http timeout server 30s server mt1-34 mt1-34:19016 track MT-back/mt1-34 weight 38 server mt1-35 mt1-35:19016 track MT-back/mt1-35 weight 38 So this error log indicates that request was sitting in the queue for timeout queue==1s and his turn did not come. In the stats web interface for MT_RU_EN-back backend I see the following numbers: Sessions: limit=3000, max=126 (for the whole backend) Limit=150, max=5 or 6 (for each server) If I understand minconn/maxconn meaning right, each server should accept up to min(150, 3000/18) connections So according to stats the load were far from limits. What can be the cause of such errors? Thanks!