Re: Theoretical limits for a HAProxy instance

Iago Alonso Wed, 14 Dec 2022 03:50:50 -0800

Hi,

We are not sure what element produces the errors, in the haproxy logs wedon't see them. What does it happen with the new connections when we hitthe limit? What server resource should be affected by it, if any?

We have our logs in `warning` level so we do not see the response timeon the haproxy logs. In the`haproxy_backend_response_time_average_seconds` we see "normal" values(20ms average and some spikes at less than 80ms).

We see a spike in the `haproxy_backend_connect_time_average_seconds`, aswell as in the `haproxy_server_connect_time_average_seconds` metricswhen the errors start. We have more haproxies (that we were notstressing in the test) in front of the same backends, serving withoutissues. We are using https between to communicate to the backends too,and we see that the task consuming the majority of the resources is`ssl_sock_io_cb`.

Our traffic is not request rate intensive, but we do have a high amountof concurrent connections. For example, at 200k connections, we have 10krps, and at 300k connections, we have 14k rps.

In our tests, we never saturate our network bandwidth, we hit as much as50% our available bandwidth for the server.


Regards.


On 14/12/22 8:09, Willy Tarreau wrote:

Hi,

On Tue, Dec 13, 2022 at 03:33:58PM +0100, Iago Alonso wrote:

Hi,

We do hit our defined max ssl/conn rates, but given the spare
resources, we don't expect to suddenly return 5xx.

What bothers me is that once this limit is reached there's no more
connection accepted by haproxy so you should indeed not see such
errors. What element produces these errors ? Since you've enabled
logging, can you check them in your logs and figure whether they're
sent by the server, by haproxy or none (it might well be the load
generator translating connection timeouts to 5xx for user reporting).

If the errors are produced by haproxy, then their exact value and the
termination flags are very important as they'll indicate where the
problem is.

Another thing you'll observe in your logs are the server's response
time. It's possible that your request rate is getting very close to
the server's limits.

Among other limitations the packet rate and network bandwidth might
represent a limit. For example, let's say you're requesting 10kB
objects. At 10k/s it's just less than a gigabit/s, at 11k/s it doesn't
pass anymore.

When you look at the haproxy stats page, the cumulated live network
bandwidth is reported at the top, it might give you an indication of
a possible limit.

But in any case, stats on the status codes and termination codes in
logs would be extremely useful. Note that you can do this natively
using halog (I seem to remember it's halog -st and halog -tc, but
please double-check, and in any case, cut|sort|uniq -c always works).

Regards,
Willy

Re: Theoretical limits for a HAProxy instance

Reply via email to