Hi,

We recently hit an issue where we observed the
haproxy_frontend_current_sessions reported by the prometheus endpoint
plateau at 4095 and some requests start dropping. Increasing the global and
listen maxconn from 4096 to something larger (as well as making the kernel
TCP queues on our Ubuntu 22.04 OS slighty larger) fixed the issue.

The cause seems to have been a switch from http to https traffic due to a
client side config change, rather than an increase in the number of
requests, so I started looking at CPU usage to see if the SSL load was too
much for our server CPUs. However on one of the modern 24 core machines
running HAProxy I noticed top was only reporting around 100% CPU usage,
with both the user and system CPU distributed pretty evenly across all the
cores (4-8% user per core, 0.5-2% system). The idle percentage was in the
high nineties, both as reported by top and by the haproxy socket Idle_pct.
This was just a quick gathering of info and may not be representative,
since our prometheus node exporter only shows overall CPU (which was a low
5% of the total on all cores throughout). This is for a bare metal server
which is just running a HAProxy processing around 200 SSL req/sec, and not
doing much else.

I started wondering if our global settings:

  master-worker
  nbthread 24
  cpu-map auto:1/1-24 0-23
  tune.ssl.cachesize 100000

were appropriate or if they had caused some inefficiency in using our
machine's cores, which then caused this backlog. Or whether what I am
observing is completely normal, given that we are now spending more time on
SSL decoding so can expect more queuing (our backend servers are very fast
and so we run them with a small maxconn, but they don't care if the request
is SSL or not so the overall request time should be the same other than SSL
processing time). We are running either the latest OpenSSL 1.1.1 or
WolfSSL, all compiled sensibly (AES-NI etc).

I turned to https://docs.haproxy.org/2.9/management.html#7 which had some
very interesting advice about pinning haproxy to one CPU core and the
interrupts to another one, but it also mentioned nbproc and the bind
process option for better SSL traffic processing. Given that seems to be a
bit out of date, I thought I might ask my question here instead.

Is there a way to use the CPU cores available on our HAProxy machines to
handle SSL requests better than I have with the global config above? I
realise this is a bit of an open ended question, but for example I was
wondering if we could reduce the number of active sessions (so we don't hit
maxconn) by increasing threads beyond the number of CPU cores, it naively
seems that might increase per session latency but increase overall
throughput since we don't appear to be taxing any of the cores (and have
lots of memory available on these machines). As I said I am not even sure
there is a problem, but I would like to understand a bit better if there is
anything we can do to help HAProxy use the CPU cores more effectively,
since all the advice I can find is obsolete (nbproc etc) and it is quite
hard to experiment when I don't know what is good to measure.

Thanks for your time,

Miles

Reply via email to