On Thu, Apr 15, 2021 at 09:23:07AM +0200, Willy Tarreau wrote:
> On Thu, Apr 15, 2021 at 07:13:53AM +0000, Robin H. Johnson wrote:
> > Thanks; I will need to catch it faster or automate this, because the
> > watchdog does a MUCH better job restarting it than before, less than 30
> > seconds of 100% CPU before the watchdog reliably kills it.
> I see. Then collecting the watchdog outputs can be instructive to see
> if it always happens at the same place or not. And the core dumps will
> indicate what all threads were doing (and if some were competing on a
> lock for example).
The truncation in the log output for the crash was interesting, I couldn't see
why it was being cut off. I wish we could get a clean reproduction in our
testing environment, because the production core dumps absolutely have customer
data in them.

> > Varnish runs on the same host and is used to cache some of the backends.
> > Please of free memory at the moment.
> 
> I'm now thinking about something. Do you have at least as many CPUs as the
> total number of threads used by haproxy and varnish ? Otherwise there will
> be some competition and migrations will happen. If neither is bounded, you
> can even end up with two haproxy threads forced to run on the same CPU,
> which is the worst situation as one could be scheduled out with a lock
> held and the other one spinning waiting for this lock.
Single socket, AMD EPYC 7702P 64-Core Processor, 128 threads!
Shows as single NUMA node in our present configuration.
Hopefully the kernel is mostly doing the right thing, but read on.

HAProxy already pinned to the first 64 threads with:
cpu-map 1/1 0
...
cpu-map 1/64 63

Varnish isn't explicitly pinned right now, but uses less than 200% CPU
overall (we know most requests aren't cachable so they don't get routed to
Varnish at all)

But your thought of CPU pinning was good.
I went to confirm it in the host, and I'm not certain if the cpu-map is working
right.

# pid_haproxy_leader=68839 ; pid_haproxy_follower=68848 
# taskset -pc $pid_haproxy_leader
pid 68839's current affinity list: 0-127
# taskset -pc $pid_haproxy_follower
pid 68848's current affinity list: 0

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

Attachment: signature.asc
Description: PGP signature

Reply via email to