systemd watchdog support?

Patrick Hemmer Tue, 07 May 2019 11:29:05 -0700

So with the prevalence of the issues lately where haproxy is goingunresponsive and consuming 100% CPU, I wanted to see what thoughts wereon implementing systemd watchdog functionality.

In our case, haproxy going unresponsive is extremely problematic as ourclustering software (pacemaker+systemd) sees the service still running,and doesn't realize it needs to restart the service or fail over.We could look into implementing some sort of custom check resource inpacemaker, but before going down that route I wanted to explore thesystemd watchdog functionality.

The watchdog is implemented by periodically sending "WATCHDOG=1" on thesystemd notification socket. However there are a few different ways Ican see this being implemented.

We could put this in the master control process, but this only tells usif the master is functioning, not the workers, which are what really matter.

So the next thought would be for all of the workers to listen on ashared socket. The master would periodically send a request to thatsocket, and as long as it gets a response, it pings the watchdog. Thistells us that there is at least one worker able to accept traffic.

However if a frontend is bound to a specific worker, then that wouldfrontend would be non-responsive, and the watchdog wouldn't restart theservice. For that the worker would have to send a request to each workerseparately, and require a response from all of them before it pings thewatchdog. This would be better able to detect issues, but for somepeople who aren't using any bound-to-process frontends, they would beable to handle failure of a single worker and potentially schedule arestart/reload at a less impactful time.

The last idea would be to have the watchdog watch the master only, andthe master watches the workers in turn. If a worker stops responding,the master would restart just that one worker.

Any thoughts on the matter, or do we not want to do this, and rely on acustom check in the cluster management software?


-Patrick

systemd watchdog support?

Reply via email to