David Robinson created MESOS-5376: ------------------------------------- Summary: Add systemd watchdog support Key: MESOS-5376 URL: https://issues.apache.org/jira/browse/MESOS-5376 Project: Mesos Issue Type: Improvement Reporter: David Robinson
It would be great if Mesos had support for systemd's [watchdog|http://0pointer.de/blog/projects/watchdog.html]. Users would typically use a supervisor like [monit|https://mmonit.com/monit/] to check the agent/master's /health endpoint and restart upon consecutive failures. Systemd doesn't support polling services, it uses a watchdog to communicate liveliness instead. Supervisor solutions like monit could be replaced with systemd if mesos had watchdog support. Note that simply restarting the service upon failure (ie, when the process exits) is not sufficient -- a deadlock within mesos would not cause the process to exit but a watchdog could detect this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)