On 06/18/2016 05:15 AM, Ferenc Wágner wrote: > Hi, > > Could somebody please elaborate a little why the pacemaker systemd > service file contains "Restart=on-failure"? I mean that a failed node > gets fenced anyway, so most of the time this would be a futile effort. > On the other hand, one could argue that restarting failed services > should be the default behavior of systemd (or any init system). Still, > it is not. I'd be grateful for some insight into the matter.
To clarify one point, the configuration mentioned here is systemd configuration, not part of pacemaker configuration or operation. Systemd monitors the processes it launches. With "Restart=on-failure", system will re-launch pacemaker in situations systemd considers "failure" (exiting nonzero, exiting with core dump, etc.). Systemd does have various rate-limiting options, which we leave as default in the pacemaker unit file. Perhaps one day we could try to come up with ideal values, but it should be a rare situation, and admins can always tune them as desired for their system using an override file. The goal of restart is of course to have a slightly better shot at recovery. You're right, if fencing is configured and quorum is retained, the node will almost certainly get fenced anyway, but those conditions aren't always true. Systemd upstream recommends Restart=on-failure or Restart=on-abnormal for all long-running services. on-abnormal would probably be better for pacemaker, but it's not supported in older systemd versions. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org