Re: [ClusterLabs] restarting pacemakerd

2016-06-19 Thread Digimer
On 19/06/16 01:59 AM, Andrei Borzenkov wrote:
> 18.06.2016 22:04, Dmitri Maziuk пишет:
>> On 2016-06-18 05:15, Ferenc Wágner wrote:
>> ...
>>> On the other hand, one could argue that restarting failed services
>>> should be the default behavior of systemd (or any init system).  Still,
>>> it is not.
>>
>> As an off-topic snide comment, I never understood the thinking behind
>> that: restarting without removing the cause of the failure will just
>> make it fail again. If at first you don't succeed, then try, try, try
>> again?
>>
> 
> Some problems are transient and restarting may succeed (most obvious
> example is program crash which includes OS kernel crash). What is needed
> here is rate limiting so restart is not attempted indefinitely.

Rgmanager offers this via "max_restarts". I'd be shocked if there wasn't
a version of this in pacemaker already, given that it has for more
flexibility than rgmanager.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] restarting pacemakerd

2016-06-19 Thread Andrei Borzenkov
18.06.2016 22:04, Dmitri Maziuk пишет:
> On 2016-06-18 05:15, Ferenc Wágner wrote:
> ...
>> On the other hand, one could argue that restarting failed services
>> should be the default behavior of systemd (or any init system).  Still,
>> it is not.
> 
> As an off-topic snide comment, I never understood the thinking behind
> that: restarting without removing the cause of the failure will just
> make it fail again. If at first you don't succeed, then try, try, try
> again?
> 

Some problems are transient and restarting may succeed (most obvious
example is program crash which includes OS kernel crash). What is needed
here is rate limiting so restart is not attempted indefinitely.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org