Re: [Pacemaker] Timeout, interval & onfail questions
On 07/10/2011 02:53 PM, Lars Marowsky-Bree wrote: 2) I wish to my resources are *never* go to fail status. I found on-fail="restart" option but it is not seems to work as I expected. So, for example, if some node under high LA and monitoring of resource is fail - pacemaker will try to run "stop" action but because of high LA it will timeout too and pacemaker decide what resource is "unmanaged". How can I tune this behaviour? I wish pacemaker not to give up and try again. Repeating the same thing over and over again and expecting the result to change is one of the clinical tests for irrational and insane behaviour. So pacemaker doesn't do that. ;-) "stop" isn't supposed to fail, we don't support retrying it, and will not. :-) Well - this is not quite true. Because env can change - eg LA is start to go low. Well I think I will use some cron job for this. Fix it so that it doesn't fail; if it fails due to a too short timeout, make the timeout longer. Sad thing - this host have huge LA time by time and we can`t fix that in near future. Timeout not really helps here(3m by now)... well I don`t really try to make it 10m or so. -- Best regards, Proskurin Kirill ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Timeout, interval & onfail questions
On 2011-07-10T00:50:45, Proskurin Kirill wrote: > Hello all! > > I trying to understand all logic of pacemaker and have some questions. > 1) There is an interval and timeout of monitoring of resource. > Situation: > Interval is 20s, timeout is 60s. > > Monitoring action is started but node on load ant is it takes more > than 20 sec to get the result - will second monitoring action start > or pacemaker understand what he allready have one? Yes. The interval is counted from completion of the previous op. > 2) I wish to my resources are *never* go to fail status. I found > on-fail="restart" option but it is not seems to work as I expected. > > So, for example, if some node under high LA and monitoring of > resource is fail - pacemaker will try to run "stop" action but > because of high LA it will timeout too and pacemaker decide what > resource is "unmanaged". How can I tune this behaviour? I wish > pacemaker not to give up and try again. Repeating the same thing over and over again and expecting the result to change is one of the clinical tests for irrational and insane behaviour. So pacemaker doesn't do that. ;-) "stop" isn't supposed to fail, we don't support retrying it, and will not. Fix it so that it doesn't fail; if it fails due to a too short timeout, make the timeout longer. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) "Experience is the name everyone gives to their mistakes." -- Oscar Wilde ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Timeout, interval & onfail questions
Hello all! I trying to understand all logic of pacemaker and have some questions. 1) There is an interval and timeout of monitoring of resource. Situation: Interval is 20s, timeout is 60s. Monitoring action is started but node on load ant is it takes more than 20 sec to get the result - will second monitoring action start or pacemaker understand what he allready have one? 2) I wish to my resources are *never* go to fail status. I found on-fail="restart" option but it is not seems to work as I expected. So, for example, if some node under high LA and monitoring of resource is fail - pacemaker will try to run "stop" action but because of high LA it will timeout too and pacemaker decide what resource is "unmanaged". How can I tune this behaviour? I wish pacemaker not to give up and try again. -- Best regards, Proskurin Kirill ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker