The resource failed when the sleep expired, i.e. each 600 secs. Now I changed the resource to
sleep 7200, failure-timeout 3600 i.e. to values far beyond the recheck-interval opf 15m. Now everything behaves as expected. Mit freundlichen Grüßen / Kind regards Holger Teutsch From: Andrew Beekhof <and...@beekhof.net> To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> Date: 05.10.2010 11:09 Subject: Re: [Pacemaker] Fail-count and failure timeout On Tue, Oct 5, 2010 at 11:07 AM, Andrew Beekhof <and...@beekhof.net> wrote: > On Fri, Oct 1, 2010 at 3:40 PM, <holger.teut...@fresenius-netcare.com> wrote: >> Hi, >> I observed the following in pacemaker Versions 1.1.3 and tip up to patch >> 10258. >> >> In a small test environment to study fail-count behavior I have one resource >> >> anything >> doing sleep 600 with monitoring interval 10 secs. >> >> The failure-timeout is 300. >> >> I would expect to never see a failcount higher than 1. > > Why? > > The fail-count is only reset when the PE runs... which is on a failure > and/or after the cluster-recheck-interval > So I'd expect a maximum of two. Actually this is wrong. There is no maximum, because there needs to have been 300s since the last failure when the PE runs. And since it only runs when the resource fails, it is never reset. > > cluster-recheck-interval = time [15min] > Polling interval for time based changes to options, > resource parameters and constraints. > > The Cluster is primarily event driven, however the > configuration can have elements that change based on time. To ensure > these changes take effect, we can optionally poll the cluster’s > status for changes. Allowed values: Zero disables > polling. Positive values are an interval in seconds (unless other SI > units are specified. eg. 5min) > > > >> >> I observed some sporadic clears but mostly the count is increasing by 1 each >> 10 minutes. >> >> Am I mistaken or is this a bug ? > > Hard to say without logs. What value did it reach? > >> >> Regards >> Holger >> >> -- complete cib for reference --- >> >> <cib epoch="32" num_updates="0" admin_epoch="0" >> validate-with="pacemaker-1.2" crm_feature_set="3.0.4" have-quorum="0" >> cib-last-written="Fri Oct 1 14:17:31 2010" dc-uuid="hotlx"> >> <configuration> >> <crm_config> >> <cluster_property_set id="cib-bootstrap-options"> >> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" >> value="1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67"/> >> <nvpair id="cib-bootstrap-options-cluster-infrastructure" >> name="cluster-infrastructure" value="openais"/> >> <nvpair id="cib-bootstrap-options-expected-quorum-votes" >> name="expected-quorum-votes" value="2"/> >> <nvpair id="cib-bootstrap-options-no-quorum-policy" >> name="no-quorum-policy" value="ignore"/> >> <nvpair id="cib-bootstrap-options-stonith-enabled" >> name="stonith-enabled" value="false"/> >> <nvpair id="cib-bootstrap-options-start-failure-is-fatal" >> name="start-failure-is-fatal" value="false"/> >> <nvpair id="cib-bootstrap-options-last-lrm-refresh" >> name="last-lrm-refresh" value="1285926879"/> >> </cluster_property_set> >> </crm_config> >> <nodes> >> <node id="hotlx" uname="hotlx" type="normal"/> >> </nodes> >> <resources> >> <primitive class="ocf" id="test" provider="heartbeat" type="anything"> >> <meta_attributes id="test-meta_attributes"> >> <nvpair id="test-meta_attributes-target-role" name="target-role" >> value="started"/> >> <nvpair id="test-meta_attributes-failure-timeout" >> name="failure-timeout" value="300"/> >> </meta_attributes> >> <operations id="test-operations"> >> <op id="test-op-monitor-10" interval="10" name="monitor" >> on-fail="restart" timeout="20s"/> >> <op id="test-op-start-0" interval="0" name="start" >> on-fail="restart" timeout="20s"/> >> </operations> >> <instance_attributes id="test-instance_attributes"> >> <nvpair id="test-instance_attributes-binfile" name="binfile" >> value="sleep 600"/> >> </instance_attributes> >> </primitive> >> </resources> >> <constraints/> >> </configuration> >> </cib> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker