On 2011-02-11 09:16, Uwe Schmeling wrote: > Hi, > > I'm just migrating my recent mon/heartbeat configuration to pacemaker. > The point of interest is the webservice behavior. Before the monitor > checked if the service failed twice within 20 sec, switch to other node > was initiated if this happens. Now I'm trying to configuring the same > behavior using pacemaker. The webservice is monitored every 10 seconds > (interval=10), failure timeout is set to 20s (expecting to ignore all > failures within this time frame)
That is *not* what failure-timeout means. Please reread the docs. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-failure-migration.html > and it should only happen if a "valid > failure" occurs twice (migration-theshold=2). Valid-failure means: the > service fails twice within 20s but is ignored if the service is back > within 20s. There is no such thing in Pacemaker as the "valid failure" you're talking about. This is the configuration, which is used to implement this > behavior: > > node lbv01 \ > attributes standby="off" > node lbv02 \ > attributes standby="off" > primitive apacheIP ocf:heartbeat:IPaddr2 \ > params ip="10.6.151.190" \ > op monitor interval="10s" \ > meta is-managed="true" > primitive haproxyIP ocf:heartbeat:IPaddr2 \ > params ip="10.6.151.191" \ > op monitor interval="10s" > primitive pingd ocf:pacemaker:ping \ > params host_list="10.6.151.11" multiplier="100" \ > op monitor interval="15s" timeout="5s" > *primitive webservice ocf:heartbeat:webservices \ > op monitor on-fail="ignore" interval="10s" \ > meta failure-timeout="20s" migration-threshold="2"* > group webservice-ips haproxyIP apacheIP webservice \ > meta target-role="Started" > colocation all-resources inf: webservice-ips pingd > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1297249441" \ > cluster-delay="30" > > If a webservice monitoring failure is forced, the switchover immediately > is performed, ignoring timeout and threshold. I already pointed out that you've got a false impression of failure-timeout, so that's irrelevant here. Could it be that you are not just forcing the monitoring failure, but also keeping the service from restarting? Some "chmod -x" trick? Because that makes your monitor fail *and* the subsequent restart, and its that failing restart that would cause your migration. Or else your "webservices" agent exits with $OCF_ERR_INSTALLED on your monitor failure, which will also cause a prompt migration. Btw, when you write your own RA, *please* don't install it into the "heartbeat" provider directory, instead create your own directory. Otherwise a casual observer will think you're talking about a resource agent that lives in our upstream repo, which for your "webservices" agent is clearly not the case. Florian
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker