Johan Hoeke wrote: > Two node cluster, RH AS 4.6 w/ Centos Heartbeat 2.1.3-3 rpm's, riloe > stonith. > Testing the clusters' reaction to manually stopping a monitored resource > of type lsb I observed the following behavior: > > 1 - the resource fails as expected > 2 - the failure triggers a stonith action as expected > 3 - the next desired action would be for the resources to failover to > the other node > 4 - instead the resources stay stopped until the first server recovers > from the stonith reboot > 5 - then the resources are failed over, and the first server is > stonith-ed again. > > > The analysis.txt from hb_report shows a problem w/ the stonith setup: > > Feb 17 09:17:07 koch pengine: [2033]: ERROR: native_add_running: > Resource stonith::external/riloe:R_koch_ilo appears to be active on 2 nodes. > > I went from a stonith setup with clones to one without clones to > hopefully get rid of this problem. That did not help. > > Also tried disabling startup-fencing. That did not make any difference. > > Have I configured the cluster somehow to force this unexpected behavior? > How can I make it so the resources failover at step 3 instead of step 5 > How can I keep the failed server from being stonith-ed twice? > > regards, > > Johan
resolved it by adding monitor operations to the stonith resources. regards, Johan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems