Johan Hoeke wrote:
> Two node cluster, RH AS 4.6 w/ Centos Heartbeat 2.1.3-3 rpm's, riloe
> stonith.
> Testing the clusters' reaction to manually stopping a monitored resource
> of type lsb I observed the following behavior:
> 
> 1 - the resource fails as expected
> 2 - the failure triggers a stonith action as expected
> 3 - the next desired action would be for the resources to failover to
> the other node
> 4 - instead the resources stay stopped until the first server recovers
> from the stonith reboot
> 5 - then the resources are failed over, and the first server is
> stonith-ed again.
> 
> 
> The analysis.txt from hb_report shows a problem w/ the stonith setup:
> 
> Feb 17 09:17:07 koch pengine: [2033]: ERROR: native_add_running:
> Resource stonith::external/riloe:R_koch_ilo appears to be active on 2 nodes.
> 
> I went from a stonith setup with clones to one without clones to
> hopefully get rid of this problem. That did not help.
> 
> Also tried disabling startup-fencing. That did not make any difference.
> 
> Have I configured the cluster somehow to force this unexpected behavior?
> How can I make it so the resources failover at step 3 instead of step 5
> How can I keep the failed server from being stonith-ed twice?
> 
> regards,
> 
> Johan

resolved it by adding monitor operations to the stonith resources.


regards,

Johan

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to