Tobias Appel wrote:
> Hi,
> 
> I have a very weird error with heartbeat version 2.14.
> 
> I have two IPMI resources for my two nodes. The configuration is posted 
> here: http://pastebin.com/m52c1809c
> 
> node1 is named nagios1
> node2 is named nagios2
> 
> now I have ipmi_nagios1 (which should run on nagios2 to shutdown nagios1)
> and ipmi_nagios2 (which should run on nagios1 to shutdown nagios2).
> 
> It's confusing I know.
> 
> Now I set up to constraints which force with score infinity a resource 
> to only run on their designated node.
> 
> For the resource ipmi_nagios2 it works without a problem. It only runs 
> on nagios1 and is never started on nagios2. But the other resource which 
> is identically configured (just the hostname differs) does not work - 
> heartbeat always wants to start it on nagios1 and very seldom starts it 
> on nagios2. Just now it failed to start on nagios1 and I hit clean up 
> resource, waited a bit, failed again and after 3 times the cluster went 
> havoc and turned off one of the nodes!
> 
> I even tried to set a constraint via the UI - it's then labeled 
> cli-constraint-name but even with this as well heartbeat still tried to 
> start it on the wrong node!
> 
> Now I'm really at a loss, maybe my configuration is wrong, or maybe it 
> really is a bug in heartbeat.
> 
> Here is the link to the configuration again: http://pastebin.com/m52c1809c
> 
> I honestly don't know what to do anymore. I have to stop the ipmi 
> service at the moment because otherwise it might randomly turn off one 
> of the nodes, but without it we don't have any fencing so it's a quite 
> delicate situation at the moment.
> 
> Any input is greatly appreciated.
> 
> Regards,
> Tobias

The constraints look okay, but without logs, we cannot say why it does
not do what you want.

Also: look at the stonith device configuration: Is it okay for both
primitives to have the same ip configured? I'd guess that will not
successfully start the resource!? Maybe that's it already.

I'd guess there was some failure before which brought up this situation
(probably stonith start fail and stop fail)?

It shouldn't turn off nodes at random. There's usually a pretty good
reason when the cluster does this.

Btw: the better way to make sure, a particular resource is only started
on one node but never on the other is usually to configure -INFINITY for
the "other" node instead of INFINITY for the node you want it to run on.

Regards
Dominik
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to