Re: [Linux-HA] Stopping heartbeat on secondary node causes primary to fail

Caspar Smit Mon, 08 Aug 2011 00:13:14 -0700

2011/8/6 Chris Huber-Lantz <[email protected]>:
> Hello All,
>
> I am having an issue in a 2-node heartbeat cluster where the primary
> node's resources are relinquished if the secondary node's heartbeat
> service is stopped. Below is the ha.cf file:
>
> logfacility local0
> logfile /var/log/ha-log
> debugfile /var/log/ha-debug
> udpport 694
> keepalive 2 # 2 second
> deadtime 20
> warntime 10
> initdead 40
> ucast eth5 192.168.0.2


You are using only 1 ucast directive and no mcast or bcast.
I'm guessing the 192.168.0.2 address is the secondary node, if that
address goes down (heartbeat stopped) there is no heartbeat at all
anymore not even a local heartbeat so the primary stops.
You should either add a mcast/bcast over another device or add a ucast
of the local IP address (ucast eth5 192.168.0.1 if that is the local
IP address)

Best regards,
Caspar

> node node1
> node node2
> auto_failback on
> watchdog /dev/watchdog
>
> As you can see we are using "auto_failback on" which *should* only
> pertain to when the main server is taken down and subsequently brought
> up as to re-assume control of the primary resources. However I have
> noticed several forum posts regarding this setting causing unexplained
> behavior, although not specifically the behavior we are seeing.
>
> Is it possible this setting is the culprit or would there be any other
> reason that stopping heartbeat on the secondary node would cause the
> primary to drop its resources?
>
> Any help is greatly appreciated!
>
> - Chris
>
> --
> Regards,
>     Chris
>
> Chris Huber-Lantz
> ScratchSpace Inc.
> (831) 621-7928
> http://www.scratchspace.com
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Stopping heartbeat on secondary node causes primary to fail

Reply via email to