Actually another question....

I would simply add eth1 to the heartbeat ha.cf then? and whats the diff between using mcast vs bcast? I am not sure i understand this ?

Thanks a bunch
:)


Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Dominik Klein wrote:
Rob Morin wrote:
I have not seen my original email get to the list yet... but after looking through the logs i see this on each node...
see below for log excerts...

My test involved bringing down eth0 only(heartbeat & replication), should i have also brought down eth1 the public side of Joe(primary)

my conf file is...

logfacility     daemon        # This is deprecated
keepalive 2                   # Interval between heartbeat (HB) packets.
deadtime 60                   # How quickly HB determines a dead node.
warntime 5                    # Time HB will issue a late HB.
initdead 120 # Time delay needed by HB to report a dead node. udpport 694 # UDP port HB uses to communicate between nodes. #ping 192.168.5.1 # Ping VMware Server host to simulate network resource.
bcast eth0

You only use one connection for heartbeat communication. That is a configuration error.

As you unplugged that interface for testing, you forced a splitbrain situation. Read http://www.linux-ha.org/SplitBrain

Dual split brain so to speak. Your drbd replication is also done over this link. So not only does heartbeat loose connection, but also does drbd. In a standard setup, a not connected secondary drbd device can be promoted disregarding the peer's drbd state.

You might want to read about dopd, too: http://www.drbd.org/users-guide/s-heartbeat-dopd.html It can prevent drbd splitbrain, but you need to have >1 network connection anyways.

#baud 115200
#serial /dev/ttyS0              # Which interface to use for HB packets.
coredumps true
auto_failback on # Auto promotion of primary node upon return to cluster.

Your comment answers your later question on what will happen when a rebooted (stonith'd) node rejoins the cluster.

Regards
Dominik

node    joe      # Node name must be same as uname -n.
node    stewie      # Node name must be same as uname -n.
###
###
respawn hacluster /usr/lib/heartbeat/ipfail
# Specifies which programs to run at startup
# DO not use the below unless you use the /var/lib/heartbeat/crm/cib/xml config file instead
#crm on
use_logd yes                  # Use system logging.
logfile /var/log/hb.log       # Heartbeat logfile.
debugfile /var/log/heartbeat-debug.log # Debugging logfile.


Primary
--------

May  6 23:04:44 joe heartbeat: [4342]: WARN: node stewie: is dead
May 6 23:04:44 joe heartbeat: [4342]: WARN: No STONITH device configured. May 6 23:04:44 joe heartbeat: [4342]: WARN: Shared disks are not protected. May 6 23:04:44 joe heartbeat: [4342]: info: Resources being acquired from stewie.
May  6 23:04:44 joe heartbeat: [4342]: info: Link stewie:eth0 dead.
May 6 23:04:44 joe heartbeat: [4249]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL May 6 23:04:44 joe mach_down[4283]: [4328]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired May 6 23:04:44 joe heartbeat: [4342]: info: mach_down takeover complete. May 6 23:04:44 joe heartbeat: [4342]: debug: StartNextRemoteRscReq(): child count 1 May 6 23:04:44 joe heartbeat: [4250]: info: Local Resource acquisition completed.


Secondary
-----------

May 6 23:04:46 stewie heartbeat: [21820]: info: Resources being acquired from joe.
May  6 23:04:46 stewie heartbeat: [21820]: info: Link joe:eth0 dead.
May 6 23:04:46 stewie heartbeat: [4946]: info: No local resources [/usr/lib/heartbeat/ResourceManager listkeys stewie] to acquire. May 6 23:04:46 stewie heartbeat: [21825]: ERROR: MSG[4] : [info=req_our_resources()] May 6 23:05:10 stewie mach_down[4953]: [6063]: info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired May 6 23:05:10 stewie heartbeat: [21820]: info: mach_down takeover complete. May 6 23:05:10 stewie heartbeat: [21825]: ERROR: MSG[2] : [info=mach_down]

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to