Hello, On Thu, May 08, 2008 at 08:53:39AM -0400, Rob Morin wrote: > Actually another question.... > > I would simply add eth1 to the heartbeat ha.cf then? and whats the diff > between using mcast vs bcast? I am not sure i understand this ?
mcast = multicast if your router supports multicast-routing this packes were routed bcast = broadcast broadcasts will NEVER routed. ucast = unicast that's my choice, broadcast are mostly trash on the net (imho) and i use always non-bcast if I can. unicast ist also routed traffic. Madd > Thanks a bunch > :) > > > Rob Morin > Dido Internet Inc. > Montreal,Canada > http://www.dido.ca > 514-990-4444 > > > > Dominik Klein wrote: > >Rob Morin wrote: > >>I have not seen my original email get to the list yet... but after > >>looking through the logs i see this on each node... > >>see below for log excerts... > >> > >>My test involved bringing down eth0 only(heartbeat & replication), > >>should i have also brought down eth1 the public side of Joe(primary) > >> > >>my conf file is... > >> > >>logfacility daemon # This is deprecated > >>keepalive 2 # Interval between heartbeat (HB) packets. > >>deadtime 60 # How quickly HB determines a dead node. > >>warntime 5 # Time HB will issue a late HB. > >>initdead 120 # Time delay needed by HB to report a > >>dead node. > >>udpport 694 # UDP port HB uses to communicate > >>between nodes. > >>#ping 192.168.5.1 # Ping VMware Server host to simulate > >>network resource. > >>bcast eth0 > > > >You only use one connection for heartbeat communication. That is a > >configuration error. > > > >As you unplugged that interface for testing, you forced a splitbrain > >situation. Read http://www.linux-ha.org/SplitBrain > > > >Dual split brain so to speak. Your drbd replication is also done over > >this link. So not only does heartbeat loose connection, but also does > >drbd. In a standard setup, a not connected secondary drbd device can > >be promoted disregarding the peer's drbd state. > > > >You might want to read about dopd, too: > >http://www.drbd.org/users-guide/s-heartbeat-dopd.html > >It can prevent drbd splitbrain, but you need to have >1 network > >connection anyways. > > > >>#baud 115200 > >>#serial /dev/ttyS0 # Which interface to use for HB packets. > >>coredumps true > >>auto_failback on # Auto promotion of primary node upon > >>return to cluster. > > > >Your comment answers your later question on what will happen when a > >rebooted (stonith'd) node rejoins the cluster. > > > >Regards > >Dominik > > > >>node joe # Node name must be same as uname -n. > >>node stewie # Node name must be same as uname -n. > >>### > >>### > >>respawn hacluster /usr/lib/heartbeat/ipfail > >># Specifies which programs to run at startup > >># DO not use the below unless you use the > >>/var/lib/heartbeat/crm/cib/xml config file instead > >>#crm on > >>use_logd yes # Use system logging. > >>logfile /var/log/hb.log # Heartbeat logfile. > >>debugfile /var/log/heartbeat-debug.log # Debugging logfile. > >> > >> > >>Primary > >>-------- > >> > >>May 6 23:04:44 joe heartbeat: [4342]: WARN: node stewie: is dead > >>May 6 23:04:44 joe heartbeat: [4342]: WARN: No STONITH device > >>configured. > >>May 6 23:04:44 joe heartbeat: [4342]: WARN: Shared disks are not > >>protected. > >>May 6 23:04:44 joe heartbeat: [4342]: info: Resources being acquired > >>from stewie. > >>May 6 23:04:44 joe heartbeat: [4342]: info: Link stewie:eth0 dead. > >>May 6 23:04:44 joe heartbeat: [4249]: debug: notify_world: setting > >>SIGCHLD Handler to SIG_DFL > >>May 6 23:04:44 joe mach_down[4283]: [4328]: info: > >>/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired > >>May 6 23:04:44 joe heartbeat: [4342]: info: mach_down takeover > >>complete. > >>May 6 23:04:44 joe heartbeat: [4342]: debug: > >>StartNextRemoteRscReq(): child count 1 > >>May 6 23:04:44 joe heartbeat: [4250]: info: Local Resource > >>acquisition completed. > >> > >> > >>Secondary > >>----------- > >> > >>May 6 23:04:46 stewie heartbeat: [21820]: info: Resources being > >>acquired from joe. > >>May 6 23:04:46 stewie heartbeat: [21820]: info: Link joe:eth0 dead. > >>May 6 23:04:46 stewie heartbeat: [4946]: info: No local resources > >>[/usr/lib/heartbeat/ResourceManager listkeys stewie] to acquire. > >>May 6 23:04:46 stewie heartbeat: [21825]: ERROR: MSG[4] : > >>[info=req_our_resources()] > >>May 6 23:05:10 stewie mach_down[4953]: [6063]: info: > >>/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired > >>May 6 23:05:10 stewie heartbeat: [21820]: info: mach_down takeover > >>complete. > >>May 6 23:05:10 stewie heartbeat: [21825]: ERROR: MSG[2] : > >>[info=mach_down] > > > >_______________________________________________ > >Linux-HA mailing list > >[email protected] > >http://lists.linux-ha.org/mailman/listinfo/linux-ha > >See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
