Re: [Linux-HA] Primary keeps IP but secondary gets it too...

Rob Morin Thu, 08 May 2008 07:48:17 -0700

Hehehe.. so i changed the config to bcast eth0 eth1

did a /etc/init.d/heartbeat reload

and not realizing this initiated a restart on the heartbeats, i thoughtit would just reload the configs, and think , well hey i have to go toanother eth now.... but it brought down the primary and teh secondarytook over flawlessy without my knowing..... then an hour later i simplybrought heartbeat down and then up on the primary and everything cameback... wow... i was impressed that i go no phone calls at all fromclients! :)

However i noticed only one thing, it can be dangerous.... mysql seemedto have lost a few files in its /var/www/mysql?? when the secondarytook over the freshly mounted system was missing some .MYD files forsome tables...... there were only 2 sites affected in a minor way, butwhat happened to these files? When i brought back the primary thesefiles were still not there( this is normal i guess) however the errorsfor the site disappear but the files were still not there...???


Any ideas or suggestions?

Thanks again for all your help... it was exciting to see it workunexpectedly.... :)

p.s. my haresources file in case you were wondering, i thought it mightbe due to start up where teh database files are not there(not mounted)but mysql starts...


joe IPaddr::xx.xx.xx.150 drbddisk::mail drbddisk::web \
Filesystem::/dev/drbd0::/var/mail/virtual::ext3::defaults \
Filesystem::/dev/drbd1::/var/www::ext3::defaults \
postfix courier-authdaemon courier-pop courier-imap mysql apache2 proftpd

Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Madd Sauer wrote:

Hello,
On Thu, May 08, 2008 at 08:53:39AM -0400, Rob Morin wrote:
Actually another question....
I would simply add eth1 to the heartbeat ha.cf then? and whats the diffbetween using mcast vs bcast? I am not sure i understand this ?
mcast = multicast
if your router supports multicast-routing this packes were routed

bcast = broadcast
broadcasts will NEVER routed.

ucast = unicast
that's my choice, broadcast are mostly trash on the net (imho) and i use
always non-bcast if I can. unicast ist also routed traffic.

Madd
Thanks a bunch
:)


Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Dominik Klein wrote:
Rob Morin wrote:
I have not seen my original email get to the list yet... but afterlooking through the logs i see this on each node...
see below for log excerts...
My test involved bringing down eth0 only(heartbeat & replication),should i have also brought down eth1 the public side of Joe(primary)
my conf file is...

logfacility     daemon        # This is deprecated
keepalive 2                   # Interval between heartbeat (HB) packets.
deadtime 60                   # How quickly HB determines a dead node.
warntime 5                    # Time HB will issue a late HB.
initdead 120 # Time delay needed by HB to report adead node.udpport 694 # UDP port HB uses to communicatebetween nodes.#ping 192.168.5.1 # Ping VMware Server host to simulatenetwork resource.
bcast eth0
You only use one connection for heartbeat communication. That is aconfiguration error.
As you unplugged that interface for testing, you forced a splitbrainsituation. Read http://www.linux-ha.org/SplitBrain
Dual split brain so to speak. Your drbd replication is also done overthis link. So not only does heartbeat loose connection, but also doesdrbd. In a standard setup, a not connected secondary drbd device canbe promoted disregarding the peer's drbd state.
You might want to read about dopd, too:http://www.drbd.org/users-guide/s-heartbeat-dopd.htmlIt can prevent drbd splitbrain, but you need to have >1 networkconnection anyways.
#baud 115200
#serial /dev/ttyS0              # Which interface to use for HB packets.
coredumps true
auto_failback on # Auto promotion of primary node uponreturn to cluster.
Your comment answers your later question on what will happen when arebooted (stonith'd) node rejoins the cluster.
Regards
Dominik
node    joe      # Node name must be same as uname -n.
node    stewie      # Node name must be same as uname -n.
###
###
respawn hacluster /usr/lib/heartbeat/ipfail
# Specifies which programs to run at startup
# DO not use the below unless you use the/var/lib/heartbeat/crm/cib/xml config file instead
#crm on
use_logd yes                  # Use system logging.
logfile /var/log/hb.log       # Heartbeat logfile.
debugfile /var/log/heartbeat-debug.log # Debugging logfile.


Primary
--------

May  6 23:04:44 joe heartbeat: [4342]: WARN: node stewie: is dead
May 6 23:04:44 joe heartbeat: [4342]: WARN: No STONITH deviceconfigured.May 6 23:04:44 joe heartbeat: [4342]: WARN: Shared disks are notprotected.May 6 23:04:44 joe heartbeat: [4342]: info: Resources being acquired
>from stewie.
May  6 23:04:44 joe heartbeat: [4342]: info: Link stewie:eth0 dead.
May 6 23:04:44 joe heartbeat: [4249]: debug: notify_world: settingSIGCHLD Handler to SIG_DFLMay 6 23:04:44 joe mach_down[4283]: [4328]: info:/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquiredMay 6 23:04:44 joe heartbeat: [4342]: info: mach_down takeovercomplete.May 6 23:04:44 joe heartbeat: [4342]: debug:StartNextRemoteRscReq(): child count 1May 6 23:04:44 joe heartbeat: [4250]: info: Local Resourceacquisition completed.
Secondary
-----------
May 6 23:04:46 stewie heartbeat: [21820]: info: Resources beingacquired from joe.
May  6 23:04:46 stewie heartbeat: [21820]: info: Link joe:eth0 dead.
May 6 23:04:46 stewie heartbeat: [4946]: info: No local resources[/usr/lib/heartbeat/ResourceManager listkeys stewie] to acquire.May 6 23:04:46 stewie heartbeat: [21825]: ERROR: MSG[4] :[info=req_our_resources()]May 6 23:05:10 stewie mach_down[4953]: [6063]: info:/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquiredMay 6 23:05:10 stewie heartbeat: [21820]: info: mach_down takeovercomplete.May 6 23:05:10 stewie heartbeat: [21825]: ERROR: MSG[2] :[info=mach_down]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Primary keeps IP but secondary gets it too...

Reply via email to