Re: [Linux-HA] Primary keeps IP but secondary gets it too...

Rob Morin Thu, 08 May 2008 08:05:35 -0700

OK forget that, these 2 databases were using InnoDB which does not use.myd and .myi files, sorry for the mix up...


please accept my applogies...


Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Rob Morin wrote:

Hehehe.. so i changed the config to bcast eth0 eth1

did a /etc/init.d/heartbeat reload
and not realizing this initiated a restart on the heartbeats, ithought it would just reload the configs, and think , well hey i haveto go to another eth now.... but it brought down the primary and tehsecondary took over flawlessy without my knowing..... then an hourlater i simply brought heartbeat down and then up on the primary andeverything came back... wow... i was impressed that i go no phonecalls at all from clients! :)
However i noticed only one thing, it can be dangerous.... mysql seemedto have lost a few files in its /var/www/mysql?? when the secondarytook over the freshly mounted system was missing some .MYD files forsome tables...... there were only 2 sites affected in a minor way,but what happened to these files? When i brought back the primarythese files were still not there( this is normal i guess) however theerrors for the site disappear but the files were still not there...???
Any ideas or suggestions?
Thanks again for all your help... it was exciting to see it workunexpectedly.... :)
p.s. my haresources file in case you were wondering, i thought itmight be due to start up where teh database files are not there(notmounted) but mysql starts...
joe IPaddr::xx.xx.xx.150 drbddisk::mail drbddisk::web \
Filesystem::/dev/drbd0::/var/mail/virtual::ext3::defaults \
Filesystem::/dev/drbd1::/var/www::ext3::defaults \
postfix courier-authdaemon courier-pop courier-imap mysql apache2 proftpd

Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Madd Sauer wrote:
Hello,
On Thu, May 08, 2008 at 08:53:39AM -0400, Rob Morin wrote:
Actually another question....
I would simply add eth1 to the heartbeat ha.cf then? and whats thediff between using mcast vs bcast? I am not sure i understand this ?
mcast = multicast
if your router supports multicast-routing this packes were routed

bcast = broadcast
broadcasts will NEVER routed.

ucast = unicast
that's my choice, broadcast are mostly trash on the net (imho) and i use
always non-bcast if I can. unicast ist also routed traffic.

Madd
Thanks a bunch
:)


Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444



Dominik Klein wrote:
Rob Morin wrote:
I have not seen my original email get to the list yet... but afterlooking through the logs i see this on each node...
see below for log excerts...
My test involved bringing down eth0 only(heartbeat & replication),should i have also brought down eth1 the public side of Joe(primary)
my conf file is...

logfacility     daemon        # This is deprecated
keepalive 2 # Interval between heartbeat (HB)packets.deadtime 60 # How quickly HB determines a deadnode.
warntime 5                    # Time HB will issue a late HB.
initdead 120 # Time delay needed by HB to reporta dead node.udpport 694 # UDP port HB uses to communicatebetween nodes.#ping 192.168.5.1 # Ping VMware Server host tosimulate network resource.
bcast eth0
You only use one connection for heartbeat communication. That is aconfiguration error.
As you unplugged that interface for testing, you forced asplitbrain situation. Read http://www.linux-ha.org/SplitBrain
Dual split brain so to speak. Your drbd replication is also doneover this link. So not only does heartbeat loose connection, butalso does drbd. In a standard setup, a not connected secondary drbddevice can be promoted disregarding the peer's drbd state.
You might want to read about dopd, too:http://www.drbd.org/users-guide/s-heartbeat-dopd.htmlIt can prevent drbd splitbrain, but you need to have >1 networkconnection anyways.
#baud 115200
#serial /dev/ttyS0 # Which interface to use for HBpackets.
coredumps true
auto_failback on # Auto promotion of primary node uponreturn to cluster.
Your comment answers your later question on what will happen when arebooted (stonith'd) node rejoins the cluster.
Regards
Dominik
node    joe      # Node name must be same as uname -n.
node    stewie      # Node name must be same as uname -n.
###
###
respawn hacluster /usr/lib/heartbeat/ipfail
# Specifies which programs to run at startup
# DO not use the below unless you use the/var/lib/heartbeat/crm/cib/xml config file instead
#crm on
use_logd yes                  # Use system logging.
logfile /var/log/hb.log       # Heartbeat logfile.
debugfile /var/log/heartbeat-debug.log # Debugging logfile.


Primary
--------

May  6 23:04:44 joe heartbeat: [4342]: WARN: node stewie: is dead
May 6 23:04:44 joe heartbeat: [4342]: WARN: No STONITH deviceconfigured.May 6 23:04:44 joe heartbeat: [4342]: WARN: Shared disks are notprotected.May 6 23:04:44 joe heartbeat: [4342]: info: Resources beingacquired
>from stewie.
May  6 23:04:44 joe heartbeat: [4342]: info: Link stewie:eth0 dead.
May 6 23:04:44 joe heartbeat: [4249]: debug: notify_world:setting SIGCHLD Handler to SIG_DFLMay 6 23:04:44 joe mach_down[4283]: [4328]: info:/usr/lib/heartbeat/mach_down: nice_failback: foreign resourcesacquiredMay 6 23:04:44 joe heartbeat: [4342]: info: mach_down takeovercomplete.May 6 23:04:44 joe heartbeat: [4342]: debug:StartNextRemoteRscReq(): child count 1May 6 23:04:44 joe heartbeat: [4250]: info: Local Resourceacquisition completed.
Secondary
-----------
May 6 23:04:46 stewie heartbeat: [21820]: info: Resources beingacquired from joe.
May  6 23:04:46 stewie heartbeat: [21820]: info: Link joe:eth0 dead.
May 6 23:04:46 stewie heartbeat: [4946]: info: No local resources[/usr/lib/heartbeat/ResourceManager listkeys stewie] to acquire.May 6 23:04:46 stewie heartbeat: [21825]: ERROR: MSG[4] :[info=req_our_resources()]May 6 23:05:10 stewie mach_down[4953]: [6063]: info:/usr/lib/heartbeat/mach_down: nice_failback: foreign resourcesacquiredMay 6 23:05:10 stewie heartbeat: [21820]: info: mach_downtakeover complete.May 6 23:05:10 stewie heartbeat: [21825]: ERROR: MSG[2] :[info=mach_down]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Primary keeps IP but secondary gets it too...

Reply via email to