Hehehe.. so i changed the config to bcast eth0 eth1
did a /etc/init.d/heartbeat reload
and not realizing this initiated a restart on the heartbeats, i thought
it would just reload the configs, and think , well hey i have to go to
another eth now.... but it brought down the primary and teh secondary
took over flawlessy without my knowing..... then an hour later i simply
brought heartbeat down and then up on the primary and everything came
back... wow... i was impressed that i go no phone calls at all from
clients! :)
However i noticed only one thing, it can be dangerous.... mysql seemed
to have lost a few files in its /var/www/mysql?? when the secondary
took over the freshly mounted system was missing some .MYD files for
some tables...... there were only 2 sites affected in a minor way, but
what happened to these files? When i brought back the primary these
files were still not there( this is normal i guess) however the errors
for the site disappear but the files were still not there...???
Any ideas or suggestions?
Thanks again for all your help... it was exciting to see it work
unexpectedly.... :)
p.s. my haresources file in case you were wondering, i thought it might
be due to start up where teh database files are not there(not mounted)
but mysql starts...
joe IPaddr::xx.xx.xx.150 drbddisk::mail drbddisk::web \
Filesystem::/dev/drbd0::/var/mail/virtual::ext3::defaults \
Filesystem::/dev/drbd1::/var/www::ext3::defaults \
postfix courier-authdaemon courier-pop courier-imap mysql apache2 proftpd
Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444
Madd Sauer wrote:
Hello,
On Thu, May 08, 2008 at 08:53:39AM -0400, Rob Morin wrote:
Actually another question....
I would simply add eth1 to the heartbeat ha.cf then? and whats the diff
between using mcast vs bcast? I am not sure i understand this ?
mcast = multicast
if your router supports multicast-routing this packes were routed
bcast = broadcast
broadcasts will NEVER routed.
ucast = unicast
that's my choice, broadcast are mostly trash on the net (imho) and i use
always non-bcast if I can. unicast ist also routed traffic.
Madd
Thanks a bunch
:)
Rob Morin
Dido Internet Inc.
Montreal,Canada
http://www.dido.ca
514-990-4444
Dominik Klein wrote:
Rob Morin wrote:
I have not seen my original email get to the list yet... but after
looking through the logs i see this on each node...
see below for log excerts...
My test involved bringing down eth0 only(heartbeat & replication),
should i have also brought down eth1 the public side of Joe(primary)
my conf file is...
logfacility daemon # This is deprecated
keepalive 2 # Interval between heartbeat (HB) packets.
deadtime 60 # How quickly HB determines a dead node.
warntime 5 # Time HB will issue a late HB.
initdead 120 # Time delay needed by HB to report a
dead node.
udpport 694 # UDP port HB uses to communicate
between nodes.
#ping 192.168.5.1 # Ping VMware Server host to simulate
network resource.
bcast eth0
You only use one connection for heartbeat communication. That is a
configuration error.
As you unplugged that interface for testing, you forced a splitbrain
situation. Read http://www.linux-ha.org/SplitBrain
Dual split brain so to speak. Your drbd replication is also done over
this link. So not only does heartbeat loose connection, but also does
drbd. In a standard setup, a not connected secondary drbd device can
be promoted disregarding the peer's drbd state.
You might want to read about dopd, too:
http://www.drbd.org/users-guide/s-heartbeat-dopd.html
It can prevent drbd splitbrain, but you need to have >1 network
connection anyways.
#baud 115200
#serial /dev/ttyS0 # Which interface to use for HB packets.
coredumps true
auto_failback on # Auto promotion of primary node upon
return to cluster.
Your comment answers your later question on what will happen when a
rebooted (stonith'd) node rejoins the cluster.
Regards
Dominik
node joe # Node name must be same as uname -n.
node stewie # Node name must be same as uname -n.
###
###
respawn hacluster /usr/lib/heartbeat/ipfail
# Specifies which programs to run at startup
# DO not use the below unless you use the
/var/lib/heartbeat/crm/cib/xml config file instead
#crm on
use_logd yes # Use system logging.
logfile /var/log/hb.log # Heartbeat logfile.
debugfile /var/log/heartbeat-debug.log # Debugging logfile.
Primary
--------
May 6 23:04:44 joe heartbeat: [4342]: WARN: node stewie: is dead
May 6 23:04:44 joe heartbeat: [4342]: WARN: No STONITH device
configured.
May 6 23:04:44 joe heartbeat: [4342]: WARN: Shared disks are not
protected.
May 6 23:04:44 joe heartbeat: [4342]: info: Resources being acquired
>from stewie.
May 6 23:04:44 joe heartbeat: [4342]: info: Link stewie:eth0 dead.
May 6 23:04:44 joe heartbeat: [4249]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
May 6 23:04:44 joe mach_down[4283]: [4328]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
May 6 23:04:44 joe heartbeat: [4342]: info: mach_down takeover
complete.
May 6 23:04:44 joe heartbeat: [4342]: debug:
StartNextRemoteRscReq(): child count 1
May 6 23:04:44 joe heartbeat: [4250]: info: Local Resource
acquisition completed.
Secondary
-----------
May 6 23:04:46 stewie heartbeat: [21820]: info: Resources being
acquired from joe.
May 6 23:04:46 stewie heartbeat: [21820]: info: Link joe:eth0 dead.
May 6 23:04:46 stewie heartbeat: [4946]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys stewie] to acquire.
May 6 23:04:46 stewie heartbeat: [21825]: ERROR: MSG[4] :
[info=req_our_resources()]
May 6 23:05:10 stewie mach_down[4953]: [6063]: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
May 6 23:05:10 stewie heartbeat: [21820]: info: mach_down takeover
complete.
May 6 23:05:10 stewie heartbeat: [21825]: ERROR: MSG[2] :
[info=mach_down]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems