Hello everyone.
I have been reading the docs and testing heartbeat/drbd for a few weeks now and
I
must admit I don't understand how it works. In particular, how things are
decided
in regard to which node should be active et how drbd decides from which nodes to
resync from after a split brain. I'm sorry if this is a newbe question but after
spending so much time, I would appreciate a little help.
First, here is my setup. I have two RHEL5 nodes which have two network cards
and running
heartbeat-2.99.2-8.1.x86_64.rpm and drbd-8.3.1. Both eth1 NIC are connected
through
a private vlan which is used for drbd communications and heartbeat. Both eth0
NIC are
connected to the network which is used by heartbeat and client access. I run
ipfail
with a ping group of two addresses: the router and a client workstation on the
eth0
network. I start with a very basic config for each software:
drbd.conf:
global {
usage-count yes;
}
common {
protocol C;
syncer {
rate 6M;
al-extents 257;
verify-alg crc32c;
csums-alg crc32c;
}
}
resource r0 {
device /dev/drbd_r0 minor 0;
disk /dev/sdb1;
meta-disk internal;
on rhel5-hb1.localdomain {
address ipv4 192.168.79.193:7790;
}
on rhel5-hb2.localdomain {
address ipv4 192.168.79.192:7790;
}
}
ha.cf (R1 config):
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
deadtime 60
warntime 10
initdead 60
udpport 694
ucast eth0 192.168.2.193 # rhel5-hb1
ucast eth0 192.168.2.192 # rhel5-hb2
ucast eth1 192.168.79.193 # rhel5-hb1
ucast eth1 192.168.79.192 # rhel5-hb2
auto_failback off
watchdog /dev/watchdog
node rhel5-hb1.localdomain
node rhel5-hb2.localdomain
ping_group group-r0 192.168.2.1 192.168.2.7
respawn hacluster /usr/lib64/heartbeat/ipfail
apiauth ipfail gid=haclient uid=hacluster
deadping 30
debug 1
haressources:
rhel5-hb1.localdomain drbddisk::r0 Mount::/mnt/r0 192.168.2.15 Test-r0
If I disconnect both NIC on the active, I can see the standby node taking the
resources and starting the service, which is what I expect. I would expect
the active to stop the service and relinquish the resources since none of
the addresses in the ping group are accessible, which is not what happens.
Instead, both nodes have the resources in standalone.
I believe that in the absence of communications, in a two nodes setup,
the partition that can ping the ping group should become active and the
partition that can't ping the ping group should go on standby (I am assuming
that if communications are interrupted, both nodes can't ping the ping group).
I would expect this to be the default behavior but maybe I'm too naive. I
tried to configure the watchdog device but I can't seem to make it work.
Does both nodes becoming active the result of a misconfiguration on my part or
is it normal and I just don't understand how HA works ?
Thanks.
Patrick.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems