Hi!

The cluster consist of 2 nodes: db1 and db2 using Squeeze backports
(heartbeat 1:3.0.4-1~bpo60+1). Heartbeat is configured to use all 3
available links:

/etc/heartbeat/ha.cf:
logfacility     local0
bcast eth0
bcast eth1
bcast eth2
auto_failback on
node db1
node db2
crm respawn


The network config is:
             db1             db2
eth0    10.10.101.7/24  10.10.101.8/24
eth1    192.168.0.1/24  192.168.0.2/24
eth2    192.168.1.1/24  192.168.1.2/24


Then, for tests, I shut down eth2 on db1: ifdown eth2

After some time both sides detected the link eth2 to be "dead".

The I re-enabled the network: ifup eth2.

After some time, the cluster on db1 reported all links OK but on db2 the
eth2 link was still "dead".

The problem is, that the heartbeat from db1 to db2 was net sent, as db1
tried to ARP-resolve 192.168.1.255: When I sniffed on eth2 I saw ARP
requests: "Who has 192.168.1.255? tell 192.168.1.1."

It is of course total crap to send ARP requests for the broadcast IP
address. The network configuration is correct. I have no idea if this is
a bug in heartbeat or in the Linux network stack. I also tried sending
PINGs to the broadcast IP address during ifdown/ifup and PING's behavior
was correct - thus maybe it is a bug in heartbeat.

Also heartbeat did not recover (I tried several minutes). I had to stop
heartbeat on db1, ifdown eth2, ifup eth2, start heartbeat to resolve the
problem.

Any ideas what is going wrong?

Thanks
Klaus
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to