Hi! The cluster consist of 2 nodes: db1 and db2 using Squeeze backports (heartbeat 1:3.0.4-1~bpo60+1). Heartbeat is configured to use all 3 available links:
/etc/heartbeat/ha.cf: logfacility local0 bcast eth0 bcast eth1 bcast eth2 auto_failback on node db1 node db2 crm respawn The network config is: db1 db2 eth0 10.10.101.7/24 10.10.101.8/24 eth1 192.168.0.1/24 192.168.0.2/24 eth2 192.168.1.1/24 192.168.1.2/24 Then, for tests, I shut down eth2 on db1: ifdown eth2 After some time both sides detected the link eth2 to be "dead". The I re-enabled the network: ifup eth2. After some time, the cluster on db1 reported all links OK but on db2 the eth2 link was still "dead". The problem is, that the heartbeat from db1 to db2 was net sent, as db1 tried to ARP-resolve 192.168.1.255: When I sniffed on eth2 I saw ARP requests: "Who has 192.168.1.255? tell 192.168.1.1." It is of course total crap to send ARP requests for the broadcast IP address. The network configuration is correct. I have no idea if this is a bug in heartbeat or in the Linux network stack. I also tried sending PINGs to the broadcast IP address during ifdown/ifup and PING's behavior was correct - thus maybe it is a bug in heartbeat. Also heartbeat did not recover (I tried several minutes). I had to stop heartbeat on db1, ifdown eth2, ifup eth2, start heartbeat to resolve the problem. Any ideas what is going wrong? Thanks Klaus _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems