Hi folks, This is my first post here, so don't be cruel ;)
I have a simple 2-nodes heartbeat (version 1.2.3) topology and it has been working without failures for more than 2 years so let me first congratulate to all who contributed to this nice piece of software. It's a simple active-passive topology with an IP and just one service takeover. Yesterday we faced a problem where the passive node wrongly detected that node1 had failed (80% probability due to CPU load, but still checking connectivity) and started acquiring the resources (public IP and service). Node2 hold therefore public IP (due to gratuitous ARP) and service. After the load decreased on node1, both nodes could "speak" and node2 realized that node1 was still alive. They started to move again to node1 both IP and service but since node1 had already them working (IP and service) it did not send the gratuitous ARP. The problem was that since node2 had sent the gratuitous arp, the router between the 2-nodes and the rest of the network kept the binding for the public IP with node 2's MAC so it was impossible to access the service until the router's ARP cache (one hour and a half) expired and tried again to refresh the binding. First action was to increase deadtime on heartbeat so punctual load problem does not expose again the same issue but I'm afraid it's not enough and I would like to ask you whether it would be safe to add a feature in ip_start function of IPaddr script so everytime it is called, no matter if the node is holding the IP, it send the gratuitous ARP. This way our problem will not happen again because we automatically refresh the router ARP table whenever we call ip_start function. I know I have an older version of heartbeat, but it has worked so far pretty well and I would prefer to stay with it because our topology is simple enough (2 nodes active-passive) to be properly managed by this version. Thanks in advance, Samuel Osorio. _______________________________________________ LinuxVirtualServer.org mailing list - [email protected] Send requests to [EMAIL PROTECTED] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
