gratuitous arps lost during IB switch failure

Sumeet Lahorani Tue, 21 Sep 2010 16:43:37 -0700


Hi All,

We are using dual ported HCAs connected with each port connected to 2different IB switches so that we can tolerate the failure of any one ofthose switches and we are trying to cut down the amount of time it takesfor traffic (TCP & RDS) to resume when there is an IB switch failure andthe hosts failover from one port to the other.

We have the bonding driver configured in active-backup mode and setup tosend out 100 gratuitous arps at intervals of 100ms whenever there is afailover. In most cases, traffic resumes within a few seconds after afailover because these gratuitous arps take care of updating all thenodes with the new IP:GID mapping.

The problem we are seeing is that sometimes, one or more of the nodes onthe fabric do not receive even 1 of these gratuitous arps andre-establishing communication with these nodes takes a much longer time(over 40 seconds) as it depends on various arp cache timeouts. Doesanyone know why all these gratuitous arps might be lost?

Besides the gratuitous arp settings, are there any other tunables tolook at to minimize the time it takes for IPoIB traffic to resume?


- Sumeet

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

gratuitous arps lost during IB switch failure

Reply via email to