We have a mixed virtual chassis of two EX4500s and two EX4200s. They are connected to two NetApp filers. Each filer has a LACP aggregate to the VC consisting of two 10-Gig links to each of the 4500s (so four xe interfaces in each one). Once things are up and running, it works fine, but things do not always come up cleanly after one of the filers does a "hand back" or reboots.
The problem happens most times, but not every time. It happens with both controllers. It does not happen to the same physical link in a bundle each time, and it does not happen only with links associated with one of the 4500 chassis. That seems to imply a software problem, not physical. The trouble is one of the links in a bundle will end up stuck in the "Defaulted" state as seen from "show lacp interfaces" output. The symptom seen to the network users is that connectivity to specific machines on a network are lost, something like the host with 192.168.2.100 is reachable, but 192.168.2.99 is not. I think this has to do with the hashing to chose a link in the LACP. The combinations that get sent to the "Defaulted" link are being lost, while others work. >From the Juniper EX side, the problem looks like the system is not receiving any LACPDUs on the affected link. The "show lacp statistics interfaces" counters are not incrementing for "Rx" PDUs. However, we have not been able to determine whether the problem is that the NetApp is not sending PDUs, or the Juniper is not processing them. Recovery from the condition is easy. On the switch side, the interface in the Defaulted state is manually downed and upped, # ifconfig xe-0/0/6 down # ifconfig xe-0/0/6 up And the LACP happily completes proper negotiations. We have been trying to work with JTAC and NetApp support. The problem has been finding downtime to reboot the filers. Both Juniper and NetApp have said they have seen issues like this, but they were resolved by specifying the following settings for the aggregate interface on the switch-side, aggregated-ether-options { lacp { active; periodic slow; } } To make the EX switch match the NetApp's defaults (defaults that cannot be changed on their side). But this did not solve the problem for us. Has anyone here seen LACP problems with NetApp or other vendors? The plan, if we ever get the chance to do some troubleshooting, is to do analyzer captures to see what's happening with the LACPDUs. In the mean time, we were trying to also think of a reliable way to automate the reset of interfaces in the bundles if they fall into the "Defaulted" state. _______________________________________________ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp