[j-nsp] LACP to NetApp

2013-02-19 Thread Crist Clark
We have a mixed virtual chassis of two EX4500s and two EX4200s. They
are connected to
two NetApp filers. Each filer has a LACP aggregate to the VC
consisting of two 10-Gig links
to each of the 4500s (so four xe interfaces in each one). Once things
are up and running,
it works fine, but things do not always come up cleanly after one of
the filers does a
hand back or reboots.

The problem happens most times, but not every time. It happens with
both controllers. It
does not happen to the same physical link in a bundle each time, and
it does not happen
only with links associated with one of the 4500 chassis. That seems to
imply a software
problem, not physical.

The trouble is one of the links in a bundle will end up stuck in the
Defaulted state as
seen from show lacp interfaces output. The symptom seen to the
network users is that
connectivity to specific machines on a network are lost, something
like the host with
192.168.2.100 is reachable, but 192.168.2.99 is not. I think this has
to do with the hashing
to chose a link in the LACP. The combinations that get sent to the
Defaulted link are
being lost, while others work.

From the Juniper EX side, the problem looks like the system is not
receiving any LACPDUs
on the affected link. The show lacp statistics interfaces counters
are not incrementing for
Rx PDUs. However, we have not been able to determine whether the
problem is that the
NetApp is not sending PDUs, or the Juniper is not processing them.

Recovery from the condition is easy. On the switch side, the interface
in the Defaulted
state is manually downed and upped,

  # ifconfig xe-0/0/6 down
  # ifconfig xe-0/0/6 up

And the LACP happily completes proper negotiations.

We have been trying to work with JTAC and NetApp support. The problem
has been finding
downtime to reboot the filers.

Both Juniper and NetApp have said they have seen issues like this, but
they were resolved by
specifying the following settings for the aggregate interface on the
switch-side,

aggregated-ether-options {
lacp {
active;
periodic slow;
}
}

To make the EX switch match the NetApp's defaults (defaults that
cannot be changed on their
side). But this did not solve the problem for us.

Has anyone here seen LACP problems with NetApp or other vendors? The
plan, if we ever get
the chance to do some troubleshooting, is to do analyzer captures to
see what's happening
with the LACPDUs. In the mean time, we were trying to also think of a
reliable way to automate
the reset of interfaces in the bundles if they fall into the Defaulted state.
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp


Re: [j-nsp] LACP to NetApp

2013-02-19 Thread JP Velders

 Date: Tue, 19 Feb 2013 10:43:07 -0800
 From: Crist Clark cjc+j-...@pumpky.net
 Subject: [j-nsp] LACP to NetApp

 aggregated-ether-options {
 lacp {
 active;
 periodic slow;
 }
 }

I prefer to always set fast  active on either side. So far it has 
avoided and fixed more issues then all the vendors telling me I 
shouldn't have two sides active...

 Has anyone here seen LACP problems with NetApp or other vendors?

Did see a weird issues once with a Nexus vPC and NetApp. Nothing so 
far with MixedMode EX VC and various NetApps, LACP on GE's and 10GE's. 
Have you tried a monitor traffic session on the EX VC to see if you 
do or do not see LACPDU's ?

Also, remember the NetApp LAG (ifgrp) commands can also give you an 
idea of what it is thinking/believing about it all... Especially 
nested groups have an actual traffic test built-in it seems...

Kind regards,
JP Velders
___
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp