Hi Ben,
On Dec 13, 2006, at 4:02 AM, Ben Rockwood wrote:
I've got a network on which a number of Galaxy systems are using
link aggregation on 2 e1000g interfaces. It works well. And thats
what puzzles me. None of the switches were configured to aggregate
ports, LACP isn't enabled on the switch or on the systems, and yet
the aggr's seem fine, we're not seeing errors on the ports, etc.
It seems to me that this shouldn't work.
Can anyone help me understand why aggr's work without the switch
having any knowledge of the configuration on the system? Does this
present problems that I haven't yet encountered or perhaps simply
don't know to look for? Given that we're using Dell PowerConect
switches which limit aggr groups to 8 even on a 48 port switch I'm
inclined to leave the setup alone unless I find a good reason to
stop aggregating in this way. Inquiring minds want to know.
It appears to be working, but there might be side effects which are
not obviously visible but can bite later. Let me try to describe what
happens in this case and these possible problems:
When LACP is turned off, Solaris assumes that a port is part of the
aggregation as soon as its link is up and its link speed and duplex
status compatible with the other ports of the aggregation. Packets
will be sent through the members of the aggregation according to the
configured outbound port policy. The switch will receive these
packets. Unicast packets will be delivered by the switch, but a first
problem is that broadcast packets sent by one of the constituent of
the aggregation will be sent by the switch to the other members of
the aggregation. This will cause unexpected broadcast packets to be
received by the host, which can cause problems such as these to show
up in your logs:
Feb 16 15:01:00 host ip: [ID 903730 kern.warning] WARNING: IP:
Hardware address 'xx:xx:xx:xx:xx:xx' trying to be our address
yyy.yyy.yyy.yyy!
On the path to the Solaris host, the misconfigured switch doesn't
know that the ports connected to the host are aggregated. However the
switch saw packets with a source address corresponding to the
aggregation MAC address coming from these ports. So the switch will
pick one of these ports to send packets to the host. Even though
packets appear to be flowing, it can be problematic since (a)
duplicate broadcast packets will be sent to the host, (b) traffic
might not be properly spread through the different ports of the
aggregation, and (c) packets for the same connection could arrive
from different NICs, which can cause reordering of packets, and a
mismatch between the interrupted CPUs and the CPUs to which squeues
are bound.
These side effects can cause performance problems, hard to diagnose
error messages, and in general suboptimal use of the hardware resources.
With LACP, such misconfigurations are a lot easier to detect, since
ports will not be enabled until the aggregation successfully
completes the LACP exchange with the remote peer. To avoid cases like
this, we're planning to enable LACP by default (see 6433652).
Let me know you have further questions.
Nicolas.
--
Nicolas Droux - Solaris Networking - Sun Microsystems, Inc.
[EMAIL PROTECTED] - http://blogs.sun.com/droux
_______________________________________________
networking-discuss mailing list
[email protected]