Hi Ben,
On Dec 13, 2006, at 4:02 AM, Ben Rockwood wrote:
I've got a network on which a number of Galaxy
systems are using
link aggregation on 2 e1000g interfaces. It works
well. And thats
what puzzles me. None of the switches were
configured to aggregate
ports, LACP isn't enabled on the switch or on the
systems, and yet
the aggr's seem fine, we're not seeing errors on
the ports, etc.
It seems to me that this shouldn't work.
Can anyone help me understand why aggr's work
without the switch
having any knowledge of the configuration on the
system? Does this
present problems that I haven't yet encountered or
perhaps simply
don't know to look for? Given that we're using
Dell PowerConect
switches which limit aggr groups to 8 even on a 48
port switch I'm
inclined to leave the setup alone unless I find a
good reason to
stop aggregating in this way. Inquiring minds want
to know.
It appears to be working, but there might be side effects which are
not obviously visible but can bite later. Let me try to describe
what happens in this case and these possible problems:
When LACP is turned off, Solaris assumes that a port is part of the
aggregation as soon as its link is up and its link speed and
duplex status compatible with the other ports of the aggregation.
Packets will be sent through the members of the aggregation
according to the configured outbound port policy. The switch will
receive these packets. Unicast packets will be delivered by the
switch, but a first problem is that broadcast packets sent by one
of the constituent of the aggregation will be sent by the switch to
the other members of the aggregation. This will cause unexpected
broadcast packets to be received by the host, which can cause
problems such as these to show up in your logs:
Feb 16 15:01:00 host ip: [ID 903730 kern.warning] WARNING: IP:
Hardware address 'xx:xx:xx:xx:xx:xx' trying to be our address
yyy.yyy.yyy.yyy!
On the path to the Solaris host, the misconfigured switch doesn't
know that the ports connected to the host are aggregated. However
the switch saw packets with a source address corresponding to the
aggregation MAC address coming from these ports. So the switch
will pick one of these ports to send packets to the host. Even
though packets appear to be flowing, it can be problematic since
(a) duplicate broadcast packets will be sent to the host, (b)
traffic might not be properly spread through the different ports of
the aggregation, and (c) packets for the same connection could
arrive from different NICs, which can cause reordering of packets,
and a mismatch between the interrupted CPUs and the CPUs to which
squeues are bound.
These side effects can cause performance problems, hard to diagnose
error messages, and in general suboptimal use of the hardware
resources.
With LACP, such misconfigurations are a lot easier to detect, since
ports will not be enabled until the aggregation successfully
completes the LACP exchange with the remote peer. To avoid cases
like this, we're planning to enable LACP by default (see 6433652).
Let me know you have further questions.
Nicolas.
-- Nicolas Droux - Solaris Networking - Sun Microsystems, Inc.
[EMAIL PROTECTED] - http://blogs.sun.com/droux
_______________________________________________ networking-discuss
mailing list [email protected]