Hi Ben,

On Dec 13, 2006, at 4:02 AM, Ben Rockwood wrote:

I've got a network on which a number of Galaxy systems are using link aggregation on 2 e1000g interfaces. It works well. And thats what puzzles me. None of the switches were configured to aggregate ports, LACP isn't enabled on the switch or on the systems, and yet the aggr's seem fine, we're not seeing errors on the ports, etc. It seems to me that this shouldn't work.

Can anyone help me understand why aggr's work without the switch having any knowledge of the configuration on the system? Does this present problems that I haven't yet encountered or perhaps simply don't know to look for? Given that we're using Dell PowerConect switches which limit aggr groups to 8 even on a 48 port switch I'm inclined to leave the setup alone unless I find a good reason to stop aggregating in this way. Inquiring minds want to know.

It appears to be working, but there might be side effects which are not obviously visible but can bite later. Let me try to describe what happens in this case and these possible problems:

When LACP is turned off, Solaris assumes that a port is part of the aggregation as soon as its link is up and its link speed and duplex status compatible with the other ports of the aggregation. Packets will be sent through the members of the aggregation according to the configured outbound port policy. The switch will receive these packets. Unicast packets will be delivered by the switch, but a first problem is that broadcast packets sent by one of the constituent of the aggregation will be sent by the switch to the other members of the aggregation. This will cause unexpected broadcast packets to be received by the host, which can cause problems such as these to show up in your logs:

Feb 16 15:01:00 host ip: [ID 903730 kern.warning] WARNING: IP: Hardware address 'xx:xx:xx:xx:xx:xx' trying to be our address yyy.yyy.yyy.yyy!

On the path to the Solaris host, the misconfigured switch doesn't know that the ports connected to the host are aggregated. However the switch saw packets with a source address corresponding to the aggregation MAC address coming from these ports. So the switch will pick one of these ports to send packets to the host. Even though packets appear to be flowing, it can be problematic since (a) duplicate broadcast packets will be sent to the host, (b) traffic might not be properly spread through the different ports of the aggregation, and (c) packets for the same connection could arrive from different NICs, which can cause reordering of packets, and a mismatch between the interrupted CPUs and the CPUs to which squeues are bound.

These side effects can cause performance problems, hard to diagnose error messages, and in general suboptimal use of the hardware resources.

With LACP, such misconfigurations are a lot easier to detect, since ports will not be enabled until the aggregation successfully completes the LACP exchange with the remote peer. To avoid cases like this, we're planning to enable LACP by default (see 6433652).

Let me know you have further questions.

Nicolas.

--
Nicolas Droux - Solaris Networking - Sun Microsystems, Inc.
[EMAIL PROTECTED] - http://blogs.sun.com/droux



_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to