On Apr 8, 2009, at 10:39 PM, Jason King wrote:
> On Wed, Apr 8, 2009 at 11:11 PM, Nicolas Droux
> <Nicolas.Droux at sun.com> wrote:
>>
>> On Apr 7, 2009, at 9:43 PM, Jason King wrote:
>>
>>> On Fri, Mar 13, 2009 at 5:23 PM, Nicolas Droux <droux at sun.com>
>>> wrote:
>>>>
>>>> On Mar 12, 2009, at 6:02 AM, James Carlson wrote:
>>>>
>>>>> Nicolas Droux writes:
>>>>>>
>>>>>> One approach would be to send/receive the PDUs through the
>>>>>> aggregation. On the receive side the PDUs received by the ports
>>>>>> should
>>>>>> be already passed up. On the send side the aggregation could
>>>>>> detect
>>>>>> the LLDP PDUs (this may not be a big issue since aggr needs to
>>>>>> parse
>>>>>> the packet headers to compute the hash for port selection
>>>>>> anyway), and
>>>>>> when a LLDP PDU is being sent, send a corresponding PDU across
>>>>>> all
>>>>>> members of the aggregation. How tricky this gets could depend
>>>>>> on the
>>>>>> contents of the PDUs for aggregation links. This is not
>>>>>> particularly
>>>>>> pretty, but at least the kernel changes would be contained in
>>>>>> aggr
>>>>>> itself.
>>>>>
>>>>> The LLDPDUs sent are different on each port -- at least the Port
>>>>> ID
>>>>> TLV needs to be different. That means that the aggr code would
>>>>> need
>>>>> to parse the TLVs, find the right one (fortunately, there's a
>>>>> mandatory ordering), and modify it for sending on each of the
>>>>> links in
>>>>> the aggregation.
>>>>>
>>>>> On receive, the listener will need to know which underlying port
>>>>> received a given LLDPDU so that it can keep them straight.
>>>>>
>>>>> I suppose it's possible to do this, but I'm not sure how viable
>>>>> that
>>>>> design would be. I think it'd be much better to provide a way
>>>>> to get
>>>>> real per-port access. You're going to need it anyway if you
>>>>> implement
>>>>> an 802.1X authenticator.
>>>>
>>>> For performance reasons the aggregation owns the underlying MAC,
>>>> and the
>>>> MAC
>>>> layer corresponding to the aggregated ports is bypassed on the
>>>> data-path
>>>> (today we have full bypass on RX and partial bypass on TX, full
>>>> TX bypass
>>>> is
>>>> planned as well). This doesn't allow other clients to have direct
>>>> access
>>>> to
>>>
>>> Can you explain this a bit more? While I've done work on either end
>>> of the stack (DLPI clients, and NIC drivers using gldv3), but
>>> nothing
>>> inbetween, so I've been trying to figure out how it all interacts.
>>>
>>> Do you mean the mac instance the aggr creates for itself is
>>> bypassed?
>>> I.e. On rx it goes from driver->mac->dls, instead of
>>> driver->mac->aggr->mac->dls ?
>>> If so, does this imply LACP has to be off (tangentially related, but
>>> just trying to get the big picture).
>>
>> It's the MAC layer associated with the port itself that's bypassed,
>> since we
>> still need to have aggr on the receive path for LACP, and on TX for
>> port
>> selection. So instead of driver->mac->aggr->mac->dls, we have
>> driver->aggr->mac->dls (actually dls is also bypassed on the data-
>> path for
>> the most common cases, but you get the drift.)
>>
>> In order to do this, MAC provides mac_hwring*() functions to aggr.
>> Aggr uses
>> these functions to obtain the list of the rings of the underlying
>> NIC, and
>> exposes corresponding pseudo-rings to the MAC layer above it. These
>> functions also rewire the port's MAC data-path, both on the
>> interrupt and
>> polling path, to enable the bypass.
>>
>> Hope this helps,
>
> Yes, I was suspecting something like that, but I've only glanced
> through the crossbow design docs a long time ago, so I'm not familiar
> with all the updates it's done to the mac later (yet).
>
> What I'm wondering as a solution for LLDP (and 802.1x) is perhaps a
> private callback function for rx, something conceptually like (the
> actual types are probably off a bit since I don't have the source in
> front of me, and names are just there as placeholders):
>
> typedef enum { AGGR_CB_LLDP, AGGR_CB_8021X } aggr_cb_proto_t;
> typedef void (*aggr_per_port_cb_t)(aggr_grp_t *grp, datalink_id_t
> link, mblk_t *mp, void *cookie);
> int aggr_add_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
> aggr_per_port_cb_t cb, void *cookie);
> int aggr_del_proto_cb(aggr_grp_t *grp, aggr_cb_proto_t proto,
> aggr_per_port_cb_t cb);
> int aggr_tx_port(aggr_grp_t *grp, datalink_id_t link, mblk_t *mp);
>
> Though it has the disadvantage of complicating add/remove of links,
> and port up/down events. But as a starting point, I'd like to get
> feedback/have holes shot in it/etc.
Jason,
This could work. It would be nice to prototype this to validate the
approach. If this becomes a large effort in itself, something to
consider would be to handle the aggregated ports through a separate
RFE/project at a later time. This would allow other protocols which
rely on LLDP to start their development (e.g. DCBX) and users to start
experimenting with this feature.
BTW did you have a chance to look into how the DCBX support would be
layered on top of your LLDP implementation? Since DCBX is used by
FCoE, it would be interesting to see how these pieces can fit together.
Nicolas.
--
Nicolas Droux - Solaris Kernel Networking - Sun Microsystems, Inc.
droux at sun.com - http://blogs.sun.com/droux