Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On Sat, 26 Aug 2017 22:56:05 +0200 Andrew Lunnwrote: > This is a WIP patchset i would like comments on from bridge, switchdev > and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is > performed. With a pure software bridge, it is not required. All > mulitcast frames are passed to the brX interface, and the network > stack filters them, as it does for any interface. However, when > hardware offload is involved, things change. We should program the > hardware to only send multcast packets to the host when the host has > in interest in them. > > Thus we need to perform IGMP snooping on the brX interface, just like > any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. There > is no net_bridge_port structure for the brX interface. This strucuture > is created when an interface is added to the bridge. But the brX > interface is not a member of the bridge. So this patchset makes the > brX interface a first class member of the bridge. When the brX > interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed as > usual. > > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. The > brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been changed > to reflecit this. The bridge receive handler needs to not process > frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change to > support this use case. > > Comment welcome and wanted. > > Andrew > > Andrew Lunn (5): > net: rtnetlink: Handle bridge port without upper device > net: bridge: Skip receive handler on brX interface > net: bridge: Make the brX interface a member of the bridge > net: dsa: HACK: Handle MDB add/remove for none-switch ports > net: dsa: Don't include CPU port when adding MDB to a port > > include/linux/if_bridge.h | 1 + > net/bridge/br_device.c| 12 ++-- > net/bridge/br_if.c| 37 - > net/bridge/br_input.c | 4 > net/bridge/br_mdb.c | 2 -- > net/bridge/br_multicast.c | 7 --- > net/bridge/br_private.h | 1 + > net/core/rtnetlink.c | 23 +-- > net/dsa/port.c| 19 +-- > net/dsa/switch.c | 2 +- > 10 files changed, 83 insertions(+), 25 deletions(-) > Sorry you can't change the semantics of the bridge like this. There are likely to be scripts and management utilities that won't work after this. Figure out another way. Such as adding IGMP updates in the local packet send/receive path.
Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On Sun, Aug 27, 2017 at 07:44:15PM -0700, Florian Fainelli wrote: > Hi Andrew, > > On 08/26/2017 01:56 PM, Andrew Lunn wrote: > > This is a WIP patchset i would like comments on from bridge, > > switchdev and hardware offload people. > > > > The linux bridge supports IGMP snooping. It will listen to IGMP > > reports on bridge ports and keep track of which groups have been > > joined on an interface. It will then forward multicast based on this > > group membership. > > > > When the bridge adds or removed groups from an interface, it uses > > switchdev to request the hardware add an mdb to a port, so the > > hardware can perform the selective forwarding between ports. > > > > What is not covered by the current bridge code, is IGMP joins/leaves > > from the host on the brX interface. No such monitoring is performed. > > With a pure software bridge, it is not required. All mulitcast frames > > are passed to the brX interface, and the network stack filters them, > > as it does for any interface. However, when hardware offload is > > involved, things change. We should program the hardware to only send > > multcast packets to the host when the host has in interest in them. > > OK, so if I understand this right, without a bridge, we have the > following happen today: with a DSA-enabled setup using any kind of > switch tagging protocol, if a host is interested in receiving particular > multicast traffic, we would receive IGMP joins/leaves through sw0p0, and > the stack should call ndo_set_rx_mode for sw0p0, which would be > dsa_slave_set_rx_mode() and which would synchronize the DSA master > network device with the slave network device, everything works fine > provided that the CPU port is configured to accept multicast traffic. Hi Florian That sounds about right, but it is not a use case i've looked at yet. I do need to look at it though, because there is a chance i break it with IGMP snooping support. > Note here that we don't really add a MDB entry for sw0p0 when that > happens, but it seems like we should for switches that lack IGMP > snooping and/or multicast filtering. Yes. But it is not an MDB in the normal sense, at least from the switches perspective. An MDB passed to a switch using switchdev says that traffic for the specified group should go out that port. However, when the host does a join on sw0p0, we want the exact opposite, multicast traffic coming in that port of the switch which matches the group should be forwarded to the host. So my second attempt at this code adds a new switchdev object, SWITCHDEV_OBJ_ID_PORT_HOST_MDB. > With the current bridge and DSA code, are not we actually always going > to get the CPU port to be added with the multicast address and therefore > no filtering is occurring and snooping is pretty much useless? Correct. At the moment, all multicast traffic gets delivered to the host. > While I understand the reasons why you did it that way, I think this is > going to break a lot of code in bridge that does not expect brX to be a > bridge port member. It does. Nikolay suggestion works a lot better, and i'm following that now. It looks good and only requires some minor tweaks to the bridge code. Andrew
Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
Hi Andrew, On 08/26/2017 01:56 PM, Andrew Lunn wrote: > This is a WIP patchset i would like comments on from bridge, > switchdev and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is performed. > With a pure software bridge, it is not required. All mulitcast frames > are passed to the brX interface, and the network stack filters them, > as it does for any interface. However, when hardware offload is > involved, things change. We should program the hardware to only send > multcast packets to the host when the host has in interest in them. OK, so if I understand this right, without a bridge, we have the following happen today: with a DSA-enabled setup using any kind of switch tagging protocol, if a host is interested in receiving particular multicast traffic, we would receive IGMP joins/leaves through sw0p0, and the stack should call ndo_set_rx_mode for sw0p0, which would be dsa_slave_set_rx_mode() and which would synchronize the DSA master network device with the slave network device, everything works fine provided that the CPU port is configured to accept multicast traffic. Note here that we don't really add a MDB entry for sw0p0 when that happens, but it seems like we should for switches that lack IGMP snooping and/or multicast filtering. With the current bridge and DSA code, are not we actually always going to get the CPU port to be added with the multicast address and therefore no filtering is occurring and snooping is pretty much useless? > > Thus we need to perform IGMP snooping on the brX interface, just > like any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. > There is no net_bridge_port structure for the brX interface. This > strucuture is created when an interface is added to the bridge. But > the brX interface is not a member of the bridge. So this patchset > makes the brX interface a first class member of the bridge. When the > brX interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed > as usual. Would not making brX be part of the bridge have a huge negative performance impact on locally generated traffic either? Even though we do an early return in br_handle_frame() this may become noticeable. > > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. > The brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been > changed to reflecit this. The bridge receive handler needs to not > process frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change > to support this use case. > > Comment welcome and wanted. While I understand the reasons why you did it that way, I think this is going to break a lot of code in bridge that does not expect brX to be a bridge port member. Maybe we can just generate switch MDB events targeting the bridge network device and let switch drivers resolve that to whatever their CPU/master port is? It does sound like we are moving more and more to a model where brX becomes one (if not the only one) net_device representor of what the CPU/master port of a switch is (at least with DSA) which sort of makes us go back to the multi-CPU port discussion we had a while ago. Thanks! -- Florian
Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
> Hi Andrew, > > Have you taken a look at mglist (the boolean, probably needs a rename) ? It > is for > exactly that purpose, to track which groups the bridge is interested in. > I assume I'm forgetting or missing something here. > > > performed. With a pure software bridge, it is not required. All > > mulitcast frames are passed to the brX interface, and the network > > If mglist (again the boolean) is false then they won't be passed up. > > > stack filters them, as it does for any interface. However, when > > hardware offload is involved, things change. We should program the > > hardware to only send multcast packets to the host when the host has > > in interest in them. > > Granted the boolean mglist might need some changes (esp. with host group > leave) > but I think it can be used to program switchdev for host join/leave, can't > we adjust its behaviour instead of introducing this complexity and avoid many > headaches ? I would like to avoid this complexity as well. I will take a look at mglist. Thanks for the hint. Andrew
Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On 27.08.2017 01:17, Nikolay Aleksandrov wrote: > On 26/08/17 23:56, Andrew Lunn wrote: >> This is a WIP patchset i would like comments on from bridge, switchdev >> and hardware offload people. >> >> The linux bridge supports IGMP snooping. It will listen to IGMP >> reports on bridge ports and keep track of which groups have been >> joined on an interface. It will then forward multicast based on this >> group membership. >> >> When the bridge adds or removed groups from an interface, it uses >> switchdev to request the hardware add an mdb to a port, so the >> hardware can perform the selective forwarding between ports. >> >> What is not covered by the current bridge code, is IGMP joins/leaves >> from the host on the brX interface. No such monitoring is > > Hi Andrew, > > Have you taken a look at mglist (the boolean, probably needs a rename) ? It > is for > exactly that purpose, to track which groups the bridge is interested in. > I assume I'm forgetting or missing something here. > >> performed. With a pure software bridge, it is not required. All >> mulitcast frames are passed to the brX interface, and the network > > If mglist (again the boolean) is false then they won't be passed up. > >> stack filters them, as it does for any interface. However, when >> hardware offload is involved, things change. We should program the >> hardware to only send multcast packets to the host when the host has >> in interest in them. > > Granted the boolean mglist might need some changes (esp. with host group > leave) > but I think it can be used to program switchdev for host join/leave, can't > we adjust its behaviour instead of introducing this complexity and avoid many > headaches ? > >> >> Thus we need to perform IGMP snooping on the brX interface, just like >> any other interface of the bridge. However, currently the brX >> interface is missing all the needed data structures to do this. There >> is no net_bridge_port structure for the brX interface. This strucuture >> is created when an interface is added to the bridge. But the brX >> interface is not a member of the bridge. So this patchset makes the >> brX interface a first class member of the bridge. When the brX >> interface is opened, the interface is added to the bridge. A >> net_bridge_port is allocated for it, and IGMP snooping is performed as >> usual. > > I have actually discussed this idea long time ago with Vlad and it has very > nice > upsides (most important one removing br/port checks everywhere) but it blows > up > fast with special cases for the bridge and things look very similar. You'll > need > to rework the whole bridge and turn every bridge special case into either a > port > generic one or again bridge-specific special case but with a check for the > new flag. > I will not point out every bug that comes out of this, but registering the > bridge > rx handler to itself is simply wrong on many levels and breaks many setups. This was a digression about making the bridge a proper port of itself (e.g. port 0, linked and all), it is only tangential to this implementation as it doesn't link the new port. > >> >> There are some complexities here. Some assumptions are broken, like >> the master interface of a port interface is the bridge interface. The >> brX interface cannot be its own master. The use of >> netdev_master_upper_dev_get() within the bridge code has been changed >> to reflecit this. The bridge receive handler needs to not process >> frames for the brX interface, etc. >> >> The interface downward to the hardware is also an issue. The code >> presented here is a hack and needs to change. But that is secondary >> and can be solved once it is agreed how the bridge needs to change to >> support this use case. > > Definitely agree with this statement. :-) > >> >> Comment welcome and wanted. >> >> Andrew >> >> Andrew Lunn (5): >> net: rtnetlink: Handle bridge port without upper device >> net: bridge: Skip receive handler on brX interface >> net: bridge: Make the brX interface a member of the bridge >> net: dsa: HACK: Handle MDB add/remove for none-switch ports >> net: dsa: Don't include CPU port when adding MDB to a port >> >> include/linux/if_bridge.h | 1 + >> net/bridge/br_device.c| 12 ++-- >> net/bridge/br_if.c| 37 - >> net/bridge/br_input.c | 4 >> net/bridge/br_mdb.c | 2 -- >> net/bridge/br_multicast.c | 7 --- >> net/bridge/br_private.h | 1 + >> net/core/rtnetlink.c | 23 +-- >> net/dsa/port.c| 19 +-- >> net/dsa/switch.c | 2 +- >> 10 files changed, 83 insertions(+), 25 deletions(-) >> >
Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
On 26/08/17 23:56, Andrew Lunn wrote: > This is a WIP patchset i would like comments on from bridge, switchdev > and hardware offload people. > > The linux bridge supports IGMP snooping. It will listen to IGMP > reports on bridge ports and keep track of which groups have been > joined on an interface. It will then forward multicast based on this > group membership. > > When the bridge adds or removed groups from an interface, it uses > switchdev to request the hardware add an mdb to a port, so the > hardware can perform the selective forwarding between ports. > > What is not covered by the current bridge code, is IGMP joins/leaves > from the host on the brX interface. No such monitoring is Hi Andrew, Have you taken a look at mglist (the boolean, probably needs a rename) ? It is for exactly that purpose, to track which groups the bridge is interested in. I assume I'm forgetting or missing something here. > performed. With a pure software bridge, it is not required. All > mulitcast frames are passed to the brX interface, and the network If mglist (again the boolean) is false then they won't be passed up. > stack filters them, as it does for any interface. However, when > hardware offload is involved, things change. We should program the > hardware to only send multcast packets to the host when the host has > in interest in them. Granted the boolean mglist might need some changes (esp. with host group leave) but I think it can be used to program switchdev for host join/leave, can't we adjust its behaviour instead of introducing this complexity and avoid many headaches ? > > Thus we need to perform IGMP snooping on the brX interface, just like > any other interface of the bridge. However, currently the brX > interface is missing all the needed data structures to do this. There > is no net_bridge_port structure for the brX interface. This strucuture > is created when an interface is added to the bridge. But the brX > interface is not a member of the bridge. So this patchset makes the > brX interface a first class member of the bridge. When the brX > interface is opened, the interface is added to the bridge. A > net_bridge_port is allocated for it, and IGMP snooping is performed as > usual. I have actually discussed this idea long time ago with Vlad and it has very nice upsides (most important one removing br/port checks everywhere) but it blows up fast with special cases for the bridge and things look very similar. You'll need to rework the whole bridge and turn every bridge special case into either a port generic one or again bridge-specific special case but with a check for the new flag. I will not point out every bug that comes out of this, but registering the bridge rx handler to itself is simply wrong on many levels and breaks many setups. > > There are some complexities here. Some assumptions are broken, like > the master interface of a port interface is the bridge interface. The > brX interface cannot be its own master. The use of > netdev_master_upper_dev_get() within the bridge code has been changed > to reflecit this. The bridge receive handler needs to not process > frames for the brX interface, etc. > > The interface downward to the hardware is also an issue. The code > presented here is a hack and needs to change. But that is secondary > and can be solved once it is agreed how the bridge needs to change to > support this use case. Definitely agree with this statement. :-) > > Comment welcome and wanted. > > Andrew > > Andrew Lunn (5): > net: rtnetlink: Handle bridge port without upper device > net: bridge: Skip receive handler on brX interface > net: bridge: Make the brX interface a member of the bridge > net: dsa: HACK: Handle MDB add/remove for none-switch ports > net: dsa: Don't include CPU port when adding MDB to a port > > include/linux/if_bridge.h | 1 + > net/bridge/br_device.c| 12 ++-- > net/bridge/br_if.c| 37 - > net/bridge/br_input.c | 4 > net/bridge/br_mdb.c | 2 -- > net/bridge/br_multicast.c | 7 --- > net/bridge/br_private.h | 1 + > net/core/rtnetlink.c | 23 +-- > net/dsa/port.c| 19 +-- > net/dsa/switch.c | 2 +- > 10 files changed, 83 insertions(+), 25 deletions(-) >
[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic
This is a WIP patchset i would like comments on from bridge, switchdev and hardware offload people. The linux bridge supports IGMP snooping. It will listen to IGMP reports on bridge ports and keep track of which groups have been joined on an interface. It will then forward multicast based on this group membership. When the bridge adds or removed groups from an interface, it uses switchdev to request the hardware add an mdb to a port, so the hardware can perform the selective forwarding between ports. What is not covered by the current bridge code, is IGMP joins/leaves from the host on the brX interface. No such monitoring is performed. With a pure software bridge, it is not required. All mulitcast frames are passed to the brX interface, and the network stack filters them, as it does for any interface. However, when hardware offload is involved, things change. We should program the hardware to only send multcast packets to the host when the host has in interest in them. Thus we need to perform IGMP snooping on the brX interface, just like any other interface of the bridge. However, currently the brX interface is missing all the needed data structures to do this. There is no net_bridge_port structure for the brX interface. This strucuture is created when an interface is added to the bridge. But the brX interface is not a member of the bridge. So this patchset makes the brX interface a first class member of the bridge. When the brX interface is opened, the interface is added to the bridge. A net_bridge_port is allocated for it, and IGMP snooping is performed as usual. There are some complexities here. Some assumptions are broken, like the master interface of a port interface is the bridge interface. The brX interface cannot be its own master. The use of netdev_master_upper_dev_get() within the bridge code has been changed to reflecit this. The bridge receive handler needs to not process frames for the brX interface, etc. The interface downward to the hardware is also an issue. The code presented here is a hack and needs to change. But that is secondary and can be solved once it is agreed how the bridge needs to change to support this use case. Comment welcome and wanted. Andrew Andrew Lunn (5): net: rtnetlink: Handle bridge port without upper device net: bridge: Skip receive handler on brX interface net: bridge: Make the brX interface a member of the bridge net: dsa: HACK: Handle MDB add/remove for none-switch ports net: dsa: Don't include CPU port when adding MDB to a port include/linux/if_bridge.h | 1 + net/bridge/br_device.c| 12 ++-- net/bridge/br_if.c| 37 - net/bridge/br_input.c | 4 net/bridge/br_mdb.c | 2 -- net/bridge/br_multicast.c | 7 --- net/bridge/br_private.h | 1 + net/core/rtnetlink.c | 23 +-- net/dsa/port.c| 19 +-- net/dsa/switch.c | 2 +- 10 files changed, 83 insertions(+), 25 deletions(-) -- 2.14.1