Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-28 Thread Stephen Hemminger
On Sat, 26 Aug 2017 22:56:05 +0200
Andrew Lunn  wrote:

> This is a WIP patchset i would like comments on from bridge, switchdev
> and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP
> reports on bridge ports and keep track of which groups have been
> joined on an interface. It will then forward multicast based on this
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses
> switchdev to request the hardware add an mdb to a port, so the
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves
> from the host on the brX interface. No such monitoring is
> performed. With a pure software bridge, it is not required. All
> mulitcast frames are passed to the brX interface, and the network
> stack filters them, as it does for any interface. However, when
> hardware offload is involved, things change. We should program the
> hardware to only send multcast packets to the host when the host has
> in interest in them.
> 
> Thus we need to perform IGMP snooping on the brX interface, just like
> any other interface of the bridge. However, currently the brX
> interface is missing all the needed data structures to do this. There
> is no net_bridge_port structure for the brX interface. This strucuture
> is created when an interface is added to the bridge. But the brX
> interface is not a member of the bridge. So this patchset makes the
> brX interface a first class member of the bridge. When the brX
> interface is opened, the interface is added to the bridge. A
> net_bridge_port is allocated for it, and IGMP snooping is performed as
> usual.
> 
> There are some complexities here. Some assumptions are broken, like
> the master interface of a port interface is the bridge interface. The
> brX interface cannot be its own master. The use of
> netdev_master_upper_dev_get() within the bridge code has been changed
> to reflecit this. The bridge receive handler needs to not process
> frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code
> presented here is a hack and needs to change. But that is secondary
> and can be solved once it is agreed how the bridge needs to change to
> support this use case.
> 
> Comment welcome and wanted.
> 
>   Andrew
> 
> Andrew Lunn (5):
>   net: rtnetlink: Handle bridge port without upper device
>   net: bridge: Skip receive handler on brX interface
>   net: bridge: Make the brX interface a member of the bridge
>   net: dsa: HACK: Handle MDB add/remove for none-switch ports
>   net: dsa: Don't include CPU port when adding MDB to a port
> 
>  include/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c| 12 ++--
>  net/bridge/br_if.c| 37 -
>  net/bridge/br_input.c |  4 
>  net/bridge/br_mdb.c   |  2 --
>  net/bridge/br_multicast.c |  7 ---
>  net/bridge/br_private.h   |  1 +
>  net/core/rtnetlink.c  | 23 +--
>  net/dsa/port.c| 19 +--
>  net/dsa/switch.c  |  2 +-
>  10 files changed, 83 insertions(+), 25 deletions(-)
> 

Sorry you can't change the semantics of the bridge like this.
There are likely to be scripts and management utilities that won't work after 
this.

Figure out another way. Such as adding IGMP updates in the local packet 
send/receive path.


Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-28 Thread Andrew Lunn
On Sun, Aug 27, 2017 at 07:44:15PM -0700, Florian Fainelli wrote:
> Hi Andrew,
> 
> On 08/26/2017 01:56 PM, Andrew Lunn wrote:
> > This is a WIP patchset i would like comments on from bridge,
> > switchdev and hardware offload people.
> > 
> > The linux bridge supports IGMP snooping. It will listen to IGMP 
> > reports on bridge ports and keep track of which groups have been 
> > joined on an interface. It will then forward multicast based on this 
> > group membership.
> > 
> > When the bridge adds or removed groups from an interface, it uses 
> > switchdev to request the hardware add an mdb to a port, so the 
> > hardware can perform the selective forwarding between ports.
> > 
> > What is not covered by the current bridge code, is IGMP joins/leaves 
> > from the host on the brX interface. No such monitoring is performed.
> > With a pure software bridge, it is not required. All mulitcast frames
> > are passed to the brX interface, and the network stack filters them,
> > as it does for any interface. However, when hardware offload is
> > involved, things change. We should program the hardware to only send
> > multcast packets to the host when the host has in interest in them.
> 
> OK, so if I understand this right, without a bridge, we have the
> following happen today: with a DSA-enabled setup using any kind of
> switch tagging protocol, if a host is interested in receiving particular
> multicast traffic, we would receive IGMP joins/leaves through sw0p0, and
> the stack should call ndo_set_rx_mode for sw0p0, which would be
> dsa_slave_set_rx_mode() and which would synchronize the DSA master
> network device with the slave network device, everything works fine
> provided that the CPU port is configured to accept multicast traffic.

Hi Florian

That sounds about right, but it is not a use case i've looked at yet.
I do need to look at it though, because there is a chance i break it
with IGMP snooping support.

> Note here that we don't really add a MDB entry for sw0p0 when that
> happens, but it seems like we should for switches that lack IGMP
> snooping and/or multicast filtering.

Yes. But it is not an MDB in the normal sense, at least from the
switches perspective. An MDB passed to a switch using switchdev says
that traffic for the specified group should go out that port. However,
when the host does a join on sw0p0, we want the exact opposite,
multicast traffic coming in that port of the switch which matches the
group should be forwarded to the host. So my second attempt at this
code adds a new switchdev object, SWITCHDEV_OBJ_ID_PORT_HOST_MDB.

> With the current bridge and DSA code, are not we actually always going
> to get the CPU port to be added with the multicast address and therefore
> no filtering is occurring and snooping is pretty much useless?

Correct. At the moment, all multicast traffic gets delivered to the
host.

> While I understand the reasons why you did it that way, I think this is
> going to break a lot of code in bridge that does not expect brX to be a
> bridge port member.

It does. 

Nikolay suggestion works a lot better, and i'm following that now. It
looks good and only requires some minor tweaks to the bridge code.

  Andrew


Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-27 Thread Florian Fainelli
Hi Andrew,

On 08/26/2017 01:56 PM, Andrew Lunn wrote:
> This is a WIP patchset i would like comments on from bridge,
> switchdev and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP 
> reports on bridge ports and keep track of which groups have been 
> joined on an interface. It will then forward multicast based on this 
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses 
> switchdev to request the hardware add an mdb to a port, so the 
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves 
> from the host on the brX interface. No such monitoring is performed.
> With a pure software bridge, it is not required. All mulitcast frames
> are passed to the brX interface, and the network stack filters them,
> as it does for any interface. However, when hardware offload is
> involved, things change. We should program the hardware to only send
> multcast packets to the host when the host has in interest in them.

OK, so if I understand this right, without a bridge, we have the
following happen today: with a DSA-enabled setup using any kind of
switch tagging protocol, if a host is interested in receiving particular
multicast traffic, we would receive IGMP joins/leaves through sw0p0, and
the stack should call ndo_set_rx_mode for sw0p0, which would be
dsa_slave_set_rx_mode() and which would synchronize the DSA master
network device with the slave network device, everything works fine
provided that the CPU port is configured to accept multicast traffic.

Note here that we don't really add a MDB entry for sw0p0 when that
happens, but it seems like we should for switches that lack IGMP
snooping and/or multicast filtering.

With the current bridge and DSA code, are not we actually always going
to get the CPU port to be added with the multicast address and therefore
no filtering is occurring and snooping is pretty much useless?

> 
> Thus we need to perform IGMP snooping on the brX interface, just
> like any other interface of the bridge. However, currently the brX 
> interface is missing all the needed data structures to do this.
> There is no net_bridge_port structure for the brX interface. This
> strucuture is created when an interface is added to the bridge. But
> the brX interface is not a member of the bridge. So this patchset
> makes the brX interface a first class member of the bridge. When the
> brX interface is opened, the interface is added to the bridge. A 
> net_bridge_port is allocated for it, and IGMP snooping is performed
> as usual.

Would not making brX be part of the bridge have a huge negative
performance impact on locally generated traffic either? Even though we
do an early return in br_handle_frame() this may become noticeable.

> 
> There are some complexities here. Some assumptions are broken, like 
> the master interface of a port interface is the bridge interface.
> The brX interface cannot be its own master. The use of 
> netdev_master_upper_dev_get() within the bridge code has been
> changed to reflecit this. The bridge receive handler needs to not
> process frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code 
> presented here is a hack and needs to change. But that is secondary 
> and can be solved once it is agreed how the bridge needs to change
> to support this use case.
> 
> Comment welcome and wanted.

While I understand the reasons why you did it that way, I think this is
going to break a lot of code in bridge that does not expect brX to be a
bridge port member.

Maybe we can just generate switch MDB events targeting the bridge
network device and let switch drivers resolve that to whatever their
CPU/master port is?

It does sound like we are moving more and more to a model where brX
becomes one (if not the only one) net_device representor of what the
CPU/master port of a switch is (at least with DSA) which sort of makes
us go back to the multi-CPU port discussion we had a while ago.

Thanks!
-- 
Florian


Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-26 Thread Andrew Lunn
> Hi Andrew,
> 
> Have you taken a look at mglist (the boolean, probably needs a rename) ? It 
> is for
> exactly that purpose, to track which groups the bridge is interested in.
> I assume I'm forgetting or missing something here.
> 
> > performed. With a pure software bridge, it is not required. All
> > mulitcast frames are passed to the brX interface, and the network
> 
> If mglist (again the boolean) is false then they won't be passed up.
> 
> > stack filters them, as it does for any interface. However, when
> > hardware offload is involved, things change. We should program the
> > hardware to only send multcast packets to the host when the host has
> > in interest in them.
> 
> Granted the boolean mglist might need some changes (esp. with host group 
> leave)
> but I think it can be used to program switchdev for host join/leave, can't
> we adjust its behaviour instead of introducing this complexity and avoid many
> headaches ?

I would like to avoid this complexity as well. I will take a look at
mglist. Thanks for the hint.

Andrew


Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-26 Thread Nikolay Aleksandrov
On 27.08.2017 01:17, Nikolay Aleksandrov wrote:
> On 26/08/17 23:56, Andrew Lunn wrote:
>> This is a WIP patchset i would like comments on from bridge, switchdev
>> and hardware offload people.
>>
>> The linux bridge supports IGMP snooping. It will listen to IGMP
>> reports on bridge ports and keep track of which groups have been
>> joined on an interface. It will then forward multicast based on this
>> group membership.
>>
>> When the bridge adds or removed groups from an interface, it uses
>> switchdev to request the hardware add an mdb to a port, so the
>> hardware can perform the selective forwarding between ports.
>>
>> What is not covered by the current bridge code, is IGMP joins/leaves
>> from the host on the brX interface. No such monitoring is
> 
> Hi Andrew,
> 
> Have you taken a look at mglist (the boolean, probably needs a rename) ? It 
> is for
> exactly that purpose, to track which groups the bridge is interested in.
> I assume I'm forgetting or missing something here.
> 
>> performed. With a pure software bridge, it is not required. All
>> mulitcast frames are passed to the brX interface, and the network
> 
> If mglist (again the boolean) is false then they won't be passed up.
> 
>> stack filters them, as it does for any interface. However, when
>> hardware offload is involved, things change. We should program the
>> hardware to only send multcast packets to the host when the host has
>> in interest in them.
> 
> Granted the boolean mglist might need some changes (esp. with host group 
> leave)
> but I think it can be used to program switchdev for host join/leave, can't
> we adjust its behaviour instead of introducing this complexity and avoid many
> headaches ?
> 
>>
>> Thus we need to perform IGMP snooping on the brX interface, just like
>> any other interface of the bridge. However, currently the brX
>> interface is missing all the needed data structures to do this. There
>> is no net_bridge_port structure for the brX interface. This strucuture
>> is created when an interface is added to the bridge. But the brX
>> interface is not a member of the bridge. So this patchset makes the
>> brX interface a first class member of the bridge. When the brX
>> interface is opened, the interface is added to the bridge. A
>> net_bridge_port is allocated for it, and IGMP snooping is performed as
>> usual.
> 
> I have actually discussed this idea long time ago with Vlad and it has very 
> nice
> upsides (most important one removing br/port checks everywhere) but it blows 
> up
> fast with special cases for the bridge and things look very similar. You'll 
> need
> to rework the whole bridge and turn every bridge special case into either a 
> port 
> generic one or again bridge-specific special case but with a check for the 
> new flag.
> I will not point out every bug that comes out of this, but registering the 
> bridge
> rx handler to itself is simply wrong on many levels and breaks many setups.

This was a digression about making the bridge a proper port of itself
(e.g. port 0, linked and all), it is only tangential to this
implementation as it doesn't link the new port.

> 
>>
>> There are some complexities here. Some assumptions are broken, like
>> the master interface of a port interface is the bridge interface. The
>> brX interface cannot be its own master. The use of
>> netdev_master_upper_dev_get() within the bridge code has been changed
>> to reflecit this. The bridge receive handler needs to not process
>> frames for the brX interface, etc.
>>
>> The interface downward to the hardware is also an issue. The code
>> presented here is a hack and needs to change. But that is secondary
>> and can be solved once it is agreed how the bridge needs to change to
>> support this use case.
> 
> Definitely agree with this statement. :-)
> 
>>
>> Comment welcome and wanted.
>>
>>  Andrew
>>
>> Andrew Lunn (5):
>>   net: rtnetlink: Handle bridge port without upper device
>>   net: bridge: Skip receive handler on brX interface
>>   net: bridge: Make the brX interface a member of the bridge
>>   net: dsa: HACK: Handle MDB add/remove for none-switch ports
>>   net: dsa: Don't include CPU port when adding MDB to a port
>>
>>  include/linux/if_bridge.h |  1 +
>>  net/bridge/br_device.c| 12 ++--
>>  net/bridge/br_if.c| 37 -
>>  net/bridge/br_input.c |  4 
>>  net/bridge/br_mdb.c   |  2 --
>>  net/bridge/br_multicast.c |  7 ---
>>  net/bridge/br_private.h   |  1 +
>>  net/core/rtnetlink.c  | 23 +--
>>  net/dsa/port.c| 19 +--
>>  net/dsa/switch.c  |  2 +-
>>  10 files changed, 83 insertions(+), 25 deletions(-)
>>
> 



Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-26 Thread Nikolay Aleksandrov
On 26/08/17 23:56, Andrew Lunn wrote:
> This is a WIP patchset i would like comments on from bridge, switchdev
> and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP
> reports on bridge ports and keep track of which groups have been
> joined on an interface. It will then forward multicast based on this
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses
> switchdev to request the hardware add an mdb to a port, so the
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves
> from the host on the brX interface. No such monitoring is

Hi Andrew,

Have you taken a look at mglist (the boolean, probably needs a rename) ? It is 
for
exactly that purpose, to track which groups the bridge is interested in.
I assume I'm forgetting or missing something here.

> performed. With a pure software bridge, it is not required. All
> mulitcast frames are passed to the brX interface, and the network

If mglist (again the boolean) is false then they won't be passed up.

> stack filters them, as it does for any interface. However, when
> hardware offload is involved, things change. We should program the
> hardware to only send multcast packets to the host when the host has
> in interest in them.

Granted the boolean mglist might need some changes (esp. with host group leave)
but I think it can be used to program switchdev for host join/leave, can't
we adjust its behaviour instead of introducing this complexity and avoid many
headaches ?

> 
> Thus we need to perform IGMP snooping on the brX interface, just like
> any other interface of the bridge. However, currently the brX
> interface is missing all the needed data structures to do this. There
> is no net_bridge_port structure for the brX interface. This strucuture
> is created when an interface is added to the bridge. But the brX
> interface is not a member of the bridge. So this patchset makes the
> brX interface a first class member of the bridge. When the brX
> interface is opened, the interface is added to the bridge. A
> net_bridge_port is allocated for it, and IGMP snooping is performed as
> usual.

I have actually discussed this idea long time ago with Vlad and it has very nice
upsides (most important one removing br/port checks everywhere) but it blows up
fast with special cases for the bridge and things look very similar. You'll need
to rework the whole bridge and turn every bridge special case into either a 
port 
generic one or again bridge-specific special case but with a check for the new 
flag.
I will not point out every bug that comes out of this, but registering the 
bridge
rx handler to itself is simply wrong on many levels and breaks many setups.

> 
> There are some complexities here. Some assumptions are broken, like
> the master interface of a port interface is the bridge interface. The
> brX interface cannot be its own master. The use of
> netdev_master_upper_dev_get() within the bridge code has been changed
> to reflecit this. The bridge receive handler needs to not process
> frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code
> presented here is a hack and needs to change. But that is secondary
> and can be solved once it is agreed how the bridge needs to change to
> support this use case.

Definitely agree with this statement. :-)

> 
> Comment welcome and wanted.
> 
>   Andrew
> 
> Andrew Lunn (5):
>   net: rtnetlink: Handle bridge port without upper device
>   net: bridge: Skip receive handler on brX interface
>   net: bridge: Make the brX interface a member of the bridge
>   net: dsa: HACK: Handle MDB add/remove for none-switch ports
>   net: dsa: Don't include CPU port when adding MDB to a port
> 
>  include/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c| 12 ++--
>  net/bridge/br_if.c| 37 -
>  net/bridge/br_input.c |  4 
>  net/bridge/br_mdb.c   |  2 --
>  net/bridge/br_multicast.c |  7 ---
>  net/bridge/br_private.h   |  1 +
>  net/core/rtnetlink.c  | 23 +--
>  net/dsa/port.c| 19 +--
>  net/dsa/switch.c  |  2 +-
>  10 files changed, 83 insertions(+), 25 deletions(-)
> 



[Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-26 Thread Andrew Lunn
This is a WIP patchset i would like comments on from bridge, switchdev
and hardware offload people.

The linux bridge supports IGMP snooping. It will listen to IGMP
reports on bridge ports and keep track of which groups have been
joined on an interface. It will then forward multicast based on this
group membership.

When the bridge adds or removed groups from an interface, it uses
switchdev to request the hardware add an mdb to a port, so the
hardware can perform the selective forwarding between ports.

What is not covered by the current bridge code, is IGMP joins/leaves
from the host on the brX interface. No such monitoring is
performed. With a pure software bridge, it is not required. All
mulitcast frames are passed to the brX interface, and the network
stack filters them, as it does for any interface. However, when
hardware offload is involved, things change. We should program the
hardware to only send multcast packets to the host when the host has
in interest in them.

Thus we need to perform IGMP snooping on the brX interface, just like
any other interface of the bridge. However, currently the brX
interface is missing all the needed data structures to do this. There
is no net_bridge_port structure for the brX interface. This strucuture
is created when an interface is added to the bridge. But the brX
interface is not a member of the bridge. So this patchset makes the
brX interface a first class member of the bridge. When the brX
interface is opened, the interface is added to the bridge. A
net_bridge_port is allocated for it, and IGMP snooping is performed as
usual.

There are some complexities here. Some assumptions are broken, like
the master interface of a port interface is the bridge interface. The
brX interface cannot be its own master. The use of
netdev_master_upper_dev_get() within the bridge code has been changed
to reflecit this. The bridge receive handler needs to not process
frames for the brX interface, etc.

The interface downward to the hardware is also an issue. The code
presented here is a hack and needs to change. But that is secondary
and can be solved once it is agreed how the bridge needs to change to
support this use case.

Comment welcome and wanted.

Andrew

Andrew Lunn (5):
  net: rtnetlink: Handle bridge port without upper device
  net: bridge: Skip receive handler on brX interface
  net: bridge: Make the brX interface a member of the bridge
  net: dsa: HACK: Handle MDB add/remove for none-switch ports
  net: dsa: Don't include CPU port when adding MDB to a port

 include/linux/if_bridge.h |  1 +
 net/bridge/br_device.c| 12 ++--
 net/bridge/br_if.c| 37 -
 net/bridge/br_input.c |  4 
 net/bridge/br_mdb.c   |  2 --
 net/bridge/br_multicast.c |  7 ---
 net/bridge/br_private.h   |  1 +
 net/core/rtnetlink.c  | 23 +--
 net/dsa/port.c| 19 +--
 net/dsa/switch.c  |  2 +-
 10 files changed, 83 insertions(+), 25 deletions(-)

-- 
2.14.1