[net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR

2019-08-01 Thread Horatiu Vultur
Based on the discussion on the topic[1], we extend the functionality of
the 'bridge mdb' command to accept link layer multicast address. This
required only few changes and it fits nicely with the current
implementation and also the old implementation was not changed.

In this patch, we have added a MAC address to the union in 'struct br_ip'.
If we want to continue like this we should properly rename the structure as
it is not an IP any more.

To create a group for two of the front ports the following entries can
be added:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1

Now the entries will be display as following:
dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1

This requires changes to iproute2 as well, see the follogin patch for that.

Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
ports. If we have HW offload support, then the frame will be forwarded by
the switch, and need not to go to the CPU. In a pure SW world, the frame is
forwarded by the SW bridge, which will flooded it only the ports which are
part of the group.

So far so good. This is an important part of the problem we wanted to solve.

But, there is one drawback of this approach. If you want to add two of the
front ports and br0 to receive the frame then I can't see a way of doing it
with the bridge mdb command. To do that it requireds many more changes to
the existing code.

Example:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This 
looks wrong.

We believe we come a long way by re-using the facilities in MDB (thanks for
convincing us in doing this), but we are still not completely happy with
the result.

If I only look at the user-interface (iproute2), and completely ignore all
the implementation details, then I still think that the FDB sub command is
more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
We could change this such that MDB is for IP-multicast, and FDB is
forwarding in general (we should prevent FDB in install IP-multicast rules,
but we suggest to allow it to install MAC-Multicast rules).

The example from above would now look like this:
bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
bridge fdb add 01:00:00:00:00:04 dev br0 static self master

It would be very similar to the "bridge vlan" command which also allow to
specify groups with and without br0.

Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
we only use/need the net_bridge_port_group/ports member (and the MAC
address, which we hacked into the br_ip struct) when we are a L2-multicast
entry. This type allow use to re-use the br_multicast_flood function
which does a lot of the work for us.

Also, the key used to do the lookup in the FDB is already a MAC address
(no need to hack the br_ip).

Regarding the events generated by switchdev: In the current proposal this
is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
the associated data type would be switchdev_notifier_fdb_info - which also
has the information we need.

Using the FDB database, can still reuse the net_bridge_port_group type (and
associated functions), and I other parts from the MDB call graph as well.

If this sounds appealing, then we can do a proposal based on the idea.

If the MDB patch is what we can agree on, then we will continue polish this
and look for a solution to control the inclusion/exclusion of the br0
device (hints will be most appreciated).

[1] https://patchwork.ozlabs.org/patch/1136878/

Signed-off-by: Horatiu Vultur 
Co-developed-by: Allan W. Nielsen 
Signed-off-by: Allan W. Nielsen 
---
 include/linux/if_bridge.h  |  1 +
 include/uapi/linux/if_bridge.h |  1 +
 net/bridge/br_device.c |  7 +--
 net/bridge/br_forward.c|  3 ++-
 net/bridge/br_input.c  | 13 ++--
 net/bridge/br_mdb.c| 47 +++---
 net/bridge/br_multicast.c  |  4 +++-
 net/bridge/br_private.h|  3 ++-
 8 files changed, 64 insertions(+), 15 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index f3fab5d..07b092a 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -16,6 +16,7 @@
 struct br_ip {
union {
__be32  ip4;
+   __u8mac[ETH_ALEN];
 #if IS_ENABLED(CONFIG_IPV6)
struct in6_addr ip6;
 #endif
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 773e476..e535a81 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/inc

[iproute2,rfc] bridge: mdb: Extend with with LLADDR

2019-08-01 Thread Horatiu Vultur
Extend bridge mdb command to accept as group also link layer multicast
addresss. The old behaviour is not change.

To add new mdb entry:
bridge mdb add dev br0 port eth0 grp 11:22:33:44:55:66 permanent vid 1

To display existing entries:
bridge mdb
dev br0 port eth4 grp 01:00:00:00:00:01 permanent offload vid 1

Signed-off-by: Horatiu Vultur 
---
 bridge/mdb.c   | 29 -
 include/uapi/linux/if_bridge.h |  1 +
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index 928ae56..057c0b6 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -138,9 +138,21 @@ static void print_mdb_entry(FILE *f, int ifindex, const 
struct br_mdb_entry *e,
print_string(PRINT_ANY, "port", " port %s",
 ll_index_to_name(e->ifindex));
 
-   print_color_string(PRINT_ANY, ifa_family_color(af),
-   "grp", " grp %s",
-   inet_ntop(af, src, abuf, sizeof(abuf)));
+   if (e->addr.proto == htons(ETH_P_IP) ||
+   e->addr.proto == htons(ETH_P_IPV6)) {
+   print_color_string(PRINT_ANY, ifa_family_color(af),
+  "grp", " grp %s",
+  inet_ntop(af, src, abuf, sizeof(abuf)));
+   } else {
+   const char *lladdr;
+   SPRINT_BUF(b1);
+
+   lladdr = ll_addr_n2a(e->addr.u.mac, ETH_ALEN, 0, b1,
+sizeof(b1));
+
+   print_color_string(PRINT_ANY, COLOR_MAC, "grp", " grp %s",
+  lladdr);
+   }
 
print_string(PRINT_ANY, "state", " %s",
   (e->state & MDB_PERMANENT) ? "permanent" : "temp");
@@ -380,6 +392,7 @@ static int mdb_modify(int cmd, int flags, int argc, char 
**argv)
};
struct br_mdb_entry entry = {};
char *d = NULL, *p = NULL, *grp = NULL;
+   char abuf[ETH_ALEN];
short vid = 0;
 
while (argc > 0) {
@@ -422,8 +435,14 @@ static int mdb_modify(int cmd, int flags, int argc, char 
**argv)
 
if (!inet_pton(AF_INET, grp, &entry.addr.u.ip4)) {
if (!inet_pton(AF_INET6, grp, &entry.addr.u.ip6)) {
-   fprintf(stderr, "Invalid address \"%s\"\n", grp);
-   return -1;
+   if (sscanf(grp, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+  abuf, abuf+1, abuf+2, abuf+3, abuf+4,
+  abuf+5) != 6) {
+   fprintf(stderr, "Invalid address \"%s\"\n", 
grp);
+   return -1;
+   }
+   memcpy(entry.addr.u.mac, abuf, ETH_ALEN);
+   entry.addr.proto = 0;
} else
entry.addr.proto = htons(ETH_P_IPV6);
} else
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 04f763c..88aad9c 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -243,6 +243,7 @@ struct br_mdb_entry {
union {
__be32  ip4;
struct in6_addr ip6;
+   __u8mac[ETH_ALEN];
} u;
__be16  proto;
} addr;
-- 
2.7.4



[Bridge] Fw: [Bug 204389] New: bridge: AF_BRIDGE NEWNEIGH netlink message with ifidx is zero

2019-08-01 Thread Stephen Hemminger



Begin forwarded message:

Date: Wed, 31 Jul 2019 09:32:20 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 204389] New: bridge: AF_BRIDGE NEWNEIGH netlink message with 
ifidx is zero


https://bugzilla.kernel.org/show_bug.cgi?id=204389

Bug ID: 204389
   Summary: bridge: AF_BRIDGE NEWNEIGH netlink message with ifidx
is zero
   Product: Networking
   Version: 2.5
Kernel Version: 4.19
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: michael-...@fami-braun.de
Regression: No

My application is receiving the following netlink messages from kernel:

-- BEGIN NETLINK
MESSAGE ---
 [NETLINK HEADER] 16 octets
 .nlmsg_len = 76
 .type = 28 
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 [ATTR 09] 4 octets
 00 00 00 00 
 [ATTR 03] 16 octets
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
 [ATTR 05] 2 octets
 00 01 ..
 [PADDING] 2 octets
 00 00 ..
--- END NETLINK MESSAGE ---

Clearly, ifidex is zero here, so the application cannot correlate this NEWNEIGH
message to  any interface.

Adding a WARN_ON (ifidx=0) to fdb_fill_info gives the following backtrace:

[ 43.071801] [efb47c10] [c0510580] fdb_fill_info+0x180/0x22c (unreliable)
[ 43.071829] [efb47c50] [c0510684] fdb_notify.isra.22+0x58/0xd8
[ 43.071849] [efb47c70] [c0511b44] fdb_insert+0xa4/0x108
[ 43.071870] [efb47c90] [c051254c] br_fdb_insert+0x40/0x64
[ 43.071891] [efb47cb0] [c052b640] __vlan_add+0xe50/0xf24
[ 43.071911] [efb47d20] [c052a77c] br_vlan_add+0x528/0x59c
[ 43.071931] [efb47d70] [c052c38c] br_vlan_init+0x90/0xd4
[ 43.071951] [efb47da0] [c050f96c] br_dev_init+0xc8/0x170
[ 43.071983] [efb47de0] [c03cff8c] register_netdevice+0x148/0x6e8
[ 43.072003] [efb47e20] [c03d0554] register_netdev+0x28/0x50
[ 43.072026] [efb47e30] [c05154f4] br_add_bridge+0x44/0x78
[ 43.072047] [efb47e40] [c0517ce8] br_ioctl_deviceless_stub+0x2c4/0x2f0
[ 43.072069] [efb47e80] [c03a372c] sock_ioctl+0x17c/0x3d8
[ 43.072091] [efb47ed0] [c017655c] do_vfs_ioctl+0x7ac/0x894
[ 43.072111] [efb47f20] [c0176684] ksys_ioctl+0x40/0x74
[ 43.072140] [efb47f40] [c0011288] ret_from_syscall+0x0/0x3c

So clearly br_vlan_init triggers a NEWNEIGH message before bridge creation has
been completed and an ifindex assigned.

-- 
You are receiving this mail because:
You are the assignee for the bug.


Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR

2019-08-01 Thread Nikolay Aleksandrov
Hi Horatiu,
Overall I think MDB is the right way, we'd like to contain the multicast code.
A few comments below.

On 01/08/2019 15:50, Horatiu Vultur wrote:
> Based on the discussion on the topic[1], we extend the functionality of
> the 'bridge mdb' command to accept link layer multicast address. This
> required only few changes and it fits nicely with the current
> implementation and also the old implementation was not changed.
> 
> In this patch, we have added a MAC address to the union in 'struct br_ip'.
> If we want to continue like this we should properly rename the structure as
> it is not an IP any more.
> 
> To create a group for two of the front ports the following entries can
> be added:
> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> 
> Now the entries will be display as following:
> dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
> dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1
> 
> This requires changes to iproute2 as well, see the follogin patch for that.
> 
> Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
> ports. If we have HW offload support, then the frame will be forwarded by
> the switch, and need not to go to the CPU. In a pure SW world, the frame is
> forwarded by the SW bridge, which will flooded it only the ports which are
> part of the group.
> 
> So far so good. This is an important part of the problem we wanted to solve.
> 
> But, there is one drawback of this approach. If you want to add two of the
> front ports and br0 to receive the frame then I can't see a way of doing it
> with the bridge mdb command. To do that it requireds many more changes to
> the existing code.
> 
> Example:
> bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
> bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This 
> looks wrong.
> 
> We believe we come a long way by re-using the facilities in MDB (thanks for
> convincing us in doing this), but we are still not completely happy with
> the result.
> 

Just add self argument for the bridge mdb command, no need to specify it twice.

> If I only look at the user-interface (iproute2), and completely ignore all
> the implementation details, then I still think that the FDB sub command is
> more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
> We could change this such that MDB is for IP-multicast, and FDB is
> forwarding in general (we should prevent FDB in install IP-multicast rules,
> but we suggest to allow it to install MAC-Multicast rules).
> 
> The example from above would now look like this:
> bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
> bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
> bridge fdb add 01:00:00:00:00:04 dev br0 static self master
> 
> It would be very similar to the "bridge vlan" command which also allow to
> specify groups with and without br0.
> 
> Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
> we only use/need the net_bridge_port_group/ports member (and the MAC
> address, which we hacked into the br_ip struct) when we are a L2-multicast
> entry. This type allow use to re-use the br_multicast_flood function
> which does a lot of the work for us.
> 
> Also, the key used to do the lookup in the FDB is already a MAC address
> (no need to hack the br_ip).
> 

Look at it as extending br_ip, it's not a hack but a valid mcast use-case.
In fact br_ip is an internal structure which can easily be renamed.

> Regarding the events generated by switchdev: In the current proposal this
> is a SWITCHDEV_OBJ_ID_PORT_MDB which refer to the switchdev_obj_port_mdb
> type. If we changed to use the SWITCHDEV_FDB_ADD_TO_BRIDGE event, then
> the associated data type would be switchdev_notifier_fdb_info - which also
> has the information we need.
> 
> Using the FDB database, can still reuse the net_bridge_port_group type (and
> associated functions), and I other parts from the MDB call graph as well.
> 

We don't want to mix these.

> If this sounds appealing, then we can do a proposal based on the idea.
> 
> If the MDB patch is what we can agree on, then we will continue polish this
> and look for a solution to control the inclusion/exclusion of the br0
> device (hints will be most appreciated).
> 

Yes, please. Let's work on this implementation. Some code comments below.

> [1] https://patchwork.ozlabs.org/patch/1136878/
> 
> Signed-off-by: Horatiu Vultur 
> Co-developed-by: Allan W. Nielsen 
> Signed-off-by: Allan W. Nielsen 
> ---
>  include/linux/if_bridge.h  |  1 +
>  include/uapi/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c |  7 +--
>  net/bridge/br_forward.c|  3 ++-
>  net/bridge/br_input.c  | 13 ++--
>  net/bridge/br_mdb.c| 47 
> 

Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR

2019-08-01 Thread Nikolay Aleksandrov
On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
> Hi Horatiu,
> Overall I think MDB is the right way, we'd like to contain the multicast code.
> A few comments below.
> 
> On 01/08/2019 15:50, Horatiu Vultur wrote:
[snip]
>>
>> Signed-off-by: Horatiu Vultur 
>> Co-developed-by: Allan W. Nielsen 
>> Signed-off-by: Allan W. Nielsen 
>> ---
>>  include/linux/if_bridge.h  |  1 +
>>  include/uapi/linux/if_bridge.h |  1 +
>>  net/bridge/br_device.c |  7 +--
>>  net/bridge/br_forward.c|  3 ++-
>>  net/bridge/br_input.c  | 13 ++--
>>  net/bridge/br_mdb.c| 47 
>> +++---
>>  net/bridge/br_multicast.c  |  4 +++-
>>  net/bridge/br_private.h|  3 ++-
>>  8 files changed, 64 insertions(+), 15 deletions(-)
>>
> 
> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the below 
> much
> easier and without the checks if you use a per-mdb flag that says it's to be 
> treated
> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code 
> (see the
> attached patch based on this one for example). and continue processing it as 
> it is processed today.
> We'll keep the fast-path with minimal number of new conditionals.
> 
> Something like the patch I've attached to this reply, note that it is not 
> complete
> just to show the intent, you'll have to re-work br_mdb_notify() to make it 
> proper
> and there're most probably other details I've missed. If you find even 
> better/less
> complex way to do it then please do.
> 
> Cheers,
>  Nik

Oops, I sent back your original patch. Here's the actually changed version
I was talking about.

Thanks,
 Nik



>From 01dbe0b22da96efcc6fbf46bd3b22353fca32f5d Mon Sep 17 00:00:00 2001
From: Horatiu Vultur 
Date: Thu, 1 Aug 2019 14:50:40 +0200
Subject: [RFC incomplete] net: bridge: mdb: Extend with multicast LLADDR

Based on the discussion on the topic[1], we extend the functionality of
the 'bridge mdb' command to accept link layer multicast address. This
required only few changes and it fits nicely with the current
implementation and also the old implementation was not changed.

In this patch, we have added a MAC address to the union in 'struct br_ip'.
If we want to continue like this we should properly rename the structure as
it is not an IP any more.

To create a group for two of the front ports the following entries can
be added:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1

Now the entries will be display as following:
dev br0 port eth0 grp 01:00:00:00:00:04 permanent offload vid 1
dev br0 port eth1 grp 01:00:00:00:00:04 permanent offload vid 1

This requires changes to iproute2 as well, see the follogin patch for that.

Now if frame with dmac '01:00:00:00:00:04' will arrive at one of the front
ports. If we have HW offload support, then the frame will be forwarded by
the switch, and need not to go to the CPU. In a pure SW world, the frame is
forwarded by the SW bridge, which will flooded it only the ports which are
part of the group.

So far so good. This is an important part of the problem we wanted to solve.

But, there is one drawback of this approach. If you want to add two of the
front ports and br0 to receive the frame then I can't see a way of doing it
with the bridge mdb command. To do that it requireds many more changes to
the existing code.

Example:
bridge mdb add dev br0 port eth0 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port eth1 grp 01:00:00:00:00:04 permanent vid 1
bridge mdb add dev br0 port br0 grp 01:00:00:00:00:04 permanent vid 1 // This looks wrong.

We believe we come a long way by re-using the facilities in MDB (thanks for
convincing us in doing this), but we are still not completely happy with
the result.

If I only look at the user-interface (iproute2), and completely ignore all
the implementation details, then I still think that the FDB sub command is
more suitable for this. Today, FDB is for unicast, and MDB is for multicast.
We could change this such that MDB is for IP-multicast, and FDB is
forwarding in general (we should prevent FDB in install IP-multicast rules,
but we suggest to allow it to install MAC-Multicast rules).

The example from above would now look like this:
bridge fdb add 01:00:00:00:00:04 dev eth0 static self master
bridge fdb add 01:00:00:00:00:04 dev eth1 static self master
bridge fdb add 01:00:00:00:00:04 dev br0 static self master

It would be very similar to the "bridge vlan" command which also allow to
specify groups with and without br0.

Next observation is on the hashing data structure. In 'net_bridge_mdb_entry'
we only use/need the net_bridge_port_group/ports member (and the MAC
address, which we hacked into the br_ip struct) when we are a L2-multicast
entry. This type allow use to re-use the br_multicast_flood function
which does a lot of the work for us.

Also, the key used to do the look

Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR

2019-08-01 Thread Nikolay Aleksandrov
On 01/08/2019 17:11, Nikolay Aleksandrov wrote:
> On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
>> Hi Horatiu,
>> Overall I think MDB is the right way, we'd like to contain the multicast 
>> code.
>> A few comments below.
>>
>> On 01/08/2019 15:50, Horatiu Vultur wrote:
> [snip]
>>>
>>> Signed-off-by: Horatiu Vultur 
>>> Co-developed-by: Allan W. Nielsen 
>>> Signed-off-by: Allan W. Nielsen 
>>> ---
>>>  include/linux/if_bridge.h  |  1 +
>>>  include/uapi/linux/if_bridge.h |  1 +
>>>  net/bridge/br_device.c |  7 +--
>>>  net/bridge/br_forward.c|  3 ++-
>>>  net/bridge/br_input.c  | 13 ++--
>>>  net/bridge/br_mdb.c| 47 
>>> +++---
>>>  net/bridge/br_multicast.c  |  4 +++-
>>>  net/bridge/br_private.h|  3 ++-
>>>  8 files changed, 64 insertions(+), 15 deletions(-)
>>>
>>
>> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the 
>> below much
>> easier and without the checks if you use a per-mdb flag that says it's to be 
>> treated
>> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 code 
>> (see the
>> attached patch based on this one for example). and continue processing it as 
>> it is processed today.
>> We'll keep the fast-path with minimal number of new conditionals.
>>
>> Something like the patch I've attached to this reply, note that it is not 
>> complete
>> just to show the intent, you'll have to re-work br_mdb_notify() to make it 
>> proper
>> and there're most probably other details I've missed. If you find even 
>> better/less
>> complex way to do it then please do.
>>
>> Cheers,
>>  Nik
> 
> Oops, I sent back your original patch. Here's the actually changed version
> I was talking about.
> 
> Thanks,
>  Nik
> 
> 
> 

The querier exists change is a hack just to get the point, I'd prefer
to re-write that portion in a better way which makes more sense, i.e.
get that check out of there since it doesn't mean that an actual querier
exists. :)



Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-08-01 Thread Allan W. Nielsen
The 07/26/2019 15:31, Nikolay Aleksandrov wrote:
...

> You know that in order to not run in promisc mode you'll have to disable
> port flooding and port learning, right ? Otherwise they're always put in 
> promisc.
Yes, we have spend some time looking at nbp_update_port_count and trying to
understand the reasoning behind it.

Our understanding is that this is to make it work with a pure SW bridge
implementation, and this is actually an optimization to allow disable promisc
mode if all forwarding is static (no flooding and no learning).

We also noticed that the Ocelot and the Rocker drivers avoids this "issue" by
not implementing promisc mode.

But promisc mode is a really nice feature for debugging, and we would actually
like to have it, and when HW that can do learning/flooding it does not seem to
be necessary.

I tried to understand how this is handled in the Mellanox drivers, but gave up.
Too big, and we lack the insight in their design.

Do you know if there are better ways to prevent switchdev-offloaded-slave
interfaces to go to promisc mode?

/Allan



Re: [net-next,rfc] net: bridge: mdb: Extend with multicast LLADDR

2019-08-01 Thread Nikolay Aleksandrov
On 01/08/2019 17:15, Nikolay Aleksandrov wrote:
> On 01/08/2019 17:11, Nikolay Aleksandrov wrote:
>> On 01/08/2019 17:07, Nikolay Aleksandrov wrote:
>>> Hi Horatiu,
>>> Overall I think MDB is the right way, we'd like to contain the multicast 
>>> code.
>>> A few comments below.
>>>
>>> On 01/08/2019 15:50, Horatiu Vultur wrote:
>> [snip]

 Signed-off-by: Horatiu Vultur 
 Co-developed-by: Allan W. Nielsen 
 Signed-off-by: Allan W. Nielsen 
 ---
  include/linux/if_bridge.h  |  1 +
  include/uapi/linux/if_bridge.h |  1 +
  net/bridge/br_device.c |  7 +--
  net/bridge/br_forward.c|  3 ++-
  net/bridge/br_input.c  | 13 ++--
  net/bridge/br_mdb.c| 47 
 +++---
  net/bridge/br_multicast.c  |  4 +++-
  net/bridge/br_private.h|  3 ++-
  8 files changed, 64 insertions(+), 15 deletions(-)

>>>
>>> Overall I don't think we need this BR_PKT_MULTICAST_L2, we could do the 
>>> below much
>>> easier and without the checks if you use a per-mdb flag that says it's to 
>>> be treated
>>> as a MULTICAST_L2 entry. Then you remove all of the BR_PKT_MULTICAST_L2 
>>> code (see the
>>> attached patch based on this one for example). and continue processing it 
>>> as it is processed today.
>>> We'll keep the fast-path with minimal number of new conditionals.
>>>
>>> Something like the patch I've attached to this reply, note that it is not 
>>> complete
>>> just to show the intent, you'll have to re-work br_mdb_notify() to make it 
>>> proper
>>> and there're most probably other details I've missed. If you find even 
>>> better/less
>>> complex way to do it then please do.
>>>
>>> Cheers,
>>>  Nik
>>
>> Oops, I sent back your original patch. Here's the actually changed version
>> I was talking about.
>>
>> Thanks,
>>  Nik
>>
>>
>>
> 
> The querier exists change is a hack just to get the point, I'd prefer
> to re-write that portion in a better way which makes more sense, i.e.
> get that check out of there since it doesn't mean that an actual querier
> exists. :)
> 

TBH, I'm inclined to just use proto == 0 *internally* as this even though it's 
reserved,
we're not putting it on the wire or using it to construct packets, it's just 
internal
use which can change into a flag if some day that value needs to be used. 
Obviously
to user-space we need it to be a flag, we can't expose or configure it as a 
proto value
without making it permanent uapi. I haven't looked into detail how feasible 
this is,
just a thought that might make it simpler.





Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-08-01 Thread Vivien Didelot
Hi Horatiu,

On Thu, 25 Jul 2019 13:44:04 +0200, Horatiu Vultur 
 wrote:
> There is no way to configure the bridge, to receive only specific link
> layer multicast addresses. From the description of the command 'bridge
> fdb append' is supposed to do that, but there was no way to notify the
> network driver that the bridge joined a group, because LLADDR was added
> to the unicast netdev_hw_addr_list.
> 
> Therefore update fdb_add_entry to check if the NLM_F_APPEND flag is set
> and if the source is NULL, which represent the bridge itself. Then add
> address to multicast netdev_hw_addr_list for each bridge interfaces.
> And then the .ndo_set_rx_mode function on the driver is called. To notify
> the driver that the list of multicast mac addresses changed.
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  net/bridge/br_fdb.c | 49 ++---
>  1 file changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b1d3248..d93746d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -175,6 +175,29 @@ static void fdb_add_hw_addr(struct net_bridge *br, const 
> unsigned char *addr)
>   }
>  }
>  
> +static void fdb_add_hw_maddr(struct net_bridge *br, const unsigned char 
> *addr)
> +{
> + int err;
> + struct net_bridge_port *p;
> +
> + ASSERT_RTNL();
> +
> + list_for_each_entry(p, &br->port_list, list) {
> + if (!br_promisc_port(p)) {
> + err = dev_mc_add(p->dev, addr);
> + if (err)
> + goto undo;
> + }
> + }
> +
> + return;
> +undo:
> + list_for_each_entry_continue_reverse(p, &br->port_list, list) {
> + if (!br_promisc_port(p))
> + dev_mc_del(p->dev, addr);
> + }
> +}
> +
>  /* When a static FDB entry is deleted, the HW address from that entry is
>   * also removed from the bridge private HW address list and updates all
>   * the ports with needed information.
> @@ -192,13 +215,27 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
> const unsigned char *addr)
>   }
>  }
>  
> +static void fdb_del_hw_maddr(struct net_bridge *br, const unsigned char 
> *addr)
> +{
> + struct net_bridge_port *p;
> +
> + ASSERT_RTNL();
> +
> + list_for_each_entry(p, &br->port_list, list) {
> + if (!br_promisc_port(p))
> + dev_mc_del(p->dev, addr);
> + }
> +}
> +
>  static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f,
>  bool swdev_notify)
>  {
>   trace_fdb_delete(br, f);
>  
> - if (f->is_static)
> + if (f->is_static) {
>   fdb_del_hw_addr(br, f->key.addr.addr);
> + fdb_del_hw_maddr(br, f->key.addr.addr);
> + }
>  
>   hlist_del_init_rcu(&f->fdb_node);
>   rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode,
> @@ -843,13 +880,19 @@ static int fdb_add_entry(struct net_bridge *br, struct 
> net_bridge_port *source,
>   fdb->is_local = 1;
>   if (!fdb->is_static) {
>   fdb->is_static = 1;
> - fdb_add_hw_addr(br, addr);
> + if (flags & NLM_F_APPEND && !source)
> + fdb_add_hw_maddr(br, addr);
> + else
> + fdb_add_hw_addr(br, addr);
>   }
>   } else if (state & NUD_NOARP) {
>   fdb->is_local = 0;
>   if (!fdb->is_static) {
>   fdb->is_static = 1;
> - fdb_add_hw_addr(br, addr);
> + if (flags & NLM_F_APPEND && !source)
> + fdb_add_hw_maddr(br, addr);
> + else
> + fdb_add_hw_addr(br, addr);
>   }
>   } else {
>   fdb->is_local = 0;
> -- 
> 2.7.4
> 

I'm a bit late in the conversation. Isn't this what you want?

ip address add  dev br0 autojoin


Thanks,
Vivien


Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-08-01 Thread Horatiu Vultur
Hi Vivien,

The 08/01/2019 15:17, Vivien Didelot wrote:
> External E-Mail
> 
> I'm a bit late in the conversation. Isn't this what you want?
> 
> ip address add  dev br0 autojoin
> 

Not really, I was looking in a way to register the ports to link layer
multicast address. Sorry for the confusion, my description of the patch
was totally missleaning.

If you follow this thread you will get a better idea what we wanted to
achive. We got some really good comments and based on these we send a
RFC[1]. 

> 
> Thanks,
> Vivien

[1] https://patchwork.ozlabs.org/patch/1140468/

-- 
/Horatiu


Re: [PATCH] net: bridge: Allow bridge to joing multicast groups

2019-08-01 Thread Vivien Didelot
Hi Horatiu,

On Thu, 1 Aug 2019 21:48:02 +0200, Horatiu Vultur 
 wrote:
> > I'm a bit late in the conversation. Isn't this what you want?
> > 
> > ip address add  dev br0 autojoin
> > 
> 
> Not really, I was looking in a way to register the ports to link layer
> multicast address. Sorry for the confusion, my description of the patch
> was totally missleaning.
> 
> If you follow this thread you will get a better idea what we wanted to
> achive. We got some really good comments and based on these we send a
> RFC[1]. 

OK great! Keep me in the loop, I enjoy bridge and multicast very much ;-)


Thanks,

Vivien


Re: [Bridge] [PATCH net v3] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER

2019-08-01 Thread Nikolay Aleksandrov
On 8/1/19 1:49 AM, Nikolay Aleksandrov wrote:
[snip]
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389
> 
> Reported-by: michael-dev 
> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid")
> Signed-off-by: Nikolay Aleksandrov 
> ---
> I tried a few different approaches to resolve this but they were all
> unsuitable for some kernels, this approach can go to stables easily
> and IMO is the way this had to be done from the start. Alternatively
> we could move only the br_vlan_add and pair it with a br_vlan_del of
> default_pvid on the same events, but I don't think it hurts to move
> the whole init/deinit there as it'd help older stable releases as well.
> 
> I also tested the br_vlan_init error handling after the move by always
> returning errors from all over it. Since errors at NETDEV_REGISTER cause
> NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless
> why it happened (e.g. device destruction or init error).
> 
>  net/bridge/br.c |  5 -
>  net/bridge/br_device.c  | 10 --
>  net/bridge/br_private.h | 20 +---
>  net/bridge/br_vlan.c| 25 ++---
>  4 files changed, 27 insertions(+), 33 deletions(-)
> 

Self-NAK, after thinking more about how to best handle this and running more
tests I believe it'll be better to go with the alternative I suggested above -
to move out only the default pvid add out of br_vlan_init to NETDEV_REGSITER
and pair it with br_vlan_delete in NETDEV_UNREGISTER. That way we'll split
the init/deinit in 2 steps, but we'll keep the current order and will reduce
the churn for this fix, functionally it should be equivalent as that is the
problematic part of the init which has to be done after the netdev has been
registered.

I'll spin v4 tomorrow after running more tests with it.