from:"Stephen Hemminger"

Re: [PATCH net-next 1/4] net: bridge: Add support for bridge port in locked mode

2024-05-21 Thread Stephen Hemminger

On Mon,  7 Feb 2022 11:07:39 +0100
Hans Schultz  wrote:

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 6218f93f5c1a..8fa2648fbc83 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -532,6 +532,7 @@ enum {
>   IFLA_BRPORT_GROUP_FWD_MASK,
>   IFLA_BRPORT_NEIGH_SUPPRESS,
>   IFLA_BRPORT_ISOLATED,
> + IFLA_BRPORT_LOCKED,
>   IFLA_BRPORT_BACKUP_PORT,
>   IFLA_BRPORT_MRP_RING_OPEN,
>   IFLA_BRPORT_MRP_IN_OPEN,

NAK
This is userspace API, adding a new value in enum in the middle
will reorder the numbers and break ABI.

Re: [PATCH iproute2-next v3 2/2] bridge: vlan: add compressvlans manpage

2024-03-28 Thread Stephen Hemminger

On Mon, 25 Mar 2024 13:49:16 +0800
Date Huang  wrote:

> Add the missing 'compressvlans' to man page
> 
> Signed-off-by: Date Huang 
> ---

Marked this as superseded since your previous man page patch is already merged
in iproute2 (not next). If you want to update that send another patch.

Re: [PATCH iproute2-next v2 2/2] bridge: vlan: fix compressvlans usage

2024-03-22 Thread Stephen Hemminger

On Fri, 22 Mar 2024 20:39:23 +0800
Date Huang  wrote:

> diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
> index eeea4073..bb02bd27 100644
> --- a/man/man8/bridge.8
> +++ b/man/man8/bridge.8
> @@ -22,6 +22,7 @@ bridge \- show / manipulate bridge addresses and devices
>  \fB\-s\fR[\fItatistics\fR] |
>  \fB\-n\fR[\fIetns\fR] name |
>  \fB\-b\fR[\fIatch\fR] filename |
> +\fB\-com\fR[\fIpressvlans\fR] |
>  \fB\-c\fR[\fIolor\fR] |
>  \fB\-p\fR[\fIretty\fR] |
>  \fB\-j\fR[\fIson\fR] |
> @@ -345,6 +346,11 @@ Don't terminate bridge command on errors in batch mode.
>  If there were any errors during execution of the commands, the application
>  return code will be non zero.
>  
> +.TP
> +.BR "\-com", " \-compressvlans"
> +Show compressed VLAN list. It will show continuous VLANs with the range 
> instead
> +of separated VLANs. Default is off.
> +

Overlapping option strings can cause problems, maybe a better word?

[Bridge] Fw: [Bug 218047] New: linux network bridge kernel set group_fwd_mask 65535

2023-10-26 Thread Stephen Hemminger via Bridge

The kernel intentionally does not allow forwarding of IEEE pause frames.
These need to be handled dropped in the bridge.
Any mask without bit 1 set should work.

Originally LACP, STP and MACPAUSE frames were always blocked and could not be
changed. The restriction was relaxed for LACP and STP frames.

Begin forwarded message:

Date: Thu, 26 Oct 2023 05:16:24 +
From: bugzilla-dae...@kernel.org
To: step...@networkplumber.org
Subject: [Bug 218047] New: linux network bridge kernel set group_fwd_mask 65535


https://bugzilla.kernel.org/show_bug.cgi?id=218047

Bug ID: 218047
   Summary: linux network bridge kernel set group_fwd_mask 65535
   Product: Networking
   Version: 2.5
  Hardware: All
OS: Linux
Status: NEW
  Severity: normal
  Priority: P3
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: ne-vleza...@yandex.ru
Regression: No

If trying set the parameter from network bridge? going error:

[root@arch user]# ip link set bridge0  type bridge group_fwd_mask 65535
RTNETLINK answers: Invalid argument

setting this parameter is desirable if the switching is routed to a separate
virtual machine on the hypervisor, and on the host it looks like this:

ip link add dev ns-switch.0 up mtu 16384 master bridge0 type veth peer name
host0 address $SWMAC  netns switch mtu 16384
ip link add dev ns-host1.0 up mtu 16384 master bridge0 type veth peer name
host0 netns host1 mtu 16384
ip link add dev ns-host2.0 up mtu 16384 master bridge0 type veth peer name
host0 netns host2 mtu 16384
ip link add dev ns-host3.0 up mtu 16384 master bridge0 type veth peer name
host0 netns host3 mtu 16384
ip link add dev ns-host4.0 up mtu 16384 master bridge0 type veth peer name
host0 netns host4 mtu 16384

tc qdisc add dev ns-switch.0 handle : ingress
tc qdisc add dev ns-host1.0 handle : ingress
tc qdisc add dev ns-host2.0 handle : ingress
tc qdisc add dev ns-host3.0 handle : ingress
tc qdisc add dev ns-host4.0 handle : ingress

tc qdisc add dev gt-ll0 handle : ingress
tc qdisc add dev gt-ll0 handle 1: root htb default 10

tc qdisc add dev ns-switch.0 handle 1: root htb default 10
tc qdisc add dev ns-host1.0 handle 1: root htb default 10
tc qdisc add dev ns-host2.0 handle 1: root htb default 10
tc qdisc add dev ns-host3.0 handle 1: root htb default 10
tc qdisc add dev ns-host4.0 handle 1: root htb default 10


tc filter add dev ns-host1.0 parent : prio 9 protocol all flower dst_mac
01:00:00:00:00:00/8 action vlan push id 10 action mirred egress redirect dev
gt-ll0
tc filter add dev ns-host2.0 parent : prio 9 protocol all flower dst_mac
01:00:00:00:00:00/8 action vlan push id 11 action mirred egress redirect dev
gt-ll0
tc filter add dev ns-host3.0 parent : prio 9 protocol all flower dst_mac
01:00:00:00:00:00/8 action vlan push id 12 action mirred egress redirect dev
gt-ll0
tc filter add dev ns-host4.0 parent : prio 9 protocol all flower dst_mac
01:00:00:00:00:00/8 action vlan push id 13 action mirred egress redirect dev
gt-ll0

tc filter add dev ns-host1.0 parent : prio 10 protocol all matchall action
vlan push id 10 
tc filter add dev ns-host2.0 parent : prio 10 protocol all matchall action
vlan push id 11
tc filter add dev ns-host3.0 parent : prio 10 protocol all matchall action
vlan push id 12
tc filter add dev ns-host4.0 parent : prio 10 protocol all matchall action
vlan push id 13

tc filter add dev ns-host1.0 parent 1: prio 65535 protocol all matchall action
drop
tc filter add dev ns-host1.0 parent 1: prio 1 protocol 802.1Q flower vlan_id 10
action vlan pop

tc filter add dev ns-host2.0 parent 1: prio 65535 protocol all matchall action
drop
tc filter add dev ns-host2.0 parent 1: prio 1 protocol 802.1Q flower vlan_id 11
action vlan pop

tc filter add dev ns-host3.0 parent 1: prio 65535 protocol all matchall action
drop
tc filter add dev ns-host3.0 parent 1: prio 1 protocol 802.1Q flower vlan_id 12
action vlan pop

tc filter add dev ns-host4.0 parent 1: prio 65535 protocol all matchall action
drop
tc filter add dev ns-host4.0 parent 1: prio 1 protocol 802.1Q flower vlan_id 13
action vlan pop 

the config example from be network lab.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are the assignee for the bug.

Re: [Bridge] [PATCH iproute2-next 1/2] configure: add the --color option

2023-09-15 Thread Stephen Hemminger via Bridge

On Wed, 13 Sep 2023 19:58:25 +0200
Andrea Claudi  wrote:

> This commit allows users/packagers to choose a default for the color
> output feature provided by some iproute2 tools.
> 
> The configure script option is documented in the script itself and it is
> pretty much self-explanatory. The default value is set to "never" to
> avoid changes to the current ip, tc, and bridge behaviour.
> 
> Signed-off-by: Andrea Claudi 
> ---

More build time config is not the answer either.
Don't want complaints from distribution users about the change.
Needs to be an environment variable or config file.

Re: [Bridge] Getting + maintaining a consistent MDB view in userspace?

2023-08-07 Thread Stephen Hemminger via Bridge

On Tue, 8 Aug 2023 02:29:09 +0200
Linus Lüssing  wrote:

> Hi,
> 
> Is there a recommended way to get and monitor the MDB state from
> userspace, without potential synchronization races?
> 
> I can use "bridge monitor mdb" to get changes or "bridge mdb show"
> to get the current state (as well as use the underlying netlink
> calls from my own application without the "bridge" command).
> 
> But if I were first calling "bridge mdb show" and then "bridge
> monitor mdb" then there is a brief moment where I might miss some
> changes. Or if I were doing it the other way around then a change
> might wrongly override a valid entry? Or the full dump might not
> be as up to date as the change event I got before?
> 
> Ideally I'd be looking for a variant of "bridge monitor mdb" which
> first dumps the whole, consistent mdb state and after that
> provides updates in the correct order.
> 
> Regards, Linus

Same issue as dumping routing table which routing daemons deal with.
Most applications end up with two netlink sockets, one for monitor
and one for dumping.

You really can't do it from the bridge utility, need to build your
own tools.

[Bridge] [PATCH iproute2 3/7] bridge: make print_vlan_info static

2023-06-01 Thread Stephen Hemminger via Bridge

Function defined and used in only one file.

Signed-off-by: Stephen Hemminger 
---
 bridge/br_common.h | 1 -
 bridge/vlan.c  | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index 1bdee65844c1..704e76b0acb2 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -6,7 +6,6 @@
 #define MDB_RTR_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32
 
-void print_vlan_info(struct rtattr *tb, int ifindex);
 int print_linkinfo(struct nlmsghdr *n, void *arg);
 int print_mdb_mon(struct nlmsghdr *n, void *arg);
 int print_fdb(struct nlmsghdr *n, void *arg);
diff --git a/bridge/vlan.c b/bridge/vlan.c
index 5b304ea94224..dfc62f83a5df 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -18,6 +18,7 @@
 
 static unsigned int filter_index, filter_vlan;
 static int vlan_rtm_cur_ifidx = -1;
+static void print_vlan_info(struct rtattr *tb, int ifindex);
 
 enum vlan_show_subject {
VLAN_SHOW_VLAN,
@@ -1309,7 +1310,7 @@ static int vlan_global_show(int argc, char **argv)
return 0;
 }
 
-void print_vlan_info(struct rtattr *tb, int ifindex)
+static void print_vlan_info(struct rtattr *tb, int ifindex)
 {
struct rtattr *i, *list = tb;
int rem = RTA_PAYLOAD(list);
-- 
2.39.2

Re: [Bridge] [PATCH net-next 2/2] bridge: Add a sysctl to limit new brides FDB entries

2023-05-15 Thread Stephen Hemminger via Bridge

On Mon, 15 May 2023 10:50:46 +0200
Johannes Nixdorf  wrote:

> +static struct ctl_table br_sysctl_table[] = {
> + {
> + .procname = "bridge-fdb-max-entries-default",

That name is too long.

Also, all the rest of bridge code does not use sysctl's.  Why is this
special and why should the property be global and not per bridge?

NAK

Re: [Bridge] [Question] Any plan to write/update the bridge doc?

2023-04-24 Thread Stephen Hemminger via Bridge

On Mon, 24 Apr 2023 14:28:00 -0700
Jakub Kicinski  wrote:

> On Mon, 24 Apr 2023 17:25:08 +0800 Hangbin Liu wrote:
> > Maybe someone already has asked. The only official Linux bridge document I
> > got is a very ancient wiki page[1] or the ip link man page[2][3]. As there 
> > are
> > many bridge stp/vlan/multicast paramegers. Should we add a detailed kernel
> > document about each parameter? The parameter showed in ip link page seems
> > a little brief.
> > 
> > I'd like to help do this work. But apparently neither my English nor my
> > understanding of the code is good enough. Anyway, if you want, I can help
> > write a draft version first and you (bridge maintainers) keep working on 
> > this.
> > 
> > [1] https://wiki.linuxfoundation.org/networking/bridge
> > [2] https://man7.org/linux/man-pages/man8/bridge.8.html
> > [3] https://man7.org/linux/man-pages/man8/ip-link.8.html  
> 
> Sounds like we have 2 votes for the CLI man pages but I'd like to
> register a vote for in-kernel documentation.
> 
> I work at a large company so my perspective may differ but from what 
> I see:
> 
>  - users who want to call the kernel API should not have to look at 
>the CLI's man

Internal Kernel API's are not stable. So documentation is only the auto
generated kernel docs.

There is an effort to cover netlink API's with YAML. Bridge could/should
be part of that.

>  - man pages use archaic and arcane markup, I'd like to know how many
>people actually know how it works and how many copy / paste / look;
>ReST is prevalent, simple and commonly understood

Yes, but that is what distributions want.

>  - in-kernel docs are rendered on the web as soon as they hit linux-next
>  - we can make sure documentation is provided with the kernel changes,
>in an ideal world it doesn't matter but in practice the CLI support
>may never happen (no to mention that iproute does not hold all CLI)
> 
> Obviously if Stephen and Ido prefer to document the bridge CLI that's
> perfectly fine, it's their call :) For new sections of uAPI, however,
> I personally find in-kernel docs superior.


The in-kernel documents usually only cover the architecture and motivation.
What/why/how... Not the user visible public API's.

Re: [Bridge] [Question] Any plan to write/update the bridge doc?

2023-04-24 Thread Stephen Hemminger via Bridge

On Mon, 24 Apr 2023 18:46:53 +0300
Ido Schimmel  wrote:

> On Mon, Apr 24, 2023 at 05:25:08PM +0800, Hangbin Liu wrote:
> > Hi,
> > 
> > Maybe someone already has asked. The only official Linux bridge document I
> > got is a very ancient wiki page[1] or the ip link man page[2][3]. As there 
> > are
> > many bridge stp/vlan/multicast paramegers. Should we add a detailed kernel
> > document about each parameter? The parameter showed in ip link page seems
> > a little brief.  
> 
> I suggest improving the man pages instead of adding kernel
> documentation. The man pages are the most up to date resource and
> therefore the one users probably refer to the most. Also, it's already
> quite annoying to patch both "ip-link" and "bridge" man pages when
> adding bridge port options. Adding a third document and making sure all
> three resources are patched would be a nightmare...
> 
> > 
> > I'd like to help do this work. But apparently neither my English nor my
> > understanding of the code is good enough. Anyway, if you want, I can help
> > write a draft version first and you (bridge maintainers) keep working on 
> > this.  
> 
> I can help reviewing man page patches if you want. I'm going to send
> some soon. Will copy you.
> 
> > 
> > [1] https://wiki.linuxfoundation.org/networking/bridge
> > [2] https://man7.org/linux/man-pages/man8/bridge.8.html
> > [3] https://man7.org/linux/man-pages/man8/ip-link.8.html
> > 
> > Thanks
> > Hangbin  

Yes, please update the iproute2 man pages.
From there, I can make the old wiki just be a reference to them.
And Michael will pickup the man7.org versions from the current iproute2.

The iproute2 git tree is single source of current documentation please.

Re: [Bridge] [PATCH net-next 01/16] net: bridge: Set strict_start_type at two policies

2023-01-26 Thread Stephen Hemminger via Bridge

On Thu, 26 Jan 2023 18:01:09 +0100
Petr Machata  wrote:

>  static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
> + [IFLA_BRPORT_UNSPEC]= { .strict_start_type =
> + IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT + 1 },

Is the original IFLA_BRPORT a typo? ETH not EHT

Re: [Bridge] How to connect more than 200 interfaces to a bridge

2022-11-09 Thread Stephen Hemminger via Bridge

On Wed, 09 Nov 2022 19:25:32 +
Ali Shirvani  wrote:

> Sent with Proton Mail secure email.
> 
> --- Original Message ---
> On Wednesday, November 9th, 2022 at 8:21 PM, Stephen Hemminger 
>  wrote:
> 
> 
> > On Wed, 09 Nov 2022 10:51:27 +
> > Ali Shirvani via Bridge bridge@lists.linux-foundation.org wrote:
> >   
> > > Hello everyone,
> > > 
> > > It seems we reach the Linux bridge limitation on the number of interfaces 
> > > in a single bridge. Currently, we have 210 tap interface in a bridge, and 
> > > we suffer from more than 50% packet loss when we ping the IP address of 
> > > the virtual machine that uses one of the tap interfaces in the bridge.
> > > Do you know how we can connect more than 200 VMs virtual interfaces to a 
> > > bridge?
> > > 
> > > Best regards,
> > > Ali
> > > 
> > > Sent with Proton Mail secure email.  
> > 
> > 
> > The upper limit on interfaces per bridge should be 1023.
> > That limitation comes from spanning tree.
> > 
> > You might bet able to improve performance by disabling flooding to those 
> > tap devices.
> > Normally, any broadcast/unknown/multicast must be copied and flooded to 
> > each interface.  
> 
> Thanks a lot for your guidance. I disabled the spanning tree on the bridge 
> with `brctl stp br0 off` but the issue does not resolve. Would you please 
> elaborate more about disabling flooding on tap devices, I don't know how I 
> should disable flooding on tap devices.

Look at the documentation of the bridge command which describes per-port 
options:
https://man7.org/linux/man-pages/man8/bridge.8.html

You do want to leave flooding on for the downstream bridge port.

You can also add some security by limiting where/when STP comes from and disable
learning on the TAP devices so that if VM sends bogus packets, the bridge won't 
get DoS.

Re: [Bridge] How to connect more than 200 interfaces to a bridge

2022-11-09 Thread Stephen Hemminger via Bridge

On Wed, 09 Nov 2022 19:25:32 +
Ali Shirvani  wrote:

> Sent with Proton Mail secure email.
> 
> --- Original Message ---
> On Wednesday, November 9th, 2022 at 8:21 PM, Stephen Hemminger 
>  wrote:
> 
> 
> > On Wed, 09 Nov 2022 10:51:27 +
> > Ali Shirvani via Bridge bridge@lists.linux-foundation.org wrote:
> >   
> > > Hello everyone,
> > > 
> > > It seems we reach the Linux bridge limitation on the number of interfaces 
> > > in a single bridge. Currently, we have 210 tap interface in a bridge, and 
> > > we suffer from more than 50% packet loss when we ping the IP address of 
> > > the virtual machine that uses one of the tap interfaces in the bridge.
> > > Do you know how we can connect more than 200 VMs virtual interfaces to a 
> > > bridge?
> > > 
> > > Best regards,
> > > Ali
> > > 
> > > Sent with Proton Mail secure email.  
> > 
> > 
> > The upper limit on interfaces per bridge should be 1023.
> > That limitation comes from spanning tree.
> > 
> > You might bet able to improve performance by disabling flooding to those 
> > tap devices.
> > Normally, any broadcast/unknown/multicast must be copied and flooded to 
> > each interface.  
> 
> Thanks a lot for your guidance. I disabled the spanning tree on the bridge 
> with `brctl stp br0 off` but the issue does not resolve. Would you please 
> elaborate more about disabling flooding on tap devices, I don't know how I 
> should disable flooding on tap devices.

It is not a spanning tree issue, in fact STP can protect you from bad VM's.
It is more about configuring the bridge ports after setup.

Re: [Bridge] How to connect more than 200 interfaces to a bridge

2022-11-09 Thread Stephen Hemminger via Bridge

On Wed, 09 Nov 2022 10:51:27 +
Ali Shirvani via Bridge  wrote:

> Hello everyone,
> 
> It seems we reach the Linux bridge limitation on the number of interfaces in 
> a single bridge. Currently, we have 210 tap interface in a bridge, and we 
> suffer from more than 50% packet loss when we ping the IP address of the 
> virtual machine that uses one of the tap interfaces in the bridge.
> Do you know how we can connect more than 200 VMs virtual interfaces to a 
> bridge?
> 
> Best regards,
> Ali
> 
> Sent with [Proton Mail](https://proton.me/) secure email.

The upper limit on interfaces per bridge should be 1023.
That limitation comes from spanning tree.

You might bet able to improve performance by disabling flooding to those tap 
devices.
Normally, any broadcast/unknown/multicast must be copied and flooded to each 
interface.

Re: [Bridge] [PATCH iproute2-next 2/2] bridge: fdb: enable FDB blackhole feature

2022-09-29 Thread Stephen Hemminger via Bridge

On Thu, 29 Sep 2022 17:21:37 +0200
Hans Schultz  wrote:

>  
> @@ -493,6 +496,8 @@ static int fdb_modify(int cmd, int flags, int argc, char 
> **argv)
>   req.ndm.ndm_flags |= NTF_EXT_LEARNED;
>   } else if (matches(*argv, "sticky") == 0) {
>   req.ndm.ndm_flags |= NTF_STICKY;
> + } else if (matches(*argv, "blackhole") == 0) {
> + ext_flags |= NTF_EXT_BLACKHOLE;
>   } else {
>   if (strcmp(*argv, "to") == 0)
>   NEXT_ARG();

The parsing of flags is weird here, most of the flags are compared with strcmp()
but some use matches()..  I should have used strcmp() all the time; but at the
time did not realize what kind of confusion matches() can cause.

Re: [Bridge] [PATCH RFC] net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.

2022-05-06 Thread Stephen Hemminger

On Fri, 6 May 2022 03:18:22 +0200
Andrew Lunn  wrote:

> On Thu, May 05, 2022 at 04:07:20PM -0700, Stephen Hemminger wrote:
> > On Fri,  6 May 2022 00:59:04 +0200
> > Andrew Lunn  wrote:
> >   
> > > It is possible to stack bridges on top of each other. Consider the
> > > following which makes use of an Ethernet switch:
> > > 
> > >br1
> > >  /\
> > > /  \
> > >/\
> > >  br0.11wlan0
> > >|
> > >br0
> > >  /  |  \
> > > p1  p2  p3
> > > 
> > > br0 is offloaded to the switch. Above br0 is a vlan interface, for
> > > vlan 11. This vlan interface is then a slave of br1. br1 also has
> > > wireless interface as a slave. This setup trunks wireless lan traffic
> > > over the copper network inside a VLAN.
> > > 
> > > A frame received on p1 which is passed up to the bridge has the
> > > skb->offload_fwd_mark flag set to true, indicating it that the switch
> > > has dealt with forwarding the frame out ports p2 and p3 as
> > > needed. This flag instructs the software bridge it does not need to
> > > pass the frame back down again. However, the flag is not getting reset
> > > when the frame is passed upwards. As a result br1 sees the flag,
> > > wrongly interprets it, and fails to forward the frame to wlan0.
> > > 
> > > When passing a frame upwards, clear the flag.
> > > 
> > > RFC because i don't know the bridge code well enough if this is the
> > > correct place to do this, and if there are any side effects, could the
> > > skb be a clone, etc.
> > > 
> > > Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward 
> > > mark")
> > > Signed-off-by: Andrew Lunn   
> > 
> > Bridging of bridges is not supposed to be allowed.
> > See:
> > 
> > bridge:br_if.c
> > 
> > /* No bridging of bridges */
> > if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) {
> > NL_SET_ERR_MSG(extack,
> >"Can not enslave a bridge to a bridge");
> > return -ELOOP;
> > }  
> 
> This is not direct bridging of bridges. There is a vlan interface in
> the middle. And even if it is not supposed to work, it does work, it
> is being used, and it regressed. This fixes the regression.
> 
>Andrew

The problem is that doing this kind of nested bridging screws up
Spanning Tree.

Re: [Bridge] [PATCH RFC] net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.

2022-05-05 Thread Stephen Hemminger

On Fri,  6 May 2022 00:59:04 +0200
Andrew Lunn  wrote:

> It is possible to stack bridges on top of each other. Consider the
> following which makes use of an Ethernet switch:
> 
>br1
>  /\
> /  \
>/\
>  br0.11wlan0
>|
>br0
>  /  |  \
> p1  p2  p3
> 
> br0 is offloaded to the switch. Above br0 is a vlan interface, for
> vlan 11. This vlan interface is then a slave of br1. br1 also has
> wireless interface as a slave. This setup trunks wireless lan traffic
> over the copper network inside a VLAN.
> 
> A frame received on p1 which is passed up to the bridge has the
> skb->offload_fwd_mark flag set to true, indicating it that the switch
> has dealt with forwarding the frame out ports p2 and p3 as
> needed. This flag instructs the software bridge it does not need to
> pass the frame back down again. However, the flag is not getting reset
> when the frame is passed upwards. As a result br1 sees the flag,
> wrongly interprets it, and fails to forward the frame to wlan0.
> 
> When passing a frame upwards, clear the flag.
> 
> RFC because i don't know the bridge code well enough if this is the
> correct place to do this, and if there are any side effects, could the
> skb be a clone, etc.
> 
> Fixes: f1c2eddf4cb6 ("bridge: switchdev: Use an helper to clear forward mark")
> Signed-off-by: Andrew Lunn 

Bridging of bridges is not supposed to be allowed.
See:

bridge:br_if.c

/* No bridging of bridges */
if (dev->netdev_ops->ndo_start_xmit == br_dev_xmit) {
NL_SET_ERR_MSG(extack,
   "Can not enslave a bridge to a bridge");
return -ELOOP;
}

Re: [Bridge] [PATCH net-next v4 2/2] net: vxlan: vxlan_core.c: Add extack support to vxlan_fdb_delete

2022-04-25 Thread Stephen Hemminger

On Mon, 25 Apr 2022 16:25:07 +0200
Alaa Mohamed  wrote:

>  static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
>  union vxlan_addr *ip, __be16 *port, __be32 *src_vni,
> -__be32 *vni, u32 *ifindex, u32 *nhid)
> +__be32 *vni, u32 *ifindex, u32 *nhid, struct 
> netlink_ext_ack *extack)
>  {
>   struct net *net = dev_net(vxlan->dev);
>   int err;
> 
>   if (tb[NDA_NH_ID] && (tb[NDA_DST] || tb[NDA_VNI] || tb[NDA_IFINDEX] ||
> - tb[NDA_PORT]))
> - return -EINVAL;
> + tb[NDA_PORT])) {
> + NL_SET_ERR_MSG(extack,
> +"DST, VNI, ifindex and port 
> are mutually exclusive with NH_ID");
> + return -EINVAL;
> + }

The indentation is off here?

Re: [Bridge] [PATCH] decouple llc/bridge

2022-04-08 Thread Stephen Hemminger

On Thu, 7 Apr 2022 19:48:59 -0700
Jakub Kicinski  wrote:

> On Thu, 7 Apr 2022 09:16:40 -0700 Stephen Hemminger wrote:
> > > I was wondering why the llc code was getting compiled and it turned out
> > > to be because I had bridging enabled. It turns out to only needs it for
> > > a single function (llc_mac_hdr_init).  
> 
> > > +static inline int llc_mac_hdr_init(struct sk_buff *skb,
> > > +const unsigned char *sa, const unsigned char 
> > > *da)
> > > +{
> > > + int rc = -EINVAL;
> > > +
> > > + switch (skb->dev->type) {
> > > + case ARPHRD_ETHER:
> > > + case ARPHRD_LOOPBACK:
> > > + rc = dev_hard_header(skb, skb->dev, ETH_P_802_2, da, sa,
> > > +  skb->len);
> > > + if (rc > 0)
> > > + rc = 0;
> > > + break;
> > > + default:
> > > + break;
> > > + }
> > > + return rc;
> > > +}
> > > +
> > >
> 
> nit: extra new line
> 
> > > -int llc_mac_hdr_init(struct sk_buff *skb,
> > > -  const unsigned char *sa, const unsigned char *da)
> > > -{
> > > - int rc = -EINVAL;
> > > -
> > > - switch (skb->dev->type) {
> > > - case ARPHRD_ETHER:
> > > - case ARPHRD_LOOPBACK:
> > > - rc = dev_hard_header(skb, skb->dev, ETH_P_802_2, da, sa,
> > > -  skb->len);
> > > - if (rc > 0)
> > > - rc = 0;
> > > - break;
> > > - default:
> > > - break;
> > > - }
> > > - return rc;
> > > -}  
> 
> There's also an EXPORT somewhere in this file that has to go.
> 
> > >  /**
> > >   *   llc_build_and_send_ui_pkt - unitdata request interface for 
> > > upper layers
> > >   *   @sap: sap to use
> > 
> > You may break other uses of LLC.
> > 
> > Why not open code as different function.  I used the llc stuff because there
> > were multiple copies of same code (DRY).  
> 
> I didn't quite get what you mean, Stephen, would you mind restating?


Short answer: it was idea that didn't pan out.

Suggestion: get rid of using LLC  in bridge and just rewrite that one place.

Re: [Bridge] [PATCH] decouple llc/bridge

2022-04-07 Thread Stephen Hemminger

On Thu, 7 Apr 2022 11:12:17 -0400
Dave Jones  wrote:

> I was wondering why the llc code was getting compiled and it turned out
> to be because I had bridging enabled. It turns out to only needs it for
> a single function (llc_mac_hdr_init).
> 
> Converting this to a static inline like the other llc functions it uses
> allows to decouple the dependency
> 
> Signed-off-by: Dave Jones 
> 
> diff --git include/net/llc.h include/net/llc.h
> index e250dca03963..edcb120ee6b0 100644
> --- include/net/llc.h
> +++ include/net/llc.h
> @@ -13,6 +13,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -100,8 +101,34 @@ extern struct list_head llc_sap_list;
>  int llc_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type 
> *pt,
>   struct net_device *orig_dev);
>  
> -int llc_mac_hdr_init(struct sk_buff *skb, const unsigned char *sa,
> -  const unsigned char *da);
> +/**
> + *  llc_mac_hdr_init - fills MAC header fields
> + *  @skb: Address of the frame to initialize its MAC header
> + *  @sa: The MAC source address
> + *  @da: The MAC destination address
> + *
> + *  Fills MAC header fields, depending on MAC type. Returns 0, If MAC 
> type
> + *  is a valid type and initialization completes correctly 1, otherwise.
> + */
> +static inline int llc_mac_hdr_init(struct sk_buff *skb,
> +const unsigned char *sa, const unsigned char 
> *da)
> +{
> + int rc = -EINVAL;
> +
> + switch (skb->dev->type) {
> + case ARPHRD_ETHER:
> + case ARPHRD_LOOPBACK:
> + rc = dev_hard_header(skb, skb->dev, ETH_P_802_2, da, sa,
> +  skb->len);
> + if (rc > 0)
> + rc = 0;
> + break;
> + default:
> + break;
> + }
> + return rc;
> +}
> +
>  
>  void llc_add_pack(int type,
> void (*handler)(struct llc_sap *sap, struct sk_buff *skb));
> diff --git net/802/Kconfig net/802/Kconfig
> index aaa83e888240..8bea5d1d5118 100644
> --- net/802/Kconfig
> +++ net/802/Kconfig
> @@ -1,7 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  config STP
>   tristate
> - select LLC
>  
>  config GARP
>   tristate
> diff --git net/bridge/Kconfig net/bridge/Kconfig
> index 3c8ded7d3e84..c011856d3386 100644
> --- net/bridge/Kconfig
> +++ net/bridge/Kconfig
> @@ -5,7 +5,6 @@
>  
>  config BRIDGE
>   tristate "802.1d Ethernet Bridging"
> - select LLC
>   select STP
>   depends on IPV6 || IPV6=n
>   help
> diff --git net/llc/llc_output.c net/llc/llc_output.c
> index 5a6466fc626a..ad66886ed141 100644
> --- net/llc/llc_output.c
> +++ net/llc/llc_output.c
> @@ -13,34 +13,6 @@
>  #include 
>  #include 
>  
> -/**
> - *   llc_mac_hdr_init - fills MAC header fields
> - *   @skb: Address of the frame to initialize its MAC header
> - *   @sa: The MAC source address
> - *   @da: The MAC destination address
> - *
> - *   Fills MAC header fields, depending on MAC type. Returns 0, If MAC type
> - *   is a valid type and initialization completes correctly 1, otherwise.
> - */
> -int llc_mac_hdr_init(struct sk_buff *skb,
> -  const unsigned char *sa, const unsigned char *da)
> -{
> - int rc = -EINVAL;
> -
> - switch (skb->dev->type) {
> - case ARPHRD_ETHER:
> - case ARPHRD_LOOPBACK:
> - rc = dev_hard_header(skb, skb->dev, ETH_P_802_2, da, sa,
> -  skb->len);
> - if (rc > 0)
> - rc = 0;
> - break;
> - default:
> - break;
> - }
> - return rc;
> -}
> -
>  /**
>   *   llc_build_and_send_ui_pkt - unitdata request interface for upper layers
>   *   @sap: sap to use

You may break other uses of LLC.

Why not open code as different function.  I used the llc stuff because there
were multiple copies of same code (DRY).

Re: [Bridge] [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees

2022-03-01 Thread Stephen Hemminger

On Tue, 1 Mar 2022 18:21:42 +0200
Vladimir Oltean  wrote:

> Hi Tobias,
> 
> On Tue, Mar 01, 2022 at 11:03:11AM +0100, Tobias Waldekranz wrote:
> > A proposal for the corresponding iproute2 interface is available here:
> > 
> > https://github.com/wkz/iproute2/tree/mst  
> 
> Please pardon my ignorance. Is there a user-mode STP protocol application
> that supports MSTP, and that you've tested these patches with?
> I'd like to give it a try.

https://github.com/mstpd/mstpd

Re: [Bridge] [PATCH net-next 1/4] net: bridge: Add support for bridge port in locked mode

2022-02-07 Thread Stephen Hemminger

On Mon,  7 Feb 2022 11:07:39 +0100
Hans Schultz  wrote:

> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -532,6 +532,7 @@ enum {
>   IFLA_BRPORT_GROUP_FWD_MASK,
>   IFLA_BRPORT_NEIGH_SUPPRESS,
>   IFLA_BRPORT_ISOLATED,
> + IFLA_BRPORT_LOCKED,
>   IFLA_BRPORT_BACKUP_PORT,
>   IFLA_BRPORT_MRP_RING_OPEN,

Adding new element in middle of enum causes them to be renumbered.
That breaks the kernel ABI. Please add only at end.

Re: [Bridge] [PATCH net] net: bridge: mcast: Do not allow users to set IGMP counter/timer to zero

2021-10-20 Thread Stephen Hemminger

On Wed, 20 Oct 2021 09:02:01 +0800
Hangbin Liu  wrote:

> On Tue, Oct 19, 2021 at 07:09:42PM +0300, Nikolay Aleksandrov wrote:
> > > I started this patch when I saw there is not limit for setting
> > > multicast_membership_interval to 0, which will cause bridge remove the
> > > mdb directly after adding. Do you think this is a problem.
> > > 
> > > And what about others? I don't think there is a meaning to set other 
> > > intervals
> > > to 0.
> > >   
> > 
> > The problem is not if there is meaning, we cannot start restricting option 
> > values now after
> > they've become uapi (and have been for a very long time) because we can 
> > break user-space even
> > though chances are pretty low. I don't think this patch is acceptable, I 
> > commented on the other
> > patch issues but they don't matter because of this.  
> 
> OK, I got your mean, we should not restrict the configurations based on 
> whether
> there is a meaning.
> 
> Thanks
> Hangbin

Maybe the bridge command could enforce that the value set are sane relative
to the RFC?  We already fixup some things in iproute2 utilities to workaround
places where changing defaults in kernel would break userspace.

Re: [Bridge] net: bridge: multicast: Renaming of flag BRIDGE_IGMP_SNOOPING

2021-06-07 Thread Stephen Hemminger

On Mon, 7 Jun 2021 21:29:34 +0530
Ujjal Roy  wrote:

> Hi Stephen,
> 
> Can we rename this flag BRIDGE_IGMP_SNOOPING into something like
> BRIDGE_MULTICAST_SNOOPING/BRIDGE_MCAST_SNOOPING? I am starting this
> thread because this BRIDGE_IGMP_SNOOPING flag holds information about
> IGMP only but not about MLD. Or this is not a common name to describe
> both IGMP and MLD. Please let me know about my concern, so that I can
> change the code and submit a patch.
> 
> Thanks,
> UjjaL Roy

It is part of user API at this point so not possible to remove old definition.

Re: [Bridge] [PATCH] Include header for use of PATH_MAX

2021-04-12 Thread Stephen Hemminger

On Mon, 12 Apr 2021 09:30:34 -0500
Matt Weber  wrote:

> Since a1f20223cfcf7b9eae8c9e7e7a07e7bf1377f65b in some build
> configurations PATH_MAX is undefined.
> 
> Build log from Buildroot testing failure:
> http://autobuild.buildroot.net/results/3ec818e676827c03f08c264c6c7a53b025b7ba1d/build-end.log
> 
> ...
> /home/giuliobenetti/autobuild/run/instance-0/output-1/host/bin/arm-buildroot-linux-musleabihf-gcc
>  -Wall -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64  -O2  
>  
> -I/home/giuliobenetti/autobuild/run/instance-0/output-1/host/arm-buildroot-linux-musleabihf/sysroot/usr/include
>   -c libbridge_misc.c
> libbridge_init.c: In function 'isbridge':
> libbridge_init.c:49:12: error: 'PATH_MAX' undeclared (first use in this 
> function); did you mean 'AF_MAX'?
>49 |  char path[PATH_MAX];
>   |^~~~
>   |AF_MAX
> libbridge_init.c:49:12: note: each undeclared identifier is reported only 
> once for each function it appears in
> libbridge_init.c:49:7: warning: unused variable 'path' [-Wunused-variable]
>49 |  char path[PATH_MAX];
>   |   ^~~~
> make[2]: *** [Makefile:37: libbridge_init.o] Error 1
> make[2]: *** Waiting for unfinished jobs
> libbridge_devif.c: In function 'fpopen':
> libbridge_devif.c:33:12: error: 'PATH_MAX' undeclared (first use in this 
> function); did you mean 'AF_MAX'?
>33 |  char path[PATH_MAX];
>   |^~~~
>   |AF_MAX
> 
> Signed-off-by: Matthew Weber 
> ---
>  libbridge/libbridge_devif.c | 1 +
>  libbridge/libbridge_init.c  | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/libbridge/libbridge_devif.c b/libbridge/libbridge_devif.c
> index 8b7d954..95f7769 100644
> --- a/libbridge/libbridge_devif.c
> +++ b/libbridge/libbridge_devif.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "libbridge.h"
>  #include "libbridge_private.h"
> diff --git a/libbridge/libbridge_init.c b/libbridge/libbridge_init.c
> index c914971..645dda6 100644
> --- a/libbridge/libbridge_init.c
> +++ b/libbridge/libbridge_init.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Shouldn't refer to Linux version of the file.

According IWYU

libbridge/libbridge_devif.c should add these lines:
#include  // for PATH_MAX
#include// for ifreq, IFNAMSIZ, ifr_data, ifr_name
#include // for __bridge_info, __port_info, __fdb_entry
#include   // for ioctl, SIOCDEVPRIVATE
struct timeval;

libbridge/libbridge_devif.c should remove these lines:
- #include   // lines 21-21

The full include-list for libbridge/libbridge_devif.c:
#include  // for closedir, opendir, DIR
#include   // for errno, EAGAIN
#include   // for open, O_WRONLY, SEEK_SET
#include  // for PATH_MAX
#include// for ifreq, IFNAMSIZ, ifr_data, ifr_name
#include // for __bridge_info, __port_info, __fdb_entry
#include   // for snprintf, fclose, fopen, fscanf, FILE
#include  // for memcpy, memset, strerror
#include   // for ioctl, SIOCDEVPRIVATE
#include  // for close, sleep, write
#include "libbridge.h"  // for bridge_info, port_info, strlcpy, brid...
#include "libbridge_private.h"  // for __jiffies_to_tv, __tv_to_jiffies, SYS...

[Bridge] Minor update to bridge-utils

2021-03-04 Thread Stephen Hemminger

Small changes to bridge-utils to address some minor issues.

1. The default branch is main not master

2. Fixed some compiler warnings because Gcc 10 and Clang now
   do checks for string overflow.

3. Made a backup repository mirror at github.

4. Fixed version string printed

This is not a required update, none of these are important
or critical to users.

Do not intend to do any new features or serious fixes to
bridge-utils; all users should convert to iproute2/bridge command.

Re: [Bridge] [PATCH net-next v7 01/10] net: bridge: extend the process of special frames

2020-10-27 Thread Stephen Hemminger

On Tue, 27 Oct 2020 10:02:42 +
Henrik Bjoernlund via Bridge  wrote:

> +/* Return 0 if the frame was not processed otherwise 1
> + * note: already called with rcu_read_lock
> + */
> +static int br_process_frame_type(struct net_bridge_port *p,
> +  struct sk_buff *skb)
> +{
> + struct br_frame_type *tmp;
> +
> + hlist_for_each_entry_rcu(tmp, >br->frame_type_list, list)
> + if (unlikely(tmp->type == skb->protocol))
> + return tmp->frame_handler(p, skb);
> +
> + return 0;
> +}

Does the linear search of frame types have noticable impact on performance?
Hint: maybe a bitmap or something would be faster.

Re: [Bridge] [RFC PATCH iproute2] bridge: add support for L2 multicast groups

2020-10-22 Thread Stephen Hemminger

On Sat, 17 Oct 2020 21:45:26 +0300
Vladimir Oltean  wrote:

> Extend the 'bridge mdb' command for the following syntax:
> bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent
> 
> Signed-off-by: Vladimir Oltean 
> ---
>  bridge/mdb.c   | 54 ++
>  include/uapi/linux/if_bridge.h |  2 ++
>  2 files changed, 43 insertions(+), 13 deletions(-)
> 
> diff --git a/bridge/mdb.c b/bridge/mdb.c
> index 4cd7ca762b78..af160250928e 100644
> --- a/bridge/mdb.c
> +++ b/bridge/mdb.c
> @@ -149,6 +149,7 @@ static void print_mdb_entry(FILE *f, int ifindex, const 
> struct br_mdb_entry *e,
>   struct nlmsghdr *n, struct rtattr **tb)
>  {
>   const void *grp, *src;
> + const char *addr;
>   SPRINT_BUF(abuf);
>   const char *dev;
>   int af;
> @@ -156,9 +157,16 @@ static void print_mdb_entry(FILE *f, int ifindex, const 
> struct br_mdb_entry *e,
>   if (filter_vlan && e->vid != filter_vlan)
>   return;
>  
> - af = e->addr.proto == htons(ETH_P_IP) ? AF_INET : AF_INET6;
> - grp = af == AF_INET ? (const void *)>addr.u.ip4 :
> -   (const void *)>addr.u.ip6;
> + if (!e->addr.proto) {
> + af = AF_PACKET;
> + grp = (const void *)>addr.u.mac_addr;
> + } else if (e->addr.proto == htons(ETH_P_IP)) {
> + af = AF_INET;
> + grp = (const void *)>addr.u.ip4;
> + } else {
> + af = AF_INET6;
> + grp = (const void *)>addr.u.ip6;
> + }
>   dev = ll_index_to_name(ifindex);
>  

In C casts of pointer to void are not necessary.

Re: [Bridge] [PATCH net-next 01/12] net: core: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats

2020-10-11 Thread Stephen Hemminger

On Sun, 11 Oct 2020 21:36:43 +0200
Heiner Kallweit  wrote:

> +void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s,
> +struct pcpu_sw_netstats __percpu *netstats)

netstats is unmodified, should it be const?

> +{
> + int cpu;
> +
> + if (IS_ERR_OR_NULL(netstats))
> + return;

Any code calling this with a null pointer is broken/buggy, please don't
ignore that.

Re: [Bridge] [PATCH RFC 0/7] net: bridge: cfm: Add support for Connectivity Fault Management(CFM)

2020-09-04 Thread Stephen Hemminger

On Fri, 4 Sep 2020 09:15:20 +
Henrik Bjoernlund  wrote:

> Connectivity Fault Management (CFM) is defined in 802.1Q section 12.14.
> 
> Connectivity Fault Management (CFM) comprises capabilities for
> detecting, verifying, and isolating connectivity failures in
> Virtual Bridged Networks. These capabilities can be used in
> networks operated by multiple independent organizations, each
> with restricted management access to each other’s equipment.
> 
> CFM functions are partitioned as follows:
> — Path discovery
> — Fault detection
> — Fault verification and isolation
> — Fault notification
> — Fault recovery
> 
> The primary CFM protocol shims are called Maintenance Points (MPs).
> A MP can be either a MEP or a MHF.
> The MEP:
> -It is the Maintenance association End Point
>  described in 802.1Q section 19.2.
> -It is created on a specific level (1-7) and is assuring
>  that no CFM frames are passing through this MEP on lower levels.
> -It initiates and terminates/validates CFM frames on its level.
> -It can only exist on a port that is related to a bridge.
> The MHF:
> -It is the Maintenance Domain Intermediate Point
>  (MIP) Half Function (MHF) described in 802.1Q section 19.3.
> -It is created on a specific level (1-7).
> -It is extracting/injecting certain CFM frame on this level.
> -It can only exist on a port that is related to a bridge.
> -Currently not supported.
> 
> There are defined the following CFM protocol functions:
> -Continuity Check
> -Loopback. Currently not supported.
> -Linktrace. Currently not supported.
> 
> This CFM component supports create/delete of MEP instances and
> configuration of the different CFM protocols. Also status information
> can be fetched and delivered through notification due to defect status
> change.
> 
> The user interacts with CFM using the 'cfm' user space client program, the
> client talks with the kernel using netlink. The kernel will try to offload
> the requests to the HW via switchdev API (not implemented yet).
> 
> Any notification emitted by CFM from the kernel can be monitored in user
> space by starting 'cfm_server' program.
> 
> Currently this 'cfm' and 'cfm_server' programs are standalone placed in a
> cfm repository https://github.com/microchip-ung/cfm but it is considered
> to integrate this into 'iproute2'.
> 
> Reviewed-by: Horatiu Vultur  
> Signed-off-by: Henrik Bjoernlund  

Could this be done in userspace? It is a control plane protocol.
Could it be done by using eBPF?

Adding more code in bridge impacts a large number of users of Linux distros.
It creates bloat and potential security vulnerabilities.

Re: [Bridge] [RFC PATCH net-next] bridge: Implement MLD Querier wake-up calls / Android bug workaround

2020-08-23 Thread Stephen Hemminger

On Sun, 23 Aug 2020 17:42:39 +0200
Linus Lüssing  wrote:

> On Sun, Aug 16, 2020 at 03:08:13PM -0700, Stephen Hemminger wrote:
> > Rather than adding yet another feature to the bridge, could this hack be 
> > done by
> > having a BPF hook? or netfilter module?  
> 
> Hi Stephen,
> 
> Thanks for the constructive feedback and suggestions!
> 
> The netfilter approach sounds tempting. However as far as I know
> OpenWrt's firewall has no easy layer 2 netfilter integration yet.
> So it has default layer 3 netfilter rules, but not for layer 2.
> 
> Ideally I'd want to work towards a solution where things "just
> work as expected" when a user enables "IGMP Snooping" in the UI.
> I could hack the netfilter rules into netifd, the OpenWrt network
> manager, when it configures the bridge. But not sure if the
> OpenWrt maintainers would like that...
> 
> Any preferences from the OpenWrt maintainers side?
> 
> Regards, Linus
> 
> 
> PS: With BPF I don't have that much experience yet. I would need
> to write a daemon which would parse the MLD packets and would
> fetch the FDB via netlink, right? If so, sounds like that would
> need way more than 300 lines of code. And that would need to be
> maintained within OpenWrt, right?

With BPF you would need to write a small program that transforms the packet
as you want. The BPF program and the userspace program would share a
map table.  The userspace program would monitor netlink messages about
FDB and update the map. Yes it would be a few hundred lines but not huge.
The userspace could even be selective and only do it for devices where
it knows they are using the broken Android code.

Sorry, no idea how OpenWrt manages their packages.

Re: [Bridge] [RFC PATCH net-next] bridge: Implement MLD Querier wake-up calls / Android bug workaround

2020-08-16 Thread Stephen Hemminger

On Sun, 16 Aug 2020 22:24:24 +0200
Linus Lüssing  wrote:

> Implement a configurable MLD Querier wake-up calls "feature" which
> works around a widely spread Android bug in connection with IGMP/MLD
> snooping.
> 
> Currently there are mobile devices (e.g. Android) which are not able
> to receive and respond to MLD Queries reliably because the Wifi driver
> filters a lot of ICMPv6 when the device is asleep - including
> MLD. This in turn breaks IPv6 communication when MLD Snooping is
> enabled. However there is one ICMPv6 type which is allowed to pass and
> which can be used to wake up the mobile device: ICMPv6 Echo Requests.
> 
> If this bridge is the selected MLD Querier then setting
> "multicast_wakeupcall" to a number n greater than 0 will send n
> ICMPv6 Echo Requests to each host behind this port to wake
> them up with each MLD Query. Upon receiving a matching ICMPv6 Echo
> Reply an MLD Query with a unicast ethernet destination will be sent
> to the specific host(s).
> 
> Link: https://issuetracker.google.com/issues/149630944
> Link: https://github.com/freifunk-gluon/gluon/issues/1832
> 
> Signed-off-by: Linus Lüssing 
> ---
> A version of this patch rebased to Linux 4.14 is currently applied on a
> 400 nodes mesh network (Freifunk Vogtland).
> 
> I'm aware that this is quite a hack, so I'm unsure if this is suitable
> for upstream. On the other hand, the Android ticket isn't moving
> anywhere and even if it were fixed in Android, I'd expect it to take
> years until that fix would propagate or unpatched Android devices to
> vanish. So I'm wondering if it should be treated like a hardware bug
> workaround and by that should be suitable for applying it upstream in
> the Linux kernel?
> 
> I've also raised this issue on the mcast-w...@ietf.org and p...@ietf.org
> mailing list earlier this year but the amount of feedback was a bit
> sparse.
> 
> CC'ing the OpenWrt mailing list, too, as I expect there to be most users
> affected by this issue, if they enabled IGMP/MLD snooping.
> 
> Let me know what you think about this.
> 
>  include/linux/if_bridge.h|   1 +
>  include/uapi/linux/if_link.h |   1 +
>  net/bridge/Kconfig   |  26 
>  net/bridge/br_fdb.c  |  10 ++
>  net/bridge/br_input.c|   4 +-
>  net/bridge/br_multicast.c| 284 ++-
>  net/bridge/br_netlink.c  |  19 +++
>  net/bridge/br_private.h  |  19 +++
>  net/bridge/br_sysfs_if.c |  18 +++
>  9 files changed, 374 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 6479a38e52fa..73bc692e1ae6 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -50,6 +50,7 @@ struct br_ip_list {
>  #define BR_MRP_AWARE BIT(17)
>  #define BR_MRP_LOST_CONT BIT(18)
>  #define BR_MRP_LOST_IN_CONT  BIT(19)
> +#define BR_MULTICAST_WAKEUPCALL  BIT(20)
>  
>  #define BR_DEFAULT_AGEING_TIME   (300 * HZ)
>  
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 7fba4de511de..5015f8ce1ad7 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -355,6 +355,7 @@ enum {
>   IFLA_BRPORT_BACKUP_PORT,
>   IFLA_BRPORT_MRP_RING_OPEN,
>   IFLA_BRPORT_MRP_IN_OPEN,
> + IFLA_BRPORT_MCAST_WAKEUPCALL,
>   __IFLA_BRPORT_MAX
>  };
>  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
> diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
> index 80879196560c..056e80bf00c4 100644
> --- a/net/bridge/Kconfig
> +++ b/net/bridge/Kconfig
> @@ -48,6 +48,32 @@ config BRIDGE_IGMP_SNOOPING
>  
> If unsure, say Y.
>  
> +config BRIDGE_IGMP_SNOOPING_WAKEUPCALLS
> + bool "MLD Querier wake-up calls"
> + depends on BRIDGE_IGMP_SNOOPING
> + depends on IPV6
> + help
> +   If you say Y here, then the MLD Snooping Querier will be built
> +   with a per bridge port wake-up call "feature"/workaround.
> +
> +   Currently there are mobile devices (e.g. Android) which are not able
> +   to receive and respond to MLD Queries reliably because the Wifi driver
> +   filters a lot of ICMPv6 when the device is asleep - including MLD.
> +   This in turn breaks IPv6 communication when MLD Snooping is enabled.
> +   However there is one ICMPv6 type which is allowed to pass and
> +   which can be used to wake up the mobile device: ICMPv6 Echo Requests.
> +
> +   If this bridge is the selected MLD Querier then setting
> +   "multicast_wakeupcall" to a number n greater than 0 will send n
> +   ICMPv6 Echo Requests to each host behind this port to wake them up
> +   with each MLD Query. Upon receiving a matching ICMPv6 Echo Reply
> +   an MLD Query with a unicast ethernet destination will be sent to the
> +   specific host(s).
> +
> +   Say N to exclude this support and reduce the binary size.
> +
> +   If unsure, say N.
> +
>  config BRIDGE_VLAN_FILTERING
>   bool "VLAN filtering"
>

Re: [Bridge] [PATCH net v3] net: bridge: fix vlan stats use-after-free on destruction

2020-05-20 Thread Stephen Hemminger

On Fri, 16 Nov 2018 18:50:01 +0200
Nikolay Aleksandrov  wrote:

> + if (v->priv_flags & BR_VLFLAG_PER_PORT_STATS)
>   free_percpu(v->stats);

Why not not v->stats == NULL as a flag instead?

Then the fact that free_percpu(NULL) is a Nop would mean less code
in the bridge driver.

Re: [Bridge] [PATCH net v3] net: bridge: fix vlan stats use-after-free on destruction

2020-04-23 Thread Stephen Hemminger

On Fri, 16 Nov 2018 18:50:01 +0200
Nikolay Aleksandrov  wrote:

> Syzbot reported a use-after-free of the global vlan context on port vlan
> destruction. When I added per-port vlan stats I missed the fact that the
> global vlan context can be freed before the per-port vlan rcu callback.
> There're a few different ways to deal with this, I've chosen to add a
> new private flag that is set only when per-port stats are allocated so
> we can directly check it on destruction without dereferencing the global
> context at all. The new field in net_bridge_vlan uses a hole.
> 
> v2: cosmetic change, move the check to br_process_vlan_info where the
> other checks are done
> v3: add change log in the patch, add private (in-kernel only) flags in a
> hole in net_bridge_vlan struct and use that instead of mixing
> user-space flags with private flags
> 
> Fixes: 9163a0fc1f0c ("net: bridge: add support for per-port vlan stats")
> Reported-by: syzbot+04681da557a0e49a5...@syzkaller.appspotmail.com
> Signed-off-by: Nikolay Aleksandrov 

Why not just use v->stats itself as the flag.
Since free of NULL is a nop it would be cleaner?

Re: [Bridge] [PATCH net v2] net: bridge: fix stale eth hdr pointer in br_dev_xmit

2020-02-24 Thread Stephen Hemminger

On Mon, 24 Feb 2020 18:46:22 +0200
Nikolay Aleksandrov  wrote:

> - eth = eth_hdr(skb);
>   skb_pull(skb, ETH_HLEN)

you could just swap these two lines.

Re: [Bridge] Support for long interface names

2020-02-16 Thread Stephen Hemminger

On Mon, 10 Feb 2020 16:00:49 +0100
Marcel Waldvogel  wrote:

> Hi,
> I've added support for long bridge interface names (such as"br-
> 0081a64f416f") to "brctl show" (long physical interface names arenot a
> problem there).
> Please find a branch with the commit at 
> https://github.com/MarcelWaldvogel/bridge-utils/tree/long-ifnames
> 
> -Marcel

Thanks but I have no plans to add any new features to bridge-utils.
Just like ifconfig, users should migrate to the new bridge command
in iproute2.

Re: [Bridge] [PATCH net-next 1/4] net: bridge: check port state before br_allowed_egress

2020-01-23 Thread Stephen Hemminger

On Thu, 23 Jan 2020 15:28:04 +0200
Nikolay Aleksandrov  wrote:

>   return ((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) &&
> - br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING &&
> + p->state == BR_STATE_FORWARDING && br_allowed_egress(vg, skb) &&
>   nbp_switchdev_allowed_egress(p, skb) &&
>   !br_skb_isolated(p, skb);
>  }

Maybe break this complex return for readability?

Re: [Bridge] [RFC net-next Patch 0/3] net: bridge: mrp: Add support for Media Redundancy Protocol(MRP)

2020-01-10 Thread Stephen Hemminger

On Fri, 10 Jan 2020 10:02:06 +0100
Horatiu Vultur  wrote:

> > 
> > Can this be implemented in userspace?  
> 
> The reason for putting this in kernal space is to HW offload this in
> switchdev/dsa driver. The switches which typically supports this are
> small and don't have a lot of CPU power and the bandwidth between the
> CPU and switch core is typically limited(at least this is the case with
> the switches that we are working). Therefor we need to use HW offload
> components which can inject the frames at the needed frequency and other
> components which can terminate the expected frames and just raise and
> interrupt if the test frames are not received as expected(and a few
> other HW features).
> 
> To put this in user-space we see two options:
> 1. We need to define a netlink interface which allows a user-space
> control application to ask the kernel to ask the switchdev driver to
> setup the frame-injector or frame-terminator. In theory this would be
> possible, and we have considered it, but we think that this interface
> will be too specific for our HW and will need to be changed every time
> we want to add support for a new SoC. By focusing the user-space
> interfaces on the protocol requirement, we feel more confident that we
> have an interface which we can continue to be backwards compatible with,
> and also support future/other chips with what ever facilities (if any)
> they have to HW offload.
> 
> 2. Do a UIO driver and keep protocol and driver in user-space. We do not
> really like this approach for many reasons: it pretty much prevents us from
> collaborating with the community to solve this and it will be really hard
> to have the switchdev driver controlling part of the chip and a
> user-space driver controlling other parts.
> 
> > 
> > Putting STP in the kernel was a mistake (even original author says so).
> > Adding more control protocols in kernel is a security and stability risk.  

The principal in networking is to separate control and data plane.
This is widely adopted in many areas: OVS, routing, etc.

There is an existing devlink interface for device control, it would
make sense to extend it to allow for more control of frame inject etc.

Re: [Bridge] [RFC net-next Patch 0/3] net: bridge: mrp: Add support for Media Redundancy Protocol(MRP)

2020-01-09 Thread Stephen Hemminger

On Thu, 9 Jan 2020 16:06:37 +0100
Horatiu Vultur  wrote:

> Media Redundancy Protocol is a data network protocol standardized by
> International Electrotechnical Commission as IEC 62439-2. It allows rings of
> Ethernet switches to overcome any single failure with recovery time faster 
> than
> STP. It is primarily used in Industrial Ethernet applications.
> 
> This is the first proposal of implementing a subset of the standard. It 
> supports
> only 2 roles of an MRP node. It supports only Media Redundancy Manager(MRM) 
> and
> Media Redundancy Client(MRC). In a MRP ring, each node needs to support MRP 
> and
> in a ring can be only one MRM and multiple MRC. It is possible to have 
> multiple
> instances of MRP on a single node. But a port can be part of only one MRP
> instance.
> 
> The MRM is responsible for detecting when there is a loop in the ring. It is
> sending the frame MRP_Test to detect the loops. It would send MRP_Test on both
> ports in the ring and if the frame is received at the other end, then the ring
> is closed. Meaning that there is a loop. In this case it sets the port state 
> to
> BLOCKED, not allowing traffic to pass through except MRP frames. In case it
> stops receiving MRP_Test frames from itself then the MRM will detect that the
> ring is open, therefor it would notify the other nodes of this change and will
> set the state of the port to be FORWARDING.
> 
> The MRC is responsible for forwarding MRP_Test frames between the ring ports
> (and not to flood on other ports) and to listen when there is a change in the
> network to clear the FDB.
> 
> Similar with STP, MRP is implemented on top of the bridge and they can't be
> enable at the same time. While STP runs on all ports of the bridge, MRP needs 
> to
> run only on 2 ports.
> 
> The bridge needs to:
> - notify when the link of one of the ports goes down or up, because MRP 
> instance
>   needs to react to link changes by sending MRP_LinkChange frames.
> - notify when one of the ports are removed from the bridge or when the bridge
>   is destroyed, because if the port is part of the MRP ring then MRP state
>   machine should be stopped.
> - add a handler to allow MRP instance to process MRP frames, if MRP is 
> enabled.
>   This is similar with STP design.
> - add logic for MRP frames inside the bridge. The bridge will just detect MRP
>   frames and it would forward them to the upper layer to allow to process it.
> - update the logic to update non-MRP frames. If MRP is enabled, then look also
>   at the state of the port to decide to forward or not.
> 
> To create a MRP instance on the bridge:
> $ bridge mrp add dev br0 p_port eth0 s_port eth1 ring_role 2 ring_id 1
> 
> Where:
> p_port, s_port: can be any port under the bridge
> ring_role: can have the value 1(MRC - Media Redundancy Client) or
>2(MRM - Media Redundancy Manager). In a ring can be only one MRM.
> ring_id: unique id for each MRP instance.
> 
> It is possible to create multiple instances. Each instance has to have it's 
> own
> ring_id and a port can't be part of multiple instances:
> $ bridge mrp add dev br0 p_port eth2 s_port eth3 ring_role 1 ring_id 2
> 
> To see current MRP instances and their status:
> $ bridge mrp show
> dev br0 p_port eth2 s_port eth3 ring_role 1 ring_id 2 ring_state 3
> dev br0 p_port eth0 s_port eth1 ring_role 2 ring_id 1 ring_state 4
> 
> If this patch series is well received, the in the future it could be extended
> with the following:
> - add support for Media Redundancy Automanager. This role allows a node to
>   detect if needs to behave as a MRM or MRC. The advantage of this role is 
> that
>   the user doesn't need to configure the nodes each time they are 
> added/removed
>   from a ring and it adds redundancy to the manager.
> - add support for Interconnect rings. This allow to connect multiple rings.
> - add HW offloading. The standard defines 4 recovery times (500, 200, 30 and 
> 10
>   ms). To be able to achieve 30 and 10 it is required by the HW to generate 
> the
>   MRP_Test frames and detect when the ring is open/closed.
> 
> Horatiu Vultur (3):
>   net: bridge: mrp: Add support for Media Redundancy Protocol
>   net: bridge: mrp: Integrate MRP into the bridge
>   net: bridge: mrp: Add netlink support to configure MRP
> 
>  include/uapi/linux/if_bridge.h |   27 +
>  include/uapi/linux/if_ether.h  |1 +
>  include/uapi/linux/rtnetlink.h |7 +
>  net/bridge/Kconfig |   12 +
>  net/bridge/Makefile|2 +
>  net/bridge/br.c|   19 +
>  net/bridge/br_device.c |3 +
>  net/bridge/br_forward.c|1 +
>  net/bridge/br_if.c |   10 +
>  net/bridge/br_input.c  |   22 +
>  net/bridge/br_mrp.c| 1517 
>  net/bridge/br_mrp_timer.c  |  227 +
>  net/bridge/br_netlink.c|9 +
>  net/bridge/br_private.h|   30 +
>  net/bridge/br_private_mrp.h|  208 +
>

Re: [Bridge] [PATCH iproute2 v4] iplink: add support for STP xstats

2019-12-16 Thread Stephen Hemminger

On Wed, 11 Dec 2019 20:07:11 -0500
Vivien Didelot  wrote:

> Add support for the BRIDGE_XSTATS_STP xstats, as follow:
> 
> # ip link xstats type bridge_slave dev lan4 stp
> lan4
> STP BPDU:  RX: 0 TX: 61
> STP TCN:   RX: 0 TX: 0
> STP Transitions: Blocked: 2 Forwarding: 1
> 
> Or below as JSON:
> 
> # ip -j -p link xstats type bridge_slave dev lan0 stp
> [ {
> "ifname": "lan0",
> "stp": {
> "rx_bpdu": 0,
> "tx_bpdu": 500,
> "rx_tcn": 0,
> "tx_tcn": 0,
> "transition_blk": 0,
> "transition_fwd": 0
> }
> } ]
> 
> Signed-off-by: Vivien Didelot 

This patch depends on new kernel features, should be to iproute2-next

Re: [Bridge] [PATCH iproute2 v2] iplink: add support for STP xstats

2019-12-11 Thread Stephen Hemminger

On Tue, 10 Dec 2019 13:16:33 -0500
Vivien Didelot  wrote:

> Hi Stephen,
> 
> On Mon, 9 Dec 2019 16:13:45 -0800, Stephen Hemminger 
>  wrote:
> > On Mon,  9 Dec 2019 18:05:22 -0500
> > Vivien Didelot  wrote:
> >   
> > > Add support for the BRIDGE_XSTATS_STP xstats, as follow:
> > > 
> > > # ip link xstats type bridge_slave dev lan5
> > > STP BPDU:
> > >   RX: 0
> > >   TX: 39
> > > STP TCN:
> > >   RX: 0
> > >   TX: 0
> > > STP Transitions:
> > >   Blocked: 0
> > >   Forwarding: 1
> > > IGMP queries:
> > >   RX: v1 0 v2 0 v3 0
> > >   TX: v1 0 v2 0 v3 0
> > > ...  
> > 
> > Might I suggest a more concise format:
> > STP BPDU:  RX: 0 TX: 39
> > STP TCN:   RX: 0 TX:0
> > STP Transitions: Blocked: 0 Forwarding: 1
> > ...  
> 
> I don't mind if you prefer this format ;-)

Just trying to reduce the long output. Which is already too long for the xstat 
as it is.

Re: [Bridge] [PATCH iproute2 v2] iplink: add support for STP xstats

2019-12-09 Thread Stephen Hemminger

On Mon,  9 Dec 2019 18:05:22 -0500
Vivien Didelot  wrote:

> Add support for the BRIDGE_XSTATS_STP xstats, as follow:
> 
> # ip link xstats type bridge_slave dev lan5
> STP BPDU:
>   RX: 0
>   TX: 39
> STP TCN:
>   RX: 0
>   TX: 0
> STP Transitions:
>   Blocked: 0
>   Forwarding: 1
> IGMP queries:
>   RX: v1 0 v2 0 v3 0
>   TX: v1 0 v2 0 v3 0
> ...

Might I suggest a more concise format:
STP BPDU:  RX: 0 TX: 39
STP TCN:   RX: 0 TX:0
STP Transitions: Blocked: 0 Forwarding: 1
...

Re: [Bridge] [PATCH iproute2] iplink: add support for STP xstats

2019-12-09 Thread Stephen Hemminger

On Mon,  9 Dec 2019 16:18:41 -0500
Vivien Didelot  wrote:

> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> index 31fc51bd..e7f2bb78 100644
> --- a/include/uapi/linux/if_bridge.h
> +++ b/include/uapi/linux/if_bridge.h

These headers are semi-automatically updated from the kernel.
Do not make changes here.

Re: [Bridge] [PATCH net-next] net: bridge: add STP stat counters

2019-11-22 Thread Stephen Hemminger

On Fri, 22 Nov 2019 18:07:42 -0500
Vivien Didelot  wrote:

> This adds rx_bpdu, tx_bpdu, rx_tcn, tx_tcn, transition_blk,
> transition_fwd stat counters to the bridge ports, along with sysfs
> statistics nodes under a "statistics" directory of the "brport" entry,
> providing useful information for STP, for example:
> 
> # cat /sys/class/net/lan0/brport/statistics/tx_bpdu
> 26
> # cat /sys/class/net/lan5/brport/statistics/transition_fwd
> 3
> 
> At the same time, make BRPORT_ATTR define a non-const attribute as
> this is required by the attribute group structure.
> 
> Signed-off-by: Vivien Didelot 

Please don't add more sysfs stuff. put it in netlink.

> ---
>  net/bridge/br_private.h  |  8 
>  net/bridge/br_stp.c  |  8 
>  net/bridge/br_stp_bpdu.c |  4 
>  net/bridge/br_sysfs_if.c | 35 ++-
>  4 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 36b0367ca1e0..360d8030e3b2 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -283,6 +283,14 @@ struct net_bridge_port {
>  #endif
>   u16 group_fwd_mask;
>   u16 backup_redirected_cnt;
> +
> + /* Statistics */
> + atomic_long_t   rx_bpdu;
> + atomic_long_t   tx_bpdu;
> + atomic_long_t   rx_tcn;
> + atomic_long_t   tx_tcn;
> + atomic_long_t   transition_blk;
> + atomic_long_t   transition_fwd;
>  };
>  

There is no these need to be atomic.
Atomic is expensive.

Re: [Bridge] [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature

2019-08-24 Thread Stephen Hemminger

On Thu, 22 Aug 2019 21:07:27 +0200
Horatiu Vultur  wrote:

> Current implementation of the SW bridge is setting the interfaces in
> promisc mode when they are added to bridge if learning of the frames is
> enabled.
> In case of Ocelot which has HW capabilities to switch frames, it is not
> needed to set the ports in promisc mode because the HW already capable of
> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> the ports in promisc mode to do the switching.
> This optimization takes places only if all the interfaces that are part
> of the bridge have this flag and have the same network driver.
> 
> If the bridge interfaces is added in promisc mode then also the ports part
> of the bridge are set in promisc mode.
> 
> Horatiu Vultur (3):
>   net: Add HW_BRIDGE offload feature
>   net: mscc: Use NETIF_F_HW_BRIDGE
>   net: mscc: Implement promisc mode.
> 
>  drivers/net/ethernet/mscc/ocelot.c | 26 --
>  include/linux/netdev_features.h|  3 +++
>  net/bridge/br_if.c | 29 -
>  net/core/ethtool.c |  1 +
>  4 files changed, 56 insertions(+), 3 deletions(-)
> 

IMHO there are already enough nerd knobs in bridge to support this.
If you hardware can't do real bridging, it is only doing MACVLAN so that
is what you should support instead.

Re: [Bridge] Split path for packets, and bridge configuration problem.

2019-08-05 Thread Stephen Hemminger

On Wed, 31 Jul 2019 11:55:37 -0700
John Clark via Bridge  wrote:

> I’m trying to configure a Linux system to bridge an ethernet link for packet 
> ingress from a local network and combine that with a radio link for the reply 
> packet egress.
> 
> I vaguely recall something like this use to be used for Satellite links, 
> where the incoming packets from the outside world were via the Sat link, but 
> outgoing packets were via a terrestrial link, phone modem or DSL, depending.
> 
> What I have working is the packet entering the bridge on the ethernet link, 
> then I see the MAC address of the node from the in the ‘bridge show’ output, 
> but even if I assign the MAC address to the ‘radio’ tap interface, I see both 
> the ‘ethernet’ port as ‘master’, and the tap device as ‘self’, but no packets 
> are sent to the tap device.
> 
> In the past I’ve just written user apps that basically implement tunnels and 
> using collection of such apps and taps, gotten packets to where I need them 
> to be.
> 
> I was trying to use more standard tools, and not write more custom code.
> 
> Thanks for any assistance.
> John Clark.
> 

Hacks like this are best done with netfilter.

Re: [PATCH net v4] net: bridge: move default pvid init/deinit to NETDEV_REGISTER/UNREGISTER

2019-08-02 Thread Stephen Hemminger

On Fri,  2 Aug 2019 13:57:36 +0300
Nikolay Aleksandrov  wrote:

> +int br_vlan_bridge_event(struct net_device *dev, unsigned long event, void 
> *ptr)
>  {
>   struct netdev_notifier_changeupper_info *info;
> - struct net_bridge *br;
> + struct net_bridge *br = netdev_priv(dev);
> + bool changed;
> + int ret = 0;
>  
>   switch (event) {
> + case NETDEV_REGISTER:
> + ret = br_vlan_add(br, br->default_pvid,
> +   BRIDGE_VLAN_INFO_PVID |
> +   BRIDGE_VLAN_INFO_UNTAGGED |
> +   BRIDGE_VLAN_INFO_BRENTRY, , NULL);
> + break;

Looks good.

As minor optimization br_vlan_add could ignore changed pointer if NULL.
This would save places where you don't care.


Something like:
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 021cc9f66804..bacd3543b215 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -626,10 +626,11 @@ static int br_vlan_add_existing(struct net_bridge *br,
refcount_inc(>refcnt);
vlan->flags |= BRIDGE_VLAN_INFO_BRENTRY;
vg->num_vlans++;
-   *changed = true;
+   if (changed)
+   *changed = true;
}
 
-   if (__vlan_add_flags(vlan, flags))
+   if (__vlan_add_flags(vlan, flags) && changed)
*changed = true;
 
return 0;
@@ -653,7 +654,8 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, 
bool *changed,
 
ASSERT_RTNL();
 
-   *changed = false;
+   if (changed)
+   *changed = false;
vg = br_vlan_group(br);
vlan = br_vlan_find(vg, vid);
if (vlan)
@@ -679,7 +681,7 @@ int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags, 
bool *changed,
if (ret) {
free_percpu(vlan->stats);
kfree(vlan);
-   } else {
+   } else if (changed) {
*changed = true;
}

[Bridge] Fw: [Bug 204389] New: bridge: AF_BRIDGE NEWNEIGH netlink message with ifidx is zero

2019-08-01 Thread Stephen Hemminger




Begin forwarded message:

Date: Wed, 31 Jul 2019 09:32:20 +
From: bugzilla-dae...@bugzilla.kernel.org
To: step...@networkplumber.org
Subject: [Bug 204389] New: bridge: AF_BRIDGE NEWNEIGH netlink message with 
ifidx is zero


https://bugzilla.kernel.org/show_bug.cgi?id=204389

Bug ID: 204389
   Summary: bridge: AF_BRIDGE NEWNEIGH netlink message with ifidx
is zero
   Product: Networking
   Version: 2.5
Kernel Version: 4.19
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Other
  Assignee: step...@networkplumber.org
  Reporter: michael-...@fami-braun.de
Regression: No

My application is receiving the following netlink messages from kernel:

-- BEGIN NETLINK
MESSAGE ---
 [NETLINK HEADER] 16 octets
 .nlmsg_len = 76
 .type = 28 
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 .flags = 0 <>
 .seq = 0
 .port = 0
 [PAYLOAD] 12 octets
 07 00 00 00 00 00 00 00 00 80 00 00 
 [ATTR 02] 6 octets
 4e db c4 30 92 f4 N..0..
 [PADDING] 2 octets
 00 00 ..
 [ATTR 09] 4 octets
 00 00 00 00 
 [ATTR 03] 16 octets
 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
 [ATTR 05] 2 octets
 00 01 ..
 [PADDING] 2 octets
 00 00 ..
--- END NETLINK MESSAGE ---

Clearly, ifidex is zero here, so the application cannot correlate this NEWNEIGH
message to  any interface.

Adding a WARN_ON (ifidx=0) to fdb_fill_info gives the following backtrace:

[ 43.071801] [efb47c10] [c0510580] fdb_fill_info+0x180/0x22c (unreliable)
[ 43.071829] [efb47c50] [c0510684] fdb_notify.isra.22+0x58/0xd8
[ 43.071849] [efb47c70] [c0511b44] fdb_insert+0xa4/0x108
[ 43.071870] [efb47c90] [c051254c] br_fdb_insert+0x40/0x64
[ 43.071891] [efb47cb0] [c052b640] __vlan_add+0xe50/0xf24
[ 43.071911] [efb47d20] [c052a77c] br_vlan_add+0x528/0x59c
[ 43.071931] [efb47d70] [c052c38c] br_vlan_init+0x90/0xd4
[ 43.071951] [efb47da0] [c050f96c] br_dev_init+0xc8/0x170
[ 43.071983] [efb47de0] [c03cff8c] register_netdevice+0x148/0x6e8
[ 43.072003] [efb47e20] [c03d0554] register_netdev+0x28/0x50
[ 43.072026] [efb47e30] [c05154f4] br_add_bridge+0x44/0x78
[ 43.072047] [efb47e40] [c0517ce8] br_ioctl_deviceless_stub+0x2c4/0x2f0
[ 43.072069] [efb47e80] [c03a372c] sock_ioctl+0x17c/0x3d8
[ 43.072091] [efb47ed0] [c017655c] do_vfs_ioctl+0x7ac/0x894
[ 43.072111] [efb47f20] [c0176684] ksys_ioctl+0x40/0x74
[ 43.072140] [efb47f40] [c0011288] ret_from_syscall+0x0/0x3c

So clearly br_vlan_init triggers a NEWNEIGH message before bridge creation has
been completed and an ifindex assigned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Re: [PATCH net v3] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER

2019-07-31 Thread Stephen Hemminger

  
> -int br_vlan_init(struct net_bridge *br)
> +static int br_vlan_init(struct net_bridge *br)
>  {
>   struct net_bridge_vlan_group *vg;
>   int ret = -ENOMEM;
> @@ -1083,6 +1085,8 @@ int br_vlan_init(struct net_bridge *br)
>   return ret;
>  
>  err_vlan_add:
> + RCU_INIT_POINTER(br->vlgrp, NULL);
> + synchronize_rcu();

Calling sychronize_rcu is expensive. And the callback for
notifier is always called with rtnl_head. 

Why not just keep the pointer initialization back in the
code where bridge is created, it was safe there.

Re: [PATCH RESEND net-next 1/2] br_netfilter: add struct netns_brnf

2019-06-06 Thread Stephen Hemminger

On Thu,  6 Jun 2019 13:41:41 +0200
Christian Brauner  wrote:

> +struct netns_brnf {
> +#ifdef CONFIG_SYSCTL
> + struct ctl_table_header *ctl_hdr;
> +#endif
> +
> + /* default value is 1 */
> + int call_iptables;
> + int call_ip6tables;
> + int call_arptables;
> +
> + /* default value is 0 */
> + int filter_vlan_tagged;
> + int filter_pppoe_tagged;
> + int pass_vlan_indev;
> +};

Do you really need to waste four bytes for each
flag value. If you use a u8 that would work just as well.

Bool would also work but the kernel developers frown on bool
in structures.

Re: [Bridge] [PATCH v2] bridge: Fix error path for kobject_init_and_add()

2019-05-09 Thread Stephen Hemminger

On Fri, 10 May 2019 12:57:02 +1000
"Tobin C. Harding"  wrote:

> On Fri, May 10, 2019 at 12:52:12PM +1000, Tobin C. Harding wrote:
> 
> Please ignore - I forgot about netdev procedure and the merge window.
> My bad.
> 
> Will re-send when you are open.
> 
> thanks,
> Tobin.

That only applies to new features,  bug fixes are allowed all the time.
Also please dont send networking stuff to the greater linux-kernel mailing list.

Re: [Bridge] [PATCH] net:bridge:always disable auto-tuning when the user specified MTU

2019-04-09 Thread Stephen Hemminger

On Wed, 10 Apr 2019 02:32:08 +
Huang Rui  wrote:

> For example.
> My purpose is to create a bridge br0 and join eth0 into br0.
> if we use this following way, the auto-tuning flag will not be disabled.
> 
> If eth0's mtu is 1200
> step 1.brctl addbr br0
> step 2.brctl addif br0 eth0
> step 3.ifconfig br0 mtu 1200
> step 4.ifconfig eth0 mtu 1500
> 
> Result:
> br0's MTU: 1500, eth0's MTU: 1500
> 
> Expected:
> br0's MTU: 1200, eth0's MTU: 1500
> 
> I have specified br0's MTU,  but auto-min policy works. So the MTU is
> not the result what we expected.
> As expected, if i have specified bridge's MTU, it will set the flag:
> BROPT_MTU_SET_BY_USER in net_bridge_opts disabled and auto-min/max
> policy will not work.But in this case, because the dev_set_mtu return
> early, the BROPT_MTU_SET_BY_USER flag will not be disabled and
> auto-min/max policy will still work.
> 
> Signed-off-by: Huang Rui 

A bridge like this going to drop frames.
A frame received with MTU of 1200 will get dropped.
The proper way to do this is to change MTU of both interfaces to match.

Re: [Bridge] [PATCH net v3] net: bridge: Fix ethernet header pointer before check skb forwardable

2019-01-16 Thread Stephen Hemminger

On Wed, 16 Jan 2019 21:04:21 +0800
wangyunjian  wrote:

>   if (unlikely(netpoll_tx_running(to->br->dev))) {
> + skb_push(skb, ETH_HLEN);
>   if (!is_skb_forwardable(skb->dev, skb)) {
>   kfree_skb(skb);
>   } else {
> - skb_push(skb, ETH_HLEN);
>   br_netpoll_send_skb(to, skb);
>   }
>   return;

The patch looks correct.

On minor style issue is that after moving skb_push out of the if statement,
both branches of the statement now contain a single line and therefore
the brackets are no longer necessary.  Suprised that checkpatch did
not complain about this.

Re: [Bridge] [PATCH net-next v2 1/3] net: bridge: add support for user-controlled bool options

2018-11-26 Thread Stephen Hemminger

On Sun, 25 Nov 2018 10:12:45 +0200
Nikolay Aleksandrov  wrote:

> On 24/11/2018 18:46, niko...@cumulusnetworks.com wrote:
> > On 24 November 2018 18:25:41 EET, Andrew Lunn  wrote:  
> >> On Sat, Nov 24, 2018 at 06:18:33PM +0200, niko...@cumulusnetworks.com
> >> wrote:  
> >>> On 24 November 2018 18:10:41 EET, Andrew Lunn  wrote:  
> > +int br_boolopt_toggle(struct net_bridge *br, enum br_boolopt_id  
> >> opt,  
>  bool on,  
> > + struct netlink_ext_ack *extack)
> > +{
> > +   switch (opt) {
> > +   default:
> > +   /* shouldn't be called with unsupported options */
> > +   WARN_ON(1);
> > +   break;  
> 
>  So you return 0 here, meaning the br_debug() lower down will not
>  happen. Maybe return -EOPNOTSUPP?
>   
> >>>
> >>> No, the idea here is that some option in the future might return an  
> >> error.   
> >>> This function cannot be called with unsupported option thus the warn.  

> 

Please don't implement some part of the API until it is used (YAGNI).
If do this kind of "someday will come" design the code will end up
littered with dead ends.

Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for user-controlled bool options

2018-11-22 Thread Stephen Hemminger

On Thu, 22 Nov 2018 18:01:29 +0200
Nikolay Aleksandrov  wrote:

> On 22/11/2018 17:35, Andrew Lunn wrote:
> > On Thu, Nov 22, 2018 at 06:29:24AM +0200, Nikolay Aleksandrov wrote:  
> >> We have been adding many new bridge options, a big number of which are
> >> boolean but still take up netlink attribute ids and waste space in the skb.
> >> Recently we discussed learning from link-local packets[1] and decided
> >> yet another new boolean option will be needed, thus introducing this API
> >> to save some bridge nl space.
> >> The API supports changing the value of multiple boolean options at once
> >> via the br_boolopt_multi struct which has an optmask (which options to
> >> set, bit per opt) and optval (options' new values). Future boolean
> >> options will only be added to the br_boolopt_id enum and then will have
> >> to be handled in br_boolopt_toggle/get. The API will automatically
> >> add the ability to change and export them via netlink, sysfs can use the
> >> single boolopt function versions to do the same. The behaviour with
> >> failing/succeeding is the same as with normal netlink option changing.
> >>
> >> If an option requires mapping to internal kernel flag or needs special
> >> configuration to be enabled then it should be handled in
> >> br_boolopt_toggle. It should also be able to retrieve an option's current
> >> state via br_boolopt_get.
> >>
> >> [1] https://www.spinics.net/lists/netdev/msg532698.html
> >>
> >> Signed-off-by: Nikolay Aleksandrov 
> >> ---
> >>  include/uapi/linux/if_bridge.h | 18 +
> >>  include/uapi/linux/if_link.h   |  1 +
> >>  net/bridge/br.c| 68 ++
> >>  net/bridge/br_netlink.c| 17 -
> >>  net/bridge/br_private.h|  6 +++
> >>  net/core/rtnetlink.c   |  2 +-
> >>  6 files changed, 110 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/uapi/linux/if_bridge.h 
> >> b/include/uapi/linux/if_bridge.h
> >> index e41eda3c71f1..6dc02c03bdf8 100644
> >> --- a/include/uapi/linux/if_bridge.h
> >> +++ b/include/uapi/linux/if_bridge.h
> >> @@ -292,4 +292,22 @@ struct br_mcast_stats {
> >>__u64 mcast_bytes[BR_MCAST_DIR_SIZE];
> >>__u64 mcast_packets[BR_MCAST_DIR_SIZE];
> >>  };
> >> +
> >> +/* bridge boolean options
> >> + * IMPORTANT: if adding a new option do not forget to handle
> >> + *it in br_boolopt_toggle/get and bridge sysfs
> >> + */
> >> +enum br_boolopt_id {
> >> +  BR_BOOLOPT_MAX
> >> +};
> >> +
> >> +/* struct br_boolopt_multi - change multiple bridge boolean options
> >> + *
> >> + * @optval: new option values (bit per option)
> >> + * @optmask: options to change (bit per option)
> >> + */
> >> +struct br_boolopt_multi {
> >> +  __u32 optval;
> >> +  __u32 optmask;
> >> +};  
> > 
> > Hi Nikolay
> > 
> > Thanks for handling this.
> > 
> > How many boolean options do we already have? What it the likelihood a
> > u32 is going to be too small, in a couple of years time?
> >   
> 
> It would mean doubling the number of current options and this is only for
> boolean options so I think we're safe.
> 
> > I recently went through the pain of converting the u32 for
> > representing link modes in the phylib API to a linux bitmap.  I'm just
> > wondering if in the long run, using a linux bitmap right from the
> > beginning would be better?
> >   
> >> +int br_boolopt_multi_toggle(struct net_bridge *br,
> >> +  struct br_boolopt_multi *bm)
> >> +{
> >> +  unsigned long bitmap = bm->optmask;
> >> +  int err = 0;
> >> +  int opt_id;
> >> +
> >> +  for_each_set_bit(opt_id, , BR_BOOLOPT_MAX) {
> >> +  bool on = !!(bm->optval & BIT(opt_id));
> >> +
> >> +  err = br_boolopt_toggle(br, opt_id, on);
> >> +  if (err) {
> >> +  br_debug(br, "boolopt multi-toggle error: option: %d 
> >> current: %d new: %d error: %d\n",
> >> +   opt_id, br_boolopt_get(br, opt_id), on, err);  
> > 
> > Would it be possible to return that to userspace using the extended
> > error infrastructure?
> >   
> 
> No, it doesn't support dynamic messages AFAIK.
> 
> >   Andrew
> >   
> 

My concern is about backwards compatibility. What about old userspace and new 
userspace tools with old kernels.
Having multiple bits does allow handling cases where certain combos won't work.

Re: [Bridge] [PATCH net] net: bridge: remove ipv6 zero address check in mcast queries

2018-10-28 Thread Stephen Hemminger

On Sat, 27 Oct 2018 12:07:47 +0300
Nikolay Aleksandrov  wrote:

> Recently a check was added which prevents marking of routers with zero
> source address, but for IPv6 that cannot happen as the relevant RFCs
> actually forbid such packets:
> RFC 2710 (MLDv1):
> "To be valid, the Query message MUST
>  come from a link-local IPv6 Source Address, be at least 24 octets
>  long, and have a correct MLD checksum."
> 
> Same goes for RFC 3810.
> 
> And also it can be seen as a requirement in ipv6_mc_check_mld_query()
> which is used by the bridge to validate the message before processing
> it. Thus any queries with :: source address won't be processed anyway.
> So just remove the check for zero IPv6 source address from the query
> processing function.
> 
> Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives 
> query with source 0.0.0.0")
> Signed-off-by: Nikolay Aleksandrov 

What about a broken/malicious sender? Could an all zero source be used
to poison the multicast table?

Re: [Bridge] [PATCH net-next] net: bridge: add support for per-port vlan stats

2018-10-12 Thread Stephen Hemminger

On Fri, 12 Oct 2018 13:41:16 +0300
Nikolay Aleksandrov  wrote:

> This patch adds an option to have per-port vlan stats instead of the
> default global stats. The option can be set only when there are no port
> vlans in the bridge since we need to allocate the stats if it is set
> when vlans are being added to ports (and respectively free them
> when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
> largest user of options. The current stats design allows us to add
> these without any changes to the fast-path, it all comes down to
> the per-vlan stats pointer which, if this option is enabled, will
> be allocated for each port vlan instead of using the global bridge-wide
> one.
> 
> CC: bridge@lists.linux-foundation.org
> CC: Roopa Prabhu 
> Signed-off-by: Nikolay Aleksandrov 

Yes, per-vlan stats could be quite useful.
Most cases of statistics in the kernel are always on, and some API's
get them (and skip others).  Other than the additional memory overhead, why
not make the statistics as always on.

Also, is there any chance of creating too much data in a netlink
message if there are 4K-1 VLAN's?

Re: [Bridge] [PATCH net-next 0/9] net: bridge: convert bool options to bits

2018-09-26 Thread Stephen Hemminger

On Wed, 26 Sep 2018 15:17:53 +0300
Nikolay Aleksandrov  wrote:

> Hi,
> A lot of boolean bridge options have been added around the net_bridge
> structure resulting in holes and more importantly different cache lines
> that need to be fetched in the fast path. This set moves all of those
> to bits in a bitfield which resides in a hot cache line thus reducing
> the size of net_bridge, the number of holes and the number of cache
> lines needed for the fast path.
> The set is also sent in preparation for new boolean options to avoid
> spreading them in the structure and making new holes.
> One nice side-effect is that we avoid potential race conditions by using
> the bitops since some of the options were bits being directly set in
> parallel risking hard to debug issues (has_ipv6_addr).
> 
> Before:
>  size: 1184, holes: 8, sum holes: 30
> After:
>  size: 1160, holes: 2, sum holes: 3
> 
> Patch 01 is a trivial style fix
> Patch 02 adds the new options bitfield and converts the vlan boolean
>  options to bits
> Patches 03-08 convert the rest of the boolean options to bits
> Patch 09 re-arranges a few fields in net_bridge to avoid fetching one
>  extra cache line in fast path and to further reduce size
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (9):
>   net: bridge: make struct opening bracket consistent
>   net: bridge: add bitfield for options and convert vlan opts
>   net: bridge: convert nf call options to bits
>   net: bridge: convert group_addr_set option to a bit
>   net: bridge: convert and rename mcast disabled
>   net: bridge: convert mcast options to bits
>   net: bridge: convert neigh_suppress_enabled option to a bit
>   net: bridge: convert mtu_set_by_user to a bit
>   net: bridge: pack net_bridge better
> 
>  net/bridge/br.c | 16 +
>  net/bridge/br_arp_nd_proxy.c| 13 +---
>  net/bridge/br_device.c  |  6 ++--
>  net/bridge/br_if.c  |  4 +--
>  net/bridge/br_input.c   |  2 +-
>  net/bridge/br_mdb.c |  6 ++--
>  net/bridge/br_multicast.c   | 54 +++---
>  net/bridge/br_netfilter_hooks.c |  7 ++--
>  net/bridge/br_netlink.c | 31 +
>  net/bridge/br_private.h | 74 
> +++--
>  net/bridge/br_sysfs_br.c| 32 +-
>  net/bridge/br_vlan.c| 30 +++++----
>  12 files changed, 155 insertions(+), 120 deletions(-)
> 

Looks good, these flags grew over time and cleanup was overdue.

Reviewed-by: Stephen Hemminger

Re: [Bridge] [PATCH net-next] net: bridge: add support for sticky fdb entries

2018-09-10 Thread Stephen Hemminger

On Mon, 10 Sep 2018 13:16:01 +0300
Nikolay Aleksandrov  wrote:

> Add support for entries which are "sticky", i.e. will not change their port
> if they show up from a different one. A new ndm flag is introduced for that
> purpose - NTF_STICKY. We allow to set it only to non-local entries.

Is there a name for this in other network switch API's?

> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> I'll send the selftest for sticky with the iproute2 patch if this one is
> accepted. We've had multiple requests to support such flag and now it's
> also needed for some eVPN and clag setups.
> 
>  include/uapi/linux/neighbour.h |  1 +
>  net/bridge/br_fdb.c| 19 ---
>  net/bridge/br_private.h|  1 +
>  3 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/neighbour.h b/include/uapi/linux/neighbour.h
> index 904db6148476..998155444e0d 100644
> --- a/include/uapi/linux/neighbour.h
> +++ b/include/uapi/linux/neighbour.h
> @@ -43,6 +43,7 @@ enum {
>  #define NTF_PROXY0x08/* == ATF_PUBL */
>  #define NTF_EXT_LEARNED  0x10
>  #define NTF_OFFLOADED   0x20
> +#define NTF_STICKY   0x40
>  #define NTF_ROUTER   0x80
>  
>  /*
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index 502f66349530..26569ed06a4d 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -584,7 +584,7 @@ void br_fdb_update(struct net_bridge *br, struct 
> net_bridge_port *source,
>   unsigned long now = jiffies;
>  
>   /* fastpath: update of existing entry */
> - if (unlikely(source != fdb->dst)) {
> + if (unlikely(source != fdb->dst && !fdb->is_sticky)) {
>   fdb->dst = source;
>   fdb_modified = true;
>   /* Take over HW learned entry */
> @@ -656,6 +656,8 @@ static int fdb_fill_info(struct sk_buff *skb, const 
> struct net_bridge *br,
>   ndm->ndm_flags |= NTF_OFFLOADED;
>   if (fdb->added_by_external_learn)
>   ndm->ndm_flags |= NTF_EXT_LEARNED;
> + if (fdb->is_sticky)
> + ndm->ndm_flags |= NTF_STICKY;
>  
>   if (nla_put(skb, NDA_LLADDR, ETH_ALEN, >key.addr))
>   goto nla_put_failure;
> @@ -772,7 +774,8 @@ int br_fdb_dump(struct sk_buff *skb,
>  
>  /* Update (create or replace) forwarding database entry */
>  static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port 
> *source,
> -  const __u8 *addr, __u16 state, __u16 flags, __u16 vid)
> +  const u8 *addr, u16 state, u16 flags, u16 vid,
> +  u8 is_sticky)

Why not change the API to take a full ndm flags, someone is sure to add more 
later.

Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options

2018-07-20 Thread Stephen Hemminger

On Fri, 20 Jul 2018 20:47:26 +0300
Nikolay Aleksandrov  wrote:

> On July 20, 2018 8:26:36 PM GMT+03:00, Nikolay Aleksandrov 
>  wrote:
> >On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger
> > wrote:  
> >>On Fri, 20 Jul 2018 20:14:43 +0300
> >>Nikolay Aleksandrov  wrote:
> >>  
> >>> >Casting away the const on the buf variable is going to cause  
> >>warnings  
> >>> >and should not be necessary.
> >>> >
> >>> 
> >>> It doesn't when it's casted like that, the new line is changed to  
> >>null byte so we need to drop   
> >>> the const.  
> >>
> >>Then change store function to take a char *?  
> >
> >Sure, will do.  
> 
> Acrually I can't change sysfs_ops store prototype. 
> That is the reason the bond also casts the ptr.
> 
> 
> 

Ok.

Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options

2018-07-20 Thread Stephen Hemminger

On Fri, 20 Jul 2018 20:14:43 +0300
Nikolay Aleksandrov  wrote:

> >Casting away the const on the buf variable is going to cause warnings
> >and should not be necessary.
> >  
> 
> It doesn't when it's casted like that, the new line is changed to null byte 
> so we need to drop 
> the const.

Then change store function to take a char *?

Re: [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port

2018-07-20 Thread Stephen Hemminger

On Fri, 20 Jul 2018 17:48:26 +0300
Nikolay Aleksandrov  wrote:

> This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
> allows to set a backup port to be used for known unicast traffic if the
> port has gone carrier down. The backup pointer is rcu protected and set
> only under RTNL, a counter is maintained so when deleting a port we know
> how many other ports reference it as a backup and we remove it from all.
> Also the pointer is in the first cache line which is hot at the time of
> the check and thus in the common case we only add one more test.
> The backup port will be used only for the non-flooding case since
> it's a part of the bridge and the flooded packets will be forwarded to it
> anyway. To remove the forwarding just send a 0/non-existing backup port.
> This is used to avoid numerous scalability problems when using MLAG most
> notably if we have thousands of fdbs one would need to change all of them
> on port carrier going down which takes too long and causes a storm of fdb
> notifications (and again when the port comes back up). In a Multi-chassis
> Link Aggregation setup usually hosts are connected to two different
> switches which act as a single logical switch. Those switches usually have
> a control and backup link between them called peerlink which might be used
> for communication in case a host loses connectivity to one of them.
> We need a fast way to failover in case a host port goes down and currently
> none of the solutions (like bond) cannot fulfill the requirements because
> the participating ports are actually the "master" devices and must have the
> same peerlink as their backup interface and at the same time all of them
> must participate in the bridge device. As Roopa noted it's normal practice
> in routing called fast re-route where a precalculated backup path is used
> when the main one is down.
> Another use case of this is with EVPN, having a single vxlan device which
> is backup of every port. Due to the nature of master devices it's not
> currently possible to use one device as a backup for many and still have
> all of them participate in the bridge (which is master itself).
> More detailed information about MLAG is available at the link below.
> https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
> 
> Signed-off-by: Nikolay Aleksandrov 

Trying to understand this.

Is it the case that what you are trying to solve is the way MLAG
and bridging interact on the Linux side or more a limitation of how
switches operate?  Wouldn't this work?
 
br0 -- team0 -- eth1
 +- eth2

The bridge would only have fdb entries for the team device.
Why do eth1 and eth2 have to be master devices?  Why would eth1
and eth2 need to be bridge ports.

This kind of thing in the bridge is most likely inevitable, and
I am guilty of introducing same logic into Hyper-V driver.
But still getting pushback that the multi device model is better.

Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options

2018-07-20 Thread Stephen Hemminger

On Fri, 20 Jul 2018 17:48:25 +0300
Nikolay Aleksandrov  wrote:

> This patch adds a new alternative store callback for port sysfs options
> which takes a raw value (buf) and can use it directly. It is needed for the
> backup port sysfs support since we have to pass the device by its name.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
> There are a few checkpatch warnings here because of the old code, I've
> noted them in my todo and will send separate patch for simple_strtoul
> and the function prototypes.
> 
>  net/bridge/br_sysfs_if.c | 49 
> +---
>  1 file changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index f99c5bf5c906..38c58879423d 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -25,6 +25,15 @@ struct brport_attribute {
>   struct attributeattr;
>   ssize_t (*show)(struct net_bridge_port *, char *);
>   int (*store)(struct net_bridge_port *, unsigned long);
> + int (*store_raw)(struct net_bridge_port *, char *);
> +};
> +
> +#define BRPORT_ATTR_RAW(_name, _mode, _show, _store) \
> +const struct brport_attribute brport_attr_##_name = {
> \
> + .attr   = {.name = __stringify(_name),  \
> +.mode = _mode }, \
> + .show   = _show,\
> + .store_raw  = _store,   \
>  };
>  
>  #define BRPORT_ATTR(_name, _mode, _show, _store) \
> @@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject *kobj,
>   struct brport_attribute *brport_attr = to_brport_attr(attr);
>   struct net_bridge_port *p = to_brport(kobj);
>   ssize_t ret = -EINVAL;
> - char *endp;
>   unsigned long val;
> + char *endp;
>  
>   if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
>   return -EPERM;
>  
> - val = simple_strtoul(buf, , 0);
> - if (endp != buf) {
> - if (!rtnl_trylock())
> - return restart_syscall();
> - if (p->dev && p->br && brport_attr->store) {
> - spin_lock_bh(>br->lock);
> - ret = brport_attr->store(p, val);
> - spin_unlock_bh(>br->lock);
> - if (!ret) {
> - br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> - ret = count;
> - }
> - }
> - rtnl_unlock();
> + if (!rtnl_trylock())
> + return restart_syscall();
> +
> + if (!p->dev || !p->br)
> + goto out_unlock;
> +
> + if (brport_attr->store_raw) {
> + spin_lock_bh(>br->lock);
> + ret = brport_attr->store_raw(p, (char *)buf);
> + spin_unlock_bh(>br->lock);
> + } else if (brport_attr->store) {
> + val = simple_strtoul(buf, , 0);
> + if (endp == buf)
> + goto out_unlock;
> + spin_lock_bh(>br->lock);
> + ret = brport_attr->store(p, val);
> + spin_unlock_bh(>br->lock);
> + }
> + if (!ret) {
> + br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> + ret = count;
>   }
> +out_unlock:
> + rtnl_unlock();
> +
>   return ret;
>  }
>  

I see where you are going with this. You want a sysfs attribute where is a 
string
not a number. This makes sense.

The code might be simpler if you always acquired  the lock. Or maybe
move the locking into the store functions.

Casting away the const on the buf variable is going to cause warnings
and should not be necessary.

You might also want an error case if neither store or store_raw are defined.

Re: [Bridge] [PATCH net-next 0/2] net: bridge: add support for backup port

2018-07-20 Thread Stephen Hemminger

On Fri, 20 Jul 2018 17:48:24 +0300
Nikolay Aleksandrov  wrote:

> Hi,
> This set introduces a new bridge port option that allows any port to have
> any other port (in the same bridge of course) as its backup and traffic
> will be forwarded to the backup port when the primary goes down. This is
> mainly used in MLAG and EVPN setups where we have peerlink path which is
> a backup of many (or even all) ports and is a participating bridge port
> itself. There's more detailed information in patch 02. Patch 01 just
> prepares the port sysfs code for options that take raw value. The main
> issues that this set solves are scalability and fallback latency.
> 
> We have used similar code for over 6 months now to bring the fallback
> latency of the backup peerlink down and avoid fdb notification storms.
> Also due to the nature of master devices such setup is currently not
> possible, and last but not least having tens of thousands of fdbs require
> thousands of calls to switch.
> 
> I've also CCed our MLAG experts that have been using similar option.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (2):
>   net: bridge: add support for raw sysfs port options
>   net: bridge: add support for backup port
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/bridge/br_forward.c  | 16 -
>  net/bridge/br_if.c   | 53 
>  net/bridge/br_netlink.c  | 30 +++-
>  net/bridge/br_private.h  |  3 ++
>  net/bridge/br_sysfs_if.c | 82 
> 
>  6 files changed, 168 insertions(+), 17 deletions(-)
> 

Not sure why this has to be built into the bridge.
There already is bonding and teaming, why invent yet another?

Re: [Bridge] [PATCH] net: bridge: fix potential null pointer dereference on return from br_port_get_rtnl()

2018-06-21 Thread Stephen Hemminger

On Fri, 22 Jun 2018 07:20:56 +0900 (KST)
David Miller  wrote:

> From: Garry McNulty 
> Date: Thu, 21 Jun 2018 21:14:27 +0100
> 
> > br_port_get_rtnl() can return NULL if the network device is not a bridge
> > port (IFF_BRIDGE_PORT flag not set). br_port_slave_changelink() and
> > br_port_fill_slave_info() callbacks dereference this pointer without
> > checking. Currently this is not a problem because slave devices always
> > set this flag. Add null check in case these conditions ever change.
> > 
> > Detected by CoverityScan, CID 1339613 ("Dereference null return value")
> > 
> > Signed-off-by: Garry McNulty   
> 
> I don't think this is reasonable.
> 
> The bridge code will never, ever, install a slave that doesn't have
> that bit set.  It's the most fundamental aspect of how these objects
> are managed.

Agree with dave. Workarounds for static tools are ok if they don't introduce
other paths. But if your fix introduces another error path which can never
be reached, it is hurting not helping.

Re: [Bridge] [PATCH net-next] net: bridge: Lock before br_fdb_find()

2018-05-28 Thread Stephen Hemminger

On Mon, 28 May 2018 17:44:16 +0200
Petr Machata  wrote:

> Callers of br_fdb_find() need to hold the hash lock, which
> br_fdb_find_port() doesn't do. Add the missing lock/unlock
> pair.
> 
> Signed-off-by: Petr Machata 
> ---
>  net/bridge/br_fdb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b19e310..3f5691a 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -135,9 +135,11 @@ struct net_device *br_fdb_find_port(const struct 
> net_device *br_dev,
>   return NULL;
>  
>   br = netdev_priv(br_dev);
> + spin_lock_bh(>hash_lock);
>   f = br_fdb_find(br, addr, vid);
>   if (f && f->dst)
>   dev = f->dst->dev;
> + spin_unlock_bh(>hash_lock);
>  
>   return dev;
>  }

Sigh. when did br_fdb_find start needing hash_lock?
What is the point of RCU then?

[Bridge] [PATCH net] bridge: netfilter stp fix reference to uninitialized data

2018-04-27 Thread Stephen Hemminger

The destination mac (destmac) is only valid if EBT_DESTMAC flag
is set. Fix by changing the order of the comparison to look for
the flag first.

Reported-by: syzbot+5c06e318fc558cc27...@syzkaller.appspotmail.com
Signed-off-by: Stephen Hemminger <step...@networkplumber.org>
---
Note: no fixes since this bug goes back to pre-git days.
Should go to stable as well.

 net/bridge/netfilter/ebt_stp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bridge/netfilter/ebt_stp.c b/net/bridge/netfilter/ebt_stp.c
index 47ba98db145d..46c1fe7637ea 100644
--- a/net/bridge/netfilter/ebt_stp.c
+++ b/net/bridge/netfilter/ebt_stp.c
@@ -161,8 +161,8 @@ static int ebt_stp_mt_check(const struct xt_mtchk_param 
*par)
/* Make sure the match only receives stp frames */
if (!par->nft_compat &&
(!ether_addr_equal(e->destmac, eth_stp_addr) ||
-!is_broadcast_ether_addr(e->destmsk) ||
-!(e->bitmask & EBT_DESTMAC)))
+!(e->bitmask & EBT_DESTMAC) ||
+!is_broadcast_ether_addr(e->destmsk)))
return -EINVAL;
 
return 0;
-- 
2.17.0

Re: [Bridge] [PATCH net-next v2 1/6] net: bridge: Publish bridge accessor functions

2018-04-27 Thread Stephen Hemminger

On Fri, 27 Apr 2018 18:11:06 +0300
Ido Schimmel  wrote:

> +int br_vlan_pvid_rtnl(const struct net_device *dev, u16 *p_pvid)
> +{
> + struct net_bridge_vlan_group *vg;
> +
> + ASSERT_RTNL();
> + if (netif_is_bridge_master(dev))
> + vg = br_vlan_group(netdev_priv(dev));
> + else
> + return -EINVAL;
> +
> + *p_pvid = br_get_pvid(vg);
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(br_vlan_pvid_rtnl);


Unless there is a RCU version, no need for _rtnl suffix.

Minor style issue, why not make error then go on and return -1 on error.



int br_vlan_pvid(const struct net_device *dev)
{
const struct net_bridge_vlan_group *vg;

ASSERT_RTNL();

if (!netif_is_bridge_master(dev))
return -1;

vg = br_vlan_group(netdev_priv(dev));
return br_get_pvid(vg);
}

Re: [Bridge] [PATCH net-next 6/6] mlxsw: spectrum_span: Allow bridge for gretap mirror

2018-04-26 Thread Stephen Hemminger

On Thu, 26 Apr 2018 13:50:12 +0300
Nikolay Aleksandrov  wrote:

> On 26/04/18 12:06, Ido Schimmel wrote:
> > From: Petr Machata 
> > 
> > When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
> > underlay address (i.e. the remote address of the tunnel) may be routed
> > to a bridge.
> > 
> > In that case, look up the resolved neighbor Ethernet address in that
> > bridge's FDB. Then configure the offload to direct the mirrored traffic
> > to that port, possibly with tagging.
> > 
> > Signed-off-by: Petr Machata 
> > Signed-off-by: Ido Schimmel 
> > ---
> >   .../net/ethernet/mellanox/mlxsw/spectrum_span.c| 102 
> > +++--
> >   .../net/ethernet/mellanox/mlxsw/spectrum_span.h|   1 +
> >   2 files changed, 97 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c 
> > b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > index 65a77708ff61..92fb81839852 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c
> > @@ -32,6 +32,7 @@
> >* POSSIBILITY OF SUCH DAMAGE.
> >*/
> >   
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -39,8 +40,9 @@
> >   #include 
> >   
> >   #include "spectrum.h"
> > -#include "spectrum_span.h"
> >   #include "spectrum_ipip.h"
> > +#include "spectrum_span.h"
> > +#include "spectrum_switchdev.h"
> >   
> >   int mlxsw_sp_span_init(struct mlxsw_sp *mlxsw_sp)
> >   {
> > @@ -167,6 +169,79 @@ mlxsw_sp_span_entry_unoffloadable(struct 
> > mlxsw_sp_span_parms *sparmsp)
> > return 0;
> >   }
> >   
> > +static struct net_device *
> > +mlxsw_sp_span_entry_bridge_8021q(const struct net_device *br_dev,
> > +unsigned char *dmac,
> > +u16 *p_vid)
> > +{
> > +   struct net_bridge_vlan_group *vg = br_vlan_group_rtnl(br_dev);
> > +   u16 pvid = br_vlan_group_pvid(vg);
> > +   struct net_device *edev = NULL;
> > +   struct net_bridge_vlan *v;
> > +  
> 
> Hi,
> These structures are not really exported anywhere outside of br_private.h
> Instead of passing them around and risking someone else actually trying to
> dereference an incomplete type, why don't you add just 2 new helpers -
> br_vlan_pvid(netdevice) and br_vlan_info(netdevice, vid, *bridge_vlan_info)
> 
> br_vlan_info can return the exported and already available type 
> "bridge_vlan_info"
> (defined in uapi/linux/if_bridge.h) into the bridge_vlan_info arg
> and br_vlan_pvid is obvious - return the current dev pvid if available.
> 
> Another bonus you'll avoid dealing with the bridge's vlan internals.

I am particularly worried that any locking changes will impact the device
driver. Also lock inversion issues, and implied (or not RTNL).
Also what if a value is changed, there is no notification path back to the 
device.

Having an API that only takes net_device and vlan seems preferable.

Re: [Bridge] [PATCH net-next] bridge: use hlist_entry_safe

2018-04-26 Thread Stephen Hemminger

On Thu, 26 Apr 2018 11:07:05 +0800
YueHaibing <yuehaib...@huawei.com> wrote:

> Use hlist_entry_safe() instead of open-coding it.
> 
> Signed-off-by: YueHaibing <yuehaib...@huawei.com>
> ---
>  net/bridge/br_forward.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index b4eed11..7a7fd67 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -274,8 +274,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
>   struct net_bridge_port *port, *lport, *rport;
>  
>   lport = p ? p->port : NULL;
> - rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
> -  NULL;
> + rport = hlist_entry_safe(rp, struct net_bridge_port, rlist);
>  
>   if ((unsigned long)lport > (unsigned long)rport) {
>
port = lport;

Reviewed-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] Bridge with 1x eth and 2x tun, adding 2nd eth, hosts become unavailable

2018-04-08 Thread Stephen Hemminger

On Sat, 7 Apr 2018 09:59:37 +0200
"Marc Roos"  wrote:

> In this situation I can ping a vm from br0 connected to 172.16.1.y
> 
> 
> 
>+---+   
>|BR0|   
>172.16.1.y  |   |  172.16.1.x
>|ETH2   TUN1|---   
>|   |   
>|   |  172.16.1.x
>|   TUN2|---   
>|   |
>+---+ 
> 
> When only adding a 2nd eth adapter with 'brctl addif br0 eth1' connected 
> to 10.11.12.y. The same ping is not working anymore.
> 
> 
>+---+   
>|BR0|   
>172.16.1.y  |   |  172.16.1.x
>|ETH2   TUN1|---   
>|   |   
>|   |  172.16.1.x
> 10.11.12.y |   TUN2|---   
>|ETH1   |
>+---+   
>
> When I remove the eth1 again, the ping starts working again.
> 
> Can someone explain me why the bridge is behaving like this? And how 
> this can be fixed?
> 
>   
> 
>  

Are you creating a spanning tree loop?  Also. Bridge is a Layer 2 device
not a Layer 3 device; therefore generally all interfaces in a bridge have same
subnet.

Re: [Bridge] [PATCH] brcmfmac: drop Inter-Access Point Protocol packets by default

2018-03-14 Thread Stephen Hemminger

On Wed, 14 Mar 2018 17:08:48 +0200
Kalle Valo  wrote:

> Arend van Spriel  writes:
> 
> > On 3/14/2018 3:24 PM, Kalle Valo wrote:  
> >>> +config BRCMFMAC_IAPP  
> >>> >+bool "Partial support for obsoleted Inter-Access Point Protocol"
> >>> >+depends on BRCMFMAC
> >>> >+---help---
> >>> >+  Most of Broadcom's firmwares can send 802.11f ADD frame every
> >>> >+  time new STA connects to the AP interface. Some recent ones
> >>> >+  can also disassociate STA when they receive such a frame.
> >>> >+
> >>> >+  It's important to understand this behavior can lead to a local
> >>> >+  DoS security issue. Attacker may trigger disassociation of any
> >>> >+  STA by sending a proper Ethernet frame to the wireless
> >>> >+  interface.
> >>> >+
> >>> >+  Moreover this feature may break AP interfaces in some specific
> >>> >+  setups. This applies e.g. to the bridge with hairpin mode
> >>> >+  enabled and IFLA_BRPORT_MCAST_TO_UCAST set. IAPP packet
> >>> >+  generated by a firmware will get passed back to the wireless
> >>> >+  interface and cause immediate disassociation of just-connected
> >>> >+  STA.  
> >> Sorry for jumping late, but does it really make sense to have a Kconfig
> >> option for this? I don't think we should add a Kconfig option for every
> >> strange feature, there should be stronger reasons (size savings etc)
> >> before adding a Kconfig option.
> >>
> >> And in this case the size savings can't be much. Wouldn't a module
> >> parameter be simpler for a functionality change like this?  
> >
> > Hi Kalle,
> >
> > Good to be wary about Kconfig option.  
> 
> I think Linus doesn't like pointless Kconfig options, me neither for
> that matter, so I try to make sure the justifications are really there
> before adding anything new.
> 
> > So my reason for asking a Kconfig option is that this is directly in
> > the datapaths (tx and rx) so I prefer to disable/enable it compile
> > time rather then runtime.  
> 
> I'm no cpu profile expert but is really one (or two?) if checks of a
> cached variable in the datapath really measurable? My guess is that it's
> just noise in the results.
> 
> But I'm not going to argue about it, if you think it's still needed I'm
> fine with that. Just mention in the commit log the justification the new
> Kconfig option.


If you have to disable it a module parameter is not a complete disaster

Re: [Bridge] [PATCH] brcmfmac: drop Inter-Access Point Protocol packets by default

2018-03-14 Thread Stephen Hemminger

On Wed, 14 Mar 2018 12:01:19 +0100
Rafał Miłecki  wrote:

> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> index 19048526b4af..db6987015fb1 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
> @@ -230,6 +230,34 @@ static void brcmf_netdev_set_multicast_list(struct 
> net_device *ndev)
>   schedule_work(>multicast_work);
>  }
>  
> +/**
> + * brcmf_skb_is_iapp - checks if skb is an IAPP packet
> + *
> + * @skb: skb to check
> + */
> +static bool brcmf_skb_is_iapp(struct sk_buff *skb)
> +{
> + const u8 iapp_l2_update_packet[6] __aligned(2) = {
> + 0x00, 0x01, 0xaf, 0x81, 0x01, 0x00,
> + };
> + unsigned char *eth_data = skb_mac_header(skb) + ETH_HLEN;
> +#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> + const u16 *a = (const u16 *)eth_data;
> + const u16 *b = (const u16 *)iapp_l2_update_packet;
> +#endif
> +
> + if (skb->len - skb->mac_len != 6 ||
> + !is_multicast_ether_addr(eth_hdr(skb)->h_dest))
> + return false;
> +
> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> + return !(((*(const u32 *)eth_data) ^ (*(const u32 
> *)iapp_l2_update_packet)) |
> +  ((*(const u16 *)(eth_data + 4)) ^ (*(const u16 
> *)(iapp_l2_update_packet + 4;
> +#else
> + return !((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2]));
> +#endif
> +}
> +
>  static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
>  struct net_device *ndev)
>  {
> @@ -250,6 +278,12 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct 
> sk_buff *skb,
>   goto done;
>   }
>  
> + if (!IS_ENABLED(CONFIG_BRCMFMAC_IAPP) && brcmf_skb_is_iapp(skb)) {
> + dev_kfree_skb(skb);
> + ret = -EINVAL;
> + goto done;
> + }
> +

The usual way to handle config options in kernel is either inline
stub function or #define

#ifdef CONFIG_BRFMMAC_IAPP
static bool brcmf_skb_is_app(...) {
real code
}
#else
#define brcmf_skb_is_app  (false)
#endif

Re: [Bridge] Problem with bridge (mcast-to-ucast + hairpin) and Broadcom's 802.11f in their FullMAC fw

2018-03-12 Thread Stephen Hemminger

On Mon, 12 Mar 2018 23:42:48 +0100
Rafał Miłecki  wrote:

> 2) Blame bridge + mcast-to-ucast + hairpin for 802.11f incompatibility
> 
> If we agree that 802.11f support in FullMAC firmware is acceptable, then
> we have to make sure Linux's bridge doesn't break it by passing 802.11f
> (broadcast) frames back to the source interface. That would require a
> check like in below diff + proper code for handling such packets. I'm
> afraid I'm not familiar with bridge code enough to complete that.
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index edae702..9e5d6ea 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -126,6 +126,27 @@ static void br_do_proxy_arp(struct sk_buff *skb, struct 
> net_bridge *br,
>   }
>   }
> 
> +static bool br_skb_is_iapp_add_packet(struct sk_buff *skb)
> +{
> + const u8 iapp_add_packet[6] __aligned(2) = {
> + 0x00, 0x01, 0xaf, 0x81, 0x01, 0x00,
> + };
> +#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> + const u16 *a = (const u16 *)skb->data;
> + const u16 *b = (const u16 *)iapp_add_packet;
> +#endif
> +
> + if (skb->len != 6)
> + return false;
> +
> +#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
> + return !(((*(const u32 *)skb->data) ^ (*(const u32 *)iapp_add_packet)) |
> +  ((*(const u16 *)(skb->data + 4)) ^ (*(const u16 
> *)(iapp_add_packet + 4;
> +#else
> + return !((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2]));
> +#endif
> +}
> +
>   /* note: already called with rcu_read_lock */
>   int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff 
> *skb)
>   {
> @@ -155,6 +176,8 @@ int br_handle_frame_finish(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>   if (is_multicast_ether_addr(dest)) {
>   /* by definition the broadcast is also a multicast address */
>   if (is_broadcast_ether_addr(dest)) {
> + if (br_skb_is_iapp_add_packet(skb))
> + pr_warn("This packet should not be passed back 
> to the source interface!\n");
>   pkt_type = BR_PKT_BROADCAST;
>   local_rcv = true;
>   } else {


Don't like bridge doing special case code for magic received values directly in 
input path.
Really needs to be generic which is why I suggested ebtables.

Re: [Bridge] [PATCH net-next 1/5] bridge: initialize port flags with switchdev defaults

2018-03-10 Thread Stephen Hemminger

On Fri,  9 Mar 2018 19:03:04 -0800
Igor Mitsyanko  wrote:

> Default bridge port flags for switchdev devices can be different from
> what is used in bridging core. Get default value from switchdev itself
> on port initialization.
> 
> Signed-off-by: Igor Mitsyanko 
> ---

Yes hardware devices may come it with different default values.
But the user experience on Linux needs to be consistent.

Instead, it makes more sense to me for each device driver using switchdev
to program to the values that it sees in the bridge.

Please revise this.

Re: [Bridge] ssh connection not working when ssh server is behind a linux bridge

2018-01-28 Thread Stephen Hemminger

On Sun, 28 Jan 2018 22:15:34 +0200
Adrian Pascalau  wrote:

> Hi,
> 
> I have a strange issue with a linux bridge and a openssh server
> running in a VM connected to that bridge. Basically when I ssh with
> Putty or any other windows based ssh client to an openssh server
> running in a centos VM connected to the external network through a
> linux bridge, the ssh connection hangs before the login prompt is
> shown.
> 
> What I have learned is that when an Ethernet frame that is less then
> 60 bytes in size goes through the network, it is padded with 0x00
> bytes until it has 60 bytes in length (64 with the frame check
> sequence). When this kind of padded Ethernet frame goes from the
> openssh server in my VM through the linux bridge to a windows host
> where the Putty ssh client is, the IP and TCP headers in those frames
> wrongly consider the 0x00 padded bytes as part of the IP / TCP user
> data, therefore the upstream protocol (SSH in my case) tries to
> interpret them, and the ssh session hangs.
> 
> My understanding is that those 0x00 padded bytes are at the layer2
> Ethernet frame level, and should not be considered in the user data of
> the higher level protocols. About the padding bytes I have found some
> info here: https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths
> 
> So I suspect this behavior is caused by the linux bridge, because
> without the linux bridge, the ssh connection works without any issue.
> What I mean is that if I ssh to the host operating system, the ssh
> connection works without any issue, however if I ssh to the centos VM
> that is running in that host operating system and that uses the linux
> bridge for external network access, then I have this behavior that I
> describe above. With other words, only when the linux bridge is in the
> path of the ssh packets, this issue happens.
> 
> Now, my centos VM where I have this ssh issue is managed by libvirt,
> and the bridge is in forwarding mode and created by hand. If the same
> centos VM is migrated into an all-in-one OpenStack Pike host, where
> the bridge is managed by neutron, the ssh connection works again
> without any issue. In all cases I am talking about the latest centos
> openssh server, and the default ssh server configuration file.
> 
> So, what do you think? Where should I look further to understand what
> exactly causes this behavior?
> 
> Many thanks,
> Adrian

These symptoms sound like an MTU mismatch.
The padding is not related. More likely, the issue is that one side
is sending a larger frame than the MTU of the underlying interface.
Since the bridge is a pure layer 2 interface, it has not choice
but to drop any frame where the size is greater than the MTU.

Re: [Bridge] [PATCH] bridge: return boolean instead of integer in br_multicast_is_router

2018-01-18 Thread Stephen Hemminger

On Thu, 18 Jan 2018 17:37:45 -0600
"Gustavo A. R. Silva" <garsi...@embeddedor.com> wrote:

> Return statements in functions returning bool should use
> true/false instead of 1/0.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva <garsi...@embeddedor.com>

LGTM

Fixes: 85b352693264 ("bridge: Fix build error when IGMP_SNOOPING is not 
enabled")
Reviewed-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] [PATCH 3.2 01/06] "bridge should send gratuitous ARP to notify peer while a bond, which is a port of this bridge, changes."

2018-01-06 Thread Stephen Hemminger

On Sat, 6 Jan 2018 17:25:20 +0800
邢庆杰  wrote:

> We create bond0 and add eth0 as slaves. Eth0 is active. Then we add
> bond0
> into br0 as a bridge port. Br0 has ip address. When eth0 is down, after
> bond0's
> failover eth1 become active. At this moment, we need br0 send a gratuitous
> ARP
> to notify peer to update ARP table.
> 
> Signed-off-by: Xing.Qingjie 
> ---
>  net/bridge/br_notify.c |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)
> diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c
> index a76b621..90cf123 100644
> --- a/net/bridge/br_notify.c
> +++ b/net/bridge/br_notify.c
> @@ -34,6 +34,7 @@ static int br_device_event(struct notifier_block *unused,
> unsigned long event, v
> struct net_device *dev = ptr;
> struct net_bridge_port *p;
> struct net_bridge *br;
> +   struct in_device *in_dev;
> bool changed_addr;
> int err;
> @@ -102,6 +103,17 @@ static int br_device_event(struct notifier_block
> *unused, unsigned long event, v
> case NETDEV_PRE_TYPE_CHANGE:
> /* Forbid underlaying device to change its type. */
> return NOTIFY_BAD;
> +
> +   case NETDEV_NOTIFY_PEERS:
> +   /* Send gratuitous arp while bond,a port of bridge, changes
> */
> +   if (dev->priv_flags & IFF_BONDING && dev->flags &
> IFF_MASTER) {
> +   in_dev = __in_dev_get_rtnl(br->dev);
> +
> +   ASSERT_RTNL();
> +   if (in_dev)
> +   inetdev_send_gratuitous_arp(br->dev,
> in_dev);
> +   }
> +   break;
> }
> /* Events that may cause spanning tree to refresh */
> --
> 1.7.1

Thanks for your patch.

There is no reason a bridge has to have an IP address.
This should be done in bonding device, doing it here is error prone.
Plus there are many more layered devices than bonding.

The patch also has many formatting issues.
Your mail client mangles patches.
Conditional expressions with & and && should have parenthesis.

Re: [Bridge] [PATCH v1] bridge: Use helpers to handle MAC address

2017-12-19 Thread Stephen Hemminger

On Tue, 19 Dec 2017 20:10:53 +0200
Andy Shevchenko <andriy.shevche...@linux.intel.com> wrote:

> Use
>   %pM to print MAC
>   mac_pton() to convert it from ASCII to binary format, and
>   ether_addr_copy() to copy.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevche...@linux.intel.com>

Looks good

Signed-off-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] [PATCH net] net: bridge: fix early call to br_stp_change_bridge_id

2017-12-16 Thread Stephen Hemminger

On Sat, 16 Dec 2017 13:31:36 +0200
Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:

> The early call to br_stp_change_bridge_id in bridge's newlink can cause
> a memory leak if an error occurs during the newlink because the fdb
> entries are not cleaned up if a different lladdr was specified, also
> another minor issue is that it generates fdb notifications with
> ifindex = 0. To remove this special case the call is done after netdev
> register and we cleanup any bridge fdb entries on changelink error.
> That also doesn't slow down normal bridge removal, alternative is to call
> it in its ndo_uninit.
> 
> To reproduce the issue:
> $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1
> RTNETLINK answers: Invalid argument
> 
> $ rmmod bridge
> [ 1822.142525] 
> =
> [ 1822.143640] BUG bridge_fdb_cache (Tainted: G   O): Objects 
> remaining in bridge_fdb_cache on __kmem_cache_shutdown()
> [ 1822.144821] 
> -
> 
> [ 1822.145990] Disabling lock debugging due to kernel taint
> [ 1822.146732] INFO: Slab 0x92a844b2 objects=32 used=2 
> fp=0xfef011b0 flags=0x1800100
> [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: GB  O 
> 4.15.0-rc2+ #87
> [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.7.5-20140531_083030-gandalf 04/01/2014
> [ 1822.150008] Call Trace:
> [ 1822.150510]  dump_stack+0x78/0xa9
> [ 1822.151156]  slab_err+0xb1/0xd3
> [ 1822.151834]  ? __kmalloc+0x1bb/0x1ce
> [ 1822.152546]  __kmem_cache_shutdown+0x151/0x28b
> [ 1822.153395]  shutdown_cache+0x13/0x144
> [ 1822.154126]  kmem_cache_destroy+0x1c0/0x1fb
> [ 1822.154669]  SyS_delete_module+0x194/0x244
> [ 1822.155199]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> [ 1822.155773]  entry_SYSCALL_64_fastpath+0x23/0x9a
> [ 1822.156343] RIP: 0033:0x7f929bd38b17
> [ 1822.156859] RSP: 002b:7ffd160e9a98 EFLAGS: 0202 ORIG_RAX: 
> 00b0
> [ 1822.157728] RAX: ffda RBX: 5578316ba090 RCX: 
> 7f929bd38b17
> [ 1822.158422] RDX: 7f929bd9ec60 RSI: 0800 RDI: 
> 5578316ba0f0
> [ 1822.159114] RBP: 0003 R08: 7f929bff5f20 R09: 
> 7ffd160e8a11
> [ 1822.159808] R10: 7ffd160e9860 R11: 0202 R12: 
> 7ffd160e8a80
> [ 1822.160513] R13:  R14:  R15: 
> 5578316ba090
> [ 1822.161278] INFO: Object 0x7645de29 @offset=0
> [ 1822.161666] INFO: Object 0xd5df2ab5 @offset=128
> 
> Fixes: a4b816d8ba1c ("bridge: Change local fdb entries whenever mac address 
> of bridge device changes")
> Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>
> ---
> Consequently this also would fix the null ptr deref due to the rhashtable
> not being initialized in net-next when br_stp_change_bridge_id is called.
> 
> Toshiaki, any reason you called br_stp_change_bridge_id before
> register_netdevice when you introduced it in 30313a3d5794 ?
> 
>  net/bridge/br_netlink.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)

Thanks for working on this.

I agree that fixing this in ndo_uninit would be wrong. There are less bugs
if init and uninit do logically equivalent steps.

A bridge device can be created either with netlink or ioctl.
This change is also makes both ways of adding MAC have
the same semantics;
If bridge is created with ioctl then the bridge_id (and MAC)
will not be changed until later device is added or MAC address
is set by other operation.

Signed-off-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm

2017-12-16 Thread Stephen Hemminger

On Sat, 16 Dec 2017 16:19:03 +0200
Adrian P  wrote:

> Investigating this further, I have noticed that the mac address of the
> eth0 interface from the cirros VM instance does not appear in the
> bridge forwarding table, and this explains why everything starts
> working only when I set ageing time to 0, because in that case all
> packets are flooded on all ports and the bridge behaves like a hub.
> 
> So now the question is: why the bridge does not learn the mac address
> of the eth0 interface from the cirros VM instance? I am able to see
> the arp request (ARP, Request who-has 10.20.21.1 tell 10.20.21.233)
> going out from the cirros VM instance on tap interface, so the bridge
> should learn the mac address and add it to the forwarding table.
> 
> The reply back to the arp request (Reply 10.20.21.1 is-at
> 00:17:08:c4:52:80) does not reach the cirros VM instance anymore, and
> now I know why: there is no mac address in the forwarding table, so
> the bridge does not know on which port to send the arp reply back.
> 
> This happens with tap interfaces only. I can see many mac addresses
> associated with "physical" interface ens160 (that is interface number
> 1) in the forwarding table, but in case of the tap interfaces, there
> are only two entries, and both entries shows the mac address of the
> tap interfaces only:

VMWare does ARP spoofing maybe it consumes the ARP

Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm

2017-12-15 Thread Stephen Hemminger

On Fri, 15 Dec 2017 18:29:58 +0200
Adrian P <adrian27ora...@gmail.com> wrote:

> On Fri, Dec 15, 2017 at 5:55 PM, Stephen Hemminger
> <step...@networkplumber.org> wrote:
> > On Fri, 15 Dec 2017 15:37:39 +0200
> > Adrian P <adrian27ora...@gmail.com> wrote:
> >  
> >> Hello,
> >>
> >> I have a strange issue with a linux bridge created by
> >> openstack-neutron (pike release). This linux bridge is hosted in a
> >> vmware VM running latest CentOS 7, with a single network interface in
> >> promiscuous mode.
> >>
> >> From openstack neutron perspective, the networking setup is simple: a
> >> single flat external provider network, with a single cirros VM
> >> instance connected to it.
> >>
> >> Therefore, in the linux bridge running in the vmware host, I have 3 
> >> interfaces:
> >>
> >> # brctl show
> >> bridge name bridge id   STP enabled interfaces
> >> brq025a9a94-58  8000.005056a6b378   no  ens160
> >> tap2eb4cad6-cd
> >><- neutron DHCP agent tap interface
> >> tap6d31a191-9f
> >><- cirros VM instance tap interface
> >>
> >> The ens160 is the "physical" CentOS 7 host interface, that is in
> >> promiscuous mode.
> >>
> >> The  tap2eb4cad6-cd tap interface is the neutron DHCP agent interface,
> >> and the tap6d31a191-9f tap interface is used by the cirros VM
> >> instance.
> >>
> >> The problem is the following:
> >>
> >> With a tcpdump, I am able to see the arp request (ARP, Request who-has
> >> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on
> >> tap interface tap6d31a191-9f, and well as on the bridge itself
> >> (brq025a9a94-58). However, the reply back to the arp request (Reply
> >> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM
> >> instance anymore. With tcpdump, I am able to see the arp reply back
> >> packets in the bridge (brq025a9a94-58), however they do not show up
> >> anymore on the cirros VM instance tap interface tap6d31a191-9f.
> >>
> >> To me it seems that for whatever reason, the bridge does not forward
> >> the arp reply back packets to the cirros VM tap interface, and I do
> >> not understand why. The strange thing is that after a while, for
> >> apparently no reason, a single arp reply back packet gets through the
> >> bridge and the tap interface, and the arp table gets updated with
> >> correct IP address in the cirros VM instance. However, if I clean up
> >> the arp table in the cirros VM instance, it takes again 10 to 15
> >> minutes of continuously sending arp requests, until a single arp reply
> >> back packets gets through.
> >>
> >> I was banging my head to the table for a few days with this issue, and
> >> finally, for apparent no reason, I manually configured the bridge max
> >> aging time to 0, to convert it in a hub, and from that moment
> >> everything started to work without any issue. Still, I do no
> >> understand why is this happening, and obviously I cannot manually set
> >> up the bridge aging time to 0 all the time in all the bridges
> >> openstack neutron creates automatically.
> >>
> >> Any thoughts?
> >>
> >> Many thanks in advance.
> >>
> >> Best regards,
> >> Adrian  
> >
> > Does each tap instance and the ens160 have a different and valid Ethernet
> > address?  Also make sure the these are in the bridge forwarding table.  
> 
> Yes, they have valid Ethernet addresses, and they do show up in the
> forwarding table twice, see below:
> 
> # ip addr
> <...>
> 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master
> brq025a9a94-58 state UP qlen 1000
> link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
> inet6 fe80::250:56ff:fea6:b378/64 scope link
>valid_lft forever preferred_lft forever
> 4: tap2eb4cad6-cd@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
> qdisc noqueue master brq025a9a94-58 state UP qlen 1000
> link/ether 8a:b2:15:4c:96:55 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> 5: brq025a9a94-58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UP qlen 1000
> link/ether 00:50:56:a6:b3:78 brd ff:ff:ff:ff:ff:ff
> inet 10.20.21.249/24 brd 10.20.21.255 scope global brq025a9a94-58
>

Re: [Bridge] linux bridge does not forward arp reply back packets in a vmware vm

2017-12-15 Thread Stephen Hemminger

On Fri, 15 Dec 2017 15:37:39 +0200
Adrian P  wrote:

> Hello,
> 
> I have a strange issue with a linux bridge created by
> openstack-neutron (pike release). This linux bridge is hosted in a
> vmware VM running latest CentOS 7, with a single network interface in
> promiscuous mode.
> 
> From openstack neutron perspective, the networking setup is simple: a
> single flat external provider network, with a single cirros VM
> instance connected to it.
> 
> Therefore, in the linux bridge running in the vmware host, I have 3 
> interfaces:
> 
> # brctl show
> bridge name bridge id   STP enabled interfaces
> brq025a9a94-58  8000.005056a6b378   no  ens160
> tap2eb4cad6-cd
><- neutron DHCP agent tap interface
> tap6d31a191-9f
><- cirros VM instance tap interface
> 
> The ens160 is the "physical" CentOS 7 host interface, that is in
> promiscuous mode.
> 
> The  tap2eb4cad6-cd tap interface is the neutron DHCP agent interface,
> and the tap6d31a191-9f tap interface is used by the cirros VM
> instance.
> 
> The problem is the following:
> 
> With a tcpdump, I am able to see the arp request (ARP, Request who-has
> 10.20.21.1 tell 10.20.21.233) going out from the cirros VM instance on
> tap interface tap6d31a191-9f, and well as on the bridge itself
> (brq025a9a94-58). However, the reply back to the arp request (Reply
> 10.20.21.1 is-at 00:17:08:c4:52:80) does not reach the cirros VM
> instance anymore. With tcpdump, I am able to see the arp reply back
> packets in the bridge (brq025a9a94-58), however they do not show up
> anymore on the cirros VM instance tap interface tap6d31a191-9f.
> 
> To me it seems that for whatever reason, the bridge does not forward
> the arp reply back packets to the cirros VM tap interface, and I do
> not understand why. The strange thing is that after a while, for
> apparently no reason, a single arp reply back packet gets through the
> bridge and the tap interface, and the arp table gets updated with
> correct IP address in the cirros VM instance. However, if I clean up
> the arp table in the cirros VM instance, it takes again 10 to 15
> minutes of continuously sending arp requests, until a single arp reply
> back packets gets through.
> 
> I was banging my head to the table for a few days with this issue, and
> finally, for apparent no reason, I manually configured the bridge max
> aging time to 0, to convert it in a hub, and from that moment
> everything started to work without any issue. Still, I do no
> understand why is this happening, and obviously I cannot manually set
> up the bridge aging time to 0 all the time in all the bridges
> openstack neutron creates automatically.
> 
> Any thoughts?
> 
> Many thanks in advance.
> 
> Best regards,
> Adrian

Does each tap instance and the ens160 have a different and valid Ethernet
address?  Also make sure the these are in the bridge forwarding table.

Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> Before this patch the bridge used a fixed 256 element hash table which
> was fine for small use cases (in my tests it starts to degrade
> above 1000 entries), but it wasn't enough for medium or large
> scale deployments. Modern setups have thousands of participants in a
> single bridge, even only enabling vlans and adding a few thousand vlan
> entries will cause a few thousand fdbs to be automatically inserted per
> participating port. So we need to scale the fdb table considerably to
> cope with modern workloads, and this patch converts it to use a
> rhashtable for its operations thus improving the bridge scalability.
> Tests show the following results (10 runs each), at up to 1000 entries
> rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
> is 2 times faster and at 3 it is 50 times faster.
> Obviously this happens because of the properties of the two constructs
> and is expected, rhashtable keeps pretty much a constant time even with
> 1000 entries (tested), while the fixed hash table struggles
> considerably even above 1.
> As a side effect this also reduces the net_bridge struct size from 3248
> bytes to 1344 bytes. Also note that the key struct is 8 bytes.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---

Thanks for doing this, it was on my list of things that never get done.

Some downsides:
 * size of the FDB entry gets larger.
 * you lost the ability to salt the hash (and rekey) which is important
   for DDoS attacks
 * being slower for small (<10 entries) also matters and is is a common
   use case for containers.

Re: [Bridge] [PATCH net-next] net: bridge: use rhashtable for fdbs

2017-12-12 Thread Stephen Hemminger

On Tue, 12 Dec 2017 16:02:50 +0200
Nikolay Aleksandrov  wrote:

> + memcpy(__entry->addr, f->key.addr.addr, ETH_ALEN);

Maybe use ether_addr_copy() here?

Re: [Bridge] Linux Bridge Static FDB move

2017-10-26 Thread Stephen Hemminger

If you have a moving MAC then you have a network that is flapping,
duplicate MAC, or worse a network loop. All of these are signs of a broken
L2 network. The bridge can't fix these

On Oct 26, 2017 19:33,  wrote:

> Thank you for the reply.
>
> Viraj
> 
> From: Toshiaki Makita 
> Sent: Thursday, October 26, 2017 4:13 AM
> To: Raiyani, Viraj; bridge@lists.linux-foundation.org
> Subject: Re: [Bridge] Linux Bridge Static FDB move
>
> On 2017/10/26 1:47, viraj.raiy...@dell.com wrote:
> > Hi Everyone,
> >
> >
> > Is it possible to program the FDB entry in Linux Bridge which is static,
> > non-local and doesn't move to new interface when the same source MAC
> > packet comes on a different interface in the same bridge  ?
>
> AFAIK no.
> Bridge supports static fdb entries, but br_fdb_update() updates their
> dst even for static entries.
>
> >
> >
> > I tried programming the MAC as permanent that prevents the moving of MAC
> > to a new interface in the same bridge, however it treats the MAC as
> > local and doesn't do the forwarding ?
>
> Yes, local entries will deliver frames to the bridge device itself.
>
> --
> Toshiaki Makita
>
>

Re: [Bridge] [PATCH net-next v3 1/2] bridge: netlink: make setlink/dellink notifications more accurate

2017-10-26 Thread Stephen Hemminger

On Thu, 26 Oct 2017 00:59:41 +0300
Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:

> Before this patch we had cases that either sent notifications when there
> were in fact no changes (e.g. non-existent vlan delete) or didn't send
> notifications when there were changes (e.g. vlan add range with an error in
> the middle, port flags change + vlan update error). This patch sends down
> a boolean to the functions setlink/dellink use and if there is even a
> single configuration change (port flag, vlan add/del, port state) then
> we always send a notification. This is all done to keep backwards
> compatibility with the opportunistic vlan delete, where one could
> specify a vlan range that has missing vlans inside and still everything
> in that range will be cleared, this is mostly used to clear the whole
> vlan config with a single call, i.e. range 1-4094.
> 
> Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>

This looks correct.
As a general note, generating a notice with no change should be ok for a 
correctly
written application. But missing a notification would cause synchronization 
problems.

Acked-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] Problem with syslog in bridge(?) with ethernet broadcast packets with QinQ

2017-10-12 Thread Stephen Hemminger

On Mon, 18 Sep 2017 17:21:11 +0200
Andrés Pozo Muñoz  wrote:

> Hi all,
> 
> I'm facing some problem with ethernet multicast traffic and QinQ
> (802.3q in both tags) in -apparently- a bridge in Ubuntu 16.04
> (4.12.0-041200-generic).
> 
> My set up is the following:
> +---+
> | VM|
> +-+-+
>   |
>   | tag 55
> +-++
> |OVS   |
> +---+--+
> | veth3.9
> veth3   +  |
>  +--+
>  | veth2
> +---+--+
> |   BRIDGE |
> +---+--+
>  |
>  |
> +---+--+
> |  ens11f1 |
> +--+
> 
> 
> I am setting the inner tag with OVS and the outer tag with a vlan type veth.
> The problem I have is that every packet sent from the VM to Ethernet
> broadcast addresses (33:33:xx:xx:xx:xx) generates a "IPv6 header not
> found" entry in syslog.
> 
> If I generate a 'ens11f1.9' interface and remove the BRIDGE, the log
> entry dissapears.
> 
> Some questions:
> * Is there any problem with these packets (ethernet multicast with
> QinQ) in bridges?
> * Is there any configuration I should change for the problem to dissapear?
> 
> Thanks in advance for your help! Any hint about what could be
> generating those logs is appreciated.
> 
> Regards,
> Andrés Pozo

Where is the log message coming from? Could you change the printk into
a WARN_ON_ONCE and get backtrace? My guess is the firewall code doesn't
understand the additional tag.

Re: [Bridge] [PATCH net-next 6/7] net: bridge: Pass extack to down to netdev_master_upper_dev_link

2017-10-04 Thread Stephen Hemminger

On Tue,  3 Oct 2017 21:58:53 -0700
David Ahern <dsah...@gmail.com> wrote:

> Pass extack arg to br_add_if. Add messages for a couple of failures
> and pass arg to netdev_master_upper_dev_link.
> 
> Signed-off-by: David Ahern <dsah...@gmail.com>

This looks good. You might want to pass the netlink_ext_ack down as
an immutable pointer (const).

Acked-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] [PATCH] bridge: Fix format string for %ul

2017-10-03 Thread Stephen Hemminger

On Fri, 26 Aug 2016 23:10:28 -0400
Oleg Drokin  wrote:

> %ul would print an unsigned value and a letter l,
> likely it was %lu that was meant to print the long int,
> but in reality the values printed there are just regular signed
> ints, so just dropping the l altogether.
> 
> Signed-off-by: Oleg Drokin 
> ---
>  net/bridge/br_stp_bpdu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_stp_bpdu.c b/net/bridge/br_stp_bpdu.c
> index 5881fbc..15c4a9c 100644
> --- a/net/bridge/br_stp_bpdu.c
> +++ b/net/bridge/br_stp_bpdu.c
> @@ -230,7 +230,7 @@ void br_stp_rcv(const struct stp_proto *proto, struct 
> sk_buff *skb,
>   if (net_ratelimit())
>   br_notice(p->br,
> "port %u config from %pM"
> -   " (message_age %ul > max_age %ul)\n",
> +   " (message_age %u > max_age %u)\n",
> p->port_no,
> eth_hdr(skb)->h_source,
> bpdu.message_age, bpdu.max_age);

Could you make the format string a single line plwase.
And add Fixes tag.

Re: [Bridge] [PATCH net-next 0/3] bridge: neigh msg proxy and flood suppression support

2017-10-02 Thread Stephen Hemminger

On Mon, 2 Oct 2017 07:49:09 -0700
Roopa Prabhu  wrote:

> On Sun, Oct 1, 2017 at 9:36 PM, Roopa Prabhu  
> wrote:
> > From: Roopa Prabhu 
> >
> > This series implements arp and nd suppression in the bridge
> > driver for ethernet vpns. It implements rfc7432, section 10
> > https://tools.ietf.org/html/rfc7432#section-10
> > for ethernet VPN deployments. It is similar to the existing
> > BR_ARP_PROXY flag but has a few semantic differences to conform
> > to EVPN standard. In case of EVPN, it is mainly used to avoid flooding to
> > tunnel ports like vxlan/mpls. Unlike the existing flags it suppresses flood
> > of all neigh discovery packets (arp, nd) to tunnel ports.
> >
> > Roopa Prabhu (3):
> >   bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd
> > flood
> >   neigh arp suppress first
> >   bridge: suppress nd messages from going to BR_NEIGH_SUPPRESS ports
> >  
> 
> pls ignore, shows conflict applying over recent net-next bridge
> changes. Will rebase and submit v2.

Ok, but the concept looks good.

Re: [Bridge] [PATCH net-next] net: bridge: add per-port group_fwd_mask with less restrictions

2017-09-29 Thread Stephen Hemminger

On Sat, 30 Sep 2017 00:01:24 +0300
Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:

> On 29/09/17 18:14, Stephen Hemminger wrote:
> > On Wed, 27 Sep 2017 16:12:44 +0300
> > Nikolay Aleksandrov <niko...@cumulusnetworks.com> wrote:
> >   
> >> We need to be able to transparently forward most link-local frames via
> >> tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
> >> mask which restricts the forwarding of STP and LACP, but we need to be able
> >> to forward these over tunnels and control that forwarding on a per-port
> >> basis thus add a new per-port group_fwd_mask option which only disallows
> >> mac pause frames to be forwarded (they're always dropped anyway).
> >> The patch does not change the current default situation - all of the others
> >> are still restricted unless configured for forwarding.
> >> We have successfully tested this patch with LACP and STP forwarding over
> >> VxLAN and qinq tunnels.
> >>
> >> Signed-off-by: Nikolay Aleksandrov <niko...@cumulusnetworks.com>  
> > 
> > LACP is fine, but STP must not be forwarded if STP in user or kernel
> > mode is enabled.
> > 
> > Please update this patch or revert it.
> >   
> 
> The default has not changed, STP is still _not_ forwarded. It can be only if 
> explicitly
> requested by the user and that means the port might be participating in STP 
> but not
> the bridge's STP, that is explicitly forward all STP frames from that port.
> I don't think we have to change anything.
> 

You need to fail if user does something stupid. And DNR.

From 16c42db4852234c13288922c1bffbf7650e08434 Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <sthem...@microsoft.com>
Date: Fri, 29 Sep 2017 14:48:17 -0700
Subject: [PATCH] bridge: check for attempt to forward STP PDU's with STP
 enabled

Move validation into set function and do not allow
configuring forwarding of STP packets if STP is enabled.

Fixes: 5af48b59f35c ("net: bridge: add per-port group_fwd_mask with less 
restrictions")
Signed-off-by: Stephen Hemminger <step...@networkplumber.org>
---
 net/bridge/br_if.c   | 13 +
 net/bridge/br_netlink.c  |  6 +++---
 net/bridge/br_private.h  |  1 +
 net/bridge/br_sysfs_if.c |  6 +-
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index f3aef22931ab..a30e12f76266 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -654,3 +654,16 @@ void br_port_flags_change(struct net_bridge_port *p, 
unsigned long mask)
if (mask & BR_AUTO_MASK)
nbp_update_port_count(br);
 }
+
+int br_set_group_fwd_mask(struct net_bridge_port *p, u16 fwd_mask)
+{
+   if (fwd_mask & BR_GROUPFWD_MACPAUSE)
+   return -EINVAL;
+
+   if ((fwd_mask & BR_GROUPFWD_STP) &&
+   (p->br->stp_enabled != BR_NO_STP))
+   return -EINVAL;
+
+   p->group_fwd_mask = fwd_mask;
+   return 0;
+}
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index dea88a255d26..7b16819ecb59 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -780,9 +780,9 @@ static int br_setport(struct net_bridge_port *p, struct 
nlattr *tb[])
if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
u16 fwd_mask = nla_get_u16(tb[IFLA_BRPORT_GROUP_FWD_MASK]);
 
-   if (fwd_mask & BR_GROUPFWD_MACPAUSE)
-   return -EINVAL;
-   p->group_fwd_mask = fwd_mask;
+   err = br_set_group_fwd_mask(p, fwd_mask);
+   if (err)
+   return err;
}
 
br_port_flags_change(p, old_flags ^ p->flags);
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 020c709a017f..734933bebb08 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -573,6 +573,7 @@ netdev_features_t br_features_recompute(struct net_bridge 
*br,
netdev_features_t features);
 void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
 void br_manage_promisc(struct net_bridge *br);
+int br_set_group_fwd_mask(struct net_bridge_port *p, u16 mask);
 
 /* br_input.c */
 int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff 
*skb);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 9110d5e56085..f5871be10b24 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -173,11 +173,7 @@ static ssize_t show_group_fwd_mask(struct net_bridge_port 
*p, char *buf)
 static int store_group_fwd_mask(struct net_bridge_port *p,
unsigned long v)
 {
-   if (v & BR_GROUPFWD_MACPAUSE)
-   return -EINVAL;
-   p->group_fwd_mask = v;
-
-   return 0;
+   return br_set_group_fwd_mask(p, v);
 }
 static BRPORT_ATTR(group_fwd_mask, S_IRUGO | S_IWUSR, show_group_fwd_mask,
   store_group_fwd_mask);
-- 
2.11.0

Re: [Bridge] [PATCH net-next] net: bridge: add per-port group_fwd_mask with less restrictions

2017-09-29 Thread Stephen Hemminger

On Wed, 27 Sep 2017 16:12:44 +0300
Nikolay Aleksandrov  wrote:

> We need to be able to transparently forward most link-local frames via
> tunnels (e.g. vxlan, qinq). Currently the bridge's group_fwd_mask has a
> mask which restricts the forwarding of STP and LACP, but we need to be able
> to forward these over tunnels and control that forwarding on a per-port
> basis thus add a new per-port group_fwd_mask option which only disallows
> mac pause frames to be forwarded (they're always dropped anyway).
> The patch does not change the current default situation - all of the others
> are still restricted unless configured for forwarding.
> We have successfully tested this patch with LACP and STP forwarding over
> VxLAN and qinq tunnels.
> 
> Signed-off-by: Nikolay Aleksandrov 

LACP is fine, but STP must not be forwarded if STP in user or kernel
mode is enabled.

Please update this patch or revert it.

Re: [Bridge] [net-next v3] bridge: trigger RTM_NEWLINK when interface is modified by bridge ioctl

2017-09-21 Thread Stephen Hemminger

On Thu, 21 Sep 2017 12:05:25 +0200
Vincent Bernat <vinc...@bernat.im> wrote:

> Currently, there is a difference in netlink events received when an
> interface is modified through bridge ioctl() or through netlink. This
> patch generates additional events when an interface is added to or
> removed from a bridge via ioctl().
> 
> When adding then removing an interface from a bridge with netlink, we
> get:
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 
> state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
> group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> When using ioctl():
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue master 
> bridge0 state UNKNOWN group default
> link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> Deleted 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 master bridge0 
> state UNKNOWN
> link/ether 9e:da:60:ee:cf:c8
> 5: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
> group default
>     link/ether 9e:da:60:ee:cf:c8 brd ff:ff:ff:ff:ff:ff
> 
> Without this patch, the last netlink notification is not sent.
> 
> Signed-off-by: Vincent Bernat <vinc...@bernat.im>

This makes sense, you should probably add a Fixes: tag to help maintainers
of long term stable kernels.

Reviewed-by: Stephen Hemminger <step...@networkplumber.org>

Re: [Bridge] [PATCH net-next v2] bridge: also trigger RTM_NEWLINK when interface is released from bridge

2017-09-20 Thread Stephen Hemminger

On Wed, 20 Sep 2017 15:57:16 -0600
David Ahern  wrote:

> On 9/20/17 3:09 PM, David Miller wrote:
> > From: Vincent Bernat 
> > Date: Sat, 16 Sep 2017 16:18:33 +0200
> > 
> > David, I am CC:'ing you because you've done work in this area over the
> > past year.  I'm applying this patch, it's been sitting since the 16th
> > and likes entirely correct to me.  But if you have objections just let
> > me know.
> >   
> >> Currently, when an interface is released from a bridge via
> >> ioctl(), we get a RTM_DELLINK event through netlink:
> >>
> >> Deleted 2: dummy0:  mtu 1500 master bridge0 
> >> state UNKNOWN
> >> link/ether 6e:23:c2:54:3a:b3
> >>
> >> Userspace has to interpret that as a removal from the bridge, not as a
> >> complete removal of the interface. When an bridged interface is
> >> completely removed, we get two events:
> >>
> >> Deleted 2: dummy0:  mtu 1500 master bridge0 state DOWN
> >> link/ether 6e:23:c2:54:3a:b3
> >> Deleted 2: dummy0:  mtu 1500 qdisc noop state DOWN group 
> >> default
> >> link/ether 6e:23:c2:54:3a:b3 brd ff:ff:ff:ff:ff:ff
> >>
> >> In constrast, when an interface is released from a bond, we get a
> >> RTM_NEWLINK with only the new characteristics (no master):
> >>
> >> 3: dummy1:  mtu 1500 qdisc noqueue 
> >> master bond0 state UNKNOWN group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1:  mtu 1500 qdisc noqueue state 
> >> UNKNOWN group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 4: bond0:  mtu 1500 qdisc noqueue 
> >> state UP group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1:  mtu 1500 qdisc noqueue state DOWN group 
> >> default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >> 3: dummy1:  mtu 1500 qdisc noqueue state DOWN group 
> >> default
> >> link/ether ca:c8:7b:66:f8:25 brd ff:ff:ff:ff:ff:ff
> >> 4: bond0:  mtu 1500 qdisc noqueue 
> >> state UP group default
> >> link/ether ae:dc:7a:8c:9a:3c brd ff:ff:ff:ff:ff:ff
> >>
> >> Userland may be confused by the fact we say a link is deleted while
> >> its characteristics are only modified. A first solution would have
> >> been to turn the RTM_DELLINK event in del_nbp() into a RTM_NEWLINK
> >> event. However, maybe some piece of userland is relying on this
> >> RTM_DELLINK to detect when a bridged interface is released. Instead,
> >> we also emit a RTM_NEWLINK event once the interface is
> >> released (without master info).
> >>
> >> Deleted 2: dummy0:  mtu 1500 master bridge0 
> >> state UNKNOWN
> >> link/ether 8a:bb:e7:94:b1:f8
> >> 2: dummy0:  mtu 1500 qdisc noqueue state 
> >> UNKNOWN group default
> >> link/ether 8a:bb:e7:94:b1:f8 brd ff:ff:ff:ff:ff:ff
> >>
> >> This is done only when using ioctl(). When using Netlink, such an
> >> event is already automatically emitted in do_setlink().  
> 
> The DELLINK is for AF_BRIDGE family (ifi_family). Adding family to
> print_linkinfo and running the monitor I get:
> 
> 
> [LINK]family 0: 35: dummy1:  mtu 1500 qdisc
> noqueue master br0 state UNKNOWN group default
> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
> 
> [LINK]family 7: 35: dummy1:  mtu 1500
> master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
> 
> [LINK]Deleted family 7: 35: dummy1:  mtu
> 1500 master br0 state UNKNOWN
> link/ether d6:c3:73:86:3c:73
> 
> [LINK]family 0: 35: dummy1:  mtu 1500 qdisc
> noqueue state UNKNOWN group default
> link/ether d6:c3:73:86:3c:73 brd ff:ff:ff:ff:ff:ff
> 
> And that seems correct. So I think the RTM_NEWLINK added by this patch
> should not be needed.

Agreed, thanks for tracing this.

The one concern is that ports added or removed through ioctl should
cause same events as doing the same thing via netlink. Some users use
brctl (ioctl) and others use newer bridge (netlink) API.

Re: [Bridge] [PATCH net-next] bridge: add tracepoint in br_fdb_update

2017-08-31 Thread Stephen Hemminger

On Thu, 31 Aug 2017 23:50:26 +0200
Jesper Dangaard Brouer  wrote:

> On Thu, 31 Aug 2017 11:43:25 -0700 (PDT)
> David Miller  wrote:
> 
> > From: Roopa Prabhu 
> > Date: Wed, 30 Aug 2017 22:18:13 -0700
> >   
> > > From: Roopa Prabhu 
> > > 
> > > This extends bridge fdb table tracepoints to also cover
> > > learned fdb entries in the br_fdb_update path. Note that
> > > unlike other tracepoints I have moved this to when the fdb
> > > is modified because this is in the datapath and can generate
> > > a lot of noise in the trace output. br_fdb_update is also called
> > > from added_by_user context in the NTF_USE case which is already
> > > traced ..hence the !added_by_user check.
> > > 
> > > Signed-off-by: Roopa Prabhu 
> > 
> > Applied.
> > 
> > Let's use dev->name for now and if the tooling can eventually
> > do transparent ifindex->name then we can consider redoing
> > a bunch of networking tracepoints.  
> 
> I agree! :-)
> 

Agreed, but it is yet another case of tracepoints not having
a stable ABI. There is no ABI guarantee on tracing, but it still
makes for user complaints.

Re: [Bridge] [PATCH RFC WIP 0/5] IGMP snooping for local traffic

2017-08-28 Thread Stephen Hemminger

On Sat, 26 Aug 2017 22:56:05 +0200
Andrew Lunn  wrote:

> This is a WIP patchset i would like comments on from bridge, switchdev
> and hardware offload people.
> 
> The linux bridge supports IGMP snooping. It will listen to IGMP
> reports on bridge ports and keep track of which groups have been
> joined on an interface. It will then forward multicast based on this
> group membership.
> 
> When the bridge adds or removed groups from an interface, it uses
> switchdev to request the hardware add an mdb to a port, so the
> hardware can perform the selective forwarding between ports.
> 
> What is not covered by the current bridge code, is IGMP joins/leaves
> from the host on the brX interface. No such monitoring is
> performed. With a pure software bridge, it is not required. All
> mulitcast frames are passed to the brX interface, and the network
> stack filters them, as it does for any interface. However, when
> hardware offload is involved, things change. We should program the
> hardware to only send multcast packets to the host when the host has
> in interest in them.
> 
> Thus we need to perform IGMP snooping on the brX interface, just like
> any other interface of the bridge. However, currently the brX
> interface is missing all the needed data structures to do this. There
> is no net_bridge_port structure for the brX interface. This strucuture
> is created when an interface is added to the bridge. But the brX
> interface is not a member of the bridge. So this patchset makes the
> brX interface a first class member of the bridge. When the brX
> interface is opened, the interface is added to the bridge. A
> net_bridge_port is allocated for it, and IGMP snooping is performed as
> usual.
> 
> There are some complexities here. Some assumptions are broken, like
> the master interface of a port interface is the bridge interface. The
> brX interface cannot be its own master. The use of
> netdev_master_upper_dev_get() within the bridge code has been changed
> to reflecit this. The bridge receive handler needs to not process
> frames for the brX interface, etc.
> 
> The interface downward to the hardware is also an issue. The code
> presented here is a hack and needs to change. But that is secondary
> and can be solved once it is agreed how the bridge needs to change to
> support this use case.
> 
> Comment welcome and wanted.
> 
>   Andrew
> 
> Andrew Lunn (5):
>   net: rtnetlink: Handle bridge port without upper device
>   net: bridge: Skip receive handler on brX interface
>   net: bridge: Make the brX interface a member of the bridge
>   net: dsa: HACK: Handle MDB add/remove for none-switch ports
>   net: dsa: Don't include CPU port when adding MDB to a port
> 
>  include/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c| 12 ++--
>  net/bridge/br_if.c| 37 -
>  net/bridge/br_input.c |  4 
>  net/bridge/br_mdb.c   |  2 --
>  net/bridge/br_multicast.c |  7 ---
>  net/bridge/br_private.h   |  1 +
>  net/core/rtnetlink.c  | 23 +--
>  net/dsa/port.c| 19 +--
>  net/dsa/switch.c  |  2 +-
>  10 files changed, 83 insertions(+), 25 deletions(-)
> 

Sorry you can't change the semantics of the bridge like this.
There are likely to be scripts and management utilities that won't work after 
this.

Figure out another way. Such as adding IGMP updates in the local packet 
send/receive path.

Re: [Bridge] [RFC net-next v2] bridge lwtunnel, VPLS & NVGRE

2017-08-21 Thread Stephen Hemminger

On Mon, 21 Aug 2017 19:15:17 +0200
David Lamparter  wrote:

> Hi all,
> 
> 
> this is an update on the earlier "[RFC net-next] VPLS support".  Note
> I've changed the subject lines on some of the patches to better reflect
> what they really do (tbh the earlier subject lines were crap.)
> 
> As previously, iproute2 / FRR patches are at:
> - https://github.com/eqvinox/vpls-iproute2
> - https://github.com/opensourcerouting/frr/commits/vpls
> while this patchset is also available at:
> - https://github.com/eqvinox/vpls-linux-kernel
> (but please be aware that I'm amending and rebasing commits)
> 
> The NVGRE implementation in the 3rd patch in this series is actually an
> accident - I was just wiring up gretap as a reference;  only after I was
> done I noticed that that sums up to NVGRE, more or less.  IMHO, it does
> serve well to demonstrate the bridge changes are not VPLS-specific.
> 
> To refer some notes from the first announce mail:
> > I've tested some basic setups, the chain from LDP down into the kernel
> > works at least in these.  FRR has some testcases around from OpenBSD
> > VPLS support, I haven't wired that up to run against Linux / this
> > patchset yet.  
> 
> Same as before (API didn't change).
> 
> > The patchset needs a lot of polishing (yes I left my TODO notes in the
> > commit messages), for now my primary concern is overall design
> > feedback.  Roopa has already provided a lot of input (Thanks!);  the
> > major topic I'm expecting to get discussion on is the bridge FDB
> > changes.  
> 
> Got some useful input;  but still need feedback on the bridge FDB
> changes (first 2 patches).  I don't believe it to have a significant
> impact on existing bridge operation, and I believe a multipoint tunnel
> driver without its own FDB (e.g. NVGRE in this set) should perform
> better than one with its own FDB (e.g. existing VXLAN).
> 
> > P.S.: For a little context on the bridge FDB changes - I'm hoping to
> > find some time to extend this to the MDB to allow aggregating dst
> > metadata and handing down a list of dst metas on TX.  This isn't
> > specifically for VPLS but rather to give sufficient information to the
> > 802.11 stack to allow it to optimize selecting rates (or unicasting)
> > for multicast traffic by having the multicast subscriber list known.
> > This is done by major commercial wifi solutions (e.g. google "dynamic
> > multicast optimization".)  
> 
> You can find hacks at this on:
> https://github.com/eqvinox/vpls-linux-kernel/tree/mdb-hack
> Please note that the patches in that branch are not at an acceptable
> quality level, but you can see the semantic relation to 802.11.
> 
> I would, however, like to point out that this branch has pseudo-working
> IGMP/MLD snooping for VPLS, and it'd be 20-ish lines to add it to NVGRE
> (I'll do that as soon as I get to it, it'll pop up on that branch too.)
> 
> This is relevant to the discussion because it's a feature which is
> non-obvious (to me) on how to do with the VXLAN model of having an
> entirely separate FDB.  Meanwhile, with this architecture, the proof of
> concept / hack is coming in at a measly cost of:
> 8 files changed, 176 insertions(+), 15 deletions(-)
> 
> 
> Cheers,
> 
> -David
> 
> 
> --- diffstat:
> include/linux/netdevice.h  |  18 ++
> include/net/dst_metadata.h |  51 ++---
> include/net/ip_tunnels.h   |   5 ++
> include/uapi/linux/lwtunnel.h  |   8 +++
> include/uapi/linux/neighbour.h |   2 +
> include/uapi/linux/rtnetlink.h |   5 ++
> net/bridge/br.c|   2 +-
> net/bridge/br_device.c |   4 ++
> net/bridge/br_fdb.c| 119 
> net/bridge/br_input.c  |   6 +-
> net/bridge/br_private.h|   6 +-
> net/core/lwtunnel.c|   1 +
> net/ipv4/ip_gre.c  |  40 --
> net/ipv4/ip_tunnel.c   |   1 +
> net/ipv4/ip_tunnel_core.c  |  87 +++--
> net/mpls/Kconfig   |  11 
> net/mpls/Makefile  |   1 +
> net/mpls/af_mpls.c | 113 --
> net/mpls/internal.h|  44 +--
> net/mpls/vpls.c| 550 
> +++
> 20 files changed, 990 insertions(+), 84 deletions(-)

I know the bridge is an easy target to extend L2 forwarding, but it is not
the only option. Have you condidered building a new driver (like VXLAN does)
which does the forwarding you want. Having all features in one driver
makes for worse performance, and increased complexity.

Re: [Bridge] [PATCH net] bridge: fix hello and hold timers starting/stopping

2017-05-19 Thread Stephen Hemminger

On Fri, 19 May 2017 18:25:43 +0200
Ivan Vecera  wrote:

> Current bridge code incorrectly handles starting/stopping of hello and
> hold timers during STP enable/disable.
> 
> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>transition. This is not correct as the timers are stopped in NO_STP
>case.
> 
> 2. Timers are started in br_stp_stop() during USER_STP->NO_STP transition.
>This is not also correct as the timers should be stopped in NO_STP
>state.
> 
> 3. Timers are NOT stopped in br_stp_stop() during KERNEL_STP->NO_STP
>transition. They should be stopped as they are running in KERNEL_STP
>state and should not run in NO_STP case.
> 
> The patch is a follow-up for "bridge: start hello_timer when enabling
> KERNEL_STP in br_stp_start" patch from Xin Long.
> 
> Cc: da...@davemloft.net
> Cc: sas...@cumulusnetworks.com
> Cc: step...@networkplumber.org
> Cc: bridge@lists.linux-foundation.org
> Cc: lucien@gmail.com
> Cc: niko...@cumulusnetworks.com
> Signed-off-by: Ivan Vecera 

Overall, this looks correct but the wording of commit message
is too terse.

It would be better to add a more complete description of the impact
of this from a user's point of view. I am concerned that this
might have other side effects.

For example, what is the sequence of commands to validated this.

What is the impact, should this go to stable?

Re: [Bridge] [PATCH] libbridge: Include the configured CFLAGS when compiling

2017-05-15 Thread Stephen Hemminger

On Sat, 13 May 2017 19:23:34 -0700
David Michael  wrote:

> When compiling bridge-utils, the CFLAGS set during "configure" are only
> used in the brctl directory.

Sure make sense. I applied it (with some reordering).

1 2 3 4 5 >

1 - 100 of 438 matches

Mail list logo