Re: [PATCH iproute2-next v2 0/2] bridge: vlan: add compressvlans manpage

2024-03-22 Thread Nikolay Aleksandrov

On 3/22/24 14:39, Date Huang wrote:

Hi maintainers

I followed Nikolay and Jiri's comment and updated the patch to v2.
Please check it.



Your cover letter should contain an overview of what the set is doing.


Date Huang (2):
   bridge: vlan: fix compressvlans usage
   bridge: vlan: add compressvlans manpage

  bridge/bridge.c   | 2 +-
  man/man8/bridge.8 | 6 ++
  2 files changed, 7 insertions(+), 1 deletion(-)



Generally it is good to give people time to review and wait at least
24 hours before reposting another version. Also please write the
changes between versions, something like:

v3: ...
v2: split the patch into two separate patches
changed the option in patch 01
changed the man page description in patch 02

Keep the history from previous postings. Also you can note what changed
in each individual patch for the version under the first ---, for 
example for patch 02:

---
v2: changed the man page description

Cheers,
 Nik




Re: [PATCH iproute2-next v2 1/2] bridge: vlan: fix compressvlans usage

2024-03-22 Thread Nikolay Aleksandrov

On 3/22/24 14:39, Date Huang wrote:

Fix the incorrect short opt for compressvlans and color
in usage

Signed-off-by: Date Huang 
---
  bridge/bridge.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index f4805092..ef592815 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -39,7 +39,7 @@ static void usage(void)
  "where  OBJECT := { link | fdb | mdb | vlan | vni | monitor }\n"
  "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
  "-o[neline] | -t[imestamp] | -n[etns] name |\n"
-"-c[ompressvlans] -color -p[retty] -j[son] }\n");
+"-com[pressvlans] -c[olor] -p[retty] -j[son] }\n");
    exit(-1);
  }
  


Acked-by: Nikolay Aleksandrov 




Re: [PATCH iproute2-next v2 2/2] bridge: vlan: fix compressvlans usage

2024-03-22 Thread Nikolay Aleksandrov

On 3/22/24 14:39, Date Huang wrote:

Add the missing 'compressvlans' to man page

Signed-off-by: Date Huang 
---
  man/man8/bridge.8 | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/man/man8/bridge.8 b/man/man8/bridge.8
index eeea4073..bb02bd27 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -22,6 +22,7 @@ bridge \- show / manipulate bridge addresses and devices
  \fB\-s\fR[\fItatistics\fR] |
  \fB\-n\fR[\fIetns\fR] name |
  \fB\-b\fR[\fIatch\fR] filename |
+\fB\-com\fR[\fIpressvlans\fR] |
  \fB\-c\fR[\fIolor\fR] |
  \fB\-p\fR[\fIretty\fR] |
  \fB\-j\fR[\fIson\fR] |
@@ -345,6 +346,11 @@ Don't terminate bridge command on errors in batch mode.
  If there were any errors during execution of the commands, the application
  return code will be non zero.
  
+.TP

+.BR "\-com", " \-compressvlans"
+Show compressed VLAN list. It will show continuous VLANs with the range instead
+of separated VLANs. Default is off.
+


How about:
Show a compressed VLAN list of continuous VLAN IDs as ranges. All VLANs 
in a range have identical configuration. Default is off (show each VLAN 
separately).



  .TP
  .BR \-c [ color ][ = { always | auto | never }
  Configure color output. If parameter is omitted or





Re: [PATCH] bridge: vlan: fix compressvlans manpage and usage

2024-03-22 Thread Nikolay Aleksandrov

On 3/22/24 10:56, Date Huang wrote:

Add the missing 'compressvlans' to man page.
Fix the incorrect short opt for compressvlans and color
in usage.

Signed-off-by: Date Huang 
---


Hi,
This should be targeted at iproute2. Nit below,


  bridge/bridge.c   | 2 +-
  man/man8/bridge.8 | 5 +
  2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/bridge/bridge.c b/bridge/bridge.c
index f4805092..345f5b5f 100644
--- a/bridge/bridge.c
+++ b/bridge/bridge.c
@@ -39,7 +39,7 @@ static void usage(void)
  "where  OBJECT := { link | fdb | mdb | vlan | vni | monitor }\n"
  "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] |\n"
  "-o[neline] | -t[imestamp] | -n[etns] name |\n"
-"-c[ompressvlans] -color -p[retty] -j[son] }\n");
+"-compressvlans -c[olor] -p[retty] -j[son] }\n");
exit(-1);
  }
  
diff --git a/man/man8/bridge.8 b/man/man8/bridge.8

index eeea4073..9a023227 100644
--- a/man/man8/bridge.8
+++ b/man/man8/bridge.8
@@ -22,6 +22,7 @@ bridge \- show / manipulate bridge addresses and devices
  \fB\-s\fR[\fItatistics\fR] |
  \fB\-n\fR[\fIetns\fR] name |
  \fB\-b\fR[\fIatch\fR] filename |
+\fB\-compressvlans |
  \fB\-c\fR[\fIolor\fR] |
  \fB\-p\fR[\fIretty\fR] |
  \fB\-j\fR[\fIson\fR] |
@@ -345,6 +346,10 @@ Don't terminate bridge command on errors in batch mode.
  If there were any errors during execution of the commands, the application
  return code will be non zero.
  
+.TP

+.BR \-compressvlans
+Show compressed vlan list


s/vlan/VLAN/
also the explanation is lacking, please add a little bit of details and
what the default is


+
  .TP
  .BR \-c [ color ][ = { always | auto | never }
  Configure color output. If parameter is omitted or


Thanks,
 Nik




Re: [PATCH net-next 9/9] selftests: vxlan_mdb: Add MDB bulk deletion test

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Add test cases to verify the behavior of the MDB bulk deletion
functionality in the VXLAN driver.

Signed-off-by: Ido Schimmel 
Acked-by: Petr Machata 
---
  tools/testing/selftests/net/test_vxlan_mdb.sh | 201 +-
  1 file changed, 199 insertions(+), 2 deletions(-)



Acked-by: Nikolay Aleksandrov 





Re: [PATCH net-next 8/9] selftests: bridge_mdb: Add MDB bulk deletion test

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Add test cases to verify the behavior of the MDB bulk deletion
functionality in the bridge driver.

Signed-off-by: Ido Schimmel 
Acked-by: Petr Machata 
---
  .../selftests/net/forwarding/bridge_mdb.sh| 191 +-
  1 file changed, 189 insertions(+), 2 deletions(-)



Acked-by: Nikolay Aleksandrov 





Re: [PATCH net-next 7/9] rtnetlink: bridge: Enable MDB bulk deletion

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Now that both the common code as well as individual drivers support MDB
bulk deletion, allow user space to make such requests.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  net/core/rtnetlink.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 349255151ad0..33f1e8d8e842 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -6747,5 +6747,6 @@ void __init rtnetlink_init(void)
  
  	rtnl_register(PF_BRIDGE, RTM_GETMDB, rtnl_mdb_get, rtnl_mdb_dump, 0);

rtnl_register(PF_BRIDGE, RTM_NEWMDB, rtnl_mdb_add, NULL, 0);
-   rtnl_register(PF_BRIDGE, RTM_DELMDB, rtnl_mdb_del, NULL, 0);
+   rtnl_register(PF_BRIDGE, RTM_DELMDB, rtnl_mdb_del, NULL,
+ RTNL_FLAG_BULK_DEL_SUPPORTED);
  }


Acked-by: Nikolay Aleksandrov 




Re: [PATCH net-next 6/9] vxlan: mdb: Add MDB bulk deletion support

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Implement MDB bulk deletion support in the VXLAN driver, allowing MDB
entries to be deleted in bulk according to provided parameters.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  drivers/net/vxlan/vxlan_core.c|   1 +
  drivers/net/vxlan/vxlan_mdb.c | 174 +-
  drivers/net/vxlan/vxlan_private.h |   2 +
  3 files changed, 153 insertions(+), 24 deletions(-)



Acked-by: Nikolay Aleksandrov 





Re: [PATCH net-next 5/9] bridge: mdb: Add MDB bulk deletion support

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Implement MDB bulk deletion support in the bridge driver, allowing MDB
entries to be deleted in bulk according to provided parameters.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  net/bridge/br_device.c  |   1 +
  net/bridge/br_mdb.c | 133 
  net/bridge/br_private.h |   8 +++
  3 files changed, 142 insertions(+)



Acked-by: Nikolay Aleksandrov 







Re: [PATCH net-next 4/9] rtnetlink: bridge: Invoke MDB bulk deletion when needed

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Invoke the new MDB bulk deletion device operation when the 'NLM_F_BULK'
flag is set in the netlink message header.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  net/core/rtnetlink.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 30f030a672f2..349255151ad0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -6494,6 +6494,14 @@ static int rtnl_mdb_del(struct sk_buff *skb, struct 
nlmsghdr *nlh,
return -EINVAL;
}
  
+	if (del_bulk) {

+   if (!dev->netdev_ops->ndo_mdb_del_bulk) {
+   NL_SET_ERR_MSG(extack, "Device does not support MDB bulk 
deletion");
+   return -EOPNOTSUPP;
+   }
+   return dev->netdev_ops->ndo_mdb_del_bulk(dev, tb, extack);
+   }
+
if (!dev->netdev_ops->ndo_mdb_del) {
NL_SET_ERR_MSG(extack, "Device does not support MDB 
operations");
    return -EOPNOTSUPP;


Acked-by: Nikolay Aleksandrov 




Re: [PATCH net-next 3/9] net: Add MDB bulk deletion device operation

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Add MDB net device operation that will be invoked by rtnetlink code in
response to received 'RTM_DELMDB' messages with the 'NLM_F_BULK' flag
set. Subsequent patches will implement the operation in the bridge and
VXLAN drivers.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  include/linux/netdevice.h | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1b935ee341b4..75c7725e5e4f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1329,6 +1329,9 @@ struct netdev_net_notifier {
   * int (*ndo_mdb_del)(struct net_device *dev, struct nlattr *tb[],
   *  struct netlink_ext_ack *extack);
   *Deletes the MDB entry from dev.
+ * int (*ndo_mdb_del_bulk)(struct net_device *dev, struct nlattr *tb[],
+ *struct netlink_ext_ack *extack);
+ * Bulk deletes MDB entries from dev.
   * int (*ndo_mdb_dump)(struct net_device *dev, struct sk_buff *skb,
   *   struct netlink_callback *cb);
   *Dumps MDB entries from dev. The first argument (marker) in the netlink
@@ -1611,6 +1614,9 @@ struct net_device_ops {
int (*ndo_mdb_del)(struct net_device *dev,
   struct nlattr *tb[],
   struct netlink_ext_ack *extack);
+   int (*ndo_mdb_del_bulk)(struct net_device *dev,
+   struct nlattr *tb[],
+   struct netlink_ext_ack 
*extack);
int (*ndo_mdb_dump)(struct net_device *dev,
struct sk_buff *skb,
struct netlink_callback *cb);


Acked-by: Nikolay Aleksandrov 




Re: [PATCH net-next 2/9] rtnetlink: bridge: Use a different policy for MDB bulk delete

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

For MDB bulk delete we will need to validate 'MDBA_SET_ENTRY'
differently compared to regular delete. Specifically, allow the ifindex
to be zero (in case not filtering on bridge port) and force the address
to be zero as bulk delete based on address is not supported.

Do that by introducing a new policy and choosing the correct policy
based on the presence of the 'NLM_F_BULK' flag in the netlink message
header. Use nlmsg_parse() for strict validation.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  net/core/rtnetlink.c | 51 ++--
  1 file changed, 49 insertions(+), 2 deletions(-)



Acked-by: Nikolay Aleksandrov 





Re: [PATCH net-next 1/9] bridge: add MDB state mask uAPI attribute

2023-12-18 Thread Nikolay Aleksandrov

On 17/12/2023 10:32, Ido Schimmel wrote:

Currently, the 'state' field in 'struct br_port_msg' can be set to 1 if
the MDB entry is permanent or 0 if it is temporary. Additional states
might be added in the future.

In a similar fashion to 'NDA_NDM_STATE_MASK', add an MDB state mask uAPI
attribute that will allow the upcoming bulk deletion API to bulk delete
MDB entries with a certain state or any state.

Signed-off-by: Ido Schimmel 
Reviewed-by: Petr Machata 
---
  include/uapi/linux/if_bridge.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 2e23f99dc0f1..a5b743a2f775 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -757,6 +757,7 @@ enum {
MDBE_ATTR_VNI,
MDBE_ATTR_IFINDEX,
MDBE_ATTR_SRC_VNI,
+   MDBE_ATTR_STATE_MASK,
__MDBE_ATTR_MAX,
  };
  #define MDBE_ATTR_MAX (__MDBE_ATTR_MAX - 1)


Acked-by: Nikolay Aleksandrov 





[Bridge] [PATCH net-next] net: bridge: fill in MODULE_DESCRIPTION()

2023-10-27 Thread Nikolay Aleksandrov
Fill in bridge's module description.

Suggested-by: Jakub Kicinski 
Signed-off-by: Nikolay Aleksandrov 
---
 net/bridge/br.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index a6e94ceb7c9a..cda9d7871f72 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -477,3 +477,4 @@ module_exit(br_deinit)
 MODULE_LICENSE("GPL");
 MODULE_VERSION(BR_VERSION);
 MODULE_ALIAS_RTNL_LINK("bridge");
+MODULE_DESCRIPTION("Ethernet bridge driver");
-- 
2.38.1



Re: [Bridge] [PATCH net-next v2 09/13] bridge: mcast: Add MDB get support

2023-10-26 Thread Nikolay Aleksandrov

On 10/25/23 15:30, Ido Schimmel wrote:

Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response. The operation is performed under the bridge multicast
lock to ensure that the entry does not change between the time the reply
size is determined and when the reply is filled in.

Signed-off-by: Ido Schimmel 
---

Notes:
 v2:
 * Add a comment above spin_lock_bh().

  net/bridge/br_device.c  |   1 +
  net/bridge/br_mdb.c | 158 
  net/bridge/br_private.h |   9 +++
  3 files changed, 168 insertions(+)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 07/13] bridge: add MDB get uAPI attributes

2023-10-26 Thread Nikolay Aleksandrov

On 10/25/23 15:30, Ido Schimmel wrote:

Add MDB get attributes that correspond to the MDB set attributes used in
RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold
a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold
'MDBE_ATTR_*' attributes that are used as indexes (source IP and source
VNI).

An example request will look as follows:

[ struct nlmsghdr ]
[ struct br_port_msg ]
[ MDBA_GET_ENTRY ]
struct br_mdb_entry
[ MDBA_GET_ENTRY_ATTRS ]
[ MDBE_ATTR_SOURCE ]
struct in_addr / struct in6_addr
[ MDBE_ATTR_SRC_VNI ]
u32

Signed-off-by: Ido Schimmel 
---

Notes:
 v2:
 * Add comment.

  include/uapi/linux/if_bridge.h | 18 ++
  1 file changed, 18 insertions(+)



Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH iproute2-next v5] iplink: bridge: Add support for bridge FDB learning limits

2023-10-19 Thread Nikolay Aleksandrov

On 10/18/23 10:04, Johannes Nixdorf wrote:

Support setting the FDB limit through ip link. The arguments is:
  - fdb_max_learned: A 32-bit unsigned integer specifying the maximum
 number of learned FDB entries, with 0 disabling
 the limit.

Also support reading back the current number of learned FDB entries in
the bridge by this count. The returned value's name is:
  - fdb_n_learned: A 32-bit unsigned integer specifying the current number
   of learned FDB entries.

Example:

  # ip -d -j -p link show br0
[ {
...
 "linkinfo": {
 "info_kind": "bridge",
 "info_data": {
...
 "fdb_n_learned": 2,
 "fdb_max_learned": 0,
...
 }
 },
...
 } ]
  # ip link set br0 type bridge fdb_max_learned 1024
  # ip -d -j -p link show br0
[ {
...
 "linkinfo": {
 "info_kind": "bridge",
 "info_data": {
...
 "fdb_n_learned": 2,
 "fdb_max_learned": 1024,
...
 }
 },
...
 } ]

Signed-off-by: Johannes Nixdorf 
---
The corresponding kernel changes are in net-next.git as commit
ddd1ad68826d ("net: bridge: Add netlink knobs for number / max learned
FDB entries").
---
Changes in v5:
  - Removed the RFC status again, as the kernel changes landed.
  - Link to v4: 
https://lore.kernel.org/r/20230919-fdb_limit-v4-1-b4d2dc4df...@avm.de

Changes in v4:
  - Removed _entries from the names. (from review)
  - Removed the UAPI change, to be synced from linux separately by the
maintainer. (from review)
For local testing e.g. `make CCOPTS="-O2 -pipe
-I${path_to_dev_kernel_headers}"` works as a workaround.
  - Downgraded to an RFC until the kernel changes land.
  - Link to v3: 
https://lore.kernel.org/netdev/20230905-fdb_limit-v3-1-34bb12455...@avm.de/

Changes in v3:
  - Properly split the net-next and iproute2-next threads. (from review)
  - Changed to *_n_* instead of *_cur_*. (from review)
  - Use strcmp() instead of matches(). (from review)
  - Made names in code and documentation consistent. (from review)
  - Various documentation fixes. (from review)
  - Link to v2: 
https://lore.kernel.org/netdev/20230619071444.14625-1-jnixdorf-...@avm.de/

Changes in v2:
  - Sent out the first corresponding iproute2 patches.
  - Link to v1: 
https://lore.kernel.org/netdev/20230515085046.4457-1-jnixdorf-...@avm.de/
---
  ip/iplink_bridge.c| 21 +
  man/man8/ip-link.8.in | 10 ++
  2 files changed, 31 insertions(+)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 09/13] bridge: mcast: Add MDB get support

2023-10-17 Thread Nikolay Aleksandrov

On 10/17/23 14:03, Ido Schimmel wrote:

On Tue, Oct 17, 2023 at 12:24:44PM +0300, Nikolay Aleksandrov wrote:

On 10/16/23 16:12, Ido Schimmel wrote:

Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response. The operation is performed under the bridge multicast
lock to ensure that the entry does not change between the time the reply
size is determined and when the reply is filled in.

Signed-off-by: Ido Schimmel 
---
   net/bridge/br_device.c  |   1 +
   net/bridge/br_mdb.c | 154 
   net/bridge/br_private.h |   9 +++
   3 files changed, 164 insertions(+)


[snip]

+int br_mdb_get(struct net_device *dev, struct nlattr *tb[], u32 portid, u32 
seq,
+  struct netlink_ext_ack *extack)
+{
+   struct net_bridge *br = netdev_priv(dev);
+   struct net_bridge_mdb_entry *mp;
+   struct sk_buff *skb;
+   struct br_ip group;
+   int err;
+
+   err = br_mdb_get_parse(dev, tb, , extack);
+   if (err)
+   return err;
+
+   spin_lock_bh(>multicast_lock);


Since this is only reading, could we use rcu to avoid blocking mcast
processing?


I tried to explain this choice in the commit message. Do you think it's
a non-issue?



Unless you really need a stable snapshot, I think it's worth
not blocking igmp processing for a read. It's not critical,
if you do need a stable snapshot then it's ok.




+
+   mp = br_mdb_ip_get(br, );
+   if (!mp) {
+   NL_SET_ERR_MSG_MOD(extack, "MDB entry not found");
+   err = -ENOENT;
+   goto unlock;
+   }
+
+   skb = br_mdb_get_reply_alloc(mp);
+   if (!skb) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   err = br_mdb_get_reply_fill(skb, mp, portid, seq);
+   if (err) {
+   NL_SET_ERR_MSG_MOD(extack, "Failed to fill MDB get reply");
+   goto free;
+   }
+
+   spin_unlock_bh(>multicast_lock);
+
+   return rtnl_unicast(skb, dev_net(dev), portid);
+
+free:
+   kfree_skb(skb);
+unlock:
+   spin_unlock_bh(>multicast_lock);
+   return err;
+}






Re: [Bridge] [PATCH net-next v5 4/5] net: bridge: Set strict_start_type for br_policy

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:27, Johannes Nixdorf wrote:

Set any new attributes added to br_policy to be parsed strictly, to
prevent userspace from passing garbage.

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/br_netlink.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 0c3cf6e6dea2..5ad4abfcb7ba 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1229,6 +1229,8 @@ static size_t br_port_get_slave_size(const struct 
net_device *brdev,
  }
  
  static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {

+   [IFLA_BR_UNSPEC]= { .strict_start_type =
+   IFLA_BR_FDB_N_LEARNED },
[IFLA_BR_FORWARD_DELAY] = { .type = NLA_U32 },
[IFLA_BR_HELLO_TIME]= { .type = NLA_U32 },
[IFLA_BR_MAX_AGE]   = { .type = NLA_U32 },



Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next v5 3/5] net: bridge: Add netlink knobs for number / max learned FDB entries

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:27, Johannes Nixdorf wrote:

The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.

Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
  - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
a single bridge.
  - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
the bridge.

The new attributes are used like this:

  # ip link add name br up type bridge fdb_max_learned 256
  # ip link add name v1 up master br type veth peer v2
  # ip link set up dev v2
  # mausezahn -a rand -c 1024 v2
  0.01 seconds (90877 packets per second
  # bridge fdb | grep -v permanent | wc -l
  256
  # ip -d link show dev br
  13: br:  mtu 1500 [...]
  [...] fdb_n_learned 256 fdb_max_learned 256

Signed-off-by: Johannes Nixdorf 
---
  include/uapi/linux/if_link.h |  2 ++
  net/bridge/br_netlink.c  | 15 ++-
  2 files changed, 16 insertions(+), 1 deletion(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 13/13] selftests: vxlan_mdb: Use MDB get instead of dump

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel 
---
  tools/testing/selftests/net/test_vxlan_mdb.sh | 108 +-
  1 file changed, 54 insertions(+), 54 deletions(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 12/13] selftests: bridge_mdb: Use MDB get instead of dump

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Test the new MDB get functionality by converting dump and grep to MDB
get.

Signed-off-by: Ido Schimmel 
---
  .../selftests/net/forwarding/bridge_mdb.sh| 184 +++---
  1 file changed, 71 insertions(+), 113 deletions(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 11/13] rtnetlink: Add MDB get support

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Now that both the bridge and VXLAN drivers implement the MDB get net
device operation, expose the functionality to user space by registering
a handler for RTM_GETMDB messages. Derive the net device from the
ifindex specified in the ancillary header and invoke its MDB get NDO.

Note that unlike other get handlers, the allocation of the skb
containing the response is not performed in the common rtnetlink code as
the size is variable and needs to be determined by the respective
driver.

Signed-off-by: Ido Schimmel 
---
  net/core/rtnetlink.c | 89 +++-
  1 file changed, 88 insertions(+), 1 deletion(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 10/13] vxlan: mdb: Add MDB get support

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response.

Signed-off-by: Ido Schimmel 
---
  drivers/net/vxlan/vxlan_core.c|   1 +
  drivers/net/vxlan/vxlan_mdb.c | 150 ++
  drivers/net/vxlan/vxlan_private.h |   2 +
  3 files changed, 153 insertions(+)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 09/13] bridge: mcast: Add MDB get support

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Implement support for MDB get operation by looking up a matching MDB
entry, allocating the skb according to the entry's size and then filling
in the response. The operation is performed under the bridge multicast
lock to ensure that the entry does not change between the time the reply
size is determined and when the reply is filled in.

Signed-off-by: Ido Schimmel 
---
  net/bridge/br_device.c  |   1 +
  net/bridge/br_mdb.c | 154 
  net/bridge/br_private.h |   9 +++
  3 files changed, 164 insertions(+)


[snip]

+int br_mdb_get(struct net_device *dev, struct nlattr *tb[], u32 portid, u32 
seq,
+  struct netlink_ext_ack *extack)
+{
+   struct net_bridge *br = netdev_priv(dev);
+   struct net_bridge_mdb_entry *mp;
+   struct sk_buff *skb;
+   struct br_ip group;
+   int err;
+
+   err = br_mdb_get_parse(dev, tb, , extack);
+   if (err)
+   return err;
+
+   spin_lock_bh(>multicast_lock);


Since this is only reading, could we use rcu to avoid blocking mcast 
processing?



+
+   mp = br_mdb_ip_get(br, );
+   if (!mp) {
+   NL_SET_ERR_MSG_MOD(extack, "MDB entry not found");
+   err = -ENOENT;
+   goto unlock;
+   }
+
+   skb = br_mdb_get_reply_alloc(mp);
+   if (!skb) {
+   err = -ENOMEM;
+   goto unlock;
+   }
+
+   err = br_mdb_get_reply_fill(skb, mp, portid, seq);
+   if (err) {
+   NL_SET_ERR_MSG_MOD(extack, "Failed to fill MDB get reply");
+   goto free;
+   }
+
+   spin_unlock_bh(>multicast_lock);
+
+   return rtnl_unicast(skb, dev_net(dev), portid);
+
+free:
+   kfree_skb(skb);
+unlock:
+   spin_unlock_bh(>multicast_lock);
+   return err;
+}




Re: [Bridge] [PATCH net-next 08/13] net: Add MDB get device operation

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Add MDB net device operation that will be invoked by rtnetlink code in
response to received RTM_GETMDB messages. Subsequent patches will
implement the operation in the bridge and VXLAN drivers.

Signed-off-by: Ido Schimmel 
---
  include/linux/netdevice.h | 4 
  1 file changed, 4 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c7681263d30..18376b65dc61 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1586,6 +1586,10 @@ struct net_device_ops {
int (*ndo_mdb_dump)(struct net_device *dev,
struct sk_buff *skb,
struct netlink_callback *cb);
+   int (*ndo_mdb_get)(struct net_device *dev,
+  struct nlattr *tb[], u32 portid,
+  u32 seq,
+  struct netlink_ext_ack *extack);
int (*ndo_bridge_setlink)(struct net_device *dev,
  struct nlmsghdr *nlh,
  u16 flags,


Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 07/13] bridge: add MDB get uAPI attributes

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Add MDB get attributes that correspond to the MDB set attributes used in
RTM_NEWMDB messages. Specifically, add 'MDBA_GET_ENTRY' which will hold
a 'struct br_mdb_entry' and 'MDBA_GET_ENTRY_ATTRS' which will hold
'MDBE_ATTR_*' attributes that are used as indexes (source IP and source
VNI).

An example request will look as follows:

[ struct nlmsghdr ]
[ struct br_port_msg ]
[ MDBA_GET_ENTRY ]
struct br_mdb_entry
[ MDBA_GET_ENTRY_ATTRS ]
[ MDBE_ATTR_SOURCE ]
struct in_addr / struct in6_addr
[ MDBE_ATTR_SRC_VNI ]
u32



Could you please add this info as a comment above the enum?
Similar to the enum below it. It'd be nice to have an example
of what's expected.


Signed-off-by: Ido Schimmel 
---
  include/uapi/linux/if_bridge.h | 8 
  1 file changed, 8 insertions(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index f95326fce6bb..7e1bf080b414 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -723,6 +723,14 @@ enum {
  };
  #define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)
  
+enum {

+   MDBA_GET_ENTRY_UNSPEC,
+   MDBA_GET_ENTRY,
+   MDBA_GET_ENTRY_ATTRS,
+   __MDBA_GET_ENTRY_MAX,
+};
+#define MDBA_GET_ENTRY_MAX (__MDBA_GET_ENTRY_MAX - 1)
+
  /* [MDBA_SET_ENTRY_ATTRS] = {
   *[MDBE_ATTR_xxx]
   *...




Re: [Bridge] [PATCH net-next 06/13] vxlan: mdb: Factor out a helper for remote entry size calculation

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Currently, netlink notifications are sent for individual remote entries
and not for the entire MDB entry itself.

Subsequent patches are going to add MDB get support which will require
the VXLAN driver to reply with an entire MDB entry.

Therefore, as a preparation, factor out a helper to calculate the size
of an individual remote entry. When determining the size of the reply
this helper will be invoked for each remote entry in the MDB entry.

No functional changes intended.

Signed-off-by: Ido Schimmel 
---
  drivers/net/vxlan/vxlan_mdb.c | 28 +++-
  1 file changed, 19 insertions(+), 9 deletions(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 05/13] vxlan: mdb: Adjust function arguments

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Adjust the function's arguments and rename it to allow it to be reused
by future call sites that only have access to 'struct
vxlan_mdb_entry_key', but not to 'struct vxlan_mdb_config'.

No functional changes intended.

Signed-off-by: Ido Schimmel 
---
  drivers/net/vxlan/vxlan_mdb.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 04/13] bridge: mcast: Rename MDB entry get function

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

The current name is going to conflict with the upcoming net device
operation for the MDB get operation.

Rename the function to br_mdb_entry_skb_get(). No functional changes
intended.

Signed-off-by: Ido Schimmel 
---
  net/bridge/br_device.c|  2 +-
  net/bridge/br_input.c |  2 +-
  net/bridge/br_multicast.c |  5 +++--
  net/bridge/br_private.h   | 10 ++
  4 files changed, 11 insertions(+), 8 deletions(-)



Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 03/13] bridge: mcast: Factor out a helper for PG entry size calculation

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Currently, netlink notifications are sent for individual port group
entries and not for the entire MDB entry itself.

Subsequent patches are going to add MDB get support which will require
the bridge driver to reply with an entire MDB entry.

Therefore, as a preparation, factor out an helper to calculate the size
of an individual port group entry. When determining the size of the
reply this helper will be invoked for each port group entry in the MDB
entry.

No functional changes intended.

Signed-off-by: Ido Schimmel 
---
  net/bridge/br_mdb.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 08de94bffc12..42983f6a0abd 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -450,18 +450,13 @@ static int nlmsg_populate_mdb_fill(struct sk_buff *skb,
return -EMSGSIZE;
  }
  
-static size_t rtnl_mdb_nlmsg_size(struct net_bridge_port_group *pg)

+static size_t rtnl_mdb_nlmsg_pg_size(const struct net_bridge_port_group *pg)
  {
struct net_bridge_group_src *ent;
size_t nlmsg_size, addr_size = 0;
  
-	nlmsg_size = NLMSG_ALIGN(sizeof(struct br_port_msg)) +

-/* MDBA_MDB */
-nla_total_size(0) +
-/* MDBA_MDB_ENTRY */
-nla_total_size(0) +
 /* MDBA_MDB_ENTRY_INFO */
-nla_total_size(sizeof(struct br_mdb_entry)) +
+   nlmsg_size = nla_total_size(sizeof(struct br_mdb_entry)) +
 /* MDBA_MDB_EATTR_TIMER */
 nla_total_size(sizeof(u32));
  
@@ -511,6 +506,17 @@ static size_t rtnl_mdb_nlmsg_size(struct net_bridge_port_group *pg)

return nlmsg_size;
  }
  
+static size_t rtnl_mdb_nlmsg_size(const struct net_bridge_port_group *pg)

+{
+   return NLMSG_ALIGN(sizeof(struct br_port_msg)) +
+  /* MDBA_MDB */
+  nla_total_size(0) +
+  /* MDBA_MDB_ENTRY */
+  nla_total_size(0) +
+  /* Port group entry */
+  rtnl_mdb_nlmsg_pg_size(pg);
+}
+
  void br_mdb_notify(struct net_device *dev,
   struct net_bridge_mdb_entry *mp,
   struct net_bridge_port_group *pg,


Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next 02/13] bridge: mcast: Account for missing attributes

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

The 'MDBA_MDB' and 'MDBA_MDB_ENTRY' nest attributes are not accounted
for when calculating the size of MDB notifications. Add them along with
comments for existing attributes.

Signed-off-by: Ido Schimmel 
---
  net/bridge/br_mdb.c | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index fb58bb1b60e8..08de94bffc12 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -452,11 +452,18 @@ static int nlmsg_populate_mdb_fill(struct sk_buff *skb,
  
  static size_t rtnl_mdb_nlmsg_size(struct net_bridge_port_group *pg)

  {
-   size_t nlmsg_size = NLMSG_ALIGN(sizeof(struct br_port_msg)) +
-   nla_total_size(sizeof(struct br_mdb_entry)) +
-   nla_total_size(sizeof(u32));
struct net_bridge_group_src *ent;
-   size_t addr_size = 0;
+   size_t nlmsg_size, addr_size = 0;
+
+   nlmsg_size = NLMSG_ALIGN(sizeof(struct br_port_msg)) +
+/* MDBA_MDB */
+nla_total_size(0) +
+/* MDBA_MDB_ENTRY */
+nla_total_size(0) +
+/* MDBA_MDB_ENTRY_INFO */
+nla_total_size(sizeof(struct br_mdb_entry)) +
+/* MDBA_MDB_EATTR_TIMER */
+nla_total_size(sizeof(u32));
  
  	if (!pg)

goto out;


Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next 01/13] bridge: mcast: Dump MDB entries even when snooping is disabled

2023-10-17 Thread Nikolay Aleksandrov

On 10/16/23 16:12, Ido Schimmel wrote:

Currently, the bridge driver does not dump MDB entries when multicast
snooping is disabled although the entries are present in the kernel:

  # bridge mdb add dev br0 port swp1 grp 239.1.1.1 permanent
  # bridge mdb show dev br0
  dev br0 port swp1 grp 239.1.1.1 permanent
  dev br0 port br0 grp ff02::6a temp
  dev br0 port br0 grp ff02::1:ff9d:e61b temp
  # ip link set dev br0 type bridge mcast_snooping 0
  # bridge mdb show dev br0
  # ip link set dev br0 type bridge mcast_snooping 1
  # bridge mdb show dev br0
  dev br0 port swp1 grp 239.1.1.1 permanent
  dev br0 port br0 grp ff02::6a temp
  dev br0 port br0 grp ff02::1:ff9d:e61b temp

This behavior differs from other netlink dump interfaces that dump
entries regardless if they are used or not. For example, VLANs are
dumped even when VLAN filtering is disabled:

  # ip link set dev br0 type bridge vlan_filtering 0
  # bridge vlan show dev swp1
  port  vlan-id
  swp1  1 PVID Egress Untagged

Remove the check and always dump MDB entries:

  # bridge mdb add dev br0 port swp1 grp 239.1.1.1 permanent
  # bridge mdb show dev br0
  dev br0 port swp1 grp 239.1.1.1 permanent
  dev br0 port br0 grp ff02::6a temp
  dev br0 port br0 grp ff02::1:ffeb:1a4d temp
  # ip link set dev br0 type bridge mcast_snooping 0
  # bridge mdb show dev br0
  dev br0 port swp1 grp 239.1.1.1 permanent
  dev br0 port br0 grp ff02::6a temp
  dev br0 port br0 grp ff02::1:ffeb:1a4d temp
  # ip link set dev br0 type bridge mcast_snooping 1
  # bridge mdb show dev br0
  dev br0 port swp1 grp 239.1.1.1 permanent
  dev br0 port br0 grp ff02::6a temp
  dev br0 port br0 grp ff02::1:ffeb:1a4d temp

Signed-off-by: Ido Schimmel 
---
  net/bridge/br_mdb.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 7305f5f8215c..fb58bb1b60e8 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -323,9 +323,6 @@ static int br_mdb_fill_info(struct sk_buff *skb, struct 
netlink_callback *cb,
struct net_bridge_mdb_entry *mp;
struct nlattr *nest, *nest2;
  
-	if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))

-   return 0;
-
nest = nla_nest_start_noflag(skb, MDBA_MDB);
if (nest == NULL)
return -EMSGSIZE;


Finally! Thanks :) this has been a long-standing annoyance.
Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next 00/11] Extend VXLAN driver to support FDB flushing

2023-10-10 Thread Nikolay Aleksandrov

On 10/9/23 13:06, Amit Cohen wrote:

The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added
support for FDB flushing in bridge driver. Extend VXLAN driver to support
FDB flushing also. Add support for filtering by fields which are relevant
for VXLAN FDBs:
* Source VNI
* Nexthop ID
* 'router' flag
* Destination VNI
* Destination Port
* Destination IP

Without this set, flush for VXLAN device fails:
$ bridge fdb flush dev vx10
RTNETLINK answers: Operation not supported

With this set, such flush works with the relevant arguments, for example:
$ bridge fdb flush dev vx10 vni 5000 dst 193.2.2.1
< flush all vx10 entries with VNI 5000 and destination IP 193.2.2.1>

Some preparations are required, handle them before adding flushing support
in VXLAN driver. See more details in commit messages.

Patch set overview:
Patch #1 prepares flush policy to be used by VXLAN driver
Patches #2-#3 are preparations in VXLAN driver
Patch #4 adds an initial support for flushing in VXLAN driver
Patches #5-#9 add support for filtering by several attributes
Patch #10 adds a test for FDB flush with VXLAN
Patch #11 extends the test to check FDB flush with bridge

Amit Cohen (11):
   net: Handle bulk delete policy in bridge driver
   vxlan: vxlan_core: Make vxlan_flush() more generic for future use
   vxlan: vxlan_core: Do not skip default entry in vxlan_flush() by
 default
   vxlan: vxlan_core: Add support for FDB flush
   vxlan: vxlan_core: Support FDB flushing by source VNI
   vxlan: vxlan_core: Support FDB flushing by nexthop ID
   vxlan: vxlan_core: Support FDB flushing by destination VNI
   vxlan: vxlan_core: Support FDB flushing by destination port
   vxlan: vxlan_core: Support FDB flushing by destination IP
   selftests: Add test cases for FDB flush with VXLAN device
   selftests: fdb_flush: Add test cases for FDB flush with bridge device

  drivers/net/vxlan/vxlan_core.c   | 207 +-
  include/linux/netdevice.h|   8 +-
  net/bridge/br_fdb.c  |  29 +-
  net/bridge/br_private.h  |   3 +-
  net/core/rtnetlink.c |  27 +-
  tools/testing/selftests/net/Makefile |   1 +
  tools/testing/selftests/net/fdb_flush.sh | 812 +++
  7 files changed, 1049 insertions(+), 38 deletions(-)
  create mode 100755 tools/testing/selftests/net/fdb_flush.sh



Nice use of the flush api. :)
The set looks good to me. For the set:
Acked-by: Nikolay Aleksandrov 


Re: [Bridge] [PATCH] bridge: MTU auto tuning ignores IFLA_MTU on NEWLINK

2023-09-28 Thread Nikolay Aleksandrov

On 9/27/23 11:10, Nikolay Aleksandrov wrote:

On 9/27/23 10:57, Trent Lloyd wrote:

Commit 804b854d374e ("net: bridge: disable bridge MTU auto tuning if it
was set manually") disabled auto-tuning of the bridge MTU when the MTU
was explicitly set by the user, however that would only happen when the
MTU was set after creation. This commit ensures auto-tuning is also
disabled when the MTU is set during bridge creation.

Currently when the br_netdev_ops br_change_mtu function is called, the
flag BROPT_MTU_SET_BY_USER is set. However this function is only called
when the MTU is changed after interface creation and is not called if
the MTU is specified during creation with IFLA_MTU (br_dev_newlink).

br_change_mtu also does not get called if the MTU is set to the same
value it currently has, which makes it difficult to work around this
issue (especially for the default MTU of 1500) as you have to first
change the MTU to some other value and then back to the desired value.



Yep, I think I also described this in the commit message of my patch.


Add new selftests to ensure the bridge MTU is handled correctly:
  - Bridge created with user-specified MTU (1500)
  - Bridge created with user-specified MTU (2000)
  - Bridge created without user-specified MTU
  - Bridge created with user-specified MTU set after creation (2000)

Regression risk: Any workload which erroneously specified an MTU during
creation but accidentally relied upon auto-tuning to a different value
may be broken by this change.



Hmm, you're right. There's a risk of regression. Also it acts 
differently when set to 1500 as you've mentioned. I think they should 
act the same, also bridge's fake rtable RTAX_MTU is not set.



Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2034099
Fixes: 804b854d374e ("net: bridge: disable bridge MTU auto tuning if 
it was set manually")

Signed-off-by: Trent Lloyd 
---




So I've been thinking about this and to elaborate - my concerns are two
first is inconsistency between setting MTU at create vs later when it's 
the default (i.e. this way disables auto-tuning, while later it doesn't)
and second is potential breakage as some network managers always set the 
mtu when creating devices. That would suddenly start disabling 
auto-tuning and that will surprise some people which expect it to work.


I think if you make them both act the same and leave out that case with 
a big comment why, this would be good for -net. To fully solve the 
problem without breaking anyone I think we should add control over the 
autotuning flag so it can be turned on/off by the users. That would be 
explicit and will work for all cases, but that is net-next material.


Thanks,
 Nik



Re: [Bridge] [PATCH] bridge: MTU auto tuning ignores IFLA_MTU on NEWLINK

2023-09-27 Thread Nikolay Aleksandrov

On 9/27/23 11:10, Nikolay Aleksandrov wrote:

On 9/27/23 10:57, Trent Lloyd wrote:

Commit 804b854d374e ("net: bridge: disable bridge MTU auto tuning if it
was set manually") disabled auto-tuning of the bridge MTU when the MTU
was explicitly set by the user, however that would only happen when the
MTU was set after creation. This commit ensures auto-tuning is also
disabled when the MTU is set during bridge creation.

Currently when the br_netdev_ops br_change_mtu function is called, the
flag BROPT_MTU_SET_BY_USER is set. However this function is only called
when the MTU is changed after interface creation and is not called if
the MTU is specified during creation with IFLA_MTU (br_dev_newlink).

br_change_mtu also does not get called if the MTU is set to the same
value it currently has, which makes it difficult to work around this
issue (especially for the default MTU of 1500) as you have to first
change the MTU to some other value and then back to the desired value.



Yep, I think I also described this in the commit message of my patch.


Add new selftests to ensure the bridge MTU is handled correctly:
  - Bridge created with user-specified MTU (1500)
  - Bridge created with user-specified MTU (2000)
  - Bridge created without user-specified MTU
  - Bridge created with user-specified MTU set after creation (2000)

Regression risk: Any workload which erroneously specified an MTU during
creation but accidentally relied upon auto-tuning to a different value
may be broken by this change.



Hmm, you're right. There's a risk of regression. Also it acts 
differently when set to 1500 as you've mentioned. I think they should 
act the same, also bridge's fake rtable RTAX_MTU is not set.




The last part about RTAX_MTU is probably a separate issue. :)


Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2034099
Fixes: 804b854d374e ("net: bridge: disable bridge MTU auto tuning if 
it was set manually")

Signed-off-by: Trent Lloyd 
---






Re: [Bridge] [PATCH] bridge: MTU auto tuning ignores IFLA_MTU on NEWLINK

2023-09-27 Thread Nikolay Aleksandrov

On 9/27/23 10:57, Trent Lloyd wrote:

Commit 804b854d374e ("net: bridge: disable bridge MTU auto tuning if it
was set manually") disabled auto-tuning of the bridge MTU when the MTU
was explicitly set by the user, however that would only happen when the
MTU was set after creation. This commit ensures auto-tuning is also
disabled when the MTU is set during bridge creation.

Currently when the br_netdev_ops br_change_mtu function is called, the
flag BROPT_MTU_SET_BY_USER is set. However this function is only called
when the MTU is changed after interface creation and is not called if
the MTU is specified during creation with IFLA_MTU (br_dev_newlink).

br_change_mtu also does not get called if the MTU is set to the same
value it currently has, which makes it difficult to work around this
issue (especially for the default MTU of 1500) as you have to first
change the MTU to some other value and then back to the desired value.



Yep, I think I also described this in the commit message of my patch.


Add new selftests to ensure the bridge MTU is handled correctly:
  - Bridge created with user-specified MTU (1500)
  - Bridge created with user-specified MTU (2000)
  - Bridge created without user-specified MTU
  - Bridge created with user-specified MTU set after creation (2000)

Regression risk: Any workload which erroneously specified an MTU during
creation but accidentally relied upon auto-tuning to a different value
may be broken by this change.



Hmm, you're right. There's a risk of regression. Also it acts 
differently when set to 1500 as you've mentioned. I think they should 
act the same, also bridge's fake rtable RTAX_MTU is not set.



Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2034099
Fixes: 804b854d374e ("net: bridge: disable bridge MTU auto tuning if it was set 
manually")
Signed-off-by: Trent Lloyd 
---




Re: [Bridge] [PATCH net-next v4 4/6] net: bridge: Add netlink knobs for number / max learned FDB entries

2023-09-21 Thread Nikolay Aleksandrov

On 9/21/23 15:41, Nikolay Aleksandrov wrote:

On 9/19/23 11:12, Johannes Nixdorf wrote:

The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.

Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
  - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
    a single bridge.
  - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
    the bridge.

The new attributes are used like this:

  # ip link add name br up type bridge fdb_max_learned 256
  # ip link add name v1 up master br type veth peer v2
  # ip link set up dev v2
  # mausezahn -a rand -c 1024 v2
  0.01 seconds (90877 packets per second
  # bridge fdb | grep -v permanent | wc -l
  256
  # ip -d link show dev br
  13: br:  mtu 1500 [...]
  [...] fdb_n_learned 256 fdb_max_learned 256

Signed-off-by: Johannes Nixdorf 
---
  include/uapi/linux/if_link.h |  2 ++
  net/bridge/br_netlink.c  | 15 ++-
  2 files changed, 16 insertions(+), 1 deletion(-)

[snip]
@@ -1670,7 +1680,10 @@ static int br_fill_info(struct sk_buff *skb, 
const struct net_device *brdev)

  nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
 br->topology_change_detected) ||
  nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
-    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ))
+    nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ) ||
+    nla_put_u32(skb, IFLA_BR_FDB_N_LEARNED,
+    atomic_read(>fdb_n_learned)) ||
+    nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED, br->fdb_max_learned))
  return -EMSGSIZE;
  #ifdef CONFIG_BRIDGE_VLAN_FILTERING



Actually you're using atomic for counting, but using a u32 for the 
limit, you should cap it because the count can overflow. Or you should

use atomic64 for the counting.



Scratch all that, I'm speaking nonsense. Need to refresh my mind. :)
EVerything's alright. Sorry for the noise.



Re: [Bridge] [PATCH net-next v4 4/6] net: bridge: Add netlink knobs for number / max learned FDB entries

2023-09-21 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.

Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
  - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
a single bridge.
  - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
the bridge.

The new attributes are used like this:

  # ip link add name br up type bridge fdb_max_learned 256
  # ip link add name v1 up master br type veth peer v2
  # ip link set up dev v2
  # mausezahn -a rand -c 1024 v2
  0.01 seconds (90877 packets per second
  # bridge fdb | grep -v permanent | wc -l
  256
  # ip -d link show dev br
  13: br:  mtu 1500 [...]
  [...] fdb_n_learned 256 fdb_max_learned 256

Signed-off-by: Johannes Nixdorf 
---
  include/uapi/linux/if_link.h |  2 ++
  net/bridge/br_netlink.c  | 15 ++-
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ce3117df9cec..0486f314c176 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -510,6 +510,8 @@ enum {
IFLA_BR_VLAN_STATS_PER_PORT,
IFLA_BR_MULTI_BOOLOPT,
IFLA_BR_MCAST_QUERIER_STATE,
+   IFLA_BR_FDB_N_LEARNED,
+   IFLA_BR_FDB_MAX_LEARNED,
__IFLA_BR_MAX,
  };
  
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c

index 505683ef9a26..f5d49a05e61b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1267,6 +1267,8 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] 
= {
[IFLA_BR_VLAN_STATS_PER_PORT] = { .type = NLA_U8 },
[IFLA_BR_MULTI_BOOLOPT] =
NLA_POLICY_EXACT_LEN(sizeof(struct br_boolopt_multi)),
+   [IFLA_BR_FDB_N_LEARNED] = { .type = NLA_U32 },
+   [IFLA_BR_FDB_MAX_LEARNED] = { .type = NLA_U32 },
  };
  
  static int br_changelink(struct net_device *brdev, struct nlattr *tb[],

@@ -1541,6 +1543,12 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return err;
}
  
+	if (data[IFLA_BR_FDB_MAX_LEARNED]) {

+   u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_LEARNED]);
+
+   WRITE_ONCE(br->fdb_max_learned, val);
+   }
+
return 0;
  }
  
@@ -1595,6 +1603,8 @@ static size_t br_get_size(const struct net_device *brdev)

   nla_total_size_64bit(sizeof(u64)) + /* 
IFLA_BR_TOPOLOGY_CHANGE_TIMER */
   nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_GC_TIMER */
   nla_total_size(ETH_ALEN) +   /* IFLA_BR_GROUP_ADDR */
+  nla_total_size(sizeof(u32)) +/* IFLA_BR_FDB_N_LEARNED */
+  nla_total_size(sizeof(u32)) +/* IFLA_BR_FDB_MAX_LEARNED */
  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
   nla_total_size(sizeof(u8)) + /* IFLA_BR_MCAST_ROUTER */
   nla_total_size(sizeof(u8)) + /* IFLA_BR_MCAST_SNOOPING */
@@ -1670,7 +1680,10 @@ static int br_fill_info(struct sk_buff *skb, const 
struct net_device *brdev)
nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
   br->topology_change_detected) ||
nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
-   nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ))
+   nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ) ||
+   nla_put_u32(skb, IFLA_BR_FDB_N_LEARNED,
+   atomic_read(>fdb_n_learned)) ||
+   nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED, br->fdb_max_learned))
return -EMSGSIZE;
  
  #ifdef CONFIG_BRIDGE_VLAN_FILTERING




Actually you're using atomic for counting, but using a u32 for the 
limit, you should cap it because the count can overflow. Or you should

use atomic64 for the counting.





Re: [Bridge] [PATCH net-next v4 5/6] net: bridge: Add a configurable default FDB learning limit

2023-09-21 Thread Nikolay Aleksandrov

On 9/21/23 11:06, Johannes Nixdorf wrote:

On Wed, Sep 20, 2023 at 02:00:27PM +0300, Nikolay Aleksandrov wrote:

On 9/19/23 11:12, Johannes Nixdorf wrote:

Add a Kconfig option to configure a default FDB learning limit system
wide, so a distributor building a special purpose kernel can limit all
created bridges by default.

The limit is only a soft default setting and overrideable on a per bridge
basis using netlink.

Signed-off-by: Johannes Nixdorf 
---
   net/bridge/Kconfig | 13 +
   net/bridge/br_device.c |  2 ++
   2 files changed, 15 insertions(+)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index 3c8ded7d3e84..c0d9c08088c4 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -84,3 +84,16 @@ config BRIDGE_CFM
  Say N to exclude this support and reduce the binary size.
  If unsure, say N.
+
+config BRIDGE_DEFAULT_FDB_MAX_LEARNED
+   int "Default FDB learning limit"
+   default 0
+   depends on BRIDGE
+   help
+ Sets a default limit on the number of learned FDB entries on
+ new bridges. This limit can be overwritten via netlink on a


overwritten doesn't sound good, how about This limit can be set (or changed)


+ per bridge basis.
+
+ The default of 0 disables the limit.
+
+ If unsure, say 0.
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 9a5ea06236bd..3214391c15a0 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -531,6 +531,8 @@ void br_dev_setup(struct net_device *dev)
br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME;
dev->max_mtu = ETH_MAX_MTU;
+   br->fdb_max_learned = CONFIG_BRIDGE_DEFAULT_FDB_MAX_LEARNED;
+
br_netfilter_rtable_init(br);
br_stp_timer_init(br);
br_multicast_init(br);



This one I'm not sure about at all. Distributions can just create the bridge
with a predefined limit. This is not flexible and just adds
one more kconfig option that is rather unnecessary. Why having a kconfig
knob is better than bridge creation time limit setting? You still have
to create the bridge, so why not set the limit then?


The problem I'm trying to solve here are unaware applications. Assuming
this change lands in the next Linux release there will still be quite
some time until the major applications that create bridges (distribution
specific or common network management tools, the container solution of
they day, for embedded some random vendor tools, etc.) will pick it
up. In this series I chose a default of 0 to not break existing setups
that rely on some arbitrary amount of FDB entries, so those unaware
applications will create bridges without limits. I added the Kconfig
setting so someone who knows their use cases can still set a more fitting
default limit.

More specifically to our use case as an embedded vendor that builds their
own kernels and knows they have no use case that requires huge FDB tables,
the kernel config allows us to set a safe default limit before starting
to teach all our applications and our upstream vendors' code about the
new netlink attribute. As this patch is relatively simple, we can also
keep it downstream if there is opposition to it here though.


I'm not strongly against, just IMO it is unnecessary. I won't block the 
set because of this, but it would be nice to get input from others as
well. If you can recompile your kernel to set a limit, it should be 
easier to change your app to set the same limit via netlink, but I'm not 
familiar with your use case.




Re: [Bridge] [PATCH net-next v4 2/6] net: bridge: Set strict_start_type for br_policy

2023-09-21 Thread Nikolay Aleksandrov

On 9/21/23 10:23, Johannes Nixdorf wrote:

On Wed, Sep 20, 2023 at 01:46:02PM +0300, Nikolay Aleksandrov wrote:

On 9/19/23 11:12, Johannes Nixdorf wrote:

Set any new attributes added to br_policy to be parsed strictly, to
prevent userspace from passing garbage.

Signed-off-by: Johannes Nixdorf 
---
   net/bridge/br_netlink.c | 2 ++
   1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 10f0d33d8ccf..505683ef9a26 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1229,6 +1229,8 @@ static size_t br_port_get_slave_size(const struct 
net_device *brdev,
   }
   static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {
+   [IFLA_BR_UNSPEC]= { .strict_start_type =
+   IFLA_BR_MCAST_QUERIER_STATE + 1 },
[IFLA_BR_FORWARD_DELAY] = { .type = NLA_U32 },
[IFLA_BR_HELLO_TIME]= { .type = NLA_U32 },
[IFLA_BR_MAX_AGE]   = { .type = NLA_U32 },



instead of IFLA_BR_MCAST_QUERIER_STATE + 1, why not move around the patch
and just use the new attribute name?
These are uapi, they won't change.


I wanted to avoid having a state between the two commits where the new
attributes are already added, but not yet strictly verified. Otherwise
they would present a slightly different UAPI at that one commit boundary
than after this commit.



That's not really a problem, the attribute is the same.


This is also not the only place in the kernel where strict_start_type
is specified that way. See e.g. commit c00041cf1cb8 ("net: bridge: Set
strict_start_type at two policies"), even though that seems mostly be
done to turn on strict_start_type preemtively, not in the same series
that adds the new attribute.


Please, just use the new attribute to be more explicit where the strict 
parsing starts.


Thanks,
 Nik




Re: [Bridge] [PATCH net-next v4 6/6] selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

Add a suite covering the fdb_n_learned and fdb_max_learned bridge
features, touching all special cases in accounting at least once.

Signed-off-by: Johannes Nixdorf 
---
  tools/testing/selftests/net/forwarding/Makefile|   3 +-
  .../net/forwarding/bridge_fdb_learning_limit.sh| 283 +
  2 files changed, 285 insertions(+), 1 deletion(-)



Always nice to see new tests. Thanks,
Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v4 5/6] net: bridge: Add a configurable default FDB learning limit

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

Add a Kconfig option to configure a default FDB learning limit system
wide, so a distributor building a special purpose kernel can limit all
created bridges by default.

The limit is only a soft default setting and overrideable on a per bridge
basis using netlink.

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/Kconfig | 13 +
  net/bridge/br_device.c |  2 ++
  2 files changed, 15 insertions(+)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index 3c8ded7d3e84..c0d9c08088c4 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -84,3 +84,16 @@ config BRIDGE_CFM
  Say N to exclude this support and reduce the binary size.
  
  	  If unsure, say N.

+
+config BRIDGE_DEFAULT_FDB_MAX_LEARNED
+   int "Default FDB learning limit"
+   default 0
+   depends on BRIDGE
+   help
+ Sets a default limit on the number of learned FDB entries on
+ new bridges. This limit can be overwritten via netlink on a
+ per bridge basis.
+
+ The default of 0 disables the limit.
+
+ If unsure, say 0.
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 9a5ea06236bd..3214391c15a0 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -531,6 +531,8 @@ void br_dev_setup(struct net_device *dev)
br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME;
dev->max_mtu = ETH_MAX_MTU;
  
+	br->fdb_max_learned = CONFIG_BRIDGE_DEFAULT_FDB_MAX_LEARNED;

+
br_netfilter_rtable_init(br);
br_stp_timer_init(br);
br_multicast_init(br);



This one I'm not sure about at all. Distributions can just create the 
bridge with a predefined limit. This is not flexible and just adds

one more kconfig option that is rather unnecessary. Why having a kconfig
knob is better than bridge creation time limit setting? You still have
to create the bridge, so why not set the limit then?


Re: [Bridge] [PATCH net-next v4 4/6] net: bridge: Add netlink knobs for number / max learned FDB entries

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

The previous patch added accounting and a limit for the number of
dynamically learned FDB entries per bridge. However it did not provide
means to actually configure those bounds or read back the count. This
patch does that.

Two new netlink attributes are added for the accounting and limit of
dynamically learned FDB entries:
  - IFLA_BR_FDB_N_LEARNED (RO) for the number of entries accounted for
a single bridge.
  - IFLA_BR_FDB_MAX_LEARNED (RW) for the configured limit of entries for
the bridge.

The new attributes are used like this:

  # ip link add name br up type bridge fdb_max_learned 256
  # ip link add name v1 up master br type veth peer v2
  # ip link set up dev v2
  # mausezahn -a rand -c 1024 v2
  0.01 seconds (90877 packets per second
  # bridge fdb | grep -v permanent | wc -l
  256
  # ip -d link show dev br
  13: br:  mtu 1500 [...]
  [...] fdb_n_learned 256 fdb_max_learned 256

Signed-off-by: Johannes Nixdorf 
---
  include/uapi/linux/if_link.h |  2 ++
  net/bridge/br_netlink.c  | 15 ++-
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ce3117df9cec..0486f314c176 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -510,6 +510,8 @@ enum {
IFLA_BR_VLAN_STATS_PER_PORT,
IFLA_BR_MULTI_BOOLOPT,
IFLA_BR_MCAST_QUERIER_STATE,
+   IFLA_BR_FDB_N_LEARNED,
+   IFLA_BR_FDB_MAX_LEARNED,
__IFLA_BR_MAX,
  };
  
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c

index 505683ef9a26..f5d49a05e61b 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1267,6 +1267,8 @@ static const struct nla_policy br_policy[IFLA_BR_MAX + 1] 
= {
[IFLA_BR_VLAN_STATS_PER_PORT] = { .type = NLA_U8 },
[IFLA_BR_MULTI_BOOLOPT] =
NLA_POLICY_EXACT_LEN(sizeof(struct br_boolopt_multi)),
+   [IFLA_BR_FDB_N_LEARNED] = { .type = NLA_U32 },


hmm? I thought this one was RO.


+   [IFLA_BR_FDB_MAX_LEARNED] = { .type = NLA_U32 },
  };
  
  static int br_changelink(struct net_device *brdev, struct nlattr *tb[],

@@ -1541,6 +1543,12 @@ static int br_changelink(struct net_device *brdev, 
struct nlattr *tb[],
return err;
}
  
+	if (data[IFLA_BR_FDB_MAX_LEARNED]) {

+   u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_LEARNED]);
+
+   WRITE_ONCE(br->fdb_max_learned, val);
+   }
+
return 0;
  }
  
@@ -1595,6 +1603,8 @@ static size_t br_get_size(const struct net_device *brdev)

   nla_total_size_64bit(sizeof(u64)) + /* 
IFLA_BR_TOPOLOGY_CHANGE_TIMER */
   nla_total_size_64bit(sizeof(u64)) + /* IFLA_BR_GC_TIMER */
   nla_total_size(ETH_ALEN) +   /* IFLA_BR_GROUP_ADDR */
+  nla_total_size(sizeof(u32)) +/* IFLA_BR_FDB_N_LEARNED */
+  nla_total_size(sizeof(u32)) +/* IFLA_BR_FDB_MAX_LEARNED */
  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
   nla_total_size(sizeof(u8)) + /* IFLA_BR_MCAST_ROUTER */
   nla_total_size(sizeof(u8)) + /* IFLA_BR_MCAST_SNOOPING */
@@ -1670,7 +1680,10 @@ static int br_fill_info(struct sk_buff *skb, const 
struct net_device *brdev)
nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED,
   br->topology_change_detected) ||
nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) ||
-   nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ))
+   nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), ) ||
+   nla_put_u32(skb, IFLA_BR_FDB_N_LEARNED,
+   atomic_read(>fdb_n_learned)) ||
+   nla_put_u32(skb, IFLA_BR_FDB_MAX_LEARNED, br->fdb_max_learned))
return -EMSGSIZE;
  
  #ifdef CONFIG_BRIDGE_VLAN_FILTERING






Re: [Bridge] [PATCH net-next v4 3/6] net: bridge: Track and limit dynamically learned FDB entries

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.

There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.

Mitigate this by maintaining a per bridge count of those automatically
generated entries in fdb_n_learned, and a limit in fdb_max_learned. If
the limit is hit new entries are not learned anymore.

For backwards compatibility the default setting of 0 disables the limit.

User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.

Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of
whether an FDB entry is included in the count. The flag is enabled for
dynamically learned entries, and disabled for all other entries. This
should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset,
but contrary to the two flags it can be toggled atomically.

Atomicity is required here, as there are multiple callers that modify the
flags, but are not under a common lock (br_fdb_update is the exception
for br->hash_lock, br_fdb_external_learn_add for RTNL).

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/br_fdb.c | 35 +--
  net/bridge/br_private.h |  4 
  2 files changed, 37 insertions(+), 2 deletions(-)



I think this is a good counting start. :) It'd be nice to get
more eyes on this one.
Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v4 2/6] net: bridge: Set strict_start_type for br_policy

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

Set any new attributes added to br_policy to be parsed strictly, to
prevent userspace from passing garbage.

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/br_netlink.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 10f0d33d8ccf..505683ef9a26 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1229,6 +1229,8 @@ static size_t br_port_get_slave_size(const struct 
net_device *brdev,
  }
  
  static const struct nla_policy br_policy[IFLA_BR_MAX + 1] = {

+   [IFLA_BR_UNSPEC]= { .strict_start_type =
+   IFLA_BR_MCAST_QUERIER_STATE + 1 },
[IFLA_BR_FORWARD_DELAY] = { .type = NLA_U32 },
[IFLA_BR_HELLO_TIME]= { .type = NLA_U32 },
[IFLA_BR_MAX_AGE]   = { .type = NLA_U32 },



instead of IFLA_BR_MCAST_QUERIER_STATE + 1, why not move around the 
patch and just use the new attribute name?

These are uapi, they won't change.


Re: [Bridge] [PATCH net-next v4 1/6] net: bridge: Set BR_FDB_ADDED_BY_USER early in fdb_add_entry

2023-09-20 Thread Nikolay Aleksandrov

On 9/19/23 11:12, Johannes Nixdorf wrote:

In preparation of the following fdb limit for dynamically learned entries,
allow fdb_create to detect that the entry was added by the user. This
way it can skip applying the limit in this case.

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/br_fdb.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index e69a872bfc1d..f517ea92132c 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -1056,7 +1056,8 @@ static int fdb_add_entry(struct net_bridge *br, struct 
net_bridge_port *source,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
  
-		fdb = fdb_create(br, source, addr, vid, 0);

+   fdb = fdb_create(br, source, addr, vid,
+BIT(BR_FDB_ADDED_BY_USER));
if (!fdb)
return -ENOMEM;
  
@@ -1069,6 +1070,8 @@ static int fdb_add_entry(struct net_bridge *br, struct net_bridge_port *source,

WRITE_ONCE(fdb->dst, source);
modified = true;
}
+
+   set_bit(BR_FDB_ADDED_BY_USER, >flags);
}
  
  	if (fdb_to_nud(br, fdb) != state) {

@@ -1100,8 +1103,6 @@ static int fdb_add_entry(struct net_bridge *br, struct 
net_bridge_port *source,
if (fdb_handle_notify(fdb, notify))
modified = true;
  
-	set_bit(BR_FDB_ADDED_BY_USER, >flags);

-
fdb->used = jiffies;
if (modified) {
if (refresh)



Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net] net: bridge: use DEV_STATS_INC()

2023-09-18 Thread Nikolay Aleksandrov

On 9/18/23 12:13, Eric Dumazet wrote:

syzbot/KCSAN reported data-races in br_handle_frame_finish() [1]
This function can run from multiple cpus without mutual exclusion.

Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.

Handles updates to dev->stats.tx_dropped while we are at it.

[1]
BUG: KCSAN: data-race in br_handle_frame_finish / br_handle_frame_finish

read-write to 0x8881374b2178 of 8 bytes by interrupt on cpu 1:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
run_ksoftirqd+0x17/0x20 kernel/softirq.c:921
smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read-write to 0x8881374b2178 of 8 bytes by interrupt on cpu 0:
br_handle_frame_finish+0xd4f/0xef0 net/bridge/br_input.c:189
br_nf_hook_thresh+0x1ed/0x220
br_nf_pre_routing_finish_ipv6+0x50f/0x540
NF_HOOK include/linux/netfilter.h:304 [inline]
br_nf_pre_routing_ipv6+0x1e3/0x2a0 net/bridge/br_netfilter_ipv6.c:178
br_nf_pre_routing+0x526/0xba0 net/bridge/br_netfilter_hooks.c:508
nf_hook_entry_hookfn include/linux/netfilter.h:144 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:272 [inline]
br_handle_frame+0x4c9/0x940 net/bridge/br_input.c:417
__netif_receive_skb_core+0xa8a/0x21e0 net/core/dev.c:5417
__netif_receive_skb_one_core net/core/dev.c:5521 [inline]
__netif_receive_skb+0x57/0x1b0 net/core/dev.c:5637
process_backlog+0x21f/0x380 net/core/dev.c:5965
__napi_poll+0x60/0x3b0 net/core/dev.c:6527
napi_poll net/core/dev.c:6594 [inline]
net_rx_action+0x32b/0x750 net/core/dev.c:6727
__do_softirq+0xc1/0x265 kernel/softirq.c:553
do_softirq+0x5e/0x90 kernel/softirq.c:454
__local_bh_enable_ip+0x64/0x70 kernel/softirq.c:381
__raw_spin_unlock_bh include/linux/spinlock_api_smp.h:167 [inline]
_raw_spin_unlock_bh+0x36/0x40 kernel/locking/spinlock.c:210
spin_unlock_bh include/linux/spinlock.h:396 [inline]
batadv_tt_local_purge+0x1a8/0x1f0 net/batman-adv/translation-table.c:1356
batadv_tt_purge+0x2b/0x630 net/batman-adv/translation-table.c:3560
process_one_work kernel/workqueue.c:2630 [inline]
process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703
worker_thread+0x525/0x730 kernel/workqueue.c:2784
kthread+0x1d7/0x210 kernel/kthread.c:388
ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

value changed: 0x000d7190 -> 0x000d7191

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 14848 Comm: kworker/u4:11 Not tainted 
6.6.0-rc1-syzkaller-00236-gad8a69f361b9 #0

Fixes: 1c29fc4989bc ("[BRIDGE]: keep track of received multicast packets")
Reported-by: syzbot 
Signed-off-by: Eric Dumazet 
Cc: Roopa Prabhu 
Cc: Nikolay Aleksandrov 
Cc: bridge@lists.linux-foundation.org
---
  net/bridge/br_forward.c | 4 ++--
  net/bridge/br_input.c   | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)



Thanks,
Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v3 3/6] net: bridge: Track and limit dynamically learned FDB entries

2023-09-14 Thread Nikolay Aleksandrov

On 9/5/23 14:47, Johannes Nixdorf wrote:

A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.

There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.

Mitigate this by maintaining a per bridge count of those automatically
generated entries in fdb_n_learned_entries, and a limit in
fdb_max_learned_entries. If the limit is hit new entries are not learned
anymore.

For backwards compatibility the default setting of 0 disables the limit.

User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.

Introduce a new fdb entry flag BR_FDB_DYNAMIC_LEARNED to keep track of
whether an FDB entry is included in the count. The flag is enabled for
dynamically learned entries, and disabled for all other entries. This
should be equivalent to BR_FDB_ADDED_BY_USER and BR_FDB_LOCAL being unset,
but contrary to the two flags it can be toggled atomically.

Atomicity is required here, as there are multiple callers that modify the
flags, but are not under a common lock (br_fdb_update is the exception
for br->hash_lock, br_fdb_external_learn_add for RTNL).

Signed-off-by: Johannes Nixdorf 
---
  net/bridge/br_fdb.c | 34 --
  net/bridge/br_private.h |  4 
  2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index 06e28ef8d9ff..f8a96ed9a338 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -329,11 +329,18 @@ static void fdb_delete(struct net_bridge *br, struct 
net_bridge_fdb_entry *f,
hlist_del_init_rcu(>fdb_node);
rhashtable_remove_fast(>fdb_hash_tbl, >rhnode,
   br_fdb_rht_params);
+   if (test_bit(BR_FDB_DYNAMIC_LEARNED, >flags))


this is racy with br_fdb_update(), the fdb entry is still visible 
because a grace period hasn't passed, so in theory br_fdb_update() can

unset the bit after this test and...

+   atomic_dec(>fdb_n_learned_entries);

... this will decrease once more wrongly


fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
call_rcu(>rcu, fdb_rcu_free);
  }
  
-/* Delete a local entry if no other port had the same address. */

+/* Delete a local entry if no other port had the same address.
+ *
+ * This function should only be called on entries with BR_FDB_LOCAL set,
+ * so even with BR_FDB_ADDED_BY_USER cleared we never need to increase
+ * the accounting for dynamically learned entries again.
+ */
  static void fdb_delete_local(struct net_bridge *br,
 const struct net_bridge_port *p,
 struct net_bridge_fdb_entry *f)
@@ -388,9 +395,20 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
net_bridge *br,
   __u16 vid,
   unsigned long flags)
  {
+   bool learned = !test_bit(BR_FDB_ADDED_BY_USER, ) &&
+  !test_bit(BR_FDB_LOCAL, );
+   u32 max_learned = READ_ONCE(br->fdb_max_learned_entries);
struct net_bridge_fdb_entry *fdb;
int err;
  
+	if (likely(learned)) {

+   int n_learned = atomic_read(>fdb_n_learned_entries);
+
+   if (unlikely(max_learned && n_learned >= max_learned))
+   return NULL;
+   __set_bit(BR_FDB_DYNAMIC_LEARNED, );
+   }
+
fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC);
if (!fdb)
return NULL;
@@ -407,6 +425,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
net_bridge *br,
return NULL;
}
  
+	if (likely(learned))

+   atomic_inc(>fdb_n_learned_entries);
+
hlist_add_head_rcu(>fdb_node, >fdb_list);
  
  	return fdb;

@@ -893,8 +914,11 @@ void br_fdb_update(struct net_bridge *br, struct 
net_bridge_port *source,
clear_bit(BR_FDB_LOCKED, >flags);
}
  
-			if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, )))

+   if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, ))) {
set_bit(BR_FDB_ADDED_BY_USER, >flags);
+   if (test_and_clear_bit(BR_FDB_DYNAMIC_LEARNED, 
>flags))
+   atomic_dec(>fdb_n_learned_entries);
+   }
if (unlikely(fdb_modified)) {
trace_br_fdb_update(br, source, addr, vid, 
flags);
fdb_notify(br, fdb, RTM_NEWNEIGH, true);
@@ -1071,6 

Re: [Bridge] [PATCH] net: bridge: Fix refcnt issues in dev_ioctl

2023-08-22 Thread Nikolay Aleksandrov

On 8/20/23 01:50, Ziqi Zhao wrote:

On Sat, Aug 19, 2023 at 12:25:15PM +0300, Nikolay Aleksandrov wrote:
Hi Nik,

Thank you so much for reviewing the patch and getting back to me!


IIRC there was no bug, it was a false-positive. The reference is held a bit
longer but then released, so the device is deleted later.



If you reproduced it, is the device later removed or is it really stuck?


I ran the reproducer again without the patch and it seems you are
correct. It was trying to create a very short-lived bridge, then delete
it immediately in the next call. The device in question "wpan4" never
showed up when I polled with `ip link` in the VM, so I'd say it did not
get stuck.


How would it leak a reference, could you elaborate?
The reference is always taken and always released after the call.


This was where I got a bit confused too. The system had a timeout of
140 seconds for the unregister_netdevice check. If the bridge in
question was created and deleted repeatedly, the warning would indeed
not be an actual reference leak. But how could its reference show up
after 140 seconds if the bridge's creation and deletion were all within
a couple of milliseconds?

So I let the system run for a bit longer with the reproducer, and after
~200 seconds, the kernel crashed and complained that some tasks had
been waiting for too long (more than 143 seconds) trying to get hold of
the br_ioctl_mutex. This was also quite strange to me, since on the
surface it definitely looked like a deadlock, but the strict locking
order as I described previously should prevent any deadlocks from
happening.

Anyways, I decided to test switching up the lock order, since both the
refcnt warning and the task stall seemed closely related to the above
mentioned locks. When I ran the reproducer again after the patch, both
the warning and the stall issue went away. So I guess the patch is
still relevant in preventing bugs in some extreme cases -- although the
scenario created by the reproducer would probably never happen in real
usages?



Thank you for testing, but we really need to understand what is going on 
and why the device isn't getting deleted for so long. Currently I don't 
have the time to debug it properly (I'll be able to next week at the 
earliest). We can't apply the patch based only on tests without 
understanding the underlying issue. I'd look into what
the reproducer is doing exactly and also check the system state while 
the deadlock has happened. Also you can list the currently held locks 
(if CONFIG_LOCKDEP is enabled) via magic sysrq + d for example. See 
which process is holding them, what are their priorities and so on.

Try to build some theory of how a deadlock might happen and then go
about proving it. Does the 8021q module have the same problem? It uses
similar code to set its hook.


Please let me know whether you have any thoughts on how the above
issues were triggered, and what other information I could gather to
further demystify this bug. Thank you again for your help!

Best regards,
Ziqi




Re: [Bridge] [PATCH] net: bridge: Fix refcnt issues in dev_ioctl

2023-08-19 Thread Nikolay Aleksandrov

Hi Ziqi,
On 8/19/23 11:10, Ziqi Zhao wrote:

In the bug reported by Syzbot, certain bridge devices would have a
leaked reference created by race conditions in dev_ioctl, specifically,
under SIOCBRADDIF or SIOCBRDELIF operations. The reference leak would


How would it leak a reference, could you elaborate?
The reference is always taken and always released after the call.


be shown in the periodic unregister_netdevice call, which throws a
warning and cause Syzbot to report a crash. Upon inspection of the


If you reproduced it, is the device later removed or is it really stuck?


logic in dev_ioctl, it seems the reference was introduced to ensure
proper access to the bridge device after rtnl_unlock. and the latter
function is necessary to maintain the following lock order in any
bridge related ioctl calls:

1) br_ioctl_mutex => 2) rtnl_lock

Conceptually, though, br_ioctl_mutex could be considered more specific
than rtnl_lock given their usages, hence swapping their order would be
a reasonable proposal. This patch changes all related call sites to
maintain the reversed order of the two locks:

1) rtnl_lock => 2) br_ioctl_mutex

By doing so, the extra reference introduced in dev_ioctl is no longer
needed, and hence the reference leak bug is now resolved.


IIRC there was no bug, it was a false-positive. The reference is held a 
bit longer but then released, so the device is deleted later.
I might be remembering wrong, but I think I briefly looked into this 
when it was reported. If that's not the case I'd be interested to see

a new report/trace because the bug might be somewhere else.



Reported-by: syzbot+881d65229ca4f9ae8...@syzkaller.appspotmail.com
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
Signed-off-by: Ziqi Zhao 
---
  net/bridge/br_ioctl.c | 4 
  net/core/dev_ioctl.c  | 8 +---
  net/socket.c  | 2 ++
  3 files changed, 3 insertions(+), 11 deletions(-)



Thanks,
 Nik




Re: [Bridge] [PATCH net-next] bridge: Remove unused declaration br_multicast_set_hash_max()

2023-07-27 Thread Nikolay Aleksandrov

On 7/26/23 17:31, YueHaibing wrote:

Since commit 19e3a9c90c53 ("net: bridge: convert multicast to generic 
rhashtable")
this is not used, so can remove it.

Signed-off-by: YueHaibing 
---
  net/bridge/br_private.h | 1 -
  1 file changed, 1 deletion(-)

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 51e4ca54b537..a1f4acfa6994 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -974,7 +974,6 @@ int br_multicast_set_vlan_router(struct net_bridge_vlan *v, 
u8 mcast_router);
  int br_multicast_toggle(struct net_bridge *br, unsigned long val,
struct netlink_ext_ack *extack);
  int br_multicast_set_querier(struct net_bridge_mcast *brmctx, unsigned long 
val);
-int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
  int br_multicast_set_igmp_version(struct net_bridge_mcast *brmctx,
  unsigned long val);
  #if IS_ENABLED(CONFIG_IPV6)


Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [RFC PATCH net-next 3/4] bridge: Add backup nexthop ID support

2023-07-13 Thread Nikolay Aleksandrov

On 13/07/2023 12:33, Nikolay Aleksandrov wrote:

On 13/07/2023 10:09, Ido Schimmel wrote:

Add a new bridge port attribute that allows attaching a nexthop object
ID to an skb that is redirected to a backup bridge port with VLAN
tunneling enabled.

Specifically, when redirecting a known unicast packet, read the backup
nexthop ID from the bridge port that lost its carrier and set it in the
bridge control block of the skb before forwarding it via the backup
port. Note that reading the ID from the bridge port should not result in
a cache miss as the ID is added next to the 'backup_port' field that was
already accessed. After this change, the 'state' field still stays on
the first cache line, together with other data path related fields such
as 'flags and 'vlgrp':

struct net_bridge_port {
 struct net_bridge *    br;   /* 0 8 */
 struct net_device *    dev;  /* 8 8 */
 netdevice_tracker  dev_tracker;  /*    16 0 */
 struct list_head   list; /*    16    16 */
 long unsigned int  flags;    /*    32 8 */
 struct net_bridge_vlan_group * vlgrp;    /*    40 8 */
 struct net_bridge_port *   backup_port;  /*    48 8 */
 u32    backup_nhid;  /*    56 4 */
 u8 priority; /*    60 1 */
 u8 state;    /*    61 1 */
 u16    port_no;  /*    62 2 */
 /* --- cacheline 1 boundary (64 bytes) --- */
[...]
} __attribute__((__aligned__(8)));

When forwarding an skb via a bridge port that has VLAN tunneling
enabled, check if the backup nexthop ID stored in the bridge control
block is valid (i.e., not zero). If so, instead of attaching the
pre-allocated metadata (that only has the tunnel key set), allocate a
new metadata, set both the tunnel key and the nexthop object ID and
attach it to the skb.

By default, do not dump the new attribute to user space as a value of
zero is an invalid nexthop object ID.

The above is useful for EVPN multihoming. When one of the links
composing an Ethernet Segment (ES) fails, traffic needs to be redirected
towards the host via one of the other ES peers. For example, if a host
is multihomed to three different VTEPs, the backup port of each ES link
needs to be set to the VXLAN device and the backup nexthop ID needs to
point to an FDB nexthop group that includes the IP addresses of the
other two VTEPs. The VXLAN driver will extract the ID from the metadata
of the redirected skb, calculate its flow hash and forward it towards
one of the other VTEPs. If the ID does not exist, or represents an
invalid nexthop object, the VXLAN driver will drop the skb. This
relieves the bridge driver from the need to validate the ID.

Signed-off-by: Ido Schimmel 
---
  include/uapi/linux/if_link.h |  1 +
  net/bridge/br_forward.c  |  1 +
  net/bridge/br_netlink.c  | 12 
  net/bridge/br_private.h  |  3 +++
  net/bridge/br_vlan_tunnel.c  | 15 +++
  net/core/rtnetlink.c |  2 +-
  6 files changed, 33 insertions(+), 1 deletion(-)


[snip]

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 6116eba1bd89..9d7bc8b96b53 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -154,6 +154,7 @@ void br_forward(const struct net_bridge_port *to,
  backup_port = rcu_dereference(to->backup_port);
  if (unlikely(!backup_port))
  goto out;
+    BR_INPUT_SKB_CB(skb)->backup_nhid = READ_ONCE(to->backup_nhid);
  to = backup_port;
  }
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 05c5863d2e20..10f0d33d8ccf 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -211,6 +211,7 @@ static inline size_t br_port_info_size(void)
  + nla_total_size(sizeof(u8))    /* IFLA_BRPORT_MRP_IN_OPEN */
  + nla_total_size(sizeof(u32))    /* IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT 
*/
  + nla_total_size(sizeof(u32))    /* IFLA_BRPORT_MCAST_EHT_HOSTS_CNT */
+    + nla_total_size(sizeof(u32))    /* IFLA_BRPORT_BACKUP_NHID */
  + 0;
  }
@@ -319,6 +320,10 @@ static int br_port_fill_attrs(struct sk_buff *skb,
  backup_p->dev->ifindex);
  rcu_read_unlock();
+    if (p->backup_nhid &&
+    nla_put_u32(skb, IFLA_BRPORT_BACKUP_NHID, p->backup_nhid))
+    return -EMSGSIZE;
+


READ_ONCE(), see the comment above backup_port :)



Actually the updates are done with port lock as well.
This should be fine. Patch looks good.





Re: [Bridge] [RFC PATCH net-next 4/4] selftests: net: Add bridge backup port and backup nexthop ID test

2023-07-13 Thread Nikolay Aleksandrov

On 13/07/2023 10:09, Ido Schimmel wrote:

Add test cases for bridge backup port and backup nexthop ID, testing
both good and bad flows.

Example truncated output:

  # ./test_bridge_backup_port.sh
  [...]
  Tests passed:  83
  Tests failed:   0

Signed-off-by: Ido Schimmel 
---
  tools/testing/selftests/net/Makefile  |   1 +
  .../selftests/net/test_bridge_backup_port.sh  | 759 ++
  2 files changed, 760 insertions(+)
  create mode 100755 tools/testing/selftests/net/test_bridge_backup_port.sh



Well explained
Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [RFC PATCH net-next 3/4] bridge: Add backup nexthop ID support

2023-07-13 Thread Nikolay Aleksandrov

On 13/07/2023 10:09, Ido Schimmel wrote:

Add a new bridge port attribute that allows attaching a nexthop object
ID to an skb that is redirected to a backup bridge port with VLAN
tunneling enabled.

Specifically, when redirecting a known unicast packet, read the backup
nexthop ID from the bridge port that lost its carrier and set it in the
bridge control block of the skb before forwarding it via the backup
port. Note that reading the ID from the bridge port should not result in
a cache miss as the ID is added next to the 'backup_port' field that was
already accessed. After this change, the 'state' field still stays on
the first cache line, together with other data path related fields such
as 'flags and 'vlgrp':

struct net_bridge_port {
 struct net_bridge *br;   /* 0 8 */
 struct net_device *dev;  /* 8 8 */
 netdevice_tracker  dev_tracker;  /*16 0 */
 struct list_head   list; /*1616 */
 long unsigned int  flags;/*32 8 */
 struct net_bridge_vlan_group * vlgrp;/*40 8 */
 struct net_bridge_port *   backup_port;  /*48 8 */
 u32backup_nhid;  /*56 4 */
 u8 priority; /*60 1 */
 u8 state;/*61 1 */
 u16port_no;  /*62 2 */
 /* --- cacheline 1 boundary (64 bytes) --- */
[...]
} __attribute__((__aligned__(8)));

When forwarding an skb via a bridge port that has VLAN tunneling
enabled, check if the backup nexthop ID stored in the bridge control
block is valid (i.e., not zero). If so, instead of attaching the
pre-allocated metadata (that only has the tunnel key set), allocate a
new metadata, set both the tunnel key and the nexthop object ID and
attach it to the skb.

By default, do not dump the new attribute to user space as a value of
zero is an invalid nexthop object ID.

The above is useful for EVPN multihoming. When one of the links
composing an Ethernet Segment (ES) fails, traffic needs to be redirected
towards the host via one of the other ES peers. For example, if a host
is multihomed to three different VTEPs, the backup port of each ES link
needs to be set to the VXLAN device and the backup nexthop ID needs to
point to an FDB nexthop group that includes the IP addresses of the
other two VTEPs. The VXLAN driver will extract the ID from the metadata
of the redirected skb, calculate its flow hash and forward it towards
one of the other VTEPs. If the ID does not exist, or represents an
invalid nexthop object, the VXLAN driver will drop the skb. This
relieves the bridge driver from the need to validate the ID.

Signed-off-by: Ido Schimmel 
---
  include/uapi/linux/if_link.h |  1 +
  net/bridge/br_forward.c  |  1 +
  net/bridge/br_netlink.c  | 12 
  net/bridge/br_private.h  |  3 +++
  net/bridge/br_vlan_tunnel.c  | 15 +++
  net/core/rtnetlink.c |  2 +-
  6 files changed, 33 insertions(+), 1 deletion(-)



One comment below, with that fixed you can add
Acked-by: Nikolay Aleksandrov 


diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 0f6a0fe09bdb..ce3117df9cec 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -570,6 +570,7 @@ enum {
IFLA_BRPORT_MCAST_N_GROUPS,
IFLA_BRPORT_MCAST_MAX_GROUPS,
IFLA_BRPORT_NEIGH_VLAN_SUPPRESS,
+   IFLA_BRPORT_BACKUP_NHID,
__IFLA_BRPORT_MAX
  };
  #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 6116eba1bd89..9d7bc8b96b53 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -154,6 +154,7 @@ void br_forward(const struct net_bridge_port *to,
backup_port = rcu_dereference(to->backup_port);
if (unlikely(!backup_port))
goto out;
+   BR_INPUT_SKB_CB(skb)->backup_nhid = READ_ONCE(to->backup_nhid);
to = backup_port;
}
  
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c

index 05c5863d2e20..10f0d33d8ccf 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -211,6 +211,7 @@ static inline size_t br_port_info_size(void)
+ nla_total_size(sizeof(u8))/* IFLA_BRPORT_MRP_IN_OPEN */
+ nla_total_size(sizeof(u32))   /* 
IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT */
+ nla_total_size(sizeof(u32))   /* 
IFLA_BRPORT_MCAST_EHT_HOSTS_CNT */
+   + nla_total_size(sizeof(u32))   /* IFLA_BRPORT_BACKUP_NHID */
+ 0;
  }
  
@@ -319,6 +320,10 @@ static int br_port_fill_attrs(struct sk_buff *skb,

backup_p->

Re: [Bridge] [RFC PATCH net-next 2/4] vxlan: Add support for nexthop ID metadata

2023-07-13 Thread Nikolay Aleksandrov

On 13/07/2023 10:09, Ido Schimmel wrote:

VXLAN FDB entries can point to FDB nexthop objects. Each such object
includes the IP address(es) of remote VTEP(s) via which the target host
is accessible. Example:

  # ip nexthop add id 1 via 192.0.2.1 fdb
  # ip nexthop add id 2 via 192.0.2.17 fdb
  # ip nexthop add id 1000 group 1/2 fdb
  # bridge fdb add 00:11:22:33:44:55 dev vx0 self static nhid 1000 src_vni 10020

This is useful for EVPN multihoming where a single host can be connected
to multiple VTEPs. The source VTEP will calculate the flow hash of the
skb and forward it towards the IP address of one of the VTEPs member in
the nexthop group.

There are cases where an external entity (e.g., the bridge driver) can
provide not only the tunnel ID (i.e., VNI) of the skb, but also the ID
of the nexthop object via which the skb should be forwarded.

Therefore, in order to support such cases, when the VXLAN device is in
external / collect metadata mode and the tunnel info attached to the skb
is of bridge type, extract the nexthop ID from the tunnel info. If the
ID is valid (i.e., non-zero), forward the skb via the nexthop object
associated with the ID, as if the skb hit an FDB entry associated with
this ID.

Signed-off-by: Ido Schimmel 
---
  drivers/net/vxlan/vxlan_core.c | 44 ++
  1 file changed, 44 insertions(+)



LGTM
Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [RFC PATCH net-next 1/4] ip_tunnels: Add nexthop ID field to ip_tunnel_key

2023-07-13 Thread Nikolay Aleksandrov

On 13/07/2023 10:09, Ido Schimmel wrote:

Extend the ip_tunnel_key structure with a field indicating the ID of the
nexthop object via which the skb should be routed.

The field is going to be populated in subsequent patches by the bridge
driver in order to indicate to the VXLAN driver which FDB nexthop object
to use in order to reach the target host.

Signed-off-by: Ido Schimmel 
---
  include/net/ip_tunnels.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index ed4b6ad3fcac..e8750b4ef7e1 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -52,6 +52,7 @@ struct ip_tunnel_key {
u8  tos;/* TOS for IPv4, TC for IPv6 */
u8  ttl;/* TTL for IPv4, HL for IPv6 */
__be32  label;  /* Flow Label for IPv6 */
+   u32 nhid;
__be16  tp_src;
__be16  tp_dst;
__u8flow_flags;


Reviewed-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH v2 net] bridge: Add extack warning when enabling STP in netns.

2023-07-12 Thread Nikolay Aleksandrov
On 12/07/2023 18:44, Kuniyuki Iwashima wrote:
> When we create an L2 loop on a bridge in netns, we will see packets storm
> even if STP is enabled.
> 
>   # unshare -n
>   # ip link add br0 type bridge
>   # ip link add veth0 type veth peer name veth1
>   # ip link set veth0 master br0 up
>   # ip link set veth1 master br0 up
>   # ip link set br0 type bridge stp_state 1
>   # ip link set br0 up
>   # sleep 30
>   # ip -s link show br0
>   2: br0:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
>   link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
>   RX: bytes  packets  errors  dropped missed  mcast
>   956553768  12861249 0   0   0   12861249  <-. Keep
>   TX: bytes  packets  errors  dropped carrier collsns |  increasing
>   1027834119510   0   0   0 <-'   rapidly
> 
> This is because llc_rcv() drops all packets in non-root netns and BPDU
> is dropped.
> 
> Let's add extack warning when enabling STP in netns.
> 
>   # unshare -n
>   # ip link add br0 type bridge
>   # ip link set br0 type bridge stp_state 1
>   Warning: bridge: STP does not work in non-root netns.
> 
> Note this commit will be reverted later when we namespacify the whole LLC
> infra.
> 
> Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
> Suggested-by: Harry Coin 
> Link: 
> https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df...@quietfountain.com/
> Suggested-by: Ido Schimmel 
> Signed-off-by: Kuniyuki Iwashima 
> ---
> v2:
>   - Just add extack instead of returning error (Ido Schimmel)
> 
> v1: https://lore.kernel.org/netdev/20230711235415.92166-1-kun...@amazon.com/
> ---
>  net/bridge/br_stp_if.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 75204d36d7f9..b65962682771 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -201,6 +201,9 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned 
> long val,
>  {
>   ASSERT_RTNL();
>  
> + if (!net_eq(dev_net(br->dev), _net))
> + NL_SET_ERR_MSG_MOD(extack, "STP does not work in non-root 
> netns");
> +
>   if (br_mrp_enabled(br)) {
>   NL_SET_ERR_MSG_MOD(extack,
>  "STP can't be enabled if MRP is already 
> enabled");

Thanks,
Acked-by: Nikolay Aleksandrov 


Re: [Bridge] [PATCH v1 net] bridge: Return an error when enabling STP in netns.

2023-07-12 Thread Nikolay Aleksandrov
On 12/07/2023 17:48, Ido Schimmel wrote:
> On Tue, Jul 11, 2023 at 04:54:15PM -0700, Kuniyuki Iwashima wrote:
>> When we create an L2 loop on a bridge in netns, we will see packets storm
>> even if STP is enabled.
>>
>>   # unshare -n
>>   # ip link add br0 type bridge
>>   # ip link add veth0 type veth peer name veth1
>>   # ip link set veth0 master br0 up
>>   # ip link set veth1 master br0 up
>>   # ip link set br0 type bridge stp_state 1
>>   # ip link set br0 up
>>   # sleep 30
>>   # ip -s link show br0
>>   2: br0:  mtu 1500 qdisc noqueue state UP 
>> mode DEFAULT group default qlen 1000
>>   link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
>>   RX: bytes  packets  errors  dropped missed  mcast
>>   956553768  12861249 0   0   0   12861249  <-. Keep
>>   TX: bytes  packets  errors  dropped carrier collsns |  increasing
>>   1027834119510   0   0   0 <-'   rapidly
>>
>> This is because llc_rcv() drops all packets in non-root netns and BPDU
>> is dropped.
>>
>> Let's show an error when enabling STP in netns.
>>
>>   # unshare -n
>>   # ip link add br0 type bridge
>>   # ip link set br0 type bridge stp_state 1
>>   Error: bridge: STP can't be enabled in non-root netns.
>>
>> Note this commit will be reverted later when we namespacify the whole LLC
>> infra.
>>
>> Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
>> Suggested-by: Harry Coin 
> 
> I'm not sure that's accurate. I read his response in the link below and
> he says "I'd rather be warned than blocked" and "But better warned and
> awaiting a fix than blocked", which I agree with. The patch has the
> potential to cause a lot of regressions, but without actually fixing the
> problem.
> 
> How about simply removing the error [1]? Since iproute2 commit
> 844c37b42373 ("libnetlink: Handle extack messages for non-error case"),
> it can print extack warnings and not only errors. With the diff below:
> 
>  # unshare -n 
>  # ip link add name br0 type bridge
>  # ip link set dev br0 type bridge stp_state 1
>  Warning: bridge: STP can't be enabled in non-root netns.
>  # echo $?
>  0
> 
> [1]
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index a807996ac56b..b5143de37938 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -201,10 +201,8 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned 
> long val,
>  {
> ASSERT_RTNL();
>  
> -   if (!net_eq(dev_net(br->dev), _net)) {
> +   if (!net_eq(dev_net(br->dev), _net))
> NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled in non-root 
> netns");
> -   return -EINVAL;
> -   }
>  
> if (br_mrp_enabled(br)) {
> NL_SET_ERR_MSG_MOD(extack,
> 

I'd prefer this approach to changing user-visible behaviour and potential 
regressions.
Just change the warning message.

Thanks,
 Nik



Re: [Bridge] [PATCH v1 net] bridge: Return an error when enabling STP in netns.

2023-07-12 Thread Nikolay Aleksandrov
On 12/07/2023 02:54, Kuniyuki Iwashima wrote:
> When we create an L2 loop on a bridge in netns, we will see packets storm
> even if STP is enabled.
> 
>   # unshare -n
>   # ip link add br0 type bridge
>   # ip link add veth0 type veth peer name veth1
>   # ip link set veth0 master br0 up
>   # ip link set veth1 master br0 up
>   # ip link set br0 type bridge stp_state 1
>   # ip link set br0 up
>   # sleep 30
>   # ip -s link show br0
>   2: br0:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
>   link/ether b6:61:98:1c:1c:b5 brd ff:ff:ff:ff:ff:ff
>   RX: bytes  packets  errors  dropped missed  mcast
>   956553768  12861249 0   0   0   12861249  <-. Keep
>   TX: bytes  packets  errors  dropped carrier collsns |  increasing
>   1027834119510   0   0   0 <-'   rapidly
> 
> This is because llc_rcv() drops all packets in non-root netns and BPDU
> is dropped.
> 
> Let's show an error when enabling STP in netns.
> 
>   # unshare -n
>   # ip link add br0 type bridge
>   # ip link set br0 type bridge stp_state 1
>   Error: bridge: STP can't be enabled in non-root netns.
> 
> Note this commit will be reverted later when we namespacify the whole LLC
> infra.
> 
> Fixes: e730c15519d0 ("[NET]: Make packet reception network namespace safe")
> Suggested-by: Harry Coin 
> Link: 
> https://lore.kernel.org/netdev/0f531295-e289-022d-5add-5ceffa0df...@quietfountain.com/
> Signed-off-by: Kuniyuki Iwashima 
> ---
>  net/bridge/br_stp_if.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 75204d36d7f9..a807996ac56b 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -201,6 +201,11 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned 
> long val,
>  {
>   ASSERT_RTNL();
>  
> + if (!net_eq(dev_net(br->dev), _net)) {
> + NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled in non-root 
> netns");
> + return -EINVAL;
> + }
> +
>   if (br_mrp_enabled(br)) {
>   NL_SET_ERR_MSG_MOD(extack,
>  "STP can't be enabled if MRP is already 
> enabled");

Oh well, I guess this should do for now. Hopefully we don't break scripts that 
rely
on configuring the bridge without any errors. Thanks for the patch,

Acked-by: Nikolay Aleksandrov 





Re: [Bridge] [PATCH net-next v2 2/3] bridge: Add a limit on learned FDB entries

2023-06-22 Thread Nikolay Aleksandrov
On 22/06/2023 15:27, Nikolay Aleksandrov wrote:
> On 20/06/2023 16:35, Johannes Nixdorf wrote:
>> On Tue, Jun 20, 2023 at 09:55:31AM +0300, Nikolay Aleksandrov wrote:
>>> On 6/19/23 10:14, Johannes Nixdorf wrote:
>>>> +/* Set a FDB flag that implies the entry was not learned, and account
>>>> + * for changes in the learned status.
>>>> + */
>>>> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
>>>> + struct net_bridge_fdb_entry *fdb,
>>>> + long nr)
>>>> +{
>>>> +  WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
>>>
>>> Please use *_bit
>>
>> Can you tell me which *_bit helper you had in mind? The shortest option I 
>> could
>> come up with the ones I found seemed needlessly verbose and wasteful:
>>
>>   static const unsigned long br_fdb_not_learned_mask = 
>> BR_FDB_NOT_LEARNED_MASK;
>>   ...
>>   WARN_ON_ONCE(test_bit(nr, _fdb_not_learned_mask));
>>
>>>> +
>>>> +  /* learned before, but we set a flag that implies it's manually added */
>>>> +  if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
>>>
>>> Please use *_bit
>>
>> This will be fixed by the redesign to get rid of my use of hash_lock
>> (proposed later in this mail), as I'll only have to test one bit and can
>> use test_and_clear_bit then.
>>
>>>> +  br->fdb_cur_learned_entries--;
>>>> +  set_bit(nr, >flags);
>>>> +}
>>>
>>> Having a helper that conditionally decrements only is counterintuitive and
>>> people can get confused. Either add 2 helpers for inc/dec and use
>>> them where appropriate or don't use helpers at all.
>>
>> The *_set_bit helper can only cause the count to drop, as there
>> is currently no flag that could turn a manually added entry back into
>> a dynamically learned one.
>>
>> The analogous helper that increments the value would be *_clear_bit,
>> which I did not add because it has no users.
>>
>>>> +  spin_unlock_bh(>hash_lock);
>>>> +}
>>>> +
>>>>   /* When a static FDB entry is deleted, the HW address from that entry is
>>>>* also removed from the bridge private HW address list and updates all
>>>>* the ports with needed information.
>>>> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
>>>> const unsigned char *addr)
>>>>   static void fdb_delete(struct net_bridge *br, struct 
>>>> net_bridge_fdb_entry *f,
>>>>   bool swdev_notify)
>>>>   {
>>>> +  bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
>>>
>>> *_bit
>>
>> I do not know a *_bit helper that would help me test the intersection
>> of multiple bits on both sides. Do you have any in mind?
>>
>>>> +
>>>>return fdb;
>>>>   }
>>>> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct 
>>>> net_bridge_port *source,
>>>>}
>>>>if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, )))
>>>> -  set_bit(BR_FDB_ADDED_BY_USER, >flags);
>>>> +  fdb_set_flag_not_learned(br, fdb, 
>>>> BR_FDB_ADDED_BY_USER);
>>>
>>> Unacceptable to take hash_lock and block all learning here, eventual
>>> consistency is ok or some other method that is much lighter and doesn't
>>> block all learning or requires a lock.
>>
>> At the time of writing v2, this seemed difficult because we want to test
>> multiple bits and increment a counter, but remembering that clear_bit
>> is never called for the bits I care about I came up with the following
>> approach:
>>
>>   a) Add a new flag BR_FDB_DYNAMIC_LEARNED, which is set to 1 iff
>>  BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL are set in br_create.
>>  Every time BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL is set, also clear
>>  BR_FDB_DYNAMIC_LEARNED, and decrement the count if it was 1 before.
>>  This solves the problem of testing two bits at once, and would not
>>  have been possible if we had a code path that could clear both bits,
>>  as it is not as easy to decide when to set BR_FDB_DYNAMIC_LEARNED
>>  again in that case.
> 
> I think you can try without adding any new flags, the places that add dynamic
> entries are known for the inc part of th

Re: [Bridge] [PATCH net-next v2 2/3] bridge: Add a limit on learned FDB entries

2023-06-22 Thread Nikolay Aleksandrov
On 20/06/2023 16:35, Johannes Nixdorf wrote:
> On Tue, Jun 20, 2023 at 09:55:31AM +0300, Nikolay Aleksandrov wrote:
>> On 6/19/23 10:14, Johannes Nixdorf wrote:
>>> +/* Set a FDB flag that implies the entry was not learned, and account
>>> + * for changes in the learned status.
>>> + */
>>> +static void __fdb_set_flag_not_learned(struct net_bridge *br,
>>> +  struct net_bridge_fdb_entry *fdb,
>>> +  long nr)
>>> +{
>>> +   WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));
>>
>> Please use *_bit
> 
> Can you tell me which *_bit helper you had in mind? The shortest option I 
> could
> come up with the ones I found seemed needlessly verbose and wasteful:
> 
>   static const unsigned long br_fdb_not_learned_mask = 
> BR_FDB_NOT_LEARNED_MASK;
>   ...
>   WARN_ON_ONCE(test_bit(nr, _fdb_not_learned_mask));
> 
>>> +
>>> +   /* learned before, but we set a flag that implies it's manually added */
>>> +   if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))
>>
>> Please use *_bit
> 
> This will be fixed by the redesign to get rid of my use of hash_lock
> (proposed later in this mail), as I'll only have to test one bit and can
> use test_and_clear_bit then.
> 
>>> +   br->fdb_cur_learned_entries--;
>>> +   set_bit(nr, >flags);
>>> +}
>>
>> Having a helper that conditionally decrements only is counterintuitive and
>> people can get confused. Either add 2 helpers for inc/dec and use
>> them where appropriate or don't use helpers at all.
> 
> The *_set_bit helper can only cause the count to drop, as there
> is currently no flag that could turn a manually added entry back into
> a dynamically learned one.
> 
> The analogous helper that increments the value would be *_clear_bit,
> which I did not add because it has no users.
> 
>>> +   spin_unlock_bh(>hash_lock);
>>> +}
>>> +
>>>   /* When a static FDB entry is deleted, the HW address from that entry is
>>>* also removed from the bridge private HW address list and updates all
>>>* the ports with needed information.
>>> @@ -321,6 +353,8 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
>>> const unsigned char *addr)
>>>   static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry 
>>> *f,
>>>bool swdev_notify)
>>>   {
>>> +   bool learned = !(f->flags & BR_FDB_NOT_LEARNED_MASK);
>>
>> *_bit
> 
> I do not know a *_bit helper that would help me test the intersection
> of multiple bits on both sides. Do you have any in mind?
> 
>>> +
>>> return fdb;
>>>   }
>>> @@ -894,7 +940,7 @@ void br_fdb_update(struct net_bridge *br, struct 
>>> net_bridge_port *source,
>>> }
>>> if (unlikely(test_bit(BR_FDB_ADDED_BY_USER, )))
>>> -   set_bit(BR_FDB_ADDED_BY_USER, >flags);
>>> +   fdb_set_flag_not_learned(br, fdb, 
>>> BR_FDB_ADDED_BY_USER);
>>
>> Unacceptable to take hash_lock and block all learning here, eventual
>> consistency is ok or some other method that is much lighter and doesn't
>> block all learning or requires a lock.
> 
> At the time of writing v2, this seemed difficult because we want to test
> multiple bits and increment a counter, but remembering that clear_bit
> is never called for the bits I care about I came up with the following
> approach:
> 
>   a) Add a new flag BR_FDB_DYNAMIC_LEARNED, which is set to 1 iff
>  BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL are set in br_create.
>  Every time BR_FDB_ADDED_BY_USER or BR_FDB_LOCAL is set, also clear
>  BR_FDB_DYNAMIC_LEARNED, and decrement the count if it was 1 before.
>  This solves the problem of testing two bits at once, and would not
>  have been possible if we had a code path that could clear both bits,
>  as it is not as easy to decide when to set BR_FDB_DYNAMIC_LEARNED
>  again in that case.

I think you can try without adding any new flags, the places that add dynamic
entries are known for the inc part of the problem, and an entry can become
local/added_by_user again only through well known paths as well. You may be 
able to
infer whether to inc/dec and make it work with careful fn argument passing.
Could you please look into that way? I'd prefer that we don't add new flags as
there are already so many.

>   b) Replace the current count with an atomic_t.
> 

Sounds good.

> I'll change it this way f

Re: [Bridge] [PATCH net-next v2 3/3] net: bridge: Add a configurable default FDB learning limit

2023-06-20 Thread Nikolay Aleksandrov

On 6/19/23 10:14, Johannes Nixdorf wrote:

This adds a Kconfig option to configure a default FDB learning limit
system wide, so a distributor building a special purpose kernel can
limit all created bridges by default.

The limit is only a soft default setting and overridable per bridge
using netlink.

Signed-off-by: Johannes Nixdorf 

---

Changes since v1:
  - Added a default limit in Kconfig. (deemed acceptable in review
comments)

  net/bridge/Kconfig | 13 +
  net/bridge/br_device.c |  2 ++
  2 files changed, 15 insertions(+)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index 3c8ded7d3e84..c0d9c08088c4 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -84,3 +84,16 @@ config BRIDGE_CFM
  Say N to exclude this support and reduce the binary size.
  
  	  If unsure, say N.

+
+config BRIDGE_DEFAULT_FDB_MAX_LEARNED
+   int "Default FDB learning limit"
+   default 0
+   depends on BRIDGE
+   help
+ Sets a default limit on the number of learned FDB entries on
+ new bridges. This limit can be overwritten via netlink on a
+ per bridge basis.
+
+ The default of 0 disables the limit.
+
+ If unsure, say 0.
diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 8eca8a5c80c6..93f081ce8195 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -530,6 +530,8 @@ void br_dev_setup(struct net_device *dev)
br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME;
dev->max_mtu = ETH_MAX_MTU;
  
+	br->fdb_max_learned_entries = CONFIG_BRIDGE_DEFAULT_FDB_MAX_LEARNED;

+
br_netfilter_rtable_init(br);
br_stp_timer_init(br);
br_multicast_init(br);


IMO this is pointless, noone will set the kconfig option except very 
specific users. I prefer if we leave it to the distribution to set a 
maximum on bridge creation, i.e. make it a distro policy.




Re: [Bridge] [PATCH net-next v2 2/3] bridge: Add a limit on learned FDB entries

2023-06-20 Thread Nikolay Aleksandrov

On 6/19/23 10:14, Johannes Nixdorf wrote:

A malicious actor behind one bridge port may spam the kernel with packets
with a random source MAC address, each of which will create an FDB entry,
each of which is a dynamic allocation in the kernel.

There are roughly 2^48 different MAC addresses, further limited by the
rhashtable they are stored in to 2^31. Each entry is of the type struct
net_bridge_fdb_entry, which is currently 128 bytes big. This means the
maximum amount of memory allocated for FDB entries is 2^31 * 128B =
256GiB, which is too much for most computers.

Mitigate this by adding a bridge netlink setting
IFLA_BR_FDB_MAX_LEARNED_ENTRIES, which, if nonzero, limits the amount
of learned entries to a user specified maximum.

For backwards compatibility the default setting of 0 disables the limit.

User-added entries by netlink or from bridge or bridge port addresses
are never blocked and do not count towards that limit.

All changes to fdb_n_entries are under br->hash_lock, which means we do
not need additional locking. The call paths are (✓ denotes that
br->hash_lock is taken around the next call):

  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
 | +- br_fdb_change_mac_address ✓
 | +- br_fdb_delete_by_port ✓
 +- br_fdb_find_delete_local ✓
 +- fdb_add_local <-+- br_fdb_changeaddr ✓
 |  +- br_fdb_change_mac_address ✓
 |  +- br_fdb_add_local ✓
 +- br_fdb_cleanup ✓
 +- br_fdb_flush ✓
 +- br_fdb_delete_by_port ✓
 +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
 +- br_fdb_external_learn_del ✓
  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
 |  +- br_fdb_change_mac_address ✓
 |  +- br_fdb_add_local ✓
 +- br_fdb_update ✓
 +- fdb_add_entry <--- __br_fdb_add ✓
 +- br_fdb_external_learn_add ✓

The flags that imply an entry does not come from learning
(BR_FDB_NOT_LEARNED_MASK) are now only set or cleared under br->hash_lock
as well, and when the boolean value of (fdb->flags &
BR_FDB_NOT_LEARNED_MASK) changes the accounting is updated.

This introduces one additional locked update in br_fdb_update if
BR_FDB_ADDED_BY_USER was set. This is only the case when creating a new
entry via netlink, and never in the packet handling fast path.

Signed-off-by: Johannes Nixdorf 

---

Changes since v1:
  - Do not initialize fdb_*_entries to 0. (from review)
  - Do not skip decrementing on 0. (from review)
  - Moved the counters to a conditional hole in struct net_bridge to
avoid growing the struct. (from review, it still grows the struct as
there are 2 32-bit values)
  - Add IFLA_BR_FDB_CUR_LEARNED_ENTRIES (from review)
  - Fix br_get_size()
  - Only limit learned entries, rename to
*_(CUR|MAX)_LEARNED_ENTRIES. (from review)

Obsolete v1 review comments:
  - Return better errors to users: Due to limiting the limit to
automatically created entries, netlink fdb add requests and changing
bridge ports are never rejected, so they do not yet need a more
friendly error returned.

  include/uapi/linux/if_link.h |  2 ++
  net/bridge/br_fdb.c  | 67 +---
  net/bridge/br_netlink.c  | 13 ++-
  net/bridge/br_private.h  |  6 
  4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4ac1000b0ef2..165b9014379b 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -510,6 +510,8 @@ enum {
IFLA_BR_VLAN_STATS_PER_PORT,
IFLA_BR_MULTI_BOOLOPT,
IFLA_BR_MCAST_QUERIER_STATE,
+   IFLA_BR_FDB_CUR_LEARNED_ENTRIES,
+   IFLA_BR_FDB_MAX_LEARNED_ENTRIES,
__IFLA_BR_MAX,
  };
  
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c

index ac1dc8723b9c..bc61d1fd5fcf 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -301,6 +301,38 @@ static void fdb_add_hw_addr(struct net_bridge *br, const 
unsigned char *addr)
}
  }
  
+/* Set a FDB flag that implies the entry was not learned, and account

+ * for changes in the learned status.
+ */
+static void __fdb_set_flag_not_learned(struct net_bridge *br,
+  struct net_bridge_fdb_entry *fdb,
+  long nr)
+{
+   WARN_ON_ONCE(!(BIT(nr) & BR_FDB_NOT_LEARNED_MASK));


Please use *_bit


+
+   /* learned before, but we set a flag that implies it's manually added */
+   if (!(fdb->flags & BR_FDB_NOT_LEARNED_MASK))


Please use *_bit


+   br->fdb_cur_learned_entries--;
+   set_bit(nr, >flags);
+}


Having a helper that conditionally decrements only is counterintuitive 
and people can get confused. Either 

Re: [Bridge] [PATCH iproute2 3/7] bridge: make print_vlan_info static

2023-06-01 Thread Nikolay Aleksandrov
On 01/06/2023 20:21, Stephen Hemminger wrote:
> Function defined and used in only one file.
> 
> Signed-off-by: Stephen Hemminger 
> ---
>  bridge/br_common.h | 1 -
>  bridge/vlan.c  | 3 ++-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/bridge/br_common.h b/bridge/br_common.h
> index 1bdee65844c1..704e76b0acb2 100644
> --- a/bridge/br_common.h
> +++ b/bridge/br_common.h
> @@ -6,7 +6,6 @@
>  #define MDB_RTR_RTA(r) \
>   ((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32
>  
> -void print_vlan_info(struct rtattr *tb, int ifindex);
>  int print_linkinfo(struct nlmsghdr *n, void *arg);
>  int print_mdb_mon(struct nlmsghdr *n, void *arg);
>  int print_fdb(struct nlmsghdr *n, void *arg);
> diff --git a/bridge/vlan.c b/bridge/vlan.c
> index 5b304ea94224..dfc62f83a5df 100644
> --- a/bridge/vlan.c
> +++ b/bridge/vlan.c
> @@ -18,6 +18,7 @@
>  
>  static unsigned int filter_index, filter_vlan;
>  static int vlan_rtm_cur_ifidx = -1;
> +static void print_vlan_info(struct rtattr *tb, int ifindex);
>  
>  enum vlan_show_subject {
>   VLAN_SHOW_VLAN,
> @@ -1309,7 +1310,7 @@ static int vlan_global_show(int argc, char **argv)
>   return 0;
>  }
>  
> -void print_vlan_info(struct rtattr *tb, int ifindex)
> +static void print_vlan_info(struct rtattr *tb, int ifindex)
>  {
>   struct rtattr *i, *list = tb;
>   int rem = RTA_PAYLOAD(list);

Thanks,
Acked-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next v2 7/8] mlxsw: spectrum_flower: Add ability to match on layer 2 miss

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Add the 'fdb_miss' key element to supported key blocks and make use of
> it to match on layer 2 miss.
> 
> The key is only supported on Spectrum-{2,3,4}. An error is returned for
> Spectrum-1 since the key element is not present in any of its key
> blocks.
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Use 'fdb_miss' key element instead of 'dmac_type'.
> 
>  drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.c| 1 +
>  drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h| 3 ++-
>  .../net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.c| 2 ++
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c   | 6 ++
>  4 files changed, 7 insertions(+), 5 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 6/8] mlxsw: spectrum_flower: Do not force matching on iif

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Currently, mlxsw only supports the 'ingress_ifindex' field in the
> 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
> support for the 'l2_miss' field as well. It is valid to only match on
> 'l2_miss' without 'ingress_ifindex', so do not force matching on it.
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * New patch.
> 
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
> index 2b0bae847eb9..9c62c12e410b 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
> @@ -290,6 +290,9 @@ mlxsw_sp_flower_parse_meta_iif(struct 
> mlxsw_sp_acl_rule_info *rulei,
>   struct mlxsw_sp_port *mlxsw_sp_port;
>   struct net_device *ingress_dev;
>  
> + if (!match->mask->ingress_ifindex)
> + return 0;
> +
>   if (match->mask->ingress_ifindex != 0x) {
>   NL_SET_ERR_MSG_MOD(extack, "Unsupported ingress ifindex mask");
>   return -EINVAL;


Reviewed-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next v2 5/8] mlxsw: spectrum_flower: Split iif parsing to a separate function

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Currently, mlxsw only supports the 'ingress_ifindex' field in the
> 'FLOW_DISSECTOR_KEY_META' key, but subsequent patches are going to add
> support for the 'l2_miss' field as well. Split the parsing of the
> 'ingress_ifindex' field to a separate function to avoid nesting. No
> functional changes intended.
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * New patch.
> 
>  .../ethernet/mellanox/mlxsw/spectrum_flower.c | 54 +++
>  1 file changed, 33 insertions(+), 21 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 8/8] selftests: forwarding: Add layer 2 miss test cases

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Add test cases to verify that the bridge driver correctly marks layer 2
> misses only when it should and that the flower classifier can match on
> this metadata.
> 
> Example output:
> 
>  # ./tc_flower_l2_miss.sh
>  TEST: L2 miss - Unicast [ OK ]
>  TEST: L2 miss - Multicast (IPv4)[ OK ]
>  TEST: L2 miss - Multicast (IPv6)[ OK ]
>  TEST: L2 miss - Link-local multicast (IPv4) [ OK ]
>  TEST: L2 miss - Link-local multicast (IPv6) [ OK ]
>  TEST: L2 miss - Broadcast   [ OK ]
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Test that broadcast does not hit miss filter.
> 
>  .../testing/selftests/net/forwarding/Makefile |   1 +
>  .../net/forwarding/tc_flower_l2_miss.sh   | 350 ++
>  2 files changed, 351 insertions(+)
>  create mode 100755 
> tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 3/8] net/sched: flower: Allow matching on layer 2 miss

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Add the 'TCA_FLOWER_L2_MISS' netlink attribute that allows user space to
> match on packets that encountered a layer 2 miss. The miss indication is
> set as metadata in the tc skb extension by the bridge driver upon FDB or
> MDB lookup miss and dissected by the flow dissector to the
> 'FLOW_DISSECTOR_KEY_META' key.
> 
> The use of this skb extension is guarded by the 'tc_skb_ext_tc' static
> key. As such, enable / disable this key when filters that match on layer
> 2 miss are added / deleted.
> 
> Tested:
> 
>  # cat tc_skb_ext_tc.py
>  #!/usr/bin/env -S drgn -s vmlinux
> 
>  refcount = prog["tc_skb_ext_tc"].key.enabled.counter.value_()
>  print(f"tc_skb_ext_tc reference count is {refcount}")
> 
>  # ./tc_skb_ext_tc.py
>  tc_skb_ext_tc reference count is 0
> 
>  # tc filter add dev swp1 egress proto all handle 101 pref 1 flower src_mac 
> 00:11:22:33:44:55 action drop
>  # tc filter add dev swp1 egress proto all handle 102 pref 2 flower src_mac 
> 00:11:22:33:44:55 l2_miss true action drop
>  # tc filter add dev swp1 egress proto all handle 103 pref 3 flower src_mac 
> 00:11:22:33:44:55 l2_miss false action drop
> 
>  # ./tc_skb_ext_tc.py
>  tc_skb_ext_tc reference count is 2
> 
>  # tc filter replace dev swp1 egress proto all handle 102 pref 2 flower 
> src_mac 00:01:02:03:04:05 l2_miss false action drop
> 
>  # ./tc_skb_ext_tc.py
>  tc_skb_ext_tc reference count is 2
> 
>  # tc filter del dev swp1 egress proto all handle 103 pref 3 flower
>  # tc filter del dev swp1 egress proto all handle 102 pref 2 flower
>  # tc filter del dev swp1 egress proto all handle 101 pref 1 flower
> 
>  # ./tc_skb_ext_tc.py
>  tc_skb_ext_tc reference count is 0
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Split flow_dissector changes to a previous patch.
> * Use tc skb extension instead of 'skb->l2_miss'.
> 
>  include/uapi/linux/pkt_cls.h |  2 ++
>  net/sched/cls_flower.c   | 30 --
>  2 files changed, 30 insertions(+), 2 deletions(-)

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 2/8] flow_dissector: Dissect layer 2 miss from tc skb extension

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> Extend the 'FLOW_DISSECTOR_KEY_META' key with a new 'l2_miss' field and
> populate it from a field with the same name in the tc skb extension.
> This field is set by the bridge driver for packets that incur an FDB or
> MDB miss.
> 
> The next patch will extend the flower classifier to be able to match on
> layer 2 misses.
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Split from flower patch.
> * Use tc skb extension instead of 'skb->l2_miss'.
> 
>  include/net/flow_dissector.h |  2 ++
>  net/core/flow_dissector.c| 10 ++
>  2 files changed, 12 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next v2 1/8] skbuff: bridge: Add layer 2 miss indication

2023-05-29 Thread Nikolay Aleksandrov
On 29/05/2023 14:48, Ido Schimmel wrote:
> For EVPN non-DF (Designated Forwarder) filtering we need to be able to
> prevent decapsulated traffic from being flooded to a multi-homed host.
> Filtering of multicast and broadcast traffic can be achieved using the
> following flower filter:
> 
>  # tc filter add dev bond0 egress pref 1 proto all flower indev vxlan0 
> dst_mac 01:00:00:00:00:00/01:00:00:00:00:00 action drop
> 
> Unlike broadcast and multicast traffic, it is not currently possible to
> filter unknown unicast traffic. The classification into unknown unicast
> is performed by the bridge driver, but is not visible to other layers
> such as tc.
> 
> Solve this by adding a new 'l2_miss' bit to the tc skb extension. Clear
> the bit whenever a packet enters the bridge (received from a bridge port
> or transmitted via the bridge) and set it if the packet did not match an
> FDB or MDB entry. If there is no skb extension and the bit needs to be
> cleared, then do not allocate one as no extension is equivalent to the
> bit being cleared. The bit is not set for broadcast packets as they
> never perform a lookup and therefore never incur a miss.
> 
> A bit that is set for every flooded packet would also work for the
> current use case, but it does not allow us to differentiate between
> registered and unregistered multicast traffic, which might be useful in
> the future.
> 
> To keep the performance impact to a minimum, the marking of packets is
> guarded by the 'tc_skb_ext_tc' static key. When 'false', the skb is not
> touched and an skb extension is not allocated. Instead, only a
> 5 bytes nop is executed, as demonstrated below for the call site in
> br_handle_frame().
> 
> Before the patch:
> 
> ```
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
>   c37b09:   49 c7 44 24 28 00 00movq   $0x0,0x28(%r12)
>   c37b10:   00 00
> 
> p = br_port_get_rcu(skb->dev);
>   c37b12:   49 8b 44 24 10  mov0x10(%r12),%rax
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
>   c37b17:   49 c7 44 24 30 00 00movq   $0x0,0x30(%r12)
>   c37b1e:   00 00
>   c37b20:   49 c7 44 24 38 00 00movq   $0x0,0x38(%r12)
>   c37b27:   00 00
> ```
> 
> After the patch (when static key is disabled):
> 
> ```
> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
>   c37c29:   49 c7 44 24 28 00 00movq   $0x0,0x28(%r12)
>   c37c30:   00 00
>   c37c32:   49 8d 44 24 28  lea0x28(%r12),%rax
>   c37c37:   48 c7 40 08 00 00 00movq   $0x0,0x8(%rax)
>   c37c3e:   00
>   c37c3f:   48 c7 40 10 00 00 00movq   $0x0,0x10(%rax)
>   c37c46:   00
> 
> #ifdef CONFIG_HAVE_JUMP_LABEL_HACK
> 
> static __always_inline bool arch_static_branch(struct static_key *key, bool 
> branch)
> {
> asm_volatile_goto("1:"
>   c37c47:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)
> br_tc_skb_miss_set(skb, false);
> 
> p = br_port_get_rcu(skb->dev);
>   c37c4c:   49 8b 44 24 10  mov0x10(%r12),%rax
> ```
> 
> Subsequent patches will extend the flower classifier to be able to match
> on the new 'l2_miss' bit and enable / disable the static key when
> filters that match on it are added / deleted.
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Use tc skb extension instead of adding a bit to the skb.
> * Do not mark broadcast packets as they never perform a lookup and
>   therefore never incur a miss.
> 
>  include/linux/skbuff.h  |  1 +
>  net/bridge/br_device.c  |  1 +
>  net/bridge/br_forward.c |  3 +++
>  net/bridge/br_input.c   |  1 +
>  net/bridge/br_private.h | 27 +++
>  5 files changed, 33 insertions(+)
> 

Nice approach.
Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 1/5] skbuff: bridge: Add layer 2 miss indication

2023-05-23 Thread Nikolay Aleksandrov
On 23/05/2023 11:10, Ido Schimmel wrote:
> On Fri, May 19, 2023 at 02:52:18PM -0700, Jakub Kicinski wrote:
>> On Fri, 19 May 2023 16:51:48 +0300 Ido Schimmel wrote:
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index fc17b9fd93e6..274e55455b15 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
>>>  */
>>> br_switchdev_frame_unmark(skb);
>>>  
>>> +   skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
>>> +
>>> /* Bridge is just like any other port.  Make sure the
>>>  * packet is allowed except in promisc mode when someone
>>>  * may be running packet capture.
>>>
>>> Ran these changes through the selftest and it seems to work.
>>
>> Can we possibly put the new field at the end of the CB and then have TC
>> look at it in the CB? We already do a bit of such CB juggling in strp
>> (first member of struct sk_skb_cb).
> 
> Using the CB between different layers is very fragile and I would like
> to avoid it. Note that the skb can pass various layers until hitting the
> classifier, each of which can decide to memset() the CB.
> 
> Anyway, I think I have a better alternative. I added the 'l2_miss' bit
> to the tc skb extension and adjusted the bridge to mark packets via this
> extension. The entire thing is protected by the existing 'tc_skb_ext_tc'
> static key, so overhead is kept to a minimum when feature is disabled.
> Extended flower to enable / disable this key when filters that match on
> 'l2_miss' are added / removed.
> 
> bridge change to mark the packet:
> https://github.com/idosch/linux/commit/3fab206492fcad9177f2340680f02ced1b9a0dec.patch
> 
> flow_dissector change to dissect the info from the extension:
> https://github.com/idosch/linux/commit/1533c078b02586547817a4e63989a0db62aa5315.patch
> 
> flower change to enable / disable the key:
> https://github.com/idosch/linux/commit/cf84b277511ec80fe565c41271abc6b2e2f629af.patch
> 
> Advantages compared to the previous approach are that we do not need a
> new bit in the skb and that overhead is kept to a minimum when feature
> is disabled. Disadvantage is that overhead is higher when feature is
> enabled.
> 
> WDYT?
> 
> To be clear, merely asking for feedback on the general approach, not
> code review.
> 
> Thanks

TBH, I like this approach much better for obvious reasons. :)
Thanks for working on it.


Re: [Bridge] [PATCH net-next 1/5] skbuff: bridge: Add layer 2 miss indication

2023-05-21 Thread Nikolay Aleksandrov
On 19/05/2023 16:51, Ido Schimmel wrote:
> On Thu, May 18, 2023 at 07:08:47PM +0300, Nikolay Aleksandrov wrote:
>> On 18/05/2023 14:33, Ido Schimmel wrote:
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index fc17b9fd93e6..d8ab5890cbe6 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -334,6 +334,7 @@ static rx_handler_result_t br_handle_frame(struct 
>>> sk_buff **pskb)
>>> return RX_HANDLER_CONSUMED;
>>>  
>>> memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
>>> +   skb->l2_miss = 0;
>>>  
>>> p = br_port_get_rcu(skb->dev);
>>> if (p->flags & BR_VLAN_TUNNEL)
>>
>> Overall looks good, only this part is a bit worrisome and needs some 
>> additional
>> investigation because now we'll unconditionally dirty a cache line for every
>> packet that is forwarded. Could you please check the effect with perf?
> 
> To eliminate it I tried the approach we discussed yesterday:
> 
> First, add the miss indication to the bridge's control block which is
> zeroed for every skb entering the bridge:
> 
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 2119729ded2b..bd5c18286a40 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -581,6 +581,7 @@ struct br_input_skb_cb {
>  #endif
> u8 proxyarp_replied:1;
> u8 src_port_isolated:1;
> +   u8 miss:1;  /* FDB or MDB lookup miss */
>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
> u8 vlan_filtered:1;
>  #endif
> 
> And set this bit upon misses instead of skb->l2_miss:
> 
> @@ -203,6 +205,8 @@ void br_flood(struct net_bridge *br, struct sk_buff *skb,
> struct net_bridge_port *prev = NULL;
> struct net_bridge_port *p;
>  
> +   BR_INPUT_SKB_CB(skb)->miss = 1;
> +
> list_for_each_entry_rcu(p, >port_list, list) {
> /* Do not flood unicast traffic to ports that turn it off, nor
>  * other traffic if flood off, except for traffic we originate
> @@ -295,6 +299,7 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
> allow_mode_include = false;
> } else {
> p = NULL;
> +   BR_INPUT_SKB_CB(skb)->miss = 1;
> }
>  
> while (p || rp) {
> 
> Then copy it to skb->l2_miss at the very end where the cache line
> containing this field is already written to:
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index 84d6dd5e5b1a..89f65564e338 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -50,6 +50,8 @@ int br_dev_queue_push_xmit(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>  
> br_switchdev_frame_set_offload_fwd_mark(skb);
>  
> +   skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> +
> dev_queue_xmit(skb);
>  
> return 0;
> 
> Also for locally received packets:
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index fc17b9fd93e6..274e55455b15 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -46,6 +46,8 @@ static int br_pass_frame_up(struct sk_buff *skb)
>  */
> br_switchdev_frame_unmark(skb);
>  
> +   skb->l2_miss = BR_INPUT_SKB_CB(skb)->miss;
> +
> /* Bridge is just like any other port.  Make sure the
>  * packet is allowed except in promisc mode when someone
>  * may be running packet capture.
> 
> Ran these changes through the selftest and it seems to work.
> 
> WDYT?

Looks good to me, this is what I had in mind wrt cache line dirtying.
The swdev mark already does it, so putting them together is nice. From
bridge POV this is good.

Thanks,
 Nik




Re: [Bridge] [PATCH net-next 1/5] skbuff: bridge: Add layer 2 miss indication

2023-05-18 Thread Nikolay Aleksandrov
On 18/05/2023 14:33, Ido Schimmel wrote:
> Allow the bridge driver to mark packets that did not match a layer 2
> entry during forwarding by adding a 'l2_miss' bit to the skb.
> 
> Clear the bit whenever a packet enters the bridge (received from a
> bridge port or transmitted via the bridge) and set it if the packet did
> not match an FDB/MDB entry.
> 
> Subsequent patches will allow the flower classifier to match on this
> bit. The motivating use case in non-DF (Designated Forwarder) filtering
> where we would like to prevent decapsulated packets from being flooded
> to a multi-homed host.
> 
> Do not allocate the bit if the kernel was not compiled with bridge
> support and place it after the two bit fields in accordance with commit
> 4c60d04c2888 ("net: skbuff: push nf_trace down the bitfield"). The bit
> does not increase the size of the structure as it is placed at an
> existing hole. Layout with allmodconfig:
> 
> struct sk_buff {
> [...]
>   __u8   csum_not_inet:1;  /*   132: 3  1 */
>   __u8   l2_miss:1;/*   132: 4  1 */
> 
>   /* XXX 3 bits hole, try to pack */
>   /* XXX 1 byte hole, try to pack */
> 
>   __u16  tc_index; /*   134 2 */
>   u16alloc_cpu;/*   136 2 */
> [...]
> } __attribute__((__aligned__(8)));
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/linux/skbuff.h  | 4 
>  net/bridge/br_device.c  | 1 +
>  net/bridge/br_forward.c | 3 +++
>  net/bridge/br_input.c   | 1 +
>  4 files changed, 9 insertions(+)
> 
[snip]
>   while (p || rp) {
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index fc17b9fd93e6..d8ab5890cbe6 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -334,6 +334,7 @@ static rx_handler_result_t br_handle_frame(struct sk_buff 
> **pskb)
>   return RX_HANDLER_CONSUMED;
>  
>   memset(skb->cb, 0, sizeof(struct br_input_skb_cb));
> + skb->l2_miss = 0;
>  
>   p = br_port_get_rcu(skb->dev);
>   if (p->flags & BR_VLAN_TUNNEL)

Overall looks good, only this part is a bit worrisome and needs some additional
investigation because now we'll unconditionally dirty a cache line for every
packet that is forwarded. Could you please check the effect with perf?

Thanks,
 Nik



Re: [Bridge] [PATCH 3/4] bridge: always declare tunnel functions

2023-05-17 Thread Nikolay Aleksandrov
On 16/05/2023 22:45, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> When CONFIG_BRIDGE_VLAN_FILTERING is disabled, two functions are still
> defined but have no prototype or caller. This causes a W=1 warning for
> the missing prototypes:
> 
> net/bridge/br_netlink_tunnel.c:29:6: error: no previous prototype for 
> 'vlan_tunid_inrange' [-Werror=missing-prototypes]
> net/bridge/br_netlink_tunnel.c:199:5: error: no previous prototype for 
> 'br_vlan_tunnel_info' [-Werror=missing-prototypes]
> 
> The functions are already contitional on CONFIG_BRIDGE_VLAN_FILTERING,
> and I coulnd't easily figure out the right set of #ifdefs, so just
> move the declarations out of the #ifdef to avoid the warning,
> at a small cost in code size over a more elaborate fix.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  net/bridge/br_private_tunnel.h | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

This should be for -net.

Fixes: 188c67dd1906 ("net: bridge: vlan options: add support for tunnel id 
dumping")
Fixes: 569da0822808 ("net: bridge: vlan options: add support for tunnel mapping 
set/del")
Acked-by: Nikolay Aleksandrov 

Thanks,
 Nik





Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 14:10, Vladimir Oltean wrote:
> On Tue, May 16, 2023 at 02:04:30PM +0300, Nikolay Aleksandrov wrote:
>> That was one of the questions actually. More that I'm thinking about this, 
>> the more
>> I want to break it apart by type because we discussed being able to specify 
>> a flag
>> mask for the limit (all, dynamic, dynamic+static etc). If we embed these 
>> stats into a
>> bridge fdb count attribute, it can be easily extended later if anything new 
>> comes along.
>> If switchdev doesn't support some of these global limit configs, we can pass 
>> the option
>> and it can deny setting it later. I think this should be more than enough as 
>> a first step.
> 
> Ok, and by "type" you actually mean the impossibly hard to understand
> neighbor discovery states used by the bridge UAPI? Like having

Yes, that is what I mean. It's not that hard, we can limit it to a few 
combinations
that are well understood and defined.

> (overlapping) limits per NUD_REACHABLE, NUD_NOARP etc flags set in
> ndm->ndm_state? Or how should the UAPI look like?

Hmm, perhaps for the time being and for keeping it simpler it'd be best if the 
type initially is just about
dynamic entries and the count reflects only those. We can add an abstraction 
later if we want "per-type"/mask limits.
Adding such abstraction should be pretty-straight forward if we keep in mind 
the flags that can change only
under lock, otherwise proper counting would have to be revisited.


Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 13:55, Vladimir Oltean wrote:
> On Tue, May 16, 2023 at 01:47:47PM +0300, Nikolay Aleksandrov wrote:
>> Having the current count is just a helper, if you have a high limit dumping 
>> the table
>> and counting might take awhile. Thanks for the feedback, then we'll polish 
>> and move
>> on with the set for a global limit.
> 
> Ok, but to be useful, the current count will have to be directly
> comparable to the limit, I guess. So the current count will also be for
> dynamically learned entries? Or is the plan to enforce the global limit
> for any kind of FDB entries?

That was one of the questions actually. More that I'm thinking about this, the 
more
I want to break it apart by type because we discussed being able to specify a 
flag
mask for the limit (all, dynamic, dynamic+static etc). If we embed these stats 
into a
bridge fdb count attribute, it can be easily extended later if anything new 
comes along.
If switchdev doesn't support some of these global limit configs, we can pass 
the option
and it can deny setting it later. I think this should be more than enough as a 
first step.


Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 13:44, Vladimir Oltean wrote:
> On Tue, May 16, 2023 at 01:32:05PM +0300, Nikolay Aleksandrov wrote:
>> Let's take a step back, I wasn't suggesting we start with a full-fledged 
>> switchdev
>> implementation. :) I meant only to see if the minimum global limit 
>> implementation
>> suggested would suffice and would be able to later extend so switchdev can 
>> use and
>> potentially modify (e.g. drivers setting limits etc). We can start with a 
>> simple
>> support for limits and then extend accordingly. The important part here is to
>> not add any uAPI that can't be changed later which would impact future 
>> changes.
> 
> I guess adding a global per-bridge learning limit now makes sense and
> would not unreasonably hinder switchdev later on. The focus is on
> "learning limit" and not a limit to user-created entries as Johannes has
> currently done in v1. I don't necessarily see an urgent need for
> IFLA_BR_FDB_CUR_ENTRIES, given the fact that user space can dump the FDB
> and count what it needs, filtering for FDB types accordingly.

Having the current count is just a helper, if you have a high limit dumping the 
table
and counting might take awhile. Thanks for the feedback, then we'll polish and 
move
on with the set for a global limit.


Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 13:21, Vladimir Oltean wrote:
> Hi,
> 
> On Tue, May 16, 2023 at 11:56:41AM +0300, Nikolay Aleksandrov wrote:
>> Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be
>> only dynamic entries. We should include more people in this discussion (+CC 
>> Ido and Vladimir).
>> Switchdev folks might have more specific requirements and restrictions, so 
>> it'd be nice to get
>> their input as well.
> 
> I have some other things to do until I can take a closer look at this
> discussion, but in principle, switchdev drivers will likely want to
> impose their own limit on FDB entries because the hardware itself is
> inherently limited in size, so I'm thinking there should be another way
> for the software bridge to be informed about this limit other than UAPI.

Yep, that's ok but it can be added later. This is pretty much internal.

> Which ports that limit should affect (think bridging between ports of
> different switches with different FDB sizes) I don't know. If we only
> consider switchdev, FDB limits should probably be per hwdom.
> 

Now, that's a whole different issue (per-port limits). I've prototype patches
for that too, but it's a much harder problem to solve and scale in software.
Let's please focus on the single global limit for the moment.

> Also, in terms of static vs dynamic limits, I've seen hardware
> implementations where static FDB entries go to a different FDB table
> compared to dynamic ones (Microchip KSZ DSA switches), implementations
> where static partitioning between static and dynamic FDB entries is
> possible but configurable, and implementations where they all consume
> from the shared space and you'd have to evict a dynamic entry to install
> a static one. So it's hard to really say what's the size. That, plus not
> to mention, many hardware FDBs are not fully associative, and due to
> hash collisions, you may be unable to install an entry in the 4-way
> associative bin where its {MAC,VID} hash says it should go, even though
> the FDB at large is not full.
> 
> It sounds sexy to take switchdev into consideration, but I'm not really
> sure what we want. Something flexible to cater for the above, probably.
> This discussion should probably be merged with:
> https://lore.kernel.org/netdev/20230324144917.32lnpgtw5auuyovy@skbuf/T/#ma600839815582ca61886e83ba533b1dfbe447557
> so I'm CCing Oleksij too, since he probably knows better than me what he
> wants.
> 

Let's take a step back, I wasn't suggesting we start with a full-fledged 
switchdev
implementation. :) I meant only to see if the minimum global limit 
implementation
suggested would suffice and would be able to later extend so switchdev can use 
and
potentially modify (e.g. drivers setting limits etc). We can start with a simple
support for limits and then extend accordingly. The important part here is to
not add any uAPI that can't be changed later which would impact future changes.

> In the thread with DSA trace events, there also was a short talk about
> user space theoretically being able to infer FDB sizes and utilization
> degree based on instrumenting with ftrace, which is something we wouldn't
> like to have to maintain. So I'm adding the DSA maintainers too, since
> there is interest for agreeing on a different API.
> https://lore.kernel.org/netdev/2f150ad4-34f4-4af9-b3ce-c1aff208e...@lunn.ch/T/#mfa895245fd012e8f66db784fa568109dba396aa7



Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 11:53, Johannes Nixdorf wrote:
> On Tue, May 16, 2023 at 11:38:11AM +0300, Nikolay Aleksandrov wrote:
>> On 15/05/2023 11:50, Johannes Nixdorf wrote:
>>> A malicious actor behind one bridge port may spam the kernel with packets
>>> with a random source MAC address, each of which will create an FDB entry,
>>> each of which is a dynamic allocation in the kernel.
>>>
>>> There are roughly 2^48 different MAC addresses, further limited by the
>>> rhashtable they are stored in to 2^31. Each entry is of the type struct
>>> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
>>> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
>>> 256GiB, which is too much for most computers.
>>>
>>> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES,
>>> which, if nonzero, limits the amount of entries to a user specified
>>> maximum.
>>>
>>> For backwards compatibility the default setting of 0 disables the limit.
>>>
>>> All changes to fdb_n_entries are under br->hash_lock, which means we do
>>> not need additional locking. The call paths are (✓ denotes that
>>> br->hash_lock is taken around the next call):
>>>
>>>  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
>>> | +- br_fdb_change_mac_address ✓
>>> | +- br_fdb_delete_by_port ✓
>>> +- br_fdb_find_delete_local ✓
>>> +- fdb_add_local <-+- br_fdb_changeaddr ✓
>>> |  +- br_fdb_change_mac_address ✓
>>> |  +- br_fdb_add_local ✓
>>> +- br_fdb_cleanup ✓
>>> +- br_fdb_flush ✓
>>> +- br_fdb_delete_by_port ✓
>>> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
>>> +- br_fdb_external_learn_del ✓
>>>  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
>>> |  +- br_fdb_change_mac_address ✓
>>> |  +- br_fdb_add_local ✓
>>> +- br_fdb_update ✓
>>> +- fdb_add_entry <--- __br_fdb_add ✓
>>> +- br_fdb_external_learn_add ✓
>>>
>>> Signed-off-by: Johannes Nixdorf 
>>> ---
>>>  include/uapi/linux/if_link.h | 1 +
>>>  net/bridge/br_device.c   | 2 ++
>>>  net/bridge/br_fdb.c  | 6 ++
>>>  net/bridge/br_netlink.c  | 9 -
>>>  net/bridge/br_private.h  | 2 ++
>>>  5 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>
>> I completely missed the fact that you don't deal with the situation where 
>> you already have fdbs created
>> and a limit is set later, then it would be useless because it will start 
>> counting from 0 even though
>> there are already entries.
> 
> This should not be an issue. The accounting starts with the bridge
> creation and is never suspended, so if the user sets a limit later we
> do not restart counting at 0.
> 
> The only corner case I can see there is if the user sets a new limit
> lower than the current number of FDB entries. In that case the code
> currently leaves the bridge in a state where the limit is violated,
> but refuses new FDB entries until the total is back below the limit. The
> alternative of cleaning out old FDB entries until their number is under
> the limit again seems to be more error prone to me as well, so I'd rather
> leave it that way.
> 

Ah, good. That's ok then.

>> Also another issue that came to mind is that you don't deal with fdb_create()
>> for "special" entries, i.e. when adding a port. Currently it will print an 
>> error, but you should revisit
>> all callers and see where it might be a problem.
> 
> I'll have a look again, also to see whether only counting dynamic
> entries created as a reaction to observed packets might be a viable
> alternative. If the user creates the entries by adding a port or manually
> via netlink I see no reason to restrict them to the same limit.

Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be
only dynamic entries. We should include more people in this discussion (+CC Ido 
and Vladimir).
Switchdev folks might have more specific requirements and restrictions, so it'd 
be nice to get
their input as well.







Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 15/05/2023 11:50, Johannes Nixdorf wrote:
> A malicious actor behind one bridge port may spam the kernel with packets
> with a random source MAC address, each of which will create an FDB entry,
> each of which is a dynamic allocation in the kernel.
> 
> There are roughly 2^48 different MAC addresses, further limited by the
> rhashtable they are stored in to 2^31. Each entry is of the type struct
> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
> 256GiB, which is too much for most computers.
> 
> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES,
> which, if nonzero, limits the amount of entries to a user specified
> maximum.
> 
> For backwards compatibility the default setting of 0 disables the limit.
> 
> All changes to fdb_n_entries are under br->hash_lock, which means we do
> not need additional locking. The call paths are (✓ denotes that
> br->hash_lock is taken around the next call):
> 
>  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
> | +- br_fdb_change_mac_address ✓
> | +- br_fdb_delete_by_port ✓
> +- br_fdb_find_delete_local ✓
> +- fdb_add_local <-+- br_fdb_changeaddr ✓
> |  +- br_fdb_change_mac_address ✓
> |  +- br_fdb_add_local ✓
> +- br_fdb_cleanup ✓
> +- br_fdb_flush ✓
> +- br_fdb_delete_by_port ✓
> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
> +- br_fdb_external_learn_del ✓
>  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
> |  +- br_fdb_change_mac_address ✓
> |  +- br_fdb_add_local ✓
> +- br_fdb_update ✓
> +- fdb_add_entry <--- __br_fdb_add ✓
> +- br_fdb_external_learn_add ✓
> 
> Signed-off-by: Johannes Nixdorf 
> ---
>  include/uapi/linux/if_link.h | 1 +
>  net/bridge/br_device.c   | 2 ++
>  net/bridge/br_fdb.c  | 6 ++
>  net/bridge/br_netlink.c  | 9 -
>  net/bridge/br_private.h  | 2 ++
>  5 files changed, 19 insertions(+), 1 deletion(-)
> 

I completely missed the fact that you don't deal with the situation where you 
already have fdbs created
and a limit is set later, then it would be useless because it will start 
counting from 0 even though
there are already entries. Also another issue that came to mind is that you 
don't deal with fdb_create()
for "special" entries, i.e. when adding a port. Currently it will print an 
error, but you should revisit
all callers and see where it might be a problem.





Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 11:12, Johannes Nixdorf wrote:
[snip]
>>> return -EMSGSIZE;
>>>  
>>>  #ifdef CONFIG_BRIDGE_VLAN_FILTERING
>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>> index 2119729ded2b..64fb359c6e3e 100644
>>> --- a/net/bridge/br_private.h
>>> +++ b/net/bridge/br_private.h
>>> @@ -494,6 +494,8 @@ struct net_bridge {
>>>  #endif
>>>  
>>> struct rhashtable   fdb_hash_tbl;
>>> +   u32 fdb_n_entries;
>>> +   u32 fdb_max_entries;
>>
>> These are not critical, so I'd use 4 byte holes in net_bridge and pack it 
>> better
>> instead of making it larger.
> 
> For a v2 I now moved it into (conditional) holes now in front of
> CONFIG_BRIDGE_VLAN_FILTERING (only a hole if it is enabled) and
> CONFIG_SWITCHDEV (only a hole if it is disabled). I could not find any
> other holes, but please tell me if you had any others in mind.
> 

Just please don't add them in the first 64 bytes (first cache line) as we use 
that
in the hot path and keep it for variables used there. I'd say use any of the 
other
4 byte holes and just add both, so another 4 byte hole would be left after the 
second one.

>>> struct list_headport_list;
>>>  #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
>>> union {
>>
> 
> Thanks for your detailed feedback.



Re: [Bridge] [PATCH net-next 2/2] bridge: Add a sysctl to limit new brides FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 15/05/2023 14:27, Johannes Nixdorf wrote:
> On Mon, May 15, 2023 at 12:35:47PM +0300, Nikolay Aleksandrov wrote:
>> On 15/05/2023 11:50, Johannes Nixdorf wrote:
>>> This is a convenience setting, which allows the administrator to limit
>>> the default limit of FDB entries for all created bridges, instead of
>>> having to set it for each created bridge using the netlink property.
>>>
>>> The setting is network namespace local, and defaults to 0, which means
>>> unlimited, for backwards compatibility reasons.
>>>
>>> Signed-off-by: Johannes Nixdorf 
>>> ---
>>>  net/bridge/br.c | 83 +
>>>  net/bridge/br_device.c  |  4 +-
>>>  net/bridge/br_private.h |  9 +
>>>  3 files changed, 95 insertions(+), 1 deletion(-)
>>>
>>
>> The bridge doesn't need private sysctls. Netlink is enough.
>> Nacked-by: Nikolay Aleksandrov 
> 
> Fair enough.
> 
> I originally included the setting so there is a global setting an
> administrator could toggle instead of having to hunt down each process
> that might create a bridge, and teaching them to create them with an
> FDB limit.
> 
> Does any of the following alternatives sound acceptable to you?:
>  - Having the default limit (instead of the proposed default to unlimited)
>configurable in Kbuild. This would solve our problem, as we build
>our kernels ourselves, but I don't know whether putting a limit there
>would be acceptable for e.g. distributions.

I don't mind, but it would be useless for everyone else. Kernels would be built
without that limit set.

>  - Hardcoding a default limit != 0. I was afraid I'd break someones
>use-case with far too large bridged networks if I don't default to
>unlimited, but if you maintainers have a number in mind with which
>you don't see a problem, I'd be fine with it as well.
> 
> (Sorry for sending this mail twice, I accidentally dropped the list and
> CC on the fist try)


Right, that has been discussed before. So far there hasn't been any good
option, so I'd say for the time being (or unless you have some better idea)
we should stick with the netlink max attribute and distributions/admins
would have to set it on bridge creation. We could add a warning when creating
a bridge without fdb limit to remind people that it's advisable to set it.
That warning can be removed when we come up with a proper solution.


Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-16 Thread Nikolay Aleksandrov
On 16/05/2023 11:12, Johannes Nixdorf wrote:
> Hi,
> 
> On Mon, May 15, 2023 at 12:35:03PM +0300, Nikolay Aleksandrov wrote:
>> On 15/05/2023 11:50, Johannes Nixdorf wrote:
>>> A malicious actor behind one bridge port may spam the kernel with packets
>>> with a random source MAC address, each of which will create an FDB entry,
>>> each of which is a dynamic allocation in the kernel.
>>>
>>> There are roughly 2^48 different MAC addresses, further limited by the
>>> rhashtable they are stored in to 2^31. Each entry is of the type struct
>>> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
>>> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
>>> 256GiB, which is too much for most computers.
>>>
>>> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES,
>>> which, if nonzero, limits the amount of entries to a user specified
>>> maximum.
>>>
>>> For backwards compatibility the default setting of 0 disables the limit.
>>>
>>> All changes to fdb_n_entries are under br->hash_lock, which means we do
>>> not need additional locking. The call paths are (✓ denotes that
>>> br->hash_lock is taken around the next call):
>>>
>>>  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
>>> | +- br_fdb_change_mac_address ✓
>>> | +- br_fdb_delete_by_port ✓
>>> +- br_fdb_find_delete_local ✓
>>> +- fdb_add_local <-+- br_fdb_changeaddr ✓
>>> |  +- br_fdb_change_mac_address ✓
>>> |  +- br_fdb_add_local ✓
>>> +- br_fdb_cleanup ✓
>>> +- br_fdb_flush ✓
>>> +- br_fdb_delete_by_port ✓
>>> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
>>> +- br_fdb_external_learn_del ✓
>>>  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
>>> |  +- br_fdb_change_mac_address ✓
>>> |  +- br_fdb_add_local ✓
>>> +- br_fdb_update ✓
>>> +- fdb_add_entry <--- __br_fdb_add ✓
>>> +- br_fdb_external_learn_add ✓
>>>
>>> Signed-off-by: Johannes Nixdorf 
>>> ---
>>>  include/uapi/linux/if_link.h | 1 +
>>>  net/bridge/br_device.c   | 2 ++
>>>  net/bridge/br_fdb.c  | 6 ++
>>>  net/bridge/br_netlink.c  | 9 -
>>>  net/bridge/br_private.h  | 2 ++
>>>  5 files changed, 19 insertions(+), 1 deletion(-)
>>>
>>
>> Hi,
>> If you're sending a patch series please add a cover letter (--cover-letter) 
>> which
>> explains what the series are trying to do and why.
> 
> Thanks for the hint. I'll do that in the future, including a potential v2.
> 
>> I've had a patch that implements this feature for a while but didn't get to 
>> upstreaming it. :)
> 
> I'm not too attached to my name on it, so if your patch is further along,
> I'd be happy if you submitted your version instead.
> 

Nah, just mentioning it. I'd be happy to review and ack your patches. I don't 
have time
at the moment to polish and upstream it.

Please add a selftest as well.

>> Anyway more comments below,
> 
>>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>>> index 4ac1000b0ef2..27cf5f2d8790 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -510,6 +510,7 @@ enum {
>>> IFLA_BR_VLAN_STATS_PER_PORT,
>>> IFLA_BR_MULTI_BOOLOPT,
>>> IFLA_BR_MCAST_QUERIER_STATE,
>>> +   IFLA_BR_FDB_MAX_ENTRIES,
>>> __IFLA_BR_MAX,
>>>  };
>>>  
>>> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
>>> index 8eca8a5c80c6..d455a28df7c9 100644
>>> --- a/net/bridge/br_device.c
>>> +++ b/net/bridge/br_device.c
>>> @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev)
>>> br->bridge_hello_time = br->hello_time = 2 * HZ;
>>> br->bridge_forward_delay = br->forward_delay = 15 * HZ;
>>> br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME;
>>> +   br->fdb_n_entries = 0;
>>> +   br->fdb_max_entries = 0;
>>
>> Unnecessary, the private area is already cleared.
> 
> This will be taken o

Re: [Bridge] [PATCH net-next 2/2] bridge: Add a sysctl to limit new brides FDB entries

2023-05-15 Thread Nikolay Aleksandrov
On 15/05/2023 11:50, Johannes Nixdorf wrote:
> This is a convenience setting, which allows the administrator to limit
> the default limit of FDB entries for all created bridges, instead of
> having to set it for each created bridge using the netlink property.
> 
> The setting is network namespace local, and defaults to 0, which means
> unlimited, for backwards compatibility reasons.
> 
> Signed-off-by: Johannes Nixdorf 
> ---
>  net/bridge/br.c | 83 +
>  net/bridge/br_device.c  |  4 +-
>  net/bridge/br_private.h |  9 +
>  3 files changed, 95 insertions(+), 1 deletion(-)
> 

The bridge doesn't need private sysctls. Netlink is enough.
Nacked-by: Nikolay Aleksandrov 


Re: [Bridge] [PATCH net-next 1/2] bridge: Add a limit on FDB entries

2023-05-15 Thread Nikolay Aleksandrov
On 15/05/2023 11:50, Johannes Nixdorf wrote:
> A malicious actor behind one bridge port may spam the kernel with packets
> with a random source MAC address, each of which will create an FDB entry,
> each of which is a dynamic allocation in the kernel.
> 
> There are roughly 2^48 different MAC addresses, further limited by the
> rhashtable they are stored in to 2^31. Each entry is of the type struct
> net_bridge_fdb_entry, which is currently 128 bytes big. This means the
> maximum amount of memory allocated for FDB entries is 2^31 * 128B =
> 256GiB, which is too much for most computers.
> 
> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES,
> which, if nonzero, limits the amount of entries to a user specified
> maximum.
> 
> For backwards compatibility the default setting of 0 disables the limit.
> 
> All changes to fdb_n_entries are under br->hash_lock, which means we do
> not need additional locking. The call paths are (✓ denotes that
> br->hash_lock is taken around the next call):
> 
>  - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓
> | +- br_fdb_change_mac_address ✓
> | +- br_fdb_delete_by_port ✓
> +- br_fdb_find_delete_local ✓
> +- fdb_add_local <-+- br_fdb_changeaddr ✓
> |  +- br_fdb_change_mac_address ✓
> |  +- br_fdb_add_local ✓
> +- br_fdb_cleanup ✓
> +- br_fdb_flush ✓
> +- br_fdb_delete_by_port ✓
> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓
> +- br_fdb_external_learn_del ✓
>  - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓
> |  +- br_fdb_change_mac_address ✓
> |  +- br_fdb_add_local ✓
> +- br_fdb_update ✓
> +- fdb_add_entry <--- __br_fdb_add ✓
> +- br_fdb_external_learn_add ✓
> 
> Signed-off-by: Johannes Nixdorf 
> ---
>  include/uapi/linux/if_link.h | 1 +
>  net/bridge/br_device.c   | 2 ++
>  net/bridge/br_fdb.c  | 6 ++
>  net/bridge/br_netlink.c  | 9 -
>  net/bridge/br_private.h  | 2 ++
>  5 files changed, 19 insertions(+), 1 deletion(-)
> 

Hi,
If you're sending a patch series please add a cover letter (--cover-letter) 
which
explains what the series are trying to do and why.
I've had a patch that implements this feature for a while but didn't get to 
upstreaming it. :)
Anyway more comments below,

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 4ac1000b0ef2..27cf5f2d8790 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -510,6 +510,7 @@ enum {
>   IFLA_BR_VLAN_STATS_PER_PORT,
>   IFLA_BR_MULTI_BOOLOPT,
>   IFLA_BR_MCAST_QUERIER_STATE,
> + IFLA_BR_FDB_MAX_ENTRIES,
>   __IFLA_BR_MAX,
>  };
>  
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 8eca8a5c80c6..d455a28df7c9 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev)
>   br->bridge_hello_time = br->hello_time = 2 * HZ;
>   br->bridge_forward_delay = br->forward_delay = 15 * HZ;
>   br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME;
> + br->fdb_n_entries = 0;
> + br->fdb_max_entries = 0;

Unnecessary, the private area is already cleared.

>   dev->max_mtu = ETH_MAX_MTU;
>  
>   br_netfilter_rtable_init(br);
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index e69a872bfc1d..8a833e6dee92 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -329,6 +329,8 @@ static void fdb_delete(struct net_bridge *br, struct 
> net_bridge_fdb_entry *f,
>   hlist_del_init_rcu(>fdb_node);
>   rhashtable_remove_fast(>fdb_hash_tbl, >rhnode,
>  br_fdb_rht_params);
> + if (!WARN_ON(!br->fdb_n_entries))
> + br->fdb_n_entries--;

This is pointless, just put the WARN_ON(!br->fdb_n_entries) above decrementing, 
if we
hit that we are already in trouble and not decrementing doesn't help us.

>   fdb_notify(br, f, RTM_DELNEIGH, swdev_notify);
>   call_rcu(>rcu, fdb_rcu_free);
>  }
> @@ -391,6 +393,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
> net_bridge *br,
>   struct net_bridge_fdb_entry *fdb;
>   int err;
>  
> + if (unlikely(br->fdb_max_entries && br->fdb_n_entries >= 
> br->fdb_max_entries))
> + return NULL;
> +

This one needs more work, fdb_create() is also used when user-space is adding 
new
entries, so it would be nice to return a proper error.

>   fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC);
>   if (!fdb)
>   return NULL;
> @@ -408,6 +413,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct 
> net_bridge *br,
>   }
>  
>   

Re: [Bridge] [Question] Any plan to write/update the bridge doc?

2023-04-25 Thread Nikolay Aleksandrov
On 24/04/2023 18:46, Ido Schimmel wrote:
> On Mon, Apr 24, 2023 at 05:25:08PM +0800, Hangbin Liu wrote:
>> Hi,
>>
>> Maybe someone already has asked. The only official Linux bridge document I
>> got is a very ancient wiki page[1] or the ip link man page[2][3]. As there 
>> are
>> many bridge stp/vlan/multicast paramegers. Should we add a detailed kernel
>> document about each parameter? The parameter showed in ip link page seems
>> a little brief.
> 
> I suggest improving the man pages instead of adding kernel
> documentation. The man pages are the most up to date resource and
> therefore the one users probably refer to the most. Also, it's already
> quite annoying to patch both "ip-link" and "bridge" man pages when
> adding bridge port options. Adding a third document and making sure all
> three resources are patched would be a nightmare...
> 
>>
>> I'd like to help do this work. But apparently neither my English nor my
>> understanding of the code is good enough. Anyway, if you want, I can help
>> write a draft version first and you (bridge maintainers) keep working on 
>> this.
> 
> I can help reviewing man page patches if you want. I'm going to send
> some soon. Will copy you.
> 
>>
>> [1] https://wiki.linuxfoundation.org/networking/bridge
>> [2] https://man7.org/linux/man-pages/man8/bridge.8.html
>> [3] https://man7.org/linux/man-pages/man8/ip-link.8.html
>>
>> Thanks
>> Hangbin

Always +1 for keeping the man pages up-to-date, but I tend to agree with Jakub 
as well
that it'd be nice to have an in-kernel doc which explains the uapi and 
potentially
at least some more obscure internals (if not all), we can insist on updating it
for new changes

I'd be happy to help fill such doc, but at the moment I don't have the
time to write the basis for it. As Hangbin nicely offered, I think we can start
there. For a start it'd be nice to make an initial outline of the different 
sections
and go on filling them from there.

E.g. as a starter something like (feel free to edit):
Introduction
Bridge internals (fdb, timers, MTU handling, fwding decisions, ports, 
synchronization)
STP (mst, rstp, timers, user-space stp etc)
Multicast (mdb, igmp, eht, vlan-mcast etc)
VLAN (filtering, options, tunnel...)
Switchdev
Netfilter
MRP/CFM (?)
FAQ

Each of these having uapi sections with descriptions. We can include references
to the iproute2 docs for cmd explanations and examples, but in this doc we'll 
have
the uapi descriptions and maybe some helpful information about internal 
implementation
that would save future contributors time.

At the very least we can do the uapi part for each section so options are 
described
and uapi nl attribute structures are explained.

Cheers,
 Nik



Re: [Bridge] [RFC PATCH net-next 0/9] bridge: Add per-{Port, VLAN} neighbor suppression

2023-04-19 Thread Nikolay Aleksandrov
On 13/04/2023 12:58, Ido Schimmel wrote:
> Background
> ==
> 
> In order to minimize the flooding of ARP and ND messages in the VXLAN
> network, EVPN includes provisions [1] that allow participating VTEPs to
> suppress such messages in case they know the MAC-IP binding and can
> reply on behalf of the remote host. In Linux, the above is implemented
> in the bridge driver using a per-port option called "neigh_suppress"
> that was added in kernel version 4.15 [2].
> 
> Motivation
> ==
> 
> Some applications use ARP messages as keepalives between the application
> nodes in the network. This works perfectly well when two nodes are
> connected to the same VTEP. When a node goes down it will stop
> responding to ARP requests and the other node will notice it
> immediately.
> 
> However, when the two nodes are connected to different VTEPs and
> neighbor suppression is enabled, the local VTEP will reply to ARP
> requests even after the remote node went down, until certain timers
> expire and the EVPN control plane decides to withdraw the MAC/IP
> Advertisement route for the address. Therefore, some users would like to
> be able to disable neighbor suppression on VLANs where such applications
> reside and keep it enabled on the rest.
> 
> Implementation
> ==
> 
> The proposed solution is to allow user space to control neighbor
> suppression on a per-{Port, VLAN} basis, in a similar fashion to other
> per-port options that gained per-{Port, VLAN} counterparts such as
> "mcast_router". This allows users to benefit from the operational
> simplicity and scalability associated with shared VXLAN devices (i.e.,
> external / collect-metadata mode), while still allowing for per-VLAN/VNI
> neighbor suppression control.
> 
> The user interface is extended with a new "neigh_vlan_suppress" bridge
> port option that allows user space to enable per-{Port, VLAN} neighbor
> suppression on the bridge port. When enabled, the existing
> "neigh_suppress" option has no effect and neighbor suppression is
> controlled using a new "neigh_suppress" VLAN option. Example usage:
> 
>  # bridge link set dev vxlan0 neigh_vlan_suppress on
>  # bridge vlan add vid 10 dev vxlan0
>  # bridge vlan set vid 10 dev vxlan0 neigh_suppress on
> 
> Testing
> ===
> 
> Tested using existing bridge selftests. Added a dedicated selftest in
> the last patch.
> 
> Patchset overview
> =
> 
> Patches #1-#5 are preparations.
> 
> Patch #6 adds per-{Port, VLAN} neighbor suppression support to the
> bridge's data path.
> 
> Patches #7-#8 add the required netlink attributes to enable the feature.
> 
> Patch #9 adds a selftest.
> 
> iproute2 patches can be found here [3].
> 
> [1] https://www.rfc-editor.org/rfc/rfc7432#section-10
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a42317785c898c0ed46db45a33b0cc71b671bf29
> [3] https://github.com/idosch/iproute2/tree/submit/neigh_suppress_v1
> 
> Ido Schimmel (9):
>   bridge: Reorder neighbor suppression check when flooding
>   bridge: Pass VLAN ID to br_flood()
>   bridge: Add internal flags for per-{Port, VLAN} neighbor suppression
>   bridge: Take per-{Port, VLAN} neighbor suppression into account
>   bridge: Encapsulate data path neighbor suppression logic
>   bridge: Add per-{Port, VLAN} neighbor suppression data path support
>   bridge: vlan: Allow setting VLAN neighbor suppression state
>   bridge: Allow setting per-{Port, VLAN} neighbor suppression state
>   selftests: net: Add bridge neighbor suppression test
> 
>  include/linux/if_bridge.h |   1 +
>  include/uapi/linux/if_bridge.h|   1 +
>  include/uapi/linux/if_link.h  |   1 +
>  net/bridge/br_arp_nd_proxy.c  |  33 +-
>  net/bridge/br_device.c|   8 +-
>  net/bridge/br_forward.c   |   8 +-
>  net/bridge/br_if.c|   2 +-
>  net/bridge/br_input.c |   2 +-
>  net/bridge/br_netlink.c   |   8 +-
>  net/bridge/br_private.h   |   5 +-
>  net/bridge/br_vlan.c  |   1 +
>  net/bridge/br_vlan_options.c  |  20 +-
>  net/core/rtnetlink.c  |   2 +-
>  tools/testing/selftests/net/Makefile  |   1 +
>  .../net/test_bridge_neigh_suppress.sh | 862 ++
>  15 files changed, 936 insertions(+), 19 deletions(-)
>  create mode 100755 tools/testing/selftests/net/test_bridge_neigh_suppress.sh
> 

The set looks good to me, nicely split and pretty straight-forward.
For the set:
Acked-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next] net/bridge: add drop reasons for bridge forwarding

2023-04-11 Thread Nikolay Aleksandrov
On 06/04/2023 14:30, yang.yan...@zte.com.cn wrote:
> From: xu xin 
> 
> This creates six drop reasons as follows, which will help users know the
> specific reason why bridge drops the packets when forwarding.
> 
> 1) SKB_DROP_REASON_BRIDGE_FWD_NO_BACKUP_PORT: failed to get a backup
>port link when the destination port is down.
> 
> 2) SKB_DROP_REASON_BRIDGE_FWD_SAME_PORT: destination port is the same
>with originating port when forwarding by a bridge.
> 
> 3) SKB_DROP_REASON_BRIDGE_NON_FORWARDING_STATE: the bridge's state is
>not forwarding.
> 
> 4) SKB_DROP_REASON_BRIDGE_NOT_ALLOWED_EGRESS: the packet is not allowed
>to go out through the port due to vlan filtering.
> 
> 5) SKB_DROP_REASON_BRIDGE_SWDEV_NOT_ALLOWED_EGRESS: the packet is not
>allowed to go out through the port which is offloaded by a hardware
>switchdev, checked by nbp_switchdev_allowed_egress().
> 
> 6) SKB_DROP_REASON_BRIDGE_BOTH_PORT_ISOLATED: both source port and dest
>port are in BR_ISOLATED state when bridge forwarding.
> 
> Signed-off-by: xu xin 
> Reviewed-by: Zhang Yunkai 
> Reviewed-by: Yang Yang 
> Cc: Xuexin Jiang 
> ---
>  include/net/dropreason.h | 33 
>  net/bridge/br_forward.c  | 49 
> +---
>  2 files changed, 71 insertions(+), 11 deletions(-)
> 

In addition to Jakub's comments, next time please CC bridge maintainers.
I just noticed this patch now.

Thanks,
 Nik




Re: [Bridge] [PATCH v2 net-next 6/6] selftests: forwarding: add dynamic FDB test

2023-03-30 Thread Nikolay Aleksandrov
On 30/03/2023 13:29, Hans Schultz wrote:
> On Thu, Mar 30, 2023 at 09:37, Ido Schimmel  wrote:
>> On Tue, Mar 28, 2023 at 09:30:08PM +0200, Hans Schultz wrote:
>>>
>>> Sorry, but I have sent you several emails telling you about the problems
>>> I have with running the selftests due to changes in the phy etc. Maybe
>>> you have just not received all those emails?
>>>
>>> Have you checked spamfilters?
>>>
>>> With the kernels now, I cannot even test with the software bridge and
>>> selftests as the compile fails - probably due to changes in uapi headers
>>> compared to what the packages my system uses expects.
>>
>> My spam filters are fine. I saw your emails where you basically said
>> that you are too lazy to setup a VM to test your patches and that your
>> time is more valuable than mine, which is why I should be testing them.
>> Stop making your problems our problems. It's hardly the first time. If
>> you are unable to test your patches, then invest the time in fixing your
>> setup instead of submitting completely broken patches and making it our
>> problem to test and fix them. I refuse to invest time in reviewing /
>> testing / reworking your submissions as long as you insist on doing less
>> than the bare minimum.
>>
>> Good luck
> 
> I never said or indicated that my time is more valuable than yours. I
> have a VM to run these things that some have spent countless hours to
> develop with the right tools etc installed and set up. Fixing that
> system will take quite many hours for me, so I am asking for some simple
> assistance from someone who already has a system running supporting the
> newest kernel.
> 
> Alternatively if there is an open sourced system available that would be
> great.
> 
> As this patch-set is for the community and some companies that would
> like to use it and not for myself, I am asking for some help from the
> community with a task that when someone has the system in place should
> not take more than 15-20 minutes, maybe even less.

I'm sorry but this is absolutely the wrong way to go about this. Your setup's
problems are yours to figure out and fix, if you are going to send *any* future
patches make absolutely sure they build, run and work as intended.
Please do not send any future patches without them being fully tested and, as
Ido mentioned, cover at least the bare minimum for a submission.

Thanks,
 Nik



Re: [Bridge] Multicast: handling of STA disconnect

2023-03-20 Thread Nikolay Aleksandrov
On 20/03/2023 19:25, Ujjal Roy wrote:
> Hi Nik,
> 
> Flushing MDB can only be done when we are managing it per station not
> per port. For that we need to have MCAST_TO_UCAST, EHT and FAST_LEAVE.
> 
> Here one more point is - some vendors may offload MCAST_TO_UCAST
> conversion in their own FW to reduce CPU.
> 
> So, the best way is to have MCAST_TO_UCAST enabled and MDB will become
> per station, so we can delete MDB on disconnect. Shall, I create one
> patch for review?
> 
> Thanks,
> UjjaL Roy
> 
> On Mon, Mar 20, 2023 at 5:38 PM Nikolay Aleksandrov  
> wrote:
>>
>> On 20/03/2023 13:45, Ujjal Roy wrote:
>>> Hi Nikolay,
>>>
>>> I have some query on multicast. When streams running on an STA and STA
>>> disconnected due to some reason. So, until the MDB is timed out the
>>> stream will be forwarded to the port and in turn to the driver and
>>> dropps there as no such STA.
>>>
>>> So, is the multicast_eht handling this scenario to take any action
>>> immediately? If not, can we do this to take quick action to reduce
>>> overhead of memory and driver?
>>>
>>> I have an idea on this. Can we mark this port group (MDB entry) as
>>> INACTIVE from the WiFi disconnect event and skip forwarding the stream
>>> to this port in br_multicast_flood by applying the check? I can share
>>> the patch on this.
>>>
>>> Thanks,
>>> UjjaL Roy
>>
>> Hi,
>> Fast leave and EHT (as that's v3's fast leave version) are about quickly 
>> converging when
>> a leave is received (e.g. when there are no listeners to quickly remove the 
>> mdb). They
>> don't deal with interface states (IIUC). Why don't you just flush the port's 
>> mdb entries
>> on disconnect? That would stop fwding.
>>
>> Cheers,
>>  Nik
>>
>>

Hi,
Alright, let's see the patch to better understand what is necessary.
Also please don't top post on netdev@.

Cheers,
 Nik



Re: [Bridge] Multicast: handling of STA disconnect

2023-03-20 Thread Nikolay Aleksandrov
On 20/03/2023 13:45, Ujjal Roy wrote:
> Hi Nikolay,
> 
> I have some query on multicast. When streams running on an STA and STA
> disconnected due to some reason. So, until the MDB is timed out the
> stream will be forwarded to the port and in turn to the driver and
> dropps there as no such STA.
> 
> So, is the multicast_eht handling this scenario to take any action
> immediately? If not, can we do this to take quick action to reduce
> overhead of memory and driver?
> 
> I have an idea on this. Can we mark this port group (MDB entry) as
> INACTIVE from the WiFi disconnect event and skip forwarding the stream
> to this port in br_multicast_flood by applying the check? I can share
> the patch on this.
> 
> Thanks,
> UjjaL Roy

Hi,
Fast leave and EHT (as that's v3's fast leave version) are about quickly 
converging when
a leave is received (e.g. when there are no listeners to quickly remove the 
mdb). They
don't deal with interface states (IIUC). Why don't you just flush the port's 
mdb entries
on disconnect? That would stop fwding.

Cheers,
 Nik




Re: [Bridge] [PATCH net-next v2 09/11] vxlan: Add MDB data path support

2023-03-16 Thread Nikolay Aleksandrov
On 15/03/2023 15:11, Ido Schimmel wrote:
> Integrate MDB support into the Tx path of the VXLAN driver, allowing it
> to selectively forward IP multicast traffic according to the matched MDB
> entry.
> 
> If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the
> packet is an IP multicast packet, perform up to three different lookups
> according to the following priority:
> 
> 1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}.
> 2. For a (*, G) entry, using {Source VNI, Destination IP}.
> 3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI.
> 
> The catchall MDB entry is similar to the catchall FDB entry
> (00:00:00:00:00:00) that is currently used to transmit BUM (broadcast,
> unknown unicast and multicast) traffic. However, unlike the catchall FDB
> entry, this entry is only used to transmit unregistered IP multicast
> traffic that is not link-local. Therefore, when configured, the catchall
> FDB entry will only transmit BULL (broadcast, unknown unicast,
> link-local multicast) traffic.
> 
> The catchall MDB entry is useful in deployments where inter-subnet
> multicast forwarding is used and not all the VTEPs in a tenant domain
> are members in all the broadcast domains. In such deployments it is
> advantageous to transmit BULL (broadcast, unknown unicast and link-local
> multicast) and unregistered IP multicast traffic on different tunnels.
> If the same tunnel was used, a VTEP only interested in IP multicast
> traffic would also pull all the BULL traffic and drop it as it is not a
> member in the originating broadcast domain [1].
> 
> If the packet did not match an MDB entry (or if the packet is not an IP
> multicast packet), return it to the Tx path, allowing it to be forwarded
> according to the FDB.
> 
> If the packet did match an MDB entry, forward it to the associated
> remote VTEPs. However, if the entry is a (*, G) entry and the associated
> remote is in INCLUDE mode, then skip over it as the source IP is not in
> its source list (otherwise the packet would have matched on an (S, G)
> entry). Similarly, if the associated remote is marked as BLOCKED (can
> only be set on (S, G) entries), then skip over it as well as the remote
> is in EXCLUDE mode and the source IP is in its source list.
> 
> [1] 
> https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v2:
> * Use htons() in 'case' instead of ntohs() in 'switch'.
> 
>  drivers/net/vxlan/vxlan_core.c|  15 
>  drivers/net/vxlan/vxlan_mdb.c     | 114 ++
>  drivers/net/vxlan/vxlan_private.h |   6 ++
>  3 files changed, 135 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 11/11] selftests: net: Add VXLAN MDB test

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Add test cases for VXLAN MDB, testing the control and data paths. Two
> different sets of namespaces (i.e., ns{1,2}_v4 and ns{1,2}_v6) are used
> in order to test VXLAN MDB with both IPv4 and IPv6 underlays,
> respectively.
> 
> Example truncated output:
> 
>  # ./test_vxlan_mdb.sh
>  [...]
>  Tests passed: 620
>  Tests failed:   0
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v1:
> * New patch.
> 
>  tools/testing/selftests/net/Makefile  |1 +
>  tools/testing/selftests/net/config|1 +
>  tools/testing/selftests/net/test_vxlan_mdb.sh | 2318 +
>  3 files changed, 2320 insertions(+)
>  create mode 100755 tools/testing/selftests/net/test_vxlan_mdb.sh
> 

Wohoo, nice!
Reviewed-by: Nikolay Aleksandrov 


Re: [Bridge] [PATCH net-next 10/11] vxlan: Enable MDB support

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Now that the VXLAN MDB control and data paths are in place we can expose
> the VXLAN MDB functionality to user space.
> 
> Set the VXLAN MDB net device operations to the appropriate functions,
> thereby allowing the rtnetlink code to reach the VXLAN driver.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  drivers/net/vxlan/vxlan_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
> index 1e55c5582e67..e473200b731f 100644
> --- a/drivers/net/vxlan/vxlan_core.c
> +++ b/drivers/net/vxlan/vxlan_core.c
> @@ -3083,6 +3083,9 @@ static const struct net_device_ops 
> vxlan_netdev_ether_ops = {
>   .ndo_fdb_del= vxlan_fdb_delete,
>   .ndo_fdb_dump   = vxlan_fdb_dump,
>   .ndo_fdb_get= vxlan_fdb_get,
> + .ndo_mdb_add= vxlan_mdb_add,
> + .ndo_mdb_del= vxlan_mdb_del,
> + .ndo_mdb_dump   = vxlan_mdb_dump,
>   .ndo_fill_metadata_dst  = vxlan_fill_metadata_dst,
>  };
>  

Reviewed-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next 09/11] vxlan: Add MDB data path support

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Integrate MDB support into the Tx path of the VXLAN driver, allowing it
> to selectively forward IP multicast traffic according to the matched MDB
> entry.
> 
> If MDB entries are configured (i.e., 'VXLAN_F_MDB' is set) and the
> packet is an IP multicast packet, perform up to three different lookups
> according to the following priority:
> 
> 1. For an (S, G) entry, using {Source VNI, Source IP, Destination IP}.
> 2. For a (*, G) entry, using {Source VNI, Destination IP}.
> 3. For the catchall MDB entry (0.0.0.0 or ::), using the source VNI.
> 
> The catchall MDB entry is similar to the catchall FDB entry
> (00:00:00:00:00:00) that is currently used to transmit BUM (broadcast,
> unknown unicast and multicast) traffic. However, unlike the catchall FDB
> entry, this entry is only used to transmit unregistered IP multicast
> traffic that is not link-local. Therefore, when configured, the catchall
> FDB entry will only transmit BULL (broadcast, unknown unicast,
> link-local multicast) traffic.
> 
> The catchall MDB entry is useful in deployments where inter-subnet
> multicast forwarding is used and not all the VTEPs in a tenant domain
> are members in all the broadcast domains. In such deployments it is
> advantageous to transmit BULL (broadcast, unknown unicast and link-local
> multicast) and unregistered IP multicast traffic on different tunnels.
> If the same tunnel was used, a VTEP only interested in IP multicast
> traffic would also pull all the BULL traffic and drop it as it is not a
> member in the originating broadcast domain [1].
> 
> If the packet did not match an MDB entry (or if the packet is not an IP
> multicast packet), return it to the Tx path, allowing it to be forwarded
> according to the FDB.
> 
> If the packet did match an MDB entry, forward it to the associated
> remote VTEPs. However, if the entry is a (*, G) entry and the associated
> remote is in INCLUDE mode, then skip over it as the source IP is not in
> its source list (otherwise the packet would have matched on an (S, G)
> entry). Similarly, if the associated remote is marked as BLOCKED (can
> only be set on (S, G) entries), then skip over it as well as the remote
> is in EXCLUDE mode and the source IP is in its source list.
> 
> [1] 
> https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-2.6
> 
> Signed-off-by: Ido Schimmel 
> ---
>  drivers/net/vxlan/vxlan_core.c|  15 
>  drivers/net/vxlan/vxlan_mdb.c | 114 ++
>  drivers/net/vxlan/vxlan_private.h |   6 ++
>  3 files changed, 135 insertions(+)
> 
[snip]> diff --git a/drivers/net/vxlan/vxlan_mdb.c 
b/drivers/net/vxlan/vxlan_mdb.c
> index b32b1fb4a74a..ea63c5178718 100644
> --- a/drivers/net/vxlan/vxlan_mdb.c
> +++ b/drivers/net/vxlan/vxlan_mdb.c
> @@ -1298,6 +1298,120 @@ int vxlan_mdb_del(struct net_device *dev, struct 
> nlattr *tb[],
>   return err;
>  }
>  
> +struct vxlan_mdb_entry *vxlan_mdb_entry_skb_get(struct vxlan_dev *vxlan,
> + struct sk_buff *skb,
> + __be32 src_vni)
> +{
> + struct vxlan_mdb_entry *mdb_entry;
> + struct vxlan_mdb_entry_key group;
> +
> + if (!is_multicast_ether_addr(eth_hdr(skb)->h_dest) ||
> + is_broadcast_ether_addr(eth_hdr(skb)->h_dest))
> + return NULL;
> +
> + /* When not in collect metadata mode, 'src_vni' is zero, but MDB
> +  * entries are stored with the VNI of the VXLAN device.
> +  */
> + if (!(vxlan->cfg.flags & VXLAN_F_COLLECT_METADATA))
> + src_vni = vxlan->default_dst.remote_vni;
> +
> + memset(, 0, sizeof(group));
> + group.vni = src_vni;
> +
> + switch (ntohs(skb->protocol)) {

drop the ntohs and..

> + case ETH_P_IP:

htons(ETH_P_IP)

> + if (!pskb_may_pull(skb, sizeof(struct iphdr)))
> + return NULL;
> + group.dst.sa.sa_family = AF_INET;
> + group.dst.sin.sin_addr.s_addr = ip_hdr(skb)->daddr;
> + group.src.sa.sa_family = AF_INET;
> + group.src.sin.sin_addr.s_addr = ip_hdr(skb)->saddr;
> + break;
> +#if IS_ENABLED(CONFIG_IPV6)
> + case ETH_P_IPV6:

htons(ETH_P_IPV6)

> + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr)))
> + return NULL;
> + group.dst.sa.sa_family = AF_INET6;
> + group.dst.sin6.sin6_addr = ipv6_hdr(skb)->daddr;
> + group.src.sa.sa_family = AF_INET6;
> + group.src.sin6.sin6_addr = ipv6_hdr(skb)->saddr;
> + break;
> +#endif
> + default:
> + return NULL;
> + }
> +
> + mdb_entry = vxlan_mdb_entry_lookup(vxlan, );
> + if (mdb_entry)
> + return mdb_entry;
> +
> + memset(, 0, sizeof(group.src));
> + mdb_entry = vxlan_mdb_entry_lookup(vxlan, );
> + if (mdb_entry)
> + return mdb_entry;
> +
> + /* No (S, G) 

Re: [Bridge] [PATCH net-next 08/11] vxlan: mdb: Add an internal flag to indicate MDB usage

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Add an internal flag to indicate whether MDB entries are configured or
> not. Set the flag after installing the first MDB entry and clear it
> before deleting the last one.
> 
> The flag will be consulted by the data path which will only perform an
> MDB lookup if the flag is set, thereby keeping the MDB overhead to a
> minimum when the MDB is not used.
> 
> Another option would have been to use a static key, but it is global and
> not per-device, unlike the current approach.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  drivers/net/vxlan/vxlan_mdb.c | 7 +++
>  include/net/vxlan.h   | 1 +
>  2 files changed, 8 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [Bridge] [PATCH net-next 07/11] vxlan: mdb: Add MDB control path support

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Implement MDB control path support, enabling the creation, deletion,
> replacement and dumping of MDB entries in a similar fashion to the
> bridge driver. Unlike the bridge driver, each entry stores a list of
> remote VTEPs to which matched packets need to be replicated to and not a
> list of bridge ports.
> 
> The motivating use case is the installation of MDB entries by a user
> space control plane in response to received EVPN routes. As such, only
> allow permanent MDB entries to be installed and do not implement
> snooping functionality, avoiding a lot of unnecessary complexity.
> 
> Since entries can only be modified by user space under RTNL, use RTNL as
> the write lock. Use RCU to ensure that MDB entries and remotes are not
> freed while being accessed from the data path during transmission.
> 
> In terms of uAPI, reuse the existing MDB netlink interface, but add a
> few new attributes to request and response messages:
> 
> * IP address of the destination VXLAN tunnel endpoint where the
>   multicast receivers reside.
> 
> * UDP destination port number to use to connect to the remote VXLAN
>   tunnel endpoint.
> 
> * VXLAN VNI Network Identifier to use to connect to the remote VXLAN
>   tunnel endpoint. Required when Ingress Replication (IR) is used and
>   the remote VTEP is not a member of originating broadcast domain
>   (VLAN/VNI) [1].
> 
> * Source VNI Network Identifier the MDB entry belongs to. Used only when
>   the VXLAN device is in external mode.
> 
> * Interface index of the outgoing interface to reach the remote VXLAN
>   tunnel endpoint. This is required when the underlay destination IP is
>   multicast (P2MP), as the multicast routing tables are not consulted.
> 
> All the new attributes are added under the 'MDBA_SET_ENTRY_ATTRS' nest
> which is strictly validated by the bridge driver, thereby automatically
> rejecting the new attributes.
> 
> [1] 
> https://datatracker.ietf.org/doc/html/draft-ietf-bess-evpn-irb-mcast#section-3.2.2
> 
> Signed-off-by: Ido Schimmel 
> ---
> 
> Notes:
> v1:
> * Remove restrictions regarding mixing of multicast and unicast remote
>   destination IPs in an MDB entry. While such configuration does not
>   make sense to me, it is no forbidden by the VXLAN FDB code and does
>   not crash the kernel.
> * Fix check regarding all-zeros MDB entry and source.
> 
>  drivers/net/vxlan/Makefile|2 +-
>  drivers/net/vxlan/vxlan_core.c|8 +
>  drivers/net/vxlan/vxlan_mdb.c | 1341 +
>  drivers/net/vxlan/vxlan_private.h |   31 +
>  include/net/vxlan.h   |5 +
>  include/uapi/linux/if_bridge.h|   10 +
>  6 files changed, 1396 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/vxlan/vxlan_mdb.c


Reviewed-by: Nikolay Aleksandrov 



Re: [Bridge] [PATCH net-next 06/11] vxlan: Expose vxlan_xmit_one()

2023-03-14 Thread Nikolay Aleksandrov
On 13/03/2023 16:53, Ido Schimmel wrote:
> Given a packet and a remote destination, the function will take care of
> encapsulating the packet and transmitting it to the destination.
> 
> Expose it so that it could be used in subsequent patches by the MDB code
> to transmit a packet to the remote destination(s) stored in the MDB
> entry.
> 
> It will allow us to keep the MDB code self-contained, not exposing its
> data structures to the rest of the VXLAN driver.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  drivers/net/vxlan/vxlan_core.c| 5 ++---
>  drivers/net/vxlan/vxlan_private.h | 2 ++
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov 




  1   2   3   4   5   6   7   8   9   10   >