from:"\"nikolay\""

[PATCH] tracing/kprobes: Add missing check for skipping symbol counting logic

2024-07-04 Thread Nikolay Kuratov

In commit b022f0c7e404
("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
it was already done for symbol in __trace_kprobe_create(), but func in
create_local_trace_kprobe() is also valid usage.
Not doing so leads to ENOENT for module_name:symbol_name
constructions passed over perf_event_open().

Fixes: b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches 
several symbols")
Signed-off-by: Nikolay Kuratov 
---
 kernel/trace/trace_kprobe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 16383247bdbf..3f4c9195dc0d 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1834,7 +1834,7 @@ create_local_trace_kprobe(char *func, void *addr, 
unsigned long offs,
int ret;
char *event;
 
-   if (func) {
+   if (func && !strchr(func, ':')) {
unsigned int count;
 
count = number_of_same_symbols(func);
-- 
2.34.1

Re: [PATCH v3] x86/paravirt: Disable virt spinlock on bare metal

2024-06-25 Thread Nikolay Borisov





On 25.06.24 г. 15:54 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Set the default value of virt_spin_lock_key to false. If booting in a VM,
enable this key. Later during the VM initialization, if other
high-efficient spinlock is preferred(paravirt-spinlock eg), the
virt_spin_lock_key is disabled accordingly. The relation is described as
below:

X86_FEATURE_HYPERVISOR YYY N
CONFIG_PARAVIRT_SPINLOCKS  YYN Y/N
PV spinlockYNN Y/N

virt_spin_lock_key NNY N

Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function 
warning")
Suggested-by: Dave Hansen 
Suggested-by: Qiuxu Zhuo 
Suggested-by: Nikolay Borisov 
Reported-by: Prem Nath Dey 
Reported-by: Xiaoping Zhou 
Signed-off-by: Chen Yu 


Reviewed-by: Nikolay Borisov

Re: [PATCH v3] x86/paravirt: Disable virt spinlock on bare metal

2024-06-25 Thread Nikolay Borisov





On 25.06.24 г. 17:50 ч., Chen Yu wrote:

On 2024-06-25 at 16:42:11 +0300, Nikolay Borisov wrote:



On 25.06.24 г. 15:54 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Set the default value of virt_spin_lock_key to false. If booting in a VM,
enable this key. Later during the VM initialization, if other
high-efficient spinlock is preferred(paravirt-spinlock eg), the
virt_spin_lock_key is disabled accordingly. The relation is described as
below:

X86_FEATURE_HYPERVISOR YYY N
CONFIG_PARAVIRT_SPINLOCKS  YYN Y/N
PV spinlockYNN Y/N

virt_spin_lock_key NNY N

-DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);
+DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
   void __init native_pv_lock_init(void)
   {
-   if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&


Actually now shouldn't the CONFIG_PARAVIRT_SPINLOCKS check be retained?
Otherwise we'll have the virtspinlock enabled even if we are a guest but
CONFIG_PARAVIRT_SPINLOCKS is disabled, no ?



It seems to be the expected behavior? If CONFIG_PARAVIRT_SPINLOCKS is disabled,
should the virt_spin_lock_key be enabled in the guest?


No, but if it's disabled and we are under a hypervisor shouldn't the 
virt spinlock be kept disabled? As it stands now everytime we are under 
a hypervisor the virt spinlock is enabled irrespective of the 
PARAVIRT_SPINLOCK config state.



The previous behavior before commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"): kvm_spinlock_init() is NULL if
CONFIG_PARAVIRT_SPINLOCKS is disabled, and 
static_branch_disable(&virt_spin_lock_key)
can not be invoked, so the virt_spin_lock_key keeps enabled.

thanks,
Chenyu

Re: [PATCH v3] x86/paravirt: Disable virt spinlock on bare metal

2024-06-25 Thread Nikolay Borisov





On 25.06.24 г. 15:54 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Set the default value of virt_spin_lock_key to false. If booting in a VM,
enable this key. Later during the VM initialization, if other
high-efficient spinlock is preferred(paravirt-spinlock eg), the
virt_spin_lock_key is disabled accordingly. The relation is described as
below:

X86_FEATURE_HYPERVISOR YYY N
CONFIG_PARAVIRT_SPINLOCKS  YYN Y/N
PV spinlockYNN Y/N

virt_spin_lock_key NNY N

Fixes: ce0a1b608bfc ("x86/paravirt: Silence unused native_pv_lock_init() function 
warning")
Suggested-by: Dave Hansen 
Suggested-by: Qiuxu Zhuo 
Suggested-by: Nikolay Borisov 
Reported-by: Prem Nath Dey 
Reported-by: Xiaoping Zhou 
Signed-off-by: Chen Yu 
---
v2._v3:
   Change the default value of virt_spin_lock_key from true to false.
   Enable this key when it is in the VM, and disable it when needed.
   This makes the code more readable. (Nikolay Borisov)
   Dropped Reviewed-by because the code has been changed.
v1->v2:
   Refine the commit log per Dave's suggestion.
   Simplify the fix by directly disabling the virt_spin_lock_key on bare metal.
   Collect Reviewed-by from Juergen.
---
  arch/x86/include/asm/qspinlock.h | 4 ++--
  arch/x86/kernel/paravirt.c   | 7 +++
  2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index a053c1293975..a32bd2aabdf9 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -66,13 +66,13 @@ static inline bool vcpu_is_preempted(long cpu)
  
  #ifdef CONFIG_PARAVIRT

  /*
- * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack.
+ * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack.
   *
   * Native (and PV wanting native due to vCPU pinning) should disable this key.
   * It is done in this backwards fashion to only have a single direction 
change,
   * which removes ordering between native_pv_spin_init() and HV setup.
   */
-DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
  
  /*

   * Shortcut for the queued_spin_lock_slowpath() function that allows
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 5358d43886ad..fec381533555 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -51,13 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
  DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
  #endif
  
-DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);

+DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
  
  void __init native_pv_lock_init(void)

  {
-   if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) &&


Actually now shouldn't the CONFIG_PARAVIRT_SPINLOCKS check be retained? 
Otherwise we'll have the virtspinlock enabled even if we are a guest but 
CONFIG_PARAVIRT_SPINLOCKS is disabled, no ?



-   !boot_cpu_has(X86_FEATURE_HYPERVISOR))
-   static_branch_disable(&virt_spin_lock_key);
+   if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+   static_branch_enable(&virt_spin_lock_key);
  }
  
  static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)

Re: [PATCH v2] x86/paravirt: Disable virt spinlock on bare metal

2024-06-19 Thread Nikolay Borisov





On 19.06.24 г. 18:25 ч., Chen Yu wrote:

Hi Nikolay,

On 2024-06-18 at 11:24:42 +0300, Nikolay Borisov wrote:



On 26.05.24 г. 4:58 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Fix this by disabling virt_spin_lock_key if it is on bare metal,
regardless of CONFIG_PARAVIRT_SPINLOCKS.



nit:

This bug wouldn't have happened if the key was defined FALSE by default and
only enabled in the appropriate case. I think it makes more sense to invert
the logic and have the key FALSE by default and only enable it iff the
kernel is running under a hypervisor... At worst only the virtualization
case would suffer if the lock is falsely not enabled.


Thank you for your review. I agree, initializing the key to FALSE by default 
seems
to be more readable. Could this change be the subsequent adjustment based
on current fix, which could be more bisectible?


Why can't this change be squashed in the current proposed patch?




Set the default key to false. If booting in a VM, enable this key. Later during
the VM initialization, if other high-efficient spinlock is preferred, like
paravirt-spinlock, the virt_spin_lock_key will be disabled accordingly.


Yep, or simply during the initialization stage the correct flavor will 
be chosen, no need to do the on-off dance. But that's a topic for a 
different discussion.




diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index cde8357bb226..a7d3ba00e70e 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -66,13 +66,13 @@ static inline bool vcpu_is_preempted(long cpu)
  
  #ifdef CONFIG_PARAVIRT

  /*
- * virt_spin_lock_key - enables (by default) the virt_spin_lock() hijack.
+ * virt_spin_lock_key - disables (by default) the virt_spin_lock() hijack.
   *
   * Native (and PV wanting native due to vCPU pinning) should disable this key.
   * It is done in this backwards fashion to only have a single direction 
change,
   * which removes ordering between native_pv_spin_init() and HV setup.
   */
-DECLARE_STATIC_KEY_TRUE(virt_spin_lock_key);
+DECLARE_STATIC_KEY_FALSE(virt_spin_lock_key);
  
  /*

   * Shortcut for the queued_spin_lock_slowpath() function that allows
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index c193c9e60a1b..fec381533555 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -51,12 +51,12 @@ DEFINE_ASM_FUNC(pv_native_irq_enable, "sti", .noinstr.text);
  DEFINE_ASM_FUNC(pv_native_read_cr2, "mov %cr2, %rax", .noinstr.text);
  #endif
  
-DEFINE_STATIC_KEY_TRUE(virt_spin_lock_key);

+DEFINE_STATIC_KEY_FALSE(virt_spin_lock_key);
  
  void __init native_pv_lock_init(void)

  {
-   if (!boot_cpu_has(X86_FEATURE_HYPERVISOR))
-   static_branch_disable(&virt_spin_lock_key);
+   if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+   static_branch_enable(&virt_spin_lock_key);
  }
  
  static void native_tlb_remove_table(struct mmu_gather *tlb, void *table)

Re: [PATCH v2] x86/paravirt: Disable virt spinlock on bare metal

2024-06-18 Thread Nikolay Borisov





On 26.05.24 г. 4:58 ч., Chen Yu wrote:

The kernel can change spinlock behavior when running as a guest. But
this guest-friendly behavior causes performance problems on bare metal.
So there's a 'virt_spin_lock_key' static key to switch between the two
modes.

The static key is always enabled by default (run in guest mode) and
should be disabled for bare metal (and in some guests that want native
behavior).

Performance drop is reported when running encode/decode workload and
BenchSEE cache sub-workload.
Bisect points to commit ce0a1b608bfc ("x86/paravirt: Silence unused
native_pv_lock_init() function warning"). When CONFIG_PARAVIRT_SPINLOCKS
is disabled the virt_spin_lock_key is incorrectly set to true on bare
metal. The qspinlock degenerates to test-and-set spinlock, which
decrease the performance on bare metal.

Fix this by disabling virt_spin_lock_key if it is on bare metal,
regardless of CONFIG_PARAVIRT_SPINLOCKS.



nit:

This bug wouldn't have happened if the key was defined FALSE by default 
and only enabled in the appropriate case. I think it makes more sense to 
invert the logic and have the key FALSE by default and only enable it 
iff the kernel is running under a hypervisor... At worst only the 
virtualization case would suffer if the lock is falsely not enabled.

[PATCH v2] vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()

2023-12-11 Thread Nikolay Kuratov

We need to do signed arithmetic if we expect condition
`if (bytes < 0)` to be possible

Found by Linux Verification Center (linuxtesting.org) with SVACE

Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko")
Signed-off-by: Nikolay Kuratov 
---

V1 -> V2: Added Fixes section

 net/vmw_vsock/virtio_transport_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index c8e162c9d1df..6df246b53260 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -843,7 +843,7 @@ static s64 virtio_transport_has_space(struct vsock_sock 
*vsk)
struct virtio_vsock_sock *vvs = vsk->trans;
s64 bytes;
 
-   bytes = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
+   bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
if (bytes < 0)
bytes = 0;
 
-- 
2.34.1

[PATCH] vsock/virtio: Fix unsigned integer wrap around in virtio_transport_has_space()

2023-12-11 Thread Nikolay Kuratov

We need to do signed arithmetic if we expect condition
`if (bytes < 0)` to be possible

Found by Linux Verification Center (linuxtesting.org) with SVACE

Signed-off-by: Nikolay Kuratov 
---
 net/vmw_vsock/virtio_transport_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c 
b/net/vmw_vsock/virtio_transport_common.c
index c8e162c9d1df..6df246b53260 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -843,7 +843,7 @@ static s64 virtio_transport_has_space(struct vsock_sock 
*vsk)
struct virtio_vsock_sock *vvs = vsk->trans;
s64 bytes;
 
-   bytes = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
+   bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt);
if (bytes < 0)
bytes = 0;
 
-- 
2.34.1

Re: [PATCH] fs: btrfs: Remove repeated struct declaration

2021-04-01 Thread Nikolay Borisov




On 1.04.21 г. 11:03, Wan Jiabing wrote:
> struct btrfs_inode is declared twice. One is declared at 67th line.
> The blew declaration is not needed. Remove the duplicate.
> struct btrfs_fs_info should be declared in the forward declarations.
> Move it to the forward declarations.
> 
> Signed-off-by: Wan Jiabing 

Reviewed-by: Nikolay Borisov

Re: [PATCH][next] net: bridge: Fix missing return assignment from br_vlan_replay_one call

2021-03-24 Thread Nikolay Aleksandrov

On 24/03/2021 17:09, Colin King wrote:
> From: Colin Ian King 
> 
> The call to br_vlan_replay_one is returning an error return value but
> this is not being assigned to err and the following check on err is
> currently always false because err was initialized to zero. Fix this
> by assigning err.
> 
> Addresses-Coverity: ("'Constant' variable guards dead code")
> Fixes: 22f67cdfae6a ("net: bridge: add helper to replay VLANs installed on 
> port")
> Signed-off-by: Colin Ian King 
> ---
>  net/bridge/br_vlan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index ca8daccff217..7422691230b1 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1815,7 +1815,7 @@ int br_vlan_replay(struct net_device *br_dev, struct 
> net_device *dev,
>   if (!br_vlan_should_use(v))
>   continue;
>  
> - br_vlan_replay_one(nb, dev, &vlan, extack);
> + err = br_vlan_replay_one(nb, dev, &vlan, extack);
>   if (err)
>   return err;
>   }
> 

Thanks,
Acked-by: Nikolay Aleksandrov

Re: [PATCH v4 net-next 05/11] net: bridge: add helper to replay VLANs installed on port

2021-03-23 Thread Nikolay Aleksandrov

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> Currently this simple setup with DSA:
> 
> ip link add br0 type bridge vlan_filtering 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> 
> will not work because the bridge has created the PVID in br_add_if ->
> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
> but that was too early, since swp0 was not yet a lower of bond0, so it
> had no reason to act upon that notification.
> 
> We need a helper in the bridge to replay the switchdev VLAN objects that
> were notified since the bridge port creation, because some of them may
> have been missed.
> 
> As opposed to the br_mdb_replay function, the vg->vlan_list write side
> protection is offered by the rtnl_mutex which is sleepable, so we don't
> need to queue up the objects in atomic context, we can replay them right
> away.
> 
> Signed-off-by: Vladimir Oltean 
> ---
>  include/linux/if_bridge.h | 10 ++
>  net/bridge/br_vlan.c  | 73 +++
>  2 files changed, 83 insertions(+)
> 

Same comments about the const qualifiers as the other patches.
The code looks good to me otherwise.

Acked-by: Nikolay Aleksandrov 

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index b564c4486a45..2cc35038a8ca 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -111,6 +111,8 @@ int br_vlan_get_pvid_rcu(const struct net_device *dev, 
> u16 *p_pvid);
>  int br_vlan_get_proto(const struct net_device *dev, u16 *p_proto);
>  int br_vlan_get_info(const struct net_device *dev, u16 vid,
>struct bridge_vlan_info *p_vinfo);
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline bool br_vlan_enabled(const struct net_device *dev)
>  {
> @@ -137,6 +139,14 @@ static inline int br_vlan_get_info(const struct 
> net_device *dev, u16 vid,
>  {
>   return -EINVAL;
>  }
> +
> +static inline int br_vlan_replay(struct net_device *br_dev,
> +  struct net_device *dev,
> +  struct notifier_block *nb,
> +  struct netlink_ext_ack *extack)
> +{
> + return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE)
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 8829f621b8ec..ca8daccff217 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -1751,6 +1751,79 @@ void br_vlan_notify(const struct net_bridge *br,
>   kfree_skb(skb);
>  }
>  
> +static int br_vlan_replay_one(struct notifier_block *nb,
> +   struct net_device *dev,
> +   struct switchdev_obj_port_vlan *vlan,
> +   struct netlink_ext_ack *extack)
> +{
> + struct switchdev_notifier_port_obj_info obj_info = {
> + .info = {
> + .dev = dev,
> + .extack = extack,
> + },
> + .obj = &vlan->obj,
> + };
> + int err;
> +
> + err = nb->notifier_call(nb, SWITCHDEV_PORT_OBJ_ADD, &obj_info);
> + return notifier_to_errno(err);
> +}
> +
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +struct notifier_block *nb, struct netlink_ext_ack *extack)
> +{
> + struct net_bridge_vlan_group *vg;
> + struct net_bridge_vlan *v;
> + struct net_bridge_port *p;
> + struct net_bridge *br;
> + int err = 0;
> + u16 pvid;
> +
> + ASSERT_RTNL();
> +
> + if (!netif_is_bridge_master(br_dev))
> + return -EINVAL;
> +
> + if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
> + return -EINVAL;
> +
> + if (netif_is_bridge_master(dev)) {
> + br = netdev_priv(dev);
> + vg = br_vlan_group(br);
> + p = NULL;
> + } else {
> + p = br_port_get_rtnl(dev);
> + if (WARN_ON(!p))
> + return -EINVAL;
> + vg = nbp_vlan_group(p);
> + br = p->br;
> + }
> +
> + if (!vg)
> + return 0;
> +
> + pvid = br_get_pvid(vg);
> +
> + list_for_each_entry(v, &vg->vlan_list, vlist) {
> + struct switchdev_obj_port_vlan vlan = {
> + .obj.orig_dev = dev,
> + .obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
> + .flags = br_vlan_fla

Re: [PATCH v4 net-next 03/11] net: bridge: add helper to replay port and host-joined mdb entries

2021-03-23 Thread Nikolay Aleksandrov

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> I have a system with DSA ports, and udhcpcd is configured to bring
> interfaces up as soon as they are created.
> 
> I create a bridge as follows:
> 
> ip link add br0 type bridge
> 
> As soon as I create the bridge and udhcpcd brings it up, I also have
> avahi which automatically starts sending IPv6 packets to advertise some
> local services, and because of that, the br0 bridge joins the following
> IPv6 groups due to the code path detailed below:
> 
> 33:33:ff:6d:c1:9c vid 0
> 33:33:00:00:00:6a vid 0
> 33:33:00:00:00:fb vid 0
> 
> br_dev_xmit
> -> br_multicast_rcv
>-> br_ip6_multicast_add_group
>   -> __br_multicast_add_group
>  -> br_multicast_host_join
> -> br_mdb_notify
> 
> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
> hooked up, and switchdev will attempt to offload the host joined groups
> to an empty list of ports. Of course nobody offloads them.
> 
> Then when we add a port to br0:
> 
> ip link set swp0 master br0
> 
> the bridge doesn't replay the host-joined MDB entries from br_add_if,
> and eventually the host joined addresses expire, and a switchdev
> notification for deleting it is emitted, but surprise, the original
> addition was already completely missed.
> 
> The strategy to address this problem is to replay the MDB entries (both
> the port ones and the host joined ones) when the new port joins the
> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
> be populated and only then attached to a bridge that you offload).
> However there are 2 possibilities: the addresses can be 'pushed' by the
> bridge into the port, or the port can 'pull' them from the bridge.
> 
> Considering that in the general case, the new port can be really late to
> the party, and there may have been many other switchdev ports that
> already received the initial notification, we would like to avoid
> delivering duplicate events to them, since they might misbehave. And
> currently, the bridge calls the entire switchdev notifier chain, whereas
> for replaying it should just call the notifier block of the new guy.
> But the bridge doesn't know what is the new guy's notifier block, it
> just knows where the switchdev notifier chain is. So for simplification,
> we make this a driver-initiated pull for now, and the notifier block is
> passed as an argument.
> 
> To emulate the calling context for mdb objects (deferred and put on the
> blocking notifier chain), we must iterate under RCU protection through
> the bridge's mdb entries, queue them, and only call them once we're out
> of the RCU read-side critical section.
> 
> There was some opportunity for reuse between br_mdb_switchdev_host_port,
> br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev
> mdb object is created, so a helper was created.
> 
> Suggested-by: Ido Schimmel 
> Signed-off-by: Vladimir Oltean 
> ---
>  include/linux/if_bridge.h |   9 +++
>  include/net/switchdev.h   |   1 +
>  net/bridge/br_mdb.c   | 148 +-
>  3 files changed, 141 insertions(+), 17 deletions(-)
> 

Absolutely the same comments here as for the fdb version.
The code looks correct.

Acked-by: Nikolay Aleksandrov 

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ebd16495459c..f6472969bb44 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device 
> *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
>  bool br_multicast_router(const struct net_device *dev);
> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>struct list_head *br_ip_list)
> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct 
> net_device *dev)
>  {
>   return false;
>  }
> +static inline int br_mdb_replay(struct net_device *br_dev,
> + struct net_device *dev,
> + struct notifier_block *nb,
> + struct netlink_ext_ack *extack)
> +{
> + return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/include/net/switchdev.h b/i

Re: [PATCH v4 net-next 04/11] net: bridge: add helper to replay port and local fdb entries

2021-03-23 Thread Nikolay Aleksandrov

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> When a switchdev port starts offloading a LAG that is already in a
> bridge and has an FDB entry pointing to it:
> 
> ip link set bond0 master br0
> bridge fdb add dev bond0 00:01:02:03:04:05 master static
> ip link set swp0 master bond0
> 
> the switchdev driver will have no idea that this FDB entry is there,
> because it missed the switchdev event emitted at its creation.
> 
> Ido Schimmel pointed this out during a discussion about challenges with
> switchdev offloading of stacked interfaces between the physical port and
> the bridge, and recommended to just catch that condition and deny the
> CHANGEUPPER event:
> https://lore.kernel.org/netdev/20210210105949.gb287...@shredder.lan/
> 
> But in fact, we might need to deal with the hard thing anyway, which is
> to replay all FDB addresses relevant to this port, because it isn't just
> static FDB entries, but also local addresses (ones that are not
> forwarded but terminated by the bridge). There, we can't just say 'oh
> yeah, there was an upper already so I'm not joining that'.
> 
> So, similar to the logic for replaying MDB entries, add a function that
> must be called by individual switchdev drivers and replays local FDB
> entries as well as ones pointing towards a bridge port. This time, we
> use the atomic switchdev notifier block, since that's what FDB entries
> expect for some reason.
> 

I get the reason to have both bridge and bridge port devices (although the 
bridge
is really unnecessary as it can be inferred from the port), but it looks kind of
weird at first glance, I mean we get all of the port's fdbs and all of the 
bridge
fdbs every time (dst == NULL). The code itself is correct and the alternative
to take only 1 net_device and act based on its type would add another
step to the process per-port which also doesn't sound good...
There are a few minor const nits below too, again if there is another version
please take care of them, for the patch:

Acked-by: Nikolay Aleksandrov 

> Reported-by: Ido Schimmel 
> Signed-off-by: Vladimir Oltean 
> ---
>  include/linux/if_bridge.h |  9 +++
>  net/bridge/br_fdb.c   | 50 +++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index f6472969bb44..b564c4486a45 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, 
> u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
>  clock_t br_get_ageing_time(struct net_device *br_dev);
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct 
> net_device *br_dev)
>  {
>   return 0;
>  }
> +
> +static inline int br_fdb_replay(struct net_device *br_dev,
> + struct net_device *dev,
> + struct notifier_block *nb)
> +{
> + return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b7490237f3fc..698b79747d32 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -726,6 +726,56 @@ static inline size_t fdb_nlmsg_size(void)
>   + nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
>  }
>  
> +static int br_fdb_replay_one(struct notifier_block *nb,
> +  struct net_bridge_fdb_entry *fdb,
> +  struct net_device *dev)
> +{
> + struct switchdev_notifier_fdb_info item;
> + int err;
> +
> + item.addr = fdb->key.addr.addr;
> + item.vid = fdb->key.vlan_id;
> + item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> + item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
> + item.info.dev = dev;
> +
> + err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
> + return notifier_to_errno(err);
> +}
> +
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb)

The devices can be const

> +{
> + struct net_bridge_fdb_entry *fdb;
> + struct net_bridge *br;
> + int err = 0;
> +
> + if (!netif_is_bridge_master(br_dev) || !netif_is_bridge_port(dev))
> + return -EINVAL;

Re: [PATCH v4 net-next 02/11] net: bridge: add helper to retrieve the current ageing time

2021-03-23 Thread Nikolay Aleksandrov

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from:
> 
> sysfs/ioctl/netlink
> -> br_set_ageing_time
>-> __set_ageing_time
> 
> therefore not at bridge port creation time, so:
> (a) switchdev drivers have to hardcode the initial value for the address
> ageing time, because they didn't get any notification
> (b) that hardcoded value can be out of sync, if the user changes the
> ageing time before enslaving the port to the bridge
> 
> We need a helper in the bridge, such that switchdev drivers can query
> the current value of the bridge ageing time when they start offloading
> it.
> 
> Signed-off-by: Vladimir Oltean 
> Reviewed-by: Florian Fainelli 
> Reviewed-by: Tobias Waldekranz 
> ---
>  include/linux/if_bridge.h |  6 ++
>  net/bridge/br_stp.c   | 13 +
>  2 files changed, 19 insertions(+)
> 

The patch is mostly fine, there are a few minor nits (const qualifiers). If 
there
is another version of the patch-set please add them, either way:

Acked-by: Nikolay Aleksandrov 

> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 920d3a02cc68..ebd16495459c 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -137,6 +137,7 @@ struct net_device *br_fdb_find_port(const struct 
> net_device *br_dev,
>  void br_fdb_clear_offload(const struct net_device *dev, u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
> +clock_t br_get_ageing_time(struct net_device *br_dev);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -160,6 +161,11 @@ static inline u8 br_port_get_stp_state(const struct 
> net_device *dev)
>  {
>   return BR_STATE_DISABLED;
>  }
> +
> +static inline clock_t br_get_ageing_time(struct net_device *br_dev)

const

> +{
> + return 0;
> +}
>  #endif
>  
>  #endif
> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
> index 86b5e05d3f21..3dafb6143cff 100644
> --- a/net/bridge/br_stp.c
> +++ b/net/bridge/br_stp.c
> @@ -639,6 +639,19 @@ int br_set_ageing_time(struct net_bridge *br, clock_t 
> ageing_time)
>   return 0;
>  }
>  
> +clock_t br_get_ageing_time(struct net_device *br_dev)

const

> +{
> + struct net_bridge *br;

const

> +
> + if (!netif_is_bridge_master(br_dev))
> + return 0;
> +
> + br = netdev_priv(br_dev);
> +
> + return jiffies_to_clock_t(br->ageing_time);
> +}
> +EXPORT_SYMBOL_GPL(br_get_ageing_time);
> +
>  /* called under bridge lock */
>  void __br_set_topology_change(struct net_bridge *br, unsigned char val)
>  {
>

Re: [PATCH v4 net-next 01/11] net: bridge: add helper for retrieving the current bridge port STP state

2021-03-23 Thread Nikolay Aleksandrov

On 23/03/2021 01:51, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> It may happen that we have the following topology with DSA or any other
> switchdev driver with LAG offload:
> 
> ip link add br0 type bridge stp_state 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> ip link set swp1 master bond0
> 
> STP decides that it should put bond0 into the BLOCKING state, and
> that's that. The ports that are actively listening for the switchdev
> port attributes emitted for the bond0 bridge port (because they are
> offloading it) and have the honor of seeing that switchdev port
> attribute can react to it, so we can program swp0 and swp1 into the
> BLOCKING state.
> 
> But if then we do:
> 
> ip link set swp2 master bond0
> 
> then as far as the bridge is concerned, nothing has changed: it still
> has one bridge port. But this new bridge port will not see any STP state
> change notification and will remain FORWARDING, which is how the
> standalone code leaves it in.
> 
> We need a function in the bridge driver which retrieves the current STP
> state, such that drivers can synchronize to it when they may have missed
> switchdev events.
> 
> Signed-off-by: Vladimir Oltean 
> Reviewed-by: Florian Fainelli 
> Reviewed-by: Tobias Waldekranz 
> ---
>  include/linux/if_bridge.h |  6 ++
>  net/bridge/br_stp.c   | 14 ++
>  2 files changed, 20 insertions(+)
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH v3 net-next 08/12] net: dsa: replay port and host-joined mdb entries when joining the bridge

2021-03-22 Thread Nikolay Aleksandrov

On 22/03/2021 18:56, Vladimir Oltean wrote:
> On Mon, Mar 22, 2021 at 06:35:10PM +0200, Nikolay Aleksandrov wrote:
>>> +   hlist_for_each_entry(mp, &br->mdb_list, mdb_node) {
>>
>> You cannot walk over these lists without the multicast lock or RCU. RTNL is 
>> not
>> enough because of various timers and leave messages that can alter both the 
>> mdb_list
>> and the port group lists. I'd prefer RCU to avoid blocking the bridge mcast.
> 
> The trouble is that I need to emulate the calling context that is
> provided to SWITCHDEV_OBJ_ID_HOST_MDB and SWITCHDEV_OBJ_ID_PORT_MDB, and
> that means blocking context.
> 
> So if I hold rcu_read_lock(), I need to queue up the mdb entries, and
> notify the driver only after I leave the RCU critical section. The
> memory footprint may temporarily blow up.
> 
> In fact this is what I did in v1:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210224114350.2791260-15-olte...@gmail.com/
> 
> I just figured I could get away with rtnl_mutex protection, but it looks
> like I can't. So I guess you prefer my v1?
> 

Indeed, if you need a blocking context then you'd have to go with v1.

Re: [PATCH v3 net-next 10/12] net: dsa: replay VLANs installed on port when joining the bridge

2021-03-22 Thread Nikolay Aleksandrov

On 21/03/2021 00:34, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> Currently this simple setup:
> 
> ip link add br0 type bridge vlan_filtering 1
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0
> 
> will not work because the bridge has created the PVID in br_add_if ->
> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1,
> but that was too early, since swp0 was not yet a lower of bond0, so it
> had no reason to act upon that notification.
> 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> Made the br_vlan_replay shim return -EOPNOTSUPP.
> 
>  include/linux/if_bridge.h | 10 ++
>  net/bridge/br_vlan.c  | 71 +++
>  net/dsa/port.c|  6 
>  3 files changed, 87 insertions(+)
[snip]
> +int br_vlan_replay(struct net_device *br_dev, struct net_device *dev,
> +struct notifier_block *nb, struct netlink_ext_ack *extack)
> +{
> + struct net_bridge_vlan_group *vg;
> + struct net_bridge_vlan *v;
> + struct net_bridge_port *p;
> + struct net_bridge *br;
> + int err = 0;
> + u16 pvid;
> +
> + ASSERT_RTNL();
> +
> + if (!netif_is_bridge_master(br_dev))
> + return -EINVAL;
> +
> + if (!netif_is_bridge_master(dev) && !netif_is_bridge_port(dev))
> + return -EINVAL;
> +
> + if (netif_is_bridge_master(dev)) {
> + br = netdev_priv(dev);
> + vg = br_vlan_group(br);
> + p = NULL;
> + } else {
> + p = br_port_get_rtnl(dev);
> + if (WARN_ON(!p))
> + return -EINVAL;
> + vg = nbp_vlan_group(p);
> + br = p->br;
> + }
> +
> + if (!vg)
> + return 0;
> +
> + pvid = br_get_pvid(vg);
> +
> + list_for_each_entry(v, &vg->vlan_list, vlist) {
> + struct switchdev_obj_port_vlan vlan = {
> + .obj.orig_dev = dev,
> + .obj.id = SWITCHDEV_OBJ_ID_PORT_VLAN,
> + .flags = br_vlan_flags(v, pvid),
> + .vid = v->vid,
> + };
> +
> + if (!br_vlan_should_use(v))
> + continue;
> +
> + br_vlan_replay_one(nb, dev, &vlan, extack);
> + if (err)
> + return err;
> + }
> +
> + return err;
> +}

EXPORT_SYMBOL_GPL ?

>  /* check if v_curr can enter a range ending in range_end */
>  bool br_vlan_can_enter_range(const struct net_bridge_vlan *v_curr,
>const struct net_bridge_vlan *range_end)
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index d21a511f1e16..84775e253ee8 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -209,6 +209,12 @@ static int dsa_port_switchdev_sync(struct dsa_port *dp,
>   if (err && err != -EOPNOTSUPP)
>   return err;
>  
> + err = br_vlan_replay(br, brport_dev,
> +  &dsa_slave_switchdev_blocking_notifier,
> +  extack);
> + if (err && err != -EOPNOTSUPP)
> + return err;
> +
>   return 0;
>  }
>  
>

Re: [PATCH v3 net-next 09/12] net: dsa: replay port and local fdb entries when joining the bridge

2021-03-22 Thread Nikolay Aleksandrov

On 21/03/2021 00:34, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> When a DSA port joins a LAG that already had an FDB entry pointing to it:
> 
> ip link set bond0 master br0
> bridge fdb add dev bond0 00:01:02:03:04:05 master static
> ip link set swp0 master bond0
> 
> the DSA port will have no idea that this FDB entry is there, because it
> missed the switchdev event emitted at its creation.
> 
> Ido Schimmel pointed this out during a discussion about challenges with
> switchdev offloading of stacked interfaces between the physical port and
> the bridge, and recommended to just catch that condition and deny the
> CHANGEUPPER event:
> https://lore.kernel.org/netdev/20210210105949.gb287...@shredder.lan/
> 
> But in fact, we might need to deal with the hard thing anyway, which is
> to replay all FDB addresses relevant to this port, because it isn't just
> static FDB entries, but also local addresses (ones that are not
> forwarded but terminated by the bridge). There, we can't just say 'oh
> yeah, there was an upper already so I'm not joining that'.
> 
> So, similar to the logic for replaying MDB entries, add a function that
> must be called by individual switchdev drivers and replays local FDB
> entries as well as ones pointing towards a bridge port. This time, we
> use the atomic switchdev notifier block, since that's what FDB entries
> expect for some reason.
> 
> Reported-by: Ido Schimmel 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> Made the br_fdb_replay shim return -EOPNOTSUPP.
> 
>  include/linux/if_bridge.h |  9 +++
>  include/net/switchdev.h   |  1 +
>  net/bridge/br_fdb.c   | 52 +++
>  net/dsa/dsa_priv.h|  1 +
>  net/dsa/port.c|  4 +++
>  net/dsa/slave.c   |  2 +-
>  6 files changed, 68 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index f6472969bb44..b564c4486a45 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -147,6 +147,8 @@ void br_fdb_clear_offload(const struct net_device *dev, 
> u16 vid);
>  bool br_port_flag_is_set(const struct net_device *dev, unsigned long flag);
>  u8 br_port_get_stp_state(const struct net_device *dev);
>  clock_t br_get_ageing_time(struct net_device *br_dev);
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb);
>  #else
>  static inline struct net_device *
>  br_fdb_find_port(const struct net_device *br_dev,
> @@ -175,6 +177,13 @@ static inline clock_t br_get_ageing_time(struct 
> net_device *br_dev)
>  {
>   return 0;
>  }
> +
> +static inline int br_fdb_replay(struct net_device *br_dev,
> + struct net_device *dev,
> + struct notifier_block *nb)
> +{
> + return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #endif
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index b7fc7d0f54e2..7688ec572757 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -205,6 +205,7 @@ struct switchdev_notifier_info {
>  
>  struct switchdev_notifier_fdb_info {
>   struct switchdev_notifier_info info; /* must be first */
> + struct list_head list;
>   const unsigned char *addr;
>   u16 vid;
>   u8 added_by_user:1,
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index b7490237f3fc..49125cc196ac 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -726,6 +726,58 @@ static inline size_t fdb_nlmsg_size(void)
>   + nla_total_size(sizeof(u8)); /* NFEA_ACTIVITY_NOTIFY */
>  }
>  
> +static int br_fdb_replay_one(struct notifier_block *nb,
> +  struct net_bridge_fdb_entry *fdb,
> +  struct net_device *dev)
> +{
> + struct switchdev_notifier_fdb_info item;
> + int err;
> +
> + item.addr = fdb->key.addr.addr;
> + item.vid = fdb->key.vlan_id;
> + item.added_by_user = test_bit(BR_FDB_ADDED_BY_USER, &fdb->flags);
> + item.offloaded = test_bit(BR_FDB_OFFLOADED, &fdb->flags);
> + item.info.dev = dev;
> +
> + err = nb->notifier_call(nb, SWITCHDEV_FDB_ADD_TO_DEVICE, &item);
> + return notifier_to_errno(err);
> +}
> +
> +int br_fdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb)
> +{
> + struct net_bridge_fdb_entry *fdb;
> + struct net_bridge *br;
> + int err = 0;
> +
> + if (!netif_is_bridge_master(br_dev))
> + return -EINVAL;
> +
> + if (!netif_is_bridge_port(dev))
> + return -EINVAL;
> +
> + br = netdev_priv(br_dev);
> +
> + rcu_read_lock();
> +
> + hlist_for_each_entry_rcu(fdb, &br->fdb_list, fdb_node) {
> + struct net_device *dst_dev;
> +
> + dst_dev = fdb->dst ? fdb->dst->dev : br->dev;

Please use READ_ONCE() to read fdb->dst and then check the result here.
I'll soon send patches to annotate all f

Re: [PATCH v3 net-next 08/12] net: dsa: replay port and host-joined mdb entries when joining the bridge

2021-03-22 Thread Nikolay Aleksandrov

On 21/03/2021 00:34, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> I have udhcpcd in my system and this is configured to bring interfaces
> up as soon as they are created.
> 
> I create a bridge as follows:
> 
> ip link add br0 type bridge
> 
> As soon as I create the bridge and udhcpcd brings it up, I also have
> avahi which automatically starts sending IPv6 packets to advertise some
> local services, and because of that, the br0 bridge joins the following
> IPv6 groups due to the code path detailed below:
> 
> 33:33:ff:6d:c1:9c vid 0
> 33:33:00:00:00:6a vid 0
> 33:33:00:00:00:fb vid 0
> 
> br_dev_xmit
> -> br_multicast_rcv
>-> br_ip6_multicast_add_group
>   -> __br_multicast_add_group
>  -> br_multicast_host_join
> -> br_mdb_notify
> 
> This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host
> hooked up, and switchdev will attempt to offload the host joined groups
> to an empty list of ports. Of course nobody offloads them.
> 
> Then when we add a port to br0:
> 
> ip link set swp0 master br0
> 
> the bridge doesn't replay the host-joined MDB entries from br_add_if,
> and eventually the host joined addresses expire, and a switchdev
> notification for deleting it is emitted, but surprise, the original
> addition was already completely missed.
> 
> The strategy to address this problem is to replay the MDB entries (both
> the port ones and the host joined ones) when the new port joins the
> bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can
> be populated and only then attached to a bridge that you offload).
> However there are 2 possibilities: the addresses can be 'pushed' by the
> bridge into the port, or the port can 'pull' them from the bridge.
> 
> Considering that in the general case, the new port can be really late to
> the party, and there may have been many other switchdev ports that
> already received the initial notification, we would like to avoid
> delivering duplicate events to them, since they might misbehave. And
> currently, the bridge calls the entire switchdev notifier chain, whereas
> for replaying it should just call the notifier block of the new guy.
> But the bridge doesn't know what is the new guy's notifier block, it
> just knows where the switchdev notifier chain is. So for simplification,
> we make this a driver-initiated pull for now, and the notifier block is
> passed as an argument.
> 
> To emulate the calling context for mdb objects (deferred and put on the
> blocking notifier chain), we must iterate under RCU protection through
> the bridge's mdb entries, queue them, and only call them once we're out
> of the RCU read-side critical section.
> 
> Suggested-by: Ido Schimmel 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> - Removed the implication that avahi is crap from the commit message.
> - Made the br_mdb_replay shim return -EOPNOTSUPP.
> 
>  include/linux/if_bridge.h |  9 +
>  net/bridge/br_mdb.c   | 84 +++
>  net/dsa/dsa_priv.h|  2 +
>  net/dsa/port.c|  6 +++
>  net/dsa/slave.c   |  2 +-
>  5 files changed, 102 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index ebd16495459c..f6472969bb44 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -69,6 +69,8 @@ bool br_multicast_has_querier_anywhere(struct net_device 
> *dev, int proto);
>  bool br_multicast_has_querier_adjacent(struct net_device *dev, int proto);
>  bool br_multicast_enabled(const struct net_device *dev);
>  bool br_multicast_router(const struct net_device *dev);
> +int br_mdb_replay(struct net_device *br_dev, struct net_device *dev,
> +   struct notifier_block *nb, struct netlink_ext_ack *extack);
>  #else
>  static inline int br_multicast_list_adjacent(struct net_device *dev,
>struct list_head *br_ip_list)
> @@ -93,6 +95,13 @@ static inline bool br_multicast_router(const struct 
> net_device *dev)
>  {
>   return false;
>  }
> +static inline int br_mdb_replay(struct net_device *br_dev,
> + struct net_device *dev,
> + struct notifier_block *nb,
> + struct netlink_ext_ack *extack)
> +{
> + return -EOPNOTSUPP;
> +}
>  #endif
>  
>  #if IS_ENABLED(CONFIG_BRIDGE) && IS_ENABLED(CONFIG_BRIDGE_VLAN_FILTERING)
> diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
> index 8846c5bcd075..23973186094c 100644
> --- a/net/bridge/br_mdb.c
> +++ b/net/bridge/br_mdb.c
> @@ -506,6 +506,90 @@ static void br_mdb_complete(struct net_device *dev, int 
> err, void *priv)
>   kfree(priv);
>  }
>  
> +static int br_mdb_replay_one(struct notifier_block *nb, struct net_device 
> *dev,
> +  struct net_bridge_mdb_entry *mp, int obj_id,
> +  struct net_device *orig_dev,
> +  str

Re: [PATCH v3 net-next 00/12] Better support for sandwiched LAGs with bridge and DSA

2021-03-22 Thread Nikolay Aleksandrov

On 21/03/2021 00:34, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> The objective of this series is to make LAG uppers on top of switchdev
> ports work regardless of which order we link interfaces to their masters
> (first make the port join the LAG, then the LAG join the bridge, or the
> other way around).
> 
> There was a design decision to be made in patches 2-4 on whether we
> should adopt the "push" model (which attempts to solve the problem
> centrally, in the bridge layer) where the driver just calls:
> 
>   switchdev_bridge_port_offloaded(brport_dev,
>   &atomic_notifier_block,
>   &blocking_notifier_block,
>   extack);
> 
> and the bridge just replays the entire collection of switchdev port
> attributes and objects that it has, in some predefined order and with
> some predefined error handling logic;
> 
> 
> or the "pull" model (which attempts to solve the problem by giving the
> driver the rope to hang itself), where the driver, apart from calling:
> 
>   switchdev_bridge_port_offloaded(brport_dev, extack);
> 
> has the task of "dumpster diving" (as Tobias puts it) through the bridge
> attributes and objects by itself, by calling:
> 
>   - br_vlan_replay
>   - br_fdb_replay
>   - br_mdb_replay
>   - br_vlan_enabled
>   - br_port_flag_is_set
>   - br_port_get_stp_state
>   - br_multicast_router
>   - br_get_ageing_time
> 
> (not necessarily all of them, and not necessarily in this order, and
> with driver-defined error handling).
> 
> Even though I'm not in love myself with the "pull" model, I chose it
> because there is a fundamental trick with replaying switchdev events
> like this:
> 
> ip link add br0 type bridge
> ip link add bond0 type bond
> ip link set bond0 master br0
> ip link set swp0 master bond0 <- this will replay the objects once for
>  the bond0 bridge port, and the swp0
>  switchdev port will process them
> ip link set swp1 master bond0 <- this will replay the objects again for
>  the bond0 bridge port, and the swp1
>  switchdev port will see them, but swp0
>  will see them for the second time now
> 
> Basically I believe that it is implementation defined whether the driver
> wants to error out on switchdev objects seen twice on a port, and the
> bridge should not enforce a certain model for that. For example, for FDB
> entries added to a bonding interface, the underling switchdev driver
> might have an abstraction for just that: an FDB entry pointing towards a
> logical (as opposed to physical) port. So when the second port joins the
> bridge, it doesn't realy need to replay FDB entries, since there is
> already at least one hardware port which has been receiving those
> events, and the FDB entries don't need to be added a second time to the
> same logical port.
> In the other corner, we have the drivers that handle switchdev port
> attributes on a LAG as individual switchdev port attributes on physical
> ports (example: VLAN filtering). In fact, the switchdev_handle_port_attr_set
> helper facilitates this: it is a fan-out from a single orig_dev towards
> multiple lowers that pass the check_cb().
> But that's the point: switchdev_handle_port_attr_set is just a helper
> which the driver _opts_ to use. The bridge can't enforce the "push"
> model, because that would assume that all drivers handle port attributes
> in the same way, which is probably false.
> 
> For this reason, I preferred to go with the "pull" mode for this patch
> set. Just to see how bad it is for other switchdev drivers to copy-paste
> this logic, I added the pull support to ocelot too, and I think it's
> pretty manageable.
> 
> Vladimir Oltean (12):
>   net: dsa: call dsa_port_bridge_join when joining a LAG that is already
> in a bridge
>   net: dsa: pass extack to dsa_port_{bridge,lag}_join
>   net: dsa: inherit the actual bridge port flags at join time
>   net: dsa: sync up with bridge port's STP state when joining
>   net: dsa: sync up VLAN filtering state when joining the bridge
>   net: dsa: sync multicast router state when joining the bridge
>   net: dsa: sync ageing time when joining the bridge
>   net: dsa: replay port and host-joined mdb entries when joining the
> bridge
>   net: dsa: replay port and local fdb entries when joining the bridge
>   net: dsa: replay VLANs installed on port when joining the bridge
>   net: ocelot: call ocelot_netdevice_bridge_join when joining a bridged
> LAG
>   net: ocelot: replay switchdev events when joining bridge
> 
>  drivers/net/dsa/ocelot/felix.c |   4 +-
>  drivers/net/ethernet/mscc/ocelot.c |  18 +--
>  drivers/net/ethernet/mscc/ocelot_net.c | 208 +
>  include/linux/if_bridge.h  |  40 +
>  include/net/switchdev.h|   1 +
>  inclu

Re: [PATCH] btrfs: Use immediate assignment when referencing cc-option

2021-03-16 Thread Nikolay Borisov




On 17.03.21 г. 0:46 ч., Victor Erminpour wrote:
> Calling cc-option will use KBUILD_CFLAGS, which when lazy setting
> subdir-ccflags-y produces the following build error:
> 
> scripts/Makefile.lib:10: *** Recursive variable `KBUILD_CFLAGS' \
>   references itself (eventually).  Stop.
> 
> Use := assignment to subdir-ccflags-y when referencing cc-option.
> This causes make to also evaluate += immediately, cc-option
> calls are done right away and we don't end up with KBUILD_CFLAGS
> referencing itself.
> 
> Signed-off-by: Victor Erminpour 
> ---
>  fs/btrfs/Makefile | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
> index b634c42115ea..3dba1336fa95 100644
> --- a/fs/btrfs/Makefile
> +++ b/fs/btrfs/Makefile
> @@ -7,10 +7,10 @@ subdir-ccflags-y += -Wmissing-format-attribute
>  subdir-ccflags-y += -Wmissing-prototypes
>  subdir-ccflags-y += -Wold-style-definition
>  subdir-ccflags-y += -Wmissing-include-dirs
> -subdir-ccflags-y += $(call cc-option, -Wunused-but-set-variable)
> -subdir-ccflags-y += $(call cc-option, -Wunused-const-variable)
> -subdir-ccflags-y += $(call cc-option, -Wpacked-not-aligned)
> -subdir-ccflags-y += $(call cc-option, -Wstringop-truncation)
> +subdir-ccflags-y := $(call cc-option, -Wunused-but-set-variable)
> +subdir-ccflags-y := $(call cc-option, -Wunused-const-variable)
> +subdir-ccflags-y := $(call cc-option, -Wpacked-not-aligned)
> +subdir-ccflags-y := $(call cc-option, -Wstringop-truncation)
>  # The following turn off the warnings enabled by -Wextra
>  subdir-ccflags-y += -Wno-missing-field-initializers
>  subdir-ccflags-y += -Wno-sign-compare
> 

Why does this patch change only some assignments and others are left as
they were?

Re: [btrfs] 5297199a8b: xfstests.btrfs.220.fail

2021-03-09 Thread Nikolay Borisov




On 9.03.21 г. 10:49 ч., kernel test robot wrote:
> 
> 
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: 5297199a8bca12b8b96afcbf2341605efb6005de ("btrfs: remove inode number 
> cache feature")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> 
> in testcase: xfstests
> version: xfstests-x86_64-d41dcbd-1_20201218
> with following parameters:
> 
>   disk: 6HDD
>   fs: btrfs
>   test: btrfs-group-22
>   ucode: 0x28
> 
> test-description: xfstests is a regression test suite for xfs and other files 
> ystems.
> test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
> 
> 
> on test machine: 8 threads Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 8G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 2021-03-09 04:13:26 export TEST_DIR=/fs/sdb1
> 2021-03-09 04:13:26 export TEST_DEV=/dev/sdb1
> 2021-03-09 04:13:26 export FSTYP=btrfs
> 2021-03-09 04:13:26 export SCRATCH_MNT=/fs/scratch
> 2021-03-09 04:13:26 mkdir /fs/scratch -p
> 2021-03-09 04:13:26 export SCRATCH_DEV_POOL="/dev/sdb2 /dev/sdb3 /dev/sdb4 
> /dev/sdb5 /dev/sdb6"
> 2021-03-09 04:13:26 sed "s:^:btrfs/:" 
> //lkp/benchmarks/xfstests/tests/btrfs-group-22
> 2021-03-09 04:13:26 ./check btrfs/220 btrfs/221 btrfs/222 btrfs/223 btrfs/224 
> btrfs/225 btrfs/226 btrfs/227
> FSTYP -- btrfs
> PLATFORM  -- Linux/x86_64 lkp-hsw-d01 5.10.0-rc7-00162-g5297199a8bca #1 
> SMP Sat Feb 27 21:06:26 CST 2021
> MKFS_OPTIONS  -- /dev/sdb2
> MOUNT_OPTIONS -- /dev/sdb2 /fs/scratch
> 
> btrfs/220 - output mismatch (see 
> /lkp/benchmarks/xfstests/results//btrfs/220.out.bad)
> --- tests/btrfs/220.out   2021-01-14 07:40:58.0 +
> +++ /lkp/benchmarks/xfstests/results//btrfs/220.out.bad   2021-03-09 
> 04:13:32.880794446 +
> @@ -1,2 +1,3 @@
>  QA output created by 220
> +Unexepcted mount options, checking for 
> 'inode_cache,relatime,space_cache,subvol=/,subvolid=5' in 
> 'rw,relatime,space_cache,subvolid=5,subvol=/' using 'inode_cache'


Given that the commit removes the inode_cache feature that's expected, I
assume you need to adjust your fstests configuration to not use
inode_cache.

[PATCH] staging: rtl8192e: remove redundant variable shadowing

2021-03-02 Thread Nikolay Kyx

In function rtl92e_start_adapter() automatic variable 'i' referenced only
within certain loops, used as iteration counter. Control flow can't get
into such loop w/o 'i = 0' assignment.

It's redundant to shadow this variable by creating scope around loop.

This patch fixes the following sparse warning:

warning: symbol 'i' shadows an earlier one

Signed-off-by: Nikolay Kyx 
---
 drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c 
b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
index ff843d7ec606..8cd085ebea81 100644
--- a/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
+++ b/drivers/staging/rtl8192e/rtl8192e/r8192E_dev.c
@@ -800,12 +800,10 @@ bool rtl92e_start_adapter(struct net_device *dev)
}
rtl92e_writew(dev, ATIMWND, 2);
rtl92e_writew(dev, BCN_INTERVAL, 100);
-   {
-   int i;
 
-   for (i = 0; i < QOS_QUEUE_NUM; i++)
-   rtl92e_writel(dev, WDCAPARA_ADD[i], 0x005e4332);
-   }
+   for (i = 0; i < QOS_QUEUE_NUM; i++)
+   rtl92e_writel(dev, WDCAPARA_ADD[i], 0x005e4332);
+
rtl92e_writeb(dev, 0xbe, 0xc0);
 
rtl92e_config_mac(dev);
-- 
2.30.1

Re: [PATCH] staging: media: omap4iss: code style - avoid macro argument precedence issues

2021-02-21 Thread Nikolay K.

Hi Laurent,

Thank you for the review.
I think that if we drop the unneeded parentheses here, we need to drop
them everywhere in the file for consistency, even in places checkpatch.pl
doesn't warn about. It'll increase patch size without actual usefullness 
gain, as for me. I am very (very) novice to the kernel,
but who wants slightly more readable one-line simple macros?

[PATCH] staging: media: omap4iss: code style - avoid macro argument precedence issues

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl check:

CHECK: Macro argument 'i' may be better as '(i)' to avoid precedence issues

in file iss_regs.h

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

 drivers/staging/media/omap4iss/iss_regs.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/media/omap4iss/iss_regs.h 
b/drivers/staging/media/omap4iss/iss_regs.h
index 09a7375c89ac..cfe0bb075072 100644
--- a/drivers/staging/media/omap4iss/iss_regs.h
+++ b/drivers/staging/media/omap4iss/iss_regs.h
@@ -197,7 +197,7 @@
 #define CSI2_TIMING_STOP_STATE_COUNTER_IO1_MASK(0x1fff << 0)
 #define CSI2_TIMING_STOP_STATE_COUNTER_IO1_SHIFT   0
 
-#define CSI2_CTX_CTRL1(i)  (0x70 + (0x20 * i))
+#define CSI2_CTX_CTRL1(i)  (0x70 + (0x20 * (i)))
 #define CSI2_CTX_CTRL1_GENERIC BIT(30)
 #define CSI2_CTX_CTRL1_TRANSCODE   (0xf << 24)
 #define CSI2_CTX_CTRL1_FEC_NUMBER_MASK (0xff << 16)
@@ -210,7 +210,7 @@
 #define CSI2_CTX_CTRL1_PING_PONG   BIT(3)
 #define CSI2_CTX_CTRL1_CTX_EN  BIT(0)
 
-#define CSI2_CTX_CTRL2(i)  (0x74 + (0x20 * i))
+#define CSI2_CTX_CTRL2(i)  (0x74 + (0x20 * (i)))
 #define CSI2_CTX_CTRL2_FRAME_MASK  (0x << 16)
 #define CSI2_CTX_CTRL2_FRAME_SHIFT 16
 #define CSI2_CTX_CTRL2_USER_DEF_MAP_SHIFT  13
@@ -222,19 +222,19 @@
 #define CSI2_CTX_CTRL2_FORMAT_MASK (0x3ff << 0)
 #define CSI2_CTX_CTRL2_FORMAT_SHIFT0
 
-#define CSI2_CTX_DAT_OFST(i)   (0x78 + (0x20 * i))
+#define CSI2_CTX_DAT_OFST(i)   (0x78 + (0x20 * (i)))
 #define CSI2_CTX_DAT_OFST_MASK (0xfff << 5)
 
-#define CSI2_CTX_PING_ADDR(i)  (0x7c + (0x20 * i))
+#define CSI2_CTX_PING_ADDR(i)  (0x7c + (0x20 * (i)))
 #define CSI2_CTX_PING_ADDR_MASK0xffe0
 
-#define CSI2_CTX_PONG_ADDR(i)  (0x80 + (0x20 * i))
+#define CSI2_CTX_PONG_ADDR(i)  (0x80 + (0x20 * (i)))
 #define CSI2_CTX_PONG_ADDR_MASK
CSI2_CTX_PING_ADDR_MASK
 
-#define CSI2_CTX_IRQENABLE(i)  (0x84 + (0x20 * i))
-#define CSI2_CTX_IRQSTATUS(i)  (0x88 + (0x20 * i))
+#define CSI2_CTX_IRQENABLE(i)  (0x84 + (0x20 * (i)))
+#define CSI2_CTX_IRQSTATUS(i)  (0x88 + (0x20 * (i)))
 
-#define CSI2_CTX_CTRL3(i)  (0x8c + (0x20 * i))
+#define CSI2_CTX_CTRL3(i)  (0x8c + (0x20 * (i)))
 #define CSI2_CTX_CTRL3_ALPHA_SHIFT 5
 #define CSI2_CTX_CTRL3_ALPHA_MASK  \
(0x3fff << CSI2_CTX_CTRL3_ALPHA_SHIFT)
-- 
2.30.1

[PATCH v4 02/02] staging: kpc2000: code style: fix line length issue

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warning:

WARNING: line length of 124 exceeds 100 columns

in file kpc2000_i2c.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Edited changelog, as suggested by Greg KH 
v3: Splitted patch in two parts, as suggested by Greg KH 

v4: Changed patch subject line, as suggested by Greg KH 


 drivers/staging/kpc2000/kpc2000_i2c.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..68f5ec000365 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -200,7 +200,9 @@ static int i801_check_post(struct kpc_i2c *priv, int 
status, int timeout)
outb_p(status & STATUS_FLAGS, SMBHSTSTS(priv));
status = inb_p(SMBHSTSTS(priv)) & STATUS_FLAGS;
if (status)
-   dev_warn(&priv->adapter.dev, "Failed clearing status 
flags at end of transaction (%02x)\n", status);
+   dev_warn(&priv->adapter.dev,
+"Failed clearing status flags at end of 
transaction (%02x)\n",
+status);
}
 
return result;
-- 
2.30.1

[PATCH v4 01/02] staging: kpc2000: code style: match alignment with open parenthesis

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl check:

CHECK: Alignment should match open parenthesis

in files kpc2000_i2c.c kpc2000_spi.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Edited changelog, as suggested by Greg KH 
v3: Splitted patch in two parts, as suggested by Greg KH 

v4: Changed patch subject line, as suggested by Greg KH 


 drivers/staging/kpc2000/kpc2000_i2c.c | 2 +-
 drivers/staging/kpc2000/kpc2000_spi.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..3f1f833d3b51 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -269,7 +269,7 @@ static int i801_block_transaction_by_block(struct kpc_i2c 
*priv,
}
 
status = i801_transaction(priv,
-   I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * hwpec);
+ I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * 
hwpec);
if (status)
return status;
 
diff --git a/drivers/staging/kpc2000/kpc2000_spi.c 
b/drivers/staging/kpc2000/kpc2000_spi.c
index 44017d523da5..16ca18b8aa15 100644
--- a/drivers/staging/kpc2000/kpc2000_spi.c
+++ b/drivers/staging/kpc2000/kpc2000_spi.c
@@ -465,7 +465,7 @@ kp_spi_probe(struct platform_device *pldev)
}
 
kpspi->base = devm_ioremap(&pldev->dev, r->start,
-  resource_size(r));
+  resource_size(r));
 
status = spi_register_master(master);
if (status < 0) {
-- 
2.30.1

[PATCH v2] staging: media: ipu3: code style fix - avoid multiple line dereference

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warning:

WARNING: Avoid multiple line dereference

in file ipu3-css.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Removed second part of patch which fixes non-existent problem

 drivers/staging/media/ipu3/ipu3-css.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/media/ipu3/ipu3-css.c 
b/drivers/staging/media/ipu3/ipu3-css.c
index 608dcacf12b2..29894ee566c1 100644
--- a/drivers/staging/media/ipu3/ipu3-css.c
+++ b/drivers/staging/media/ipu3/ipu3-css.c
@@ -1206,14 +1206,14 @@ static int imgu_css_binary_preallocate(struct imgu_css 
*css, unsigned int pipe)
 
for (i = 0; i < IPU3_CSS_AUX_FRAMES; i++)
if (!imgu_dmamap_alloc(imgu,
-  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_REF].
-  mem[i], CSS_BDS_SIZE))
+  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_REF].mem[i],
+  CSS_BDS_SIZE))
goto out_of_memory;
 
for (i = 0; i < IPU3_CSS_AUX_FRAMES; i++)
if (!imgu_dmamap_alloc(imgu,
-  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_TNR].
-  mem[i], CSS_GDC_SIZE))
+  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_TNR].mem[i],
+  CSS_GDC_SIZE))
goto out_of_memory;
 
return 0;
@@ -1441,13 +1441,11 @@ static int imgu_css_map_init(struct imgu_css *css, 
unsigned int pipe)
for (p = 0; p < IPU3_CSS_PIPE_ID_NUM; p++)
for (i = 0; i < IMGU_ABI_MAX_STAGES; i++) {
if (!imgu_dmamap_alloc(imgu,
-  &css_pipe->
-  xmem_sp_stage_ptrs[p][i],
+  
&css_pipe->xmem_sp_stage_ptrs[p][i],
   sizeof(struct 
imgu_abi_sp_stage)))
return -ENOMEM;
if (!imgu_dmamap_alloc(imgu,
-  &css_pipe->
-  xmem_isp_stage_ptrs[p][i],
+  
&css_pipe->xmem_isp_stage_ptrs[p][i],
   sizeof(struct 
imgu_abi_isp_stage)))
return -ENOMEM;
}
-- 
2.30.1

Re: [PATCH 2/2] staging: media: ipu3: code style fix - missing a blank line after declarations

2021-02-21 Thread Nikolay K.

I can't find any change in struct imgu_fw_info layout after this patch.
But warning is strange, because declarations don't actually end here.
So I think this warning should be suppressed to reduce noise
in checkpatch.pl output.

[PATCH 2/2] staging: media: ipu3: code style fix - missing a blank line after declarations

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warning:

WARNING: Missing a blank line after declarations

in file ipu3-css-fw.h

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

 drivers/staging/media/ipu3/ipu3-css-fw.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/media/ipu3/ipu3-css-fw.h 
b/drivers/staging/media/ipu3/ipu3-css-fw.h
index 79ffa7045139..3c078f15a295 100644
--- a/drivers/staging/media/ipu3/ipu3-css-fw.h
+++ b/drivers/staging/media/ipu3/ipu3-css-fw.h
@@ -148,6 +148,7 @@ union imgu_fw_union {
 struct imgu_fw_info {
size_t header_size; /* size of fw header */
u32 type __aligned(8);  /* enum imgu_fw_type */
+
union imgu_fw_union info;   /* Binary info */
struct imgu_abi_blob_info blob; /* Blob info */
/* Dynamic part */
-- 
2.30.1

[PATCH 1/2] staging: media: ipu3: code style fix - avoid multiple line dereference

2021-02-21 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warning:

WARNING: Avoid multiple line dereference

in file ipu3-css.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

 drivers/staging/media/ipu3/ipu3-css.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/media/ipu3/ipu3-css.c 
b/drivers/staging/media/ipu3/ipu3-css.c
index 608dcacf12b2..29894ee566c1 100644
--- a/drivers/staging/media/ipu3/ipu3-css.c
+++ b/drivers/staging/media/ipu3/ipu3-css.c
@@ -1206,14 +1206,14 @@ static int imgu_css_binary_preallocate(struct imgu_css 
*css, unsigned int pipe)
 
for (i = 0; i < IPU3_CSS_AUX_FRAMES; i++)
if (!imgu_dmamap_alloc(imgu,
-  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_REF].
-  mem[i], CSS_BDS_SIZE))
+  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_REF].mem[i],
+  CSS_BDS_SIZE))
goto out_of_memory;
 
for (i = 0; i < IPU3_CSS_AUX_FRAMES; i++)
if (!imgu_dmamap_alloc(imgu,
-  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_TNR].
-  mem[i], CSS_GDC_SIZE))
+  
&css_pipe->aux_frames[IPU3_CSS_AUX_FRAME_TNR].mem[i],
+  CSS_GDC_SIZE))
goto out_of_memory;
 
return 0;
@@ -1441,13 +1441,11 @@ static int imgu_css_map_init(struct imgu_css *css, 
unsigned int pipe)
for (p = 0; p < IPU3_CSS_PIPE_ID_NUM; p++)
for (i = 0; i < IMGU_ABI_MAX_STAGES; i++) {
if (!imgu_dmamap_alloc(imgu,
-  &css_pipe->
-  xmem_sp_stage_ptrs[p][i],
+  
&css_pipe->xmem_sp_stage_ptrs[p][i],
   sizeof(struct 
imgu_abi_sp_stage)))
return -ENOMEM;
if (!imgu_dmamap_alloc(imgu,
-  &css_pipe->
-  xmem_isp_stage_ptrs[p][i],
+  
&css_pipe->xmem_isp_stage_ptrs[p][i],
   sizeof(struct 
imgu_abi_isp_stage)))
return -ENOMEM;
}
-- 
2.30.1

[PATCH v3 02/02] staging: kpc2000: code style: fix alignment issues

2021-02-19 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warning:

WARNING: line length of 124 exceeds 100 columns

in file kpc2000_i2c.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Edited changelog, as suggested by Greg KH 
v3: Splitted patch in two parts, as suggested by Greg KH 


 drivers/staging/kpc2000/kpc2000_i2c.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..68f5ec000365 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -200,7 +200,9 @@ static int i801_check_post(struct kpc_i2c *priv, int 
status, int timeout)
outb_p(status & STATUS_FLAGS, SMBHSTSTS(priv));
status = inb_p(SMBHSTSTS(priv)) & STATUS_FLAGS;
if (status)
-   dev_warn(&priv->adapter.dev, "Failed clearing status 
flags at end of transaction (%02x)\n", status);
+   dev_warn(&priv->adapter.dev,
+"Failed clearing status flags at end of 
transaction (%02x)\n",
+status);
}
 
return result;
-- 
2.30.1

[PATCH v3 01/02] staging: kpc2000: code style: fix alignment issues

2021-02-19 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl check:

CHECK: Alignment should match open parenthesis

in files kpc2000_i2c.c kpc2000_spi.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Edited changelog, as suggested by Greg KH 
v3: Splitted patch in two parts, as suggested by Greg KH 


 drivers/staging/kpc2000/kpc2000_i2c.c | 2 +-
 drivers/staging/kpc2000/kpc2000_spi.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..3f1f833d3b51 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -269,7 +269,7 @@ static int i801_block_transaction_by_block(struct kpc_i2c 
*priv,
}
 
status = i801_transaction(priv,
-   I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * hwpec);
+ I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * 
hwpec);
if (status)
return status;
 
diff --git a/drivers/staging/kpc2000/kpc2000_spi.c 
b/drivers/staging/kpc2000/kpc2000_spi.c
index 44017d523da5..16ca18b8aa15 100644
--- a/drivers/staging/kpc2000/kpc2000_spi.c
+++ b/drivers/staging/kpc2000/kpc2000_spi.c
@@ -465,7 +465,7 @@ kp_spi_probe(struct platform_device *pldev)
}
 
kpspi->base = devm_ioremap(&pldev->dev, r->start,
-  resource_size(r));
+  resource_size(r));
 
status = spi_register_master(master);
if (status < 0) {
-- 
2.30.1

[PATCH v3] staging: fwserial: match alignment with open parenthesis

2021-02-19 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl check:

CHECK: Alignment should match open parenthesis

in file fwserial.c

Signed-off-by: Nikolay Kyx 
---

Additionally some style warnings remain valid here and could be fixed by
another patch.

v2: Edited changelog, as suggested by Greg KH 

v3: Moved comment about remaining warnings under the cut-off line,
as suggested by Dan Carpenter  

 drivers/staging/fwserial/fwserial.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index c368082aae1a..137e97c9406c 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -1318,8 +1318,8 @@ static int fwtty_break_ctl(struct tty_struct *tty, int 
state)
if (state == -1) {
set_bit(STOP_TX, &port->flags);
ret = wait_event_interruptible_timeout(port->wait_tx,
-  !test_bit(IN_TX, &port->flags),
-  10);
+  !test_bit(IN_TX, 
&port->flags),
+  10);
if (ret == 0 || ret == -ERESTARTSYS) {
clear_bit(STOP_TX, &port->flags);
fwtty_restart_tx(port);
-- 
2.30.1

[PATCH v2] staging: fwserial: match alignment with open parenthesis

2021-02-19 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl check:

CHECK: Alignment should match open parenthesis

in file fwserial.c

Additionally some style warnings remain valid here and could be fixed by
another patch.

Signed-off-by: Nikolay Kyx 
---

v2: Edited changelog, as suggested by Greg KH 

 drivers/staging/fwserial/fwserial.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index c368082aae1a..137e97c9406c 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -1318,8 +1318,8 @@ static int fwtty_break_ctl(struct tty_struct *tty, int 
state)
if (state == -1) {
set_bit(STOP_TX, &port->flags);
ret = wait_event_interruptible_timeout(port->wait_tx,
-  !test_bit(IN_TX, &port->flags),
-  10);
+  !test_bit(IN_TX, 
&port->flags),
+  10);
if (ret == 0 || ret == -ERESTARTSYS) {
clear_bit(STOP_TX, &port->flags);
fwtty_restart_tx(port);
-- 
2.30.1

[PATCH v2] staging: kpc2000: code style: fix alignment issues

2021-02-19 Thread Nikolay Kyx

This patch fixes the following checkpatch.pl warnings:

WARNING: line length of 124 exceeds 100 columns
CHECK: Alignment should match open parenthesis

in files kpc2000_i2c.c kpc2000_spi.c

Additionally some style warnings remain valid here and could be fixed by
another patch.

Signed-off-by: Nikolay Kyx 
---

v2: Edited changelog, as suggested by Greg KH 

 drivers/staging/kpc2000/kpc2000_i2c.c | 6 --
 drivers/staging/kpc2000/kpc2000_spi.c | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..14f7940fa4fb 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -200,7 +200,9 @@ static int i801_check_post(struct kpc_i2c *priv, int 
status, int timeout)
outb_p(status & STATUS_FLAGS, SMBHSTSTS(priv));
status = inb_p(SMBHSTSTS(priv)) & STATUS_FLAGS;
if (status)
-   dev_warn(&priv->adapter.dev, "Failed clearing status 
flags at end of transaction (%02x)\n", status);
+   dev_warn(&priv->adapter.dev,
+"Failed clearing status flags at end of 
transaction (%02x)\n",
+status);
}
 
return result;
@@ -269,7 +271,7 @@ static int i801_block_transaction_by_block(struct kpc_i2c 
*priv,
}
 
status = i801_transaction(priv,
-   I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * hwpec);
+ I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * 
hwpec);
if (status)
return status;
 
diff --git a/drivers/staging/kpc2000/kpc2000_spi.c 
b/drivers/staging/kpc2000/kpc2000_spi.c
index 44017d523da5..16ca18b8aa15 100644
--- a/drivers/staging/kpc2000/kpc2000_spi.c
+++ b/drivers/staging/kpc2000/kpc2000_spi.c
@@ -465,7 +465,7 @@ kp_spi_probe(struct platform_device *pldev)
}
 
kpspi->base = devm_ioremap(&pldev->dev, r->start,
-  resource_size(r));
+  resource_size(r));
 
status = spi_register_master(master);
if (status < 0) {
-- 
2.30.1

[PATCH] staging: kpc2000: code style: fix alignment issues

2021-02-18 Thread Nikolay Kyx

kpc2000_i2c.c:
fix WARNING: line length of 124 exceeds 100 columns
fix CHECK: Alignment should match open parenthesis

kpc2000_spi.c:
fix CHECK: Alignment should match open parenthesis

Signed-off-by: Nikolay Kyx 
---
 drivers/staging/kpc2000/kpc2000_i2c.c | 6 --
 drivers/staging/kpc2000/kpc2000_spi.c | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/kpc2000/kpc2000_i2c.c 
b/drivers/staging/kpc2000/kpc2000_i2c.c
index 25bb5c97dd21..14f7940fa4fb 100644
--- a/drivers/staging/kpc2000/kpc2000_i2c.c
+++ b/drivers/staging/kpc2000/kpc2000_i2c.c
@@ -200,7 +200,9 @@ static int i801_check_post(struct kpc_i2c *priv, int 
status, int timeout)
outb_p(status & STATUS_FLAGS, SMBHSTSTS(priv));
status = inb_p(SMBHSTSTS(priv)) & STATUS_FLAGS;
if (status)
-   dev_warn(&priv->adapter.dev, "Failed clearing status 
flags at end of transaction (%02x)\n", status);
+   dev_warn(&priv->adapter.dev,
+"Failed clearing status flags at end of 
transaction (%02x)\n",
+status);
}
 
return result;
@@ -269,7 +271,7 @@ static int i801_block_transaction_by_block(struct kpc_i2c 
*priv,
}
 
status = i801_transaction(priv,
-   I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * hwpec);
+ I801_BLOCK_DATA | ENABLE_INT9 | I801_PEC_EN * 
hwpec);
if (status)
return status;
 
diff --git a/drivers/staging/kpc2000/kpc2000_spi.c 
b/drivers/staging/kpc2000/kpc2000_spi.c
index 44017d523da5..16ca18b8aa15 100644
--- a/drivers/staging/kpc2000/kpc2000_spi.c
+++ b/drivers/staging/kpc2000/kpc2000_spi.c
@@ -465,7 +465,7 @@ kp_spi_probe(struct platform_device *pldev)
}
 
kpspi->base = devm_ioremap(&pldev->dev, r->start,
-  resource_size(r));
+  resource_size(r));
 
status = spi_register_master(master);
if (status < 0) {
-- 
2.30.1

[PATCH] staging: fwserial: match alignment with open parenthesis

2021-02-18 Thread Nikolay Kyx

fwserial.c:

fix CHECK: Alignment should match open parenthesis

Signed-off-by: Nikolay Kyx 
---
 drivers/staging/fwserial/fwserial.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/fwserial/fwserial.c 
b/drivers/staging/fwserial/fwserial.c
index c368082aae1a..137e97c9406c 100644
--- a/drivers/staging/fwserial/fwserial.c
+++ b/drivers/staging/fwserial/fwserial.c
@@ -1318,8 +1318,8 @@ static int fwtty_break_ctl(struct tty_struct *tty, int 
state)
if (state == -1) {
set_bit(STOP_TX, &port->flags);
ret = wait_event_interruptible_timeout(port->wait_tx,
-  !test_bit(IN_TX, &port->flags),
-  10);
+  !test_bit(IN_TX, 
&port->flags),
+  10);
if (ret == 0 || ret == -ERESTARTSYS) {
clear_bit(STOP_TX, &port->flags);
fwtty_restart_tx(port);
-- 
2.30.1

Re: [PATCH v5 net-next 03/10] net: bridge: don't print in br_switchdev_set_port_flag

2021-02-12 Thread Nikolay Aleksandrov

On 12/02/2021 17:15, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> For the netlink interface, propagate errors through extack rather than
> simply printing them to the console. For the sysfs interface, we still
> print to the console, but at least that's one layer higher than in
> switchdev, which also allows us to silently ignore the offloading of
> flags if that is ever needed in the future.
> 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v5:
> None.
> 
> Changes in v4:
> - Adjust the commit message now that we aren't notifying initial and
>   final port flags from the bridge any longer.
> 
> Changes in v3:
> - Deal with the br_switchdev_set_port_flag call from sysfs too.
> 
> Changes in v2:
> - br_set_port_flag now returns void, so no extack there.
> - don't overwrite extack in br_switchdev_set_port_flag if already
>   populated.
> 
>  net/bridge/br_netlink.c   |  9 +
>  net/bridge/br_private.h   |  6 --
>  net/bridge/br_switchdev.c | 13 +++--
>  net/bridge/br_sysfs_if.c  |  7 +--
>  4 files changed, 21 insertions(+), 14 deletions(-)
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH v5 net-next 01/10] net: switchdev: propagate extack to port attributes

2021-02-12 Thread Nikolay Aleksandrov

On 12/02/2021 17:15, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> When a struct switchdev_attr is notified through switchdev, there is no
> way to report informational messages, unlike for struct switchdev_obj.
> 
> Signed-off-by: Vladimir Oltean 
> Reviewed-by: Ido Schimmel 
> Reviewed-by: Florian Fainelli 
> ---
> Changes in v5:
> Rebased on top of AM65 CPSW driver merge.
> 
> Changes in v4:
> None.
> 
> Changes in v3:
> None.
> 
> Changes in v2:
> Patch is new.
> 
>  .../ethernet/marvell/prestera/prestera_switchdev.c|  3 ++-
>  .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c  |  3 ++-
>  drivers/net/ethernet/mscc/ocelot_net.c|  3 ++-
>  drivers/net/ethernet/ti/am65-cpsw-switchdev.c |  3 ++-
>  drivers/net/ethernet/ti/cpsw_switchdev.c  |  3 ++-
>  include/net/switchdev.h   |  6 --
>  net/dsa/slave.c   |  3 ++-
>  net/switchdev/switchdev.c | 11 ++++---
>  8 files changed, 24 insertions(+), 11 deletions(-)
> 

Reviewed-by: Nikolay Aleksandrov

Re: [PATCH v5 net-next 02/10] net: bridge: offload all port flags at once in br_setport

2021-02-12 Thread Nikolay Aleksandrov

On 12/02/2021 17:15, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> If for example this command:
> 
> ip link set swp0 type bridge_slave flood off mcast_flood off learning off
> 
> succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
> BR_LEARNING, there would be no attempt to revert the partial state in
> any way. Arguably, if the user changes more than one flag through the
> same netlink command, this one _should_ be all or nothing, which means
> it should be passed through switchdev as all or nothing.
> 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v5:
> None.
> 
> Changes in v4:
> Leave br->lock alone completely.
> 
> Changes in v3:
> Don't attempt to drop br->lock around br_switchdev_set_port_flag now,
> move that part to a later patch.
> 
> Changes in v2:
> Patch is new.
> 
> Changes in v2:
> Patch is new.
> 
>  net/bridge/br_netlink.c   | 109 --
>  net/bridge/br_switchdev.c |   6 ++-
>  2 files changed, 39 insertions(+), 76 deletions(-)
> 

LGTM, thanks!
Acked-by: Nikolay Aleksandrov

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Nikolay Aleksandrov

On 10/02/2021 14:01, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
>> On 10/02/2021 13:01, Vladimir Oltean wrote:
>>> On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
>>>> On 10/02/2021 12:45, Vladimir Oltean wrote:
>>>>> Hi Nikolay,
>>>>>
>>>>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>>>>>> Hi Vladimir,
>>>>>> Let's take a step back for a moment and discuss the bridge unlock/lock 
>>>>>> sequences
>>>>>> that come with this set. I'd really like to avoid those as they're a 
>>>>>> recipe
>>>>>> for future problems. The only good way to achieve that currently is to 
>>>>>> keep
>>>>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS 
>>>>>> call
>>>>>> after the flags have been changed (if they have changed obviously). That 
>>>>>> would
>>>>>> make the code read much easier since we'll have all our lock/unlock 
>>>>>> sequences
>>>>>> in the same code blocks and won't play games to get sleepable context.
>>>>>> Please let's think and work in that direction, rather than having:
>>>>>> +spin_lock_bh(&p->br->lock);
>>>>>> +if (err) {
>>>>>> +netdev_err(p->dev, "%s\n", extack._msg);
>>>>>> +return err;
>>>>>>  }
>>>>>> +
>>>>>>
>>>>>> which immediately looks like a bug even though after some code checking 
>>>>>> we can
>>>>>> verify it's ok. WDYT?
>>>>>>
>>>>>> I plan to get rid of most of the br->lock since it's been abused for a 
>>>>>> very long
>>>>>> time because it's essentially STP lock, but people have started using it 
>>>>>> for other
>>>>>> things and I plan to fix that when I get more time.
>>>>>
>>>>> This won't make the sysfs codepath any nicer, will it?
>>>>>
>>>>
>>>> Currently we'll have to live with a hack that checks if the flags have 
>>>> changed. I agree
>>>> it won't be pretty, but we won't have to unlock and lock again in the 
>>>> middle of the
>>>> called function and we'll have all our locking in the same place, easier 
>>>> to verify and
>>>> later easier to remove. Once I get rid of most of the br->lock usage we 
>>>> can revisit
>>>> the drop of PRE_FLAGS if it's a problem. The alternative is to change the 
>>>> flags, then
>>>> send the switchdev notification outside of the lock and revert the flags 
>>>> if it doesn't
>>>> go through which doesn't sound much better.
>>>> I'm open to any other suggestions, but definitely would like to avoid 
>>>> playing locking games.
>>>> Even if it means casing out flag setting from all other store_ functions 
>>>> for sysfs.
>>>
>>> By casing out flag settings you mean something like this?
>>>
>>>
>>> #define BRPORT_ATTR(_name, _mode, _show, _store)\
>>> const struct brport_attribute brport_attr_##_name = {   \
>>> .attr = {.name = __stringify(_name),\
>>>  .mode = _mode },   \
>>> .show   = _show,\
>>> .store_unlocked = _store,   \
>>> };
>>>
>>> #define BRPORT_ATTR_FLAG(_name, _mask)  \
>>> static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
>>> {   \
>>> return sprintf(buf, "%d\n", !!(p->flags & _mask));  \
>>> }   \
>>> static int store_##_name(struct net_bridge_port *p, unsigned long v) \
>>> {   \
>>> return store_flag(p, v, _mask); \
>>> }   \
>>> static BRPORT_ATTR(_name, 0644,

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Nikolay Aleksandrov

On 10/02/2021 13:01, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
>> On 10/02/2021 12:45, Vladimir Oltean wrote:
>>> Hi Nikolay,
>>>
>>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>>>> Hi Vladimir,
>>>> Let's take a step back for a moment and discuss the bridge unlock/lock 
>>>> sequences
>>>> that come with this set. I'd really like to avoid those as they're a recipe
>>>> for future problems. The only good way to achieve that currently is to keep
>>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS 
>>>> call
>>>> after the flags have been changed (if they have changed obviously). That 
>>>> would
>>>> make the code read much easier since we'll have all our lock/unlock 
>>>> sequences
>>>> in the same code blocks and won't play games to get sleepable context.
>>>> Please let's think and work in that direction, rather than having:
>>>> +  spin_lock_bh(&p->br->lock);
>>>> +  if (err) {
>>>> +  netdev_err(p->dev, "%s\n", extack._msg);
>>>> +  return err;
>>>>}
>>>> +
>>>>
>>>> which immediately looks like a bug even though after some code checking we 
>>>> can
>>>> verify it's ok. WDYT?
>>>>
>>>> I plan to get rid of most of the br->lock since it's been abused for a 
>>>> very long
>>>> time because it's essentially STP lock, but people have started using it 
>>>> for other
>>>> things and I plan to fix that when I get more time.
>>>
>>> This won't make the sysfs codepath any nicer, will it?
>>>
>>
>> Currently we'll have to live with a hack that checks if the flags have 
>> changed. I agree
>> it won't be pretty, but we won't have to unlock and lock again in the middle 
>> of the 
>> called function and we'll have all our locking in the same place, easier to 
>> verify and
>> later easier to remove. Once I get rid of most of the br->lock usage we can 
>> revisit
>> the drop of PRE_FLAGS if it's a problem. The alternative is to change the 
>> flags, then
>> send the switchdev notification outside of the lock and revert the flags if 
>> it doesn't
>> go through which doesn't sound much better.
>> I'm open to any other suggestions, but definitely would like to avoid 
>> playing locking games.
>> Even if it means casing out flag setting from all other store_ functions for 
>> sysfs.
> 
> By casing out flag settings you mean something like this?
> 
> 
> #define BRPORT_ATTR(_name, _mode, _show, _store)  \
> const struct brport_attribute brport_attr_##_name = { \
>   .attr = {.name = __stringify(_name),\
>.mode = _mode },   \
>   .show   = _show,\
>   .store_unlocked = _store,   \
> };
> 
> #define BRPORT_ATTR_FLAG(_name, _mask)\
> static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> { \
>   return sprintf(buf, "%d\n", !!(p->flags & _mask));  \
> } \
> static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> { \
>   return store_flag(p, v, _mask); \
> } \
> static BRPORT_ATTR(_name, 0644,   \
>  show_##_name, store_##_name)
> 
> static ssize_t brport_store(struct kobject *kobj,
>   struct attribute *attr,
>   const char *buf, size_t count)
> {
>   ...
> 
>   } else if (brport_attr->store_unlocked) {
>   val = simple_strtoul(buf, &endp, 0);
>   if (endp == buf)
>   goto out_unlock;
>   ret = brport_attr->store_unlocked(p, val);
>   }
> 

Yes, this can work but will need a bit more changes because of 
br_port_flags_change().
Then the netlink side can be modeled in a similar way.

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Nikolay Aleksandrov

On 10/02/2021 12:45, Vladimir Oltean wrote:
> Hi Nikolay,
> 
> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>> Hi Vladimir,
>> Let's take a step back for a moment and discuss the bridge unlock/lock 
>> sequences
>> that come with this set. I'd really like to avoid those as they're a recipe
>> for future problems. The only good way to achieve that currently is to keep
>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
>> after the flags have been changed (if they have changed obviously). That 
>> would
>> make the code read much easier since we'll have all our lock/unlock sequences
>> in the same code blocks and won't play games to get sleepable context.
>> Please let's think and work in that direction, rather than having:
>> +spin_lock_bh(&p->br->lock);
>> +if (err) {
>> +netdev_err(p->dev, "%s\n", extack._msg);
>> +return err;
>>  }
>> +
>>
>> which immediately looks like a bug even though after some code checking we 
>> can
>> verify it's ok. WDYT?
>>
>> I plan to get rid of most of the br->lock since it's been abused for a very 
>> long
>> time because it's essentially STP lock, but people have started using it for 
>> other
>> things and I plan to fix that when I get more time.
> 
> This won't make the sysfs codepath any nicer, will it?
> 

Currently we'll have to live with a hack that checks if the flags have changed. 
I agree
it won't be pretty, but we won't have to unlock and lock again in the middle of 
the 
called function and we'll have all our locking in the same place, easier to 
verify and
later easier to remove. Once I get rid of most of the br->lock usage we can 
revisit
the drop of PRE_FLAGS if it's a problem. The alternative is to change the 
flags, then
send the switchdev notification outside of the lock and revert the flags if it 
doesn't
go through which doesn't sound much better.
I'm open to any other suggestions, but definitely would like to avoid playing 
locking games.
Even if it means casing out flag setting from all other store_ functions for 
sysfs.

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Nikolay Aleksandrov

On 10/02/2021 11:14, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> The initial goal of this series was to have better support for
> standalone ports mode and multiple bridges on the DSA drivers like
> ocelot/felix and sja1105. Proper support for standalone mode requires
> disabling address learning, which in turn requires interaction with the
> switchdev notifier, which is actually where most of the patches are.
> 
> I also noticed that most of the drivers are actually talking either to
> firmware or SPI/MDIO connected devices from the brport flags switchdev
> attribute handler, so it makes sense to actually make it sleepable
> instead of atomic.
> 

Hi Vladimir,
Let's take a step back for a moment and discuss the bridge unlock/lock sequences
that come with this set. I'd really like to avoid those as they're a recipe
for future problems. The only good way to achieve that currently is to keep
the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
after the flags have been changed (if they have changed obviously). That would
make the code read much easier since we'll have all our lock/unlock sequences
in the same code blocks and won't play games to get sleepable context.
Please let's think and work in that direction, rather than having:
+   spin_lock_bh(&p->br->lock);
+   if (err) {
+   netdev_err(p->dev, "%s\n", extack._msg);
+   return err;
}
+

which immediately looks like a bug even though after some code checking we can
verify it's ok. WDYT?

I plan to get rid of most of the br->lock since it's been abused for a very long
time because it's essentially STP lock, but people have started using it for 
other
things and I plan to fix that when I get more time.

Thanks,
 Nik

Re: [PATCH v3 net-next 08/11] net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking call chain

2021-02-10 Thread Nikolay Aleksandrov

On 10/02/2021 11:14, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> Since we would like br_switchdev_set_port_flag to not use an atomic
> notifier, it should be called from outside spinlock context.
> 
> We can temporarily drop br->lock, but that creates some concurrency
> complications (example below is given for sysfs):
> - There might be an "echo 1 > multicast_flood" simultaneous with an
>   "echo 0 > multicast_flood". The result of this is nondeterministic
>   either way, so I'm not too concerned as long as the result is
>   consistent (no other flags have changed).
> - There might be an "echo 1 > multicast_flood" simultaneous with an
>   "echo 0 > learning". My expectation is that none of the two writes are
>   "eaten", and the final flags contain BR_MCAST_FLOOD=1 and BR_LEARNING=0
>   regardless of the order of execution. That is actually possible if, on
>   the commit path, we don't do a trivial "p->flags = flags" which might
>   overwrite bits outside of our mask, but instead we just change the
>   flags corresponding to our mask.
> 

Not sure I follow here, how do we get any concurrency issues with sysfs or 
netlink
when both take rtnl before doing any changes ?

> Now that br_switchdev_set_port_flag is never called from under br->lock,
> it runs in sleepable context.
> 
> All switchdev drivers handle SWITCHDEV_PORT_ATTR_SET as both blocking
> and atomic, so no changes are needed on that front.
> 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> - Drop the br->lock around br_switchdev_set_port_flag in this patch, for
>   both sysfs and netlink.
> - Only set/restore the masked bits in p->flags to avoid concurrency
>   issues.
> 
> Changes in v2:
> Patch is new.
> 
>  net/bridge/br_netlink.c   | 10 +++---
>  net/bridge/br_switchdev.c |  5 ++---
>  net/bridge/br_sysfs_if.c  | 22 ++
>  3 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> index b7731614c036..8f09106966c4 100644
> --- a/net/bridge/br_netlink.c
> +++ b/net/bridge/br_netlink.c
> @@ -869,7 +869,7 @@ static void br_set_port_flag(struct net_bridge_port *p, 
> struct nlattr *tb[],
>  static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
> struct netlink_ext_ack *extack)
>  {
> - unsigned long old_flags, changed_mask;
> + unsigned long flags, old_flags, changed_mask;
>   bool br_vlan_tunnel_old;
>   int err;
>  
> @@ -896,10 +896,14 @@ static int br_setport(struct net_bridge_port *p, struct 
> nlattr *tb[],
>   br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
>  
>   changed_mask = old_flags ^ p->flags;
> + flags = p->flags;
>  
> - err = br_switchdev_set_port_flag(p, p->flags, changed_mask, extack);
> + spin_unlock_bh(&p->br->lock);
> + err = br_switchdev_set_port_flag(p, flags, changed_mask, extack);
> + spin_lock_bh(&p->br->lock);
>   if (err) {
> - p->flags = old_flags;
> + p->flags &= ~changed_mask;
> + p->flags |= (old_flags & changed_mask);
>   goto out;
>   }
>  
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index dbd94156960f..a79164ee65b9 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -79,9 +79,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
>   attr.u.brport_flags.val = flags & mask;
>   attr.u.brport_flags.mask = mask;
>  
> - /* We run from atomic context here */
> - err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
> -&info.info, extack);
> + err = call_switchdev_blocking_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
> + &info.info, extack);
>   err = notifier_to_errno(err);
>   if (err == -EOPNOTSUPP)
>   return 0;
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index 72e92376eef1..3f21fdd1cdaa 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -68,16 +68,22 @@ static int store_flag(struct net_bridge_port *p, unsigned 
> long v,
>   else
>   flags &= ~mask;
>  
> - if (flags != p->flags) {
> - err = br_switchdev_set_port_flag(p, flags, mask, &extack);
> - if (err) {
> - netdev_err(p->dev, "%s\n", extack._msg);
> - return err;
> - }
> + if (flags == p->flags)
> + return 0;
>  
> - p->flags = flags;
> - br_port_flags_change(p, mask);
> + spin_unlock_bh(&p->br->lock);
> + err = br_switchdev_set_port_flag(p, flags, mask, &extack);
> + spin_lock_bh(&p->br->lock);
> + if (err) {
> + netdev_err(p->dev, "%s\n", extack._msg);
> + return err;
>   }
> +
> + p->flags &= ~mask;
> + p->flags |= (flags & mask);
> +
> + br_port_flags_change(p, mask);
> +
>

Re: [PATCH net-next 2/9] net: bridge: offload initial and final port flags through switchdev

2021-02-08 Thread Nikolay Aleksandrov

On 08/02/2021 13:45, Vladimir Oltean wrote:
> On Mon, Feb 08, 2021 at 01:37:03PM +0200, Nikolay Aleksandrov wrote:
>> Hi Vladimir,
>> I think this patch potentially breaks some use cases. There are a few 
>> problems, I'll
>> start with the more serious one: before the ports would have a set of flags 
>> that were
>> always set when joining, now due to how nbp_flags_change() handles flag 
>> setting some might
>> not be set which would immediately change behaviour w.r.t software fwding. 
>> I'll use your
>> example of BR_BCAST_FLOOD: a lot of drivers will return an error for it and 
>> any broadcast
>> towards these ports will be dropped, we have mixed environments with 
>> software ports that
>> sometimes have traffic (e.g. decapped ARP requests) software forwarded which 
>> will stop working.
> 
> Yes, you're right. The only solution I can think of is to add a "bool 
> ignore_errors"
> to nbp_flags_change, set to true from new_nbp and del_nbp, and to false from 
> the
> netlink code.
> 

Indeed, I can't think of any better solution right now, but that would make it 
more or less
equal to the current situation where the flags are just set. You can 
read/restore them on add/del
of bridge port, but I guess that's what you'd like to avoid. :)
I don't mind adding the add/del_nbp() notifications, but both of them seem 
redundant with
the port add/del notifications which you can handle in the driver.

>> The other lesser issue is with the style below, I mean these three calls for 
>> each flag are
>> just ugly and look weird as you've also noted, since these APIs are internal 
>> can we do better?
> 
> Doing better would mean allowing nbp_flags_change() to have a bit mask with
> potentially more brport flags set, and to call br_switchdev_set_port_flag in
> a for_each_set_bit() loop?
> 

Sure, that sounds better for now. I think you've described the ideal case in 
your
commit message.

Re: [PATCH net-next 2/9] net: bridge: offload initial and final port flags through switchdev

2021-02-08 Thread Nikolay Aleksandrov

On 08/02/2021 01:21, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> It must first be admitted that switchdev device drivers have a life
> beyond the bridge, and when they aren't offloading the bridge driver
> they are operating with forwarding disabled between ports, emulating as
> closely as possible N standalone network interfaces.
> 
> Now it must be said that for a switchdev port operating in standalone
> mode, address learning doesn't make much sense since that is a bridge
> function. In fact, address learning even breaks setups such as this one:
> 
>+-+
>| |
>| +---+   |
>| |br0|send  receive  |
>| ++-++ ++ ++ |
>| || || || || |
>| |  swp0  | |  swp1  | |  swp2  | |  swp3  | |
>| || || || || |
>+-++-++-++-++-+
>   | ^   |  ^
>   | |   |  |
>   | +---+  |
>   ||
>   ++
> 
> because if the ASIC has a single FDB (can offload a single bridge)
> then source address learning on swp3 can "steal" the source MAC address
> of swp2 from br0's FDB, because learning frames coming from swp2 will be
> done twice: first on the swp1 ingress port, second on the swp3 ingress
> port. So the hardware FDB will become out of sync with the software
> bridge, and when swp2 tries to send one more packet towards swp1, the
> ASIC will attempt to short-circuit the forwarding path and send it
> directly to swp3 (since that's the last port it learned that address on),
> which it obviously can't, because swp3 operates in standalone mode.
> 
> So switchdev drivers operating in standalone mode should disable address
> learning. As a matter of practicality, we can reduce code duplication in
> drivers by having the bridge notify through switchdev of the initial and
> final brport flags. Then, drivers can simply start up hardcoded for no
> address learning (similar to how they already start up hardcoded for no
> forwarding), then they only need to listen for
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> need for special cases when the port joins or leaves the bridge etc.
> 
> When a port leaves the bridge (and therefore becomes standalone), we
> issue a switchdev attribute that apart from disabling address learning,
> enables flooding of all kinds. This is also done for pragmatic reasons,
> because even though standalone switchdev ports might not need to have
> flooding enabled in order to inject traffic with any MAC DA from the
> control interface, it certainly doesn't hurt either, and it even makes
> more sense than disabling flooding of unknown traffic towards that port.
> 
> Note that the implementation is a bit wacky because the switchdev API
> for port attributes is very counterproductive. Instead of issuing a
> single switchdev notification with a bitwise OR of all flags that we're
> modifying, we need to issue 4 individual notifications, one for each bit.
> This is because the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS notifier
> forces you to refuse the entire operation if there's at least one bit
> which you can't offload, and that is currently BR_BCAST_FLOOD which
> nobody does. So this change would do nothing for no one if we offloaded
> all flags at once, but the idea is to offload as much as possible
> instead of all or nothing.
> 
> Signed-off-by: Vladimir Oltean 
> ---
>  net/bridge/br_if.c  | 24 +++-
>  net/bridge/br_netlink.c | 16 
>  net/bridge/br_private.h |  2 ++
>  3 files changed, 29 insertions(+), 13 deletions(-)
> 

Hi Vladimir,
I think this patch potentially breaks some use cases. There are a few problems, 
I'll
start with the more serious one: before the ports would have a set of flags 
that were
always set when joining, now due to how nbp_flags_change() handles flag setting 
some might
not be set which would immediately change behaviour w.r.t software fwding. I'll 
use your
example of BR_BCAST_FLOOD: a lot of drivers will return an error for it and any 
broadcast
towards these ports will be dropped, we have mixed environments with software 
ports that
sometimes have traffic (e.g. decapped ARP requests) software forwarded which 
will stop working.
The other lesser issue is with the style below, I mean these three calls for 
each flag are
just ugly and look weird as you've also noted, since these APIs are internal 
can we do better?

Cheers,
 Nik

> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index f7d2f472ae24..890654f0 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -89,6 +89,21 @@ void br_port_carrier_check(struct net_bridge_

Re: [PATCH net] net: bridge: use switchdev for port flags set through sysfs too

2021-02-08 Thread Nikolay Aleksandrov

On 07/02/2021 21:47, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> Looking through patchwork I don't see that there was any consensus to
> use switchdev notifiers only in case of netlink provided port flags but
> not sysfs (as a sort of deprecation, punishment or anything like that),
> so we should probably keep the user interface consistent in terms of
> functionality.
> 
> http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-j...@resnulli.us/
> http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-j...@resnulli.us/
> 
> Fixes: 3922285d96e7 ("net: bridge: Add support for offloading port 
> attributes")
> Signed-off-by: Vladimir Oltean 
> ---
>  net/bridge/br_sysfs_if.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index 96ff63cde1be..5aea9427ffe1 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -59,9 +59,8 @@ static BRPORT_ATTR(_name, 0644, 
> \
>  static int store_flag(struct net_bridge_port *p, unsigned long v,
> unsigned long mask)
>  {
> - unsigned long flags;
> -
> - flags = p->flags;
> + unsigned long flags = p->flags;
> + int err;
>  
>   if (v)
>   flags |= mask;
> @@ -69,6 +68,10 @@ static int store_flag(struct net_bridge_port *p, unsigned 
> long v,
>   flags &= ~mask;
>  
>   if (flags != p->flags) {
> + err = br_switchdev_set_port_flag(p, flags, mask);
> + if (err)
> + return err;
> +
>   p->flags = flags;
>   br_port_flags_change(p, mask);
>   }
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH 106/141] net: bridge: Fix fall-through warnings for Clang

2021-02-02 Thread Nikolay Aleksandrov

On 20/11/2020 20:37, Gustavo A. R. Silva wrote:
> In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning
> by explicitly adding a break statement instead of letting the code fall
> through to the next case.
> 
> Link: https://github.com/KSPP/linux/issues/115
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  net/bridge/br_input.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 59a318b9f646..8db219d979c5 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -148,6 +148,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>   break;
>   case BR_PKT_UNICAST:
>   dst = br_fdb_find_rcu(br, eth_hdr(skb)->h_dest, vid);
> + break;
>   default:
>   break;
>   }
> 

Somehow this hasn't hit my inbox, good thing I just got the reply and saw the
patch. Anyway, thanks!

Acked-by: Nikolay Aleksandrov

[tip: locking/core] locking/rwsem: Remove empty rwsem.h

2021-02-01 Thread tip-bot2 for Nikolay Borisov

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 442187f3c2de40bab13b8f9751b37925bede73b0
Gitweb:
https://git.kernel.org/tip/442187f3c2de40bab13b8f9751b37925bede73b0
Author:Nikolay Borisov 
AuthorDate:Tue, 26 Jan 2021 12:17:21 +02:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 29 Jan 2021 20:02:34 +01:00

locking/rwsem: Remove empty rwsem.h

This is a leftover from 7f26482a872c ("locking/percpu-rwsem: Remove the 
embedded rwsem")

Signed-off-by: Nikolay Borisov 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Will Deacon 
Link: https://lkml.kernel.org/r/20210126101721.976027-1-nbori...@suse.com
---
 kernel/locking/rwsem.h | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index e69de29..000
--- a/kernel/locking/rwsem.h
+++ /dev/null

Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-29 Thread Nikolay Borisov




On 29.01.21 г. 18:49 ч., Josh Poimboeuf wrote:
> Agreed, stable is a good idea.   I think Nikolay saw it with GCC 9.


Yes I did, with the default Ubuntu compiler as well as the default gcc-10 
compiler: 

# gcc -v -Q -O2 --help=target | grep protection

gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
COLLECT_GCC_OPTIONS='-v' '-Q' '-O2' '--help=target' '-mtune=generic' 
'-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -v -imultiarch x86_64-linux-gnu help-dummy 
-dumpbase help-dummy -mtune=generic -march=x86-64 -auxbase help-dummy -O2 
-version --help=target -fasynchronous-unwind-tables -fstack-protector-strong 
-Wformat -Wformat-security -fstack-clash-protection -fcf-protection -o 
/tmp/ccSecttk.s
GNU C17 (Ubuntu 9.3.0-17ubuntu1~20.04) version 9.3.0 (x86_64-linux-gnu)
compiled by GNU C version 9.3.0, GMP version 6.2.0, MPFR version 4.0.2, 
MPC version 1.1.0, isl version isl-0.22.1-GMP


It has -fcf-protection turned on by default it seems.

Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-28 Thread Nikolay Borisov




On 29.01.21 г. 3:34 ч., Alexei Starovoitov wrote:
> On Thu, Jan 28, 2021 at 07:24:14PM +0100, Peter Zijlstra wrote:
>> On Thu, Jan 28, 2021 at 06:45:56PM +0200, Nikolay Borisov wrote:
>>> it would be placed on the __fentry__ (and not endbr64) hence it works.
>>> So perhaps a workaround outside of bpf could essentially detect this
>>> scenario and adjust the probe to be on the __fentry__ and not preceding
>>> instruction if it's detected to be endbr64 ?
>>
>> Arguably the fentry handler should also set the nmi context, it can,
>> after all, interrupt pretty much any other context by construction.
> 
> But that doesn't make it a real nmi.
> nmi can actually interrupt anything. Whereas kprobe via int3 cannot
> do nokprobe and noinstr sections. The exposure is a lot different.
> ftrace is even more contained. It's only at the start of the functions.
> It's even smaller subset of places than kprobes.
> So ftrace < kprobe < nmi.
> Grouping them all into nmi doesn't make sense to me.
> That bpf breaking change came unnoticed mostly because people use
> kprobes in the beginning of the functions only, but there are cases
> where kprobes are in the middle too. People just didn't upgrade kernels
> fast enough to notice.

nit: slight correction - I observed while I was putting kprobes at the
beginning of the function but __fentry__ wasn't the first thing in the
function's code. The reason why people haven't observed is because
everyone is running with retpolines enabled which disables CFI/CET.

> imo appropriate solution would be to have some distinction between
> ftrace < kprobe_via_int3 < nmi, so that bpf side can react accordingly.
> That code in trace_call_bpf:
>   if (in_nmi()) /* not supported yet */
> was necessary in the past. Now we have all sorts of protections built in.
> So it's probably ok to just drop it, but I think adding
> called_via_ftrace vs called_via_kprobe_int3 vs called_via_nmi
> is more appropriate solution long term.
>

Re: [PATCH] x86: Disable CET instrumentation in the kernel

2021-01-28 Thread Nikolay Borisov




On 28.01.21 г. 23:52 ч., Josh Poimboeuf wrote:
> 
> With retpolines disabled, some configurations of GCC will add Intel CET
> instrumentation to the kernel by default.  That breaks certain tracing
> scenarios by adding a superfluous ENDBR64 instruction before the fentry
> call, for functions which can be called indirectly.
> 
> CET instrumentation isn't currently necessary in the kernel, as CET is
> only supported in user space.  Disable it unconditionally.
> 
> Reported-by: Nikolay Borisov 
> Signed-off-by: Josh Poimboeuf 

Tested-by: Nikolay Borisov 
Reviewed-by: Nikolay Borisov

Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-28 Thread Nikolay Borisov

On 28.01.21 г. 18:12 ч., Nikolay Borisov wrote:
> 
> 
> On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
>> Hi,
> 
> 
> 
>>
>> Alexei, could you tell me what is the concerning situation for bpf?
> 
> Another data point masami is that this affects bpf kprobes which are
> entered via int3, alternatively if the kprobe is entered via
> kprobe_ftrace_handler it works as expected. I haven't been able to
> determine why a particular bpf probe won't use ftrace's infrastructure
> if it's put at the beginning of the function.  An alternative call chain
> is :
> 
>  => __ftrace_trace_stack
>  => trace_call_bpf
>  => kprobe_perf_func
>  => kprobe_ftrace_handler
>  => 0xc095d0c8
>  => btrfs_validate_metadata_buffer
>  => end_bio_extent_readpage
>  => end_workqueue_fn
>  => btrfs_work_helper
>  => process_one_work
>  => worker_thread
>  => kthread
>  => ret_from_fork
> 
>>

I have a working theory why I'm seeing this. My kernel (broken) was
compiled with retpolines off and with the gcc that comes with ubuntu
(both 9 and 10:
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
)

this results in CFI being enabled so functions look like:
0x81493890 <+0>: endbr64
0x81493894 <+4>: callq  0x8104d820 <__fentry__>

i.e fentry's thunk is not the first instruction on the function hence
it's not going through the optimized ftrace handler. Instead it's using
int3 which is broken as ascertained.

After testing with my testcase I confirm that with cfi off and
__fentry__ being the first entry bpf starts working. And indeed, even
with CFI turned on if I use a probe like :

bpftrace -e 'kprobe:btrfs_sync_file+4 {printf("kprobe: %s\n",
kstack());}' &>bpf-output &

it would be placed on the __fentry__ (and not endbr64) hence it works.
So perhaps a workaround outside of bpf could essentially detect this
scenario and adjust the probe to be on the __fentry__ and not preceding
instruction if it's detected to be endbr64 ?

Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-28 Thread Nikolay Borisov




On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
> Hi,



> 
> Alexei, could you tell me what is the concerning situation for bpf?

Another data point masami is that this affects bpf kprobes which are
entered via int3, alternatively if the kprobe is entered via
kprobe_ftrace_handler it works as expected. I haven't been able to
determine why a particular bpf probe won't use ftrace's infrastructure
if it's put at the beginning of the function.  An alternative call chain
is :

 => __ftrace_trace_stack
 => trace_call_bpf
 => kprobe_perf_func
 => kprobe_ftrace_handler
 => 0xc095d0c8
 => btrfs_validate_metadata_buffer
 => end_bio_extent_readpage
 => end_workqueue_fn
 => btrfs_work_helper
 => process_one_work
 => worker_thread
 => kthread
 => ret_from_fork

> 
> Thank you,
> 
> From c5cd0e5f60ef6494c9e1579ec1b82b7344c41f9a Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu 
> Date: Thu, 28 Jan 2021 12:31:02 +0900
> Subject: [PATCH] tracing: bpf: Remove in_nmi() check from kprobe handler
> 
> Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") has
> changed the kprobe handler to run in the NMI context, in_nmi() always returns
> true. This means the bpf events on kprobes always skipped.
> 
> Signed-off-by: Masami Hiramatsu 
> ---
>  kernel/trace/bpf_trace.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 6c0018abe68a..764400260eb6 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -96,9 +96,6 @@ unsigned int trace_call_bpf(struct trace_event_call *call, 
> void *ctx)
>  {
>   unsigned int ret;
>  
> - if (in_nmi()) /* not supported yet */
> - return 1;
> -
>   cant_sleep();
>  
>   if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
>

Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-27 Thread Nikolay Borisov




On 28.01.21 г. 5:38 ч., Masami Hiramatsu wrote:
> Hi,
> 



> 
> Yeah, there is. Nikolay, could you try this tentative patch?
I can confirm that with this patch everything is working. I also applied
the following diff:

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6c0018abe68a..cc5a3a18816d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -96,8 +96,10 @@ unsigned int trace_call_bpf(struct trace_event_call
*call, void *ctx)
 {
unsigned int ret;

-   if (in_nmi()) /* not supported yet */
+   if (in_nmi()) /* not supported yet */ {
+   trace_dump_stack(0);
return 1;
+   }

cant_sleep();



And can confirm that the branch is being hit and the following call
trace is produced:

 => __ftrace_trace_stack
 => trace_call_bpf
 => kprobe_perf_func
 => kprobe_int3_handler
 => exc_int3
 => asm_exc_int3
 => btrfs_sync_file
 => do_fsync
 => __x64_sys_fsync
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe


> 
> Of course this just drops the NMI check from the handler, so alternative
> checker is required. But I'm not sure what the original code concerns.
> As far as I can see, there seems no re-entrant block flag, nor locks
> among ebpf programs in runtime.
> 
> Alexei, could you tell me what is the concerning situation for bpf?
> 
> Thank you,
> 
> From c5cd0e5f60ef6494c9e1579ec1b82b7344c41f9a Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu 
> Date: Thu, 28 Jan 2021 12:31:02 +0900
> Subject: [PATCH] tracing: bpf: Remove in_nmi() check from kprobe handler
> 
> Since commit 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()") has
> changed the kprobe handler to run in the NMI context, in_nmi() always returns
> true. This means the bpf events on kprobes always skipped.

FWIW I'd prefer if in addition to the original commit you also mention:

ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking")
b6be002bcd1d ("x86/entry: Move nmi entry/exit into common code")

Since they changed the way nmi state is managed in exc_int3 and not in
the original do_int3. THe latter no longer contains any references to
nmi-related code.

> 
> Signed-off-by: Masami Hiramatsu 
> ---
>  kernel/trace/bpf_trace.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 6c0018abe68a..764400260eb6 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -96,9 +96,6 @@ unsigned int trace_call_bpf(struct trace_event_call *call, 
> void *ctx)
>  {
>   unsigned int ret;
>  
> - if (in_nmi()) /* not supported yet */
> - return 1;
> -
>   cant_sleep();
>  
>   if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
>

Re: kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-27 Thread Nikolay Borisov



On 27.01.21 г. 17:24 ч., Masami Hiramatsu wrote:
> On Thu, 28 Jan 2021 00:13:53 +0900
> Masami Hiramatsu  wrote:
> 
>> Hi Nikolay,
>>
>> On Wed, 27 Jan 2021 15:43:29 +0200
>> Nikolay Borisov  wrote:
>>
>>> Hello,
>>>
>>> I'm currently seeing latest Linus' master being somewhat broken w.r.t
>>> krpobes. In particular I have the following test-case:
>>>
>>> #!/bin/bash
>>>
>>> mkfs.btrfs -f /dev/vdc &> /dev/null
>>> mount /dev/vdc /media/scratch/
>>>
>>> bpftrace -e 'kprobe:btrfs_sync_file {printf("kprobe: %s\n", kstack());}'
>>> &>bpf-output &
>>> bpf_trace_pid=$!
>>>
>>> # force btrfs_sync_file to be called
>>> sleep 2
>>> xfs_io -f -c "pwrite 0 4m" -c "fsync" /media/scratch/file5
>>>
>>> kill $bpf_trace_pid
>>> sleep 1
>>>
>>> grep -q kprobe bpf-output
>>> retval=$?
>>> rm -f bpf-output
>>> umount /media/scratch
>>>
>>> exit $retval
>>>
>>> It traces btrfs_sync_file which is called when fsync is executed on a
>>> btrfs file, however I don't see the stacktrace being printed i.e the
>>> kprobe doesn't fire at all. The following alternative program:
>>>
>>> bpftrace -e 'tracepoint:btrfs:btrfs_sync_file {printf("tracepoint:
>>> %s\n", kstack());} kprobe:btrfs_sync_file {printf("kprobe: %s\n",
>>> kstack());}'
>>>
>>> only prints the stack from the tracepoint and not from the kprobe, given
>>> that the tracepoint is called from the btrfs_sync_file function.
>>
>> Thank you for reporting!
>>
>> If you don't mind, could you confirm it with ftrace (tracefs)?
>> bpftrace etc. involves too many things. It is better to test with
>> simpler way to test it.
>> I'm not familer with the bpftrace, but I think you can check it with
>>
>> # cd /sys/kernel/debug/tracing
>> # echo p:myevent btrfs_sync_file >> kprobe_events
>> # echo stacktrace > events/kprobes/myevent/trigger
>>  (or echo 1 > options/stacktrace , if trigger file doesn't exist)
> 
> Of course, also you have to enable the event.
>  # echo 1 > events/kprobes/myevent/enable
> 
> And check the results
> 
>  # cat trace
> 
> 
>> Could you also share your kernel config, so that we can reproduce it?
> 

I've attached the config and indeed with the scenario you proposed it
seems to works. I see:

   xfs_io-20280   [000] d.Z.  9900.748633: myevent:
(btrfs_sync_file+0x0/0x580)
  xfs_io-20280   [000] d.Z.  9900.748647: 
 => kprobe_trace_func
 => kprobe_dispatcher
 => kprobe_int3_handler
 => exc_int3
 => asm_exc_int3
 => btrfs_sync_file
 => do_fsync
 => __x64_sys_fsync
 => do_syscall_64
 => entry_SYSCALL_64_after_hwframe



#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 5.11.0-rc5 Kernel Configuration
#
CONFIG_CC_VERSION_TEXT="gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=90300
CONFIG_LD_VERSION=23400
CONFIG_CLANG_VERSION=0
CONFIG_LLD_VERSION=0
CONFIG_CC_CAN_LINK=y
CONFIG_CC_CAN_LINK_STATIC=y
CONFIG_CC_HAS_ASM_GOTO=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_TABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-default"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_HAVE_KERNEL_ZSTD=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
# CONFIG_KERNEL_ZSTD is not set
CONFIG_DEFAULT_INIT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_WATCH_QUEUE is not set
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CO

Re: [PATCH v2] btrfs: Avoid calling btrfs_get_chunk_map() twice

2021-01-27 Thread Nikolay Borisov




On 27.01.21 г. 15:57 ч., Michal Rostecki wrote:
> From: Michal Rostecki 
> 
> Before this change, the btrfs_get_io_geometry() function was calling
> btrfs_get_chunk_map() to get the extent mapping, necessary for
> calculating the I/O geometry. It was using that extent mapping only
> internally and freeing the pointer after its execution.
> 
> That resulted in calling btrfs_get_chunk_map() de facto twice by the
> __btrfs_map_block() function. It was calling btrfs_get_io_geometry()
> first and then calling btrfs_get_chunk_map() directly to get the extent
> mapping, used by the rest of the function.
> 
> This change fixes that by passing the extent mapping to the
> btrfs_get_io_geometry() function as an argument.
> 
> v2:
> When btrfs_get_chunk_map() returns an error in btrfs_submit_direct():
> - Use errno_to_blk_status(PTR_ERR(em)) as the status
> - Set em to NULL
> 
> Signed-off-by: Michal Rostecki 
> ---
>  fs/btrfs/inode.c   | 38 +-
>  fs/btrfs/volumes.c | 39 ---
>  fs/btrfs/volumes.h |  5 +++--
>  3 files changed, 48 insertions(+), 34 deletions(-)

So this adds more code but for what benefit? In your reply to Filipe you
said you didn't observe this being a performance-affecting change so

> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 0dbe1aaa0b71..e2ee3a9c1140 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -2183,9 +2183,10 @@ int btrfs_bio_fits_in_stripe(struct page *page, size_t 
> size, struct bio *bio,
>   struct inode *inode = page->mapping->host;
>   struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>   u64 logical = bio->bi_iter.bi_sector << 9;
> + struct extent_map *em;
>   u64 length = 0;
>   u64 map_length;
> - int ret;
> + int ret = 0;
>   struct btrfs_io_geometry geom;
>  
>   if (bio_flags & EXTENT_BIO_COMPRESSED)
> @@ -2193,14 +2194,21 @@ int btrfs_bio_fits_in_stripe(struct page *page, 
> size_t size, struct bio *bio,
>  
>   length = bio->bi_iter.bi_size;
>   map_length = length;
> - ret = btrfs_get_io_geometry(fs_info, btrfs_op(bio), logical, map_length,
> - &geom);
> + em = btrfs_get_chunk_map(fs_info, logical, map_length);
> + if (IS_ERR(em))
> + return PTR_ERR(em);
> + ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), logical,
> + map_length, &geom);
>   if (ret < 0)
> - return ret;
> + goto out;
>  
> - if (geom.len < length + size)
> - return 1;
> - return 0;
> + if (geom.len < length + size) {
> + ret = 1;
> + goto out;
> + }

this could be simply

if (geom.len  +out:
> + free_extent_map(em);
> + return ret;
>  }
>  
>  /*
> @@ -7941,10 +7949,12 @@ static blk_qc_t btrfs_submit_direct(struct inode 
> *inode, struct iomap *iomap,
>   u64 submit_len;
>   int clone_offset = 0;
>   int clone_len;
> + int logical;
>   int ret;
>   blk_status_t status;
>   struct btrfs_io_geometry geom;
>   struct btrfs_dio_data *dio_data = iomap->private;
> + struct extent_map *em;
>  
>   dip = btrfs_create_dio_private(dio_bio, inode, file_offset);
>   if (!dip) {
> @@ -7970,11 +7980,18 @@ static blk_qc_t btrfs_submit_direct(struct inode 
> *inode, struct iomap *iomap,
>   }
>  
>   start_sector = dio_bio->bi_iter.bi_sector;
> + logical = start_sector << 9;
>   submit_len = dio_bio->bi_iter.bi_size;
>  
>   do {
> - ret = btrfs_get_io_geometry(fs_info, btrfs_op(dio_bio),
> - start_sector << 9, submit_len,
> + em = btrfs_get_chunk_map(fs_info, logical, submit_len);
> + if (IS_ERR(em)) {
> + status = errno_to_blk_status(PTR_ERR(em));
> + em = NULL;
> + goto out_err;
> + }
> + ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio),
> + logical, submit_len,
>   &geom);
>   if (ret) {
>   status = errno_to_blk_status(ret);
> @@ -8030,12 +8047,15 @@ static blk_qc_t btrfs_submit_direct(struct inode 
> *inode, struct iomap *iomap,
>   clone_offset += clone_len;
>   start_sector += clone_len >> 9;
>   file_offset += clone_len;
> +
> + free_extent_map(em);
>   } while (submit_len > 0);
>   return BLK_QC_T_NONE;
>  
>  out_err:
>   dip->dio_bio->bi_status = status;
>   btrfs_dio_private_put(dip);
> + free_extent_map(em);
>   return BLK_QC_T_NONE;
>  }

For example in this function you increase complexity by having to deal
with free_extent_map as well so I'm not sure this is a net-win.

>  
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a8ec8539cd8d..4c753b17

kprobes broken since 0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

2021-01-27 Thread Nikolay Borisov

Hello,

I'm currently seeing latest Linus' master being somewhat broken w.r.t
krpobes. In particular I have the following test-case:

#!/bin/bash

mkfs.btrfs -f /dev/vdc &> /dev/null
mount /dev/vdc /media/scratch/

bpftrace -e 'kprobe:btrfs_sync_file {printf("kprobe: %s\n", kstack());}'
&>bpf-output &
bpf_trace_pid=$!

# force btrfs_sync_file to be called
sleep 2
xfs_io -f -c "pwrite 0 4m" -c "fsync" /media/scratch/file5

kill $bpf_trace_pid
sleep 1

grep -q kprobe bpf-output
retval=$?
rm -f bpf-output
umount /media/scratch

exit $retval

It traces btrfs_sync_file which is called when fsync is executed on a
btrfs file, however I don't see the stacktrace being printed i.e the
kprobe doesn't fire at all. The following alternative program:

bpftrace -e 'tracepoint:btrfs:btrfs_sync_file {printf("tracepoint:
%s\n", kstack());} kprobe:btrfs_sync_file {printf("kprobe: %s\n",
kstack());}'

only prints the stack from the tracepoint and not from the kprobe, given
that the tracepoint is called from the btrfs_sync_file function.

I started bisecting this and arrived at the following commit:

0d00449c7a28 ("x86: Replace ist_enter() with nmi_enter()")

FWIW the following series is applied on the kernel I was testing:
https://lore.kernel.org/lkml/159870598914.1229682.15230803449082078353.stgit@devnote2/

but it's still broken.

[PATCH v2] locking/rwsem: Remove empty rwsem.h

2021-01-26 Thread Nikolay Borisov

This is a leftover from 7f26482a872c ("locking/percpu-rwsem: Remove the 
embedded rwsem")

Signed-off-by: Nikolay Borisov 
---
V2:
 * Add reference to commit which made the file useless.

 kernel/locking/rwsem.h | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index e69de29bb2d1..
--
2.25.1

Re: [PATCH] bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR

2021-01-25 Thread Nikolay Aleksandrov

On 25/01/2021 04:39, Jiapeng Zhong wrote:
> coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code.
> 
> Fix the following coccicheck warnings:
> 
> ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be
> used.
> 
> Reported-by: Abaci 
> Signed-off-by: Jiapeng Zhong 
> ---
>  net/bridge/br_multicast.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 257ac4e..2229d10 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -1292,7 +1292,7 @@ static int br_multicast_add_group(struct net_bridge *br,
>   pg = __br_multicast_add_group(br, port, group, src, filter_mode,
> igmpv2_mldv1, false);
>   /* NULL is considered valid for host joined groups */
> - err = IS_ERR(pg) ? PTR_ERR(pg) : 0;
> + err = PTR_ERR_OR_ZERO(pg);
>   spin_unlock(&br->multicast_lock);
>  
>   return err;
> 

This should be targeted at net-next.
Acked-by: Nikolay Aleksandrov

Re: [RFC PATCH 04/37] btrfs: use bio_init_fields in volumes

2021-01-19 Thread Nikolay Borisov




On 19.01.21 г. 7:05 ч., Chaitanya Kulkarni wrote:
> Signed-off-by: Chaitanya Kulkarni 
> ---
>  fs/btrfs/volumes.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index ee086fc56c30..836167212252 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6371,14 +6371,12 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, 
> struct bio *bio,
>  
>   bio->bi_private = bbio;
This line can be removed as ->private is initialized by bio_init_fields.

>   btrfs_io_bio(bio)->device = dev;
> - bio->bi_end_io = btrfs_end_bio;
> - bio->bi_iter.bi_sector = physical >> 9;
> + bio_init_fields(bio, dev->bdev, physical >> 9, bbio, btrfs_end_bio, 0, 
> 0);
>   btrfs_debug_in_rcu(fs_info,
>   "btrfs_map_bio: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u",
>   bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector,
>   (unsigned long)dev->bdev->bd_dev, rcu_str_deref(dev->name),
>   dev->devid, bio->bi_iter.bi_size);
> - bio_set_dev(bio, dev->bdev);
>  
>   btrfs_bio_counter_inc_noblocked(fs_info);
>  
>

Re: [PATCH v1] ipv4: add iPv4_is_multicast() check in ip_mc_leave_group().

2021-01-19 Thread Nikolay Aleksandrov

On 19/01/2021 06:39, Jakub Kicinski wrote:
> On Sun, 17 Jan 2021 05:34:16 -0800 wangyingji...@126.com wrote:
>> From: Yingjie Wang 
>>
>> There is no iPv4_is_multicast() check added to ip_mc_leave_group()
>> to check if imr->imr_multiaddr.s_addr is a multicast address.
>> If not a multicast address, it may result in an error.
> 
> Could you please say more? From looking at the code it seems like
> no address should match if group is non-mcast, and -EADDRNOTAVAIL 
> will be returned.
> 
> Adding Nik to CC.
> 

Thanks, and absolutely right. I don't see any point in changing the code, also
you are definitely not fixing any bug. 

>> In some cases, the callers of ip_mc_leave_group() don't check
>> whether it is multicast address or not before calling
>> such as do_ip_setsockopt(). So I suggest adding the ipv4_is_multicast()
>> check to the ip_mc_leave_group() to prevent this from happening.
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Signed-off-by: Yingjie Wang 
>> ---
>>  net/ipv4/igmp.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
>> index 7b272bbed2b4..1b6f91271cfd 100644
>> --- a/net/ipv4/igmp.c
>> +++ b/net/ipv4/igmp.c
>> @@ -2248,6 +2248,9 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn 
>> *imr)
>>  u32 ifindex;
>>  int ret = -EADDRNOTAVAIL;
>>  
>> +if (!ipv4_is_multicast(group))
>> +return -EINVAL;
>> +
>>  ASSERT_RTNL();
>>  
>>  in_dev = ip_mc_find_dev(net, imr);
>

Re: [PATCH v4 net-next] net: bridge: check vlan with eth_type_vlan() method

2021-01-18 Thread Nikolay Aleksandrov

On 17/01/2021 10:09, menglong8.d...@gmail.com wrote:
> From: Menglong Dong 
> 
> Replace some checks for ETH_P_8021Q and ETH_P_8021AD with
> eth_type_vlan().
> 
> Signed-off-by: Menglong Dong 
> ---
> v4:
> - remove unnecessary brackets.
> 
> v3:
> - fix compile warning in br_vlan_set_proto() by casting 'val' to
>   be16.
> 
> v2:
> - use eth_type_vlan() in br_validate() and __br_vlan_set_proto()
>   too.
> ---
>  net/bridge/br_forward.c |  3 +--
>  net/bridge/br_netlink.c | 12 +++-
>  net/bridge/br_vlan.c|  2 +-
>  3 files changed, 5 insertions(+), 12 deletions(-)
> 

Acked-by: Nikolay Aleksandrov

Re: [RFC PATCH v2] net: bridge: igmp: Extend IGMP query to be per vlan

2021-01-18 Thread Nikolay Aleksandrov

On 16/01/2021 17:39, Joachim Wiberg wrote:
> On Wed, Jan 13, 2021 at 14:15, Nikolay Aleksandrov  wrote:
>> On 12/01/2021 15:59, Horatiu Vultur wrote:
>>> Based on the comments of the previous version, we started to work on a
>>> new version, so it would be possible to enable/disable queries per vlan.
>>> [snip]
>>> We were wondering if this what you had in mind when you proposed to have
>>> this per vlan? Or we are completely off? Or we should fix some of the
>>> issues that I mentioned, before you can see more clearly the direction?
>> No, unfortunately not even close. We already have per-port per-vlan and 
>> global per-vlan
>> contexts which are also linked together for each vlan, those must be used 
>> for any vlan
>> configuration and state. The problem is that you'd have to mix igmp and vlan 
>> code and
>> those two live under two different kconfig options, and worse rely on 
>> different locks, so
>> extra care must be taken.
>> [snip]
>> If you don't need this asap, I'll probably get to it in two months
>> after EHT and the new bridge flush api, even we are still carrying an 
>> out-of-tree patch
>> for this which someone (not from cumulus) tried to upstream a few years 
>> back, but it also has
>> wrong design in general. :)
> 
> Hi,
> 
> very interesting thread this!  I believe I may be the one who posted the
> patch[1] a few years ago, and I fully agree with Nik here.  We developed
> the basic concepts further at Westermo, but it's been really difficult
> to get it stable.
> 
> We have discussed at length at work if an IGMP snooping implementation
> really belongs in the bridge, or if it's better suited as a user space
> daemon?  Similar to what was decided for RSTP/MSTP support, i.e., the
> bridge only has STP and RSTP/MSTP is handled by mstpd[2].
> 
> Most of what's required for a user space implementation is available,
> but it would've been nice if a single AF_PACKET socket on br0 could be
> used to catch what brport (ifindex) a query or report comes in on.  As
> it is now that information is lost/replaced with the ifindex of br0.
> And then there's the issue of detecting and forwarding to a multicast
> routing daemon on top of br0.  That br0 is not a brport in the MDB, or
> that host_joined cannot be set/seen with iproute2 is quite limiting.
> These issues can of course be addressed, but are they of interest to
> the community at large?
> 
> 
> Best regards
>  /Joachim
> 
> [1]: https://lore.kernel.org/netdev/20180418120713.GA10742@troglobit/
> [2]: https://github.com/mstpd/mstpd
> 

Hi Joachim,
I actually had started implementing IGMPv3/MLDv2 as a user-space daemon part of
FRRouting (since it already has a lot of the required infra to talk to the 
kernel).
It also has IGMPv3/MLDv2 support within pimd, so a lot of code can be shared.
Obviously there are pros and cons to each choice, but I'd be interested to see a
full user-space implementation. I decided to make the kernel support more 
complete
since it already did IGMPv2 and so stopped with the new FRR daemon. If needed 
I'd be
happy to help with the kernel support for a new user-space daemon, and also can
contribute to the daemon itself if time permits.

Thanks,
 Nik

Re: [PATCH net-next] net: bridge: use eth_type_vlan in br_dev_queue_push_xmit

2021-01-14 Thread Nikolay Aleksandrov

On 14/01/2021 09:51, menglong8.d...@gmail.com wrote:
> From: Menglong Dong 
> 
> Replace the check for ETH_P_8021Q and ETH_P_8021AD in
> br_dev_queue_push_xmit with eth_type_vlan.
> 
> Signed-off-by: Menglong Dong 
> ---
>  net/bridge/br_forward.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
> index e28ffadd1371..6e9b049ae521 100644
> --- a/net/bridge/br_forward.c
> +++ b/net/bridge/br_forward.c
> @@ -39,8 +39,7 @@ int br_dev_queue_push_xmit(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>   br_drop_fake_rtable(skb);
>  
>   if (skb->ip_summed == CHECKSUM_PARTIAL &&
> - (skb->protocol == htons(ETH_P_8021Q) ||
> -  skb->protocol == htons(ETH_P_8021AD))) {
> + eth_type_vlan(skb->protocol)) {
>   int depth;
>  
>   if (!__vlan_get_protocol(skb, skb->protocol, &depth))
> 


Please change all similar checks, there are also:
br_netlink.c - br_validate()
br_vlan.c - br_vlan_set_proto()

Thanks.

[PATCH] locking/rwsem: Remove empty rwsem.h

2021-01-13 Thread Nikolay Borisov

Signed-off-by: Nikolay Borisov 
---
 kernel/locking/rwsem.h | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index e69de29bb2d1..
-- 
2.25.1

Re: [PATCH] net/bridge: Fix inconsistent format argument types

2021-01-13 Thread Nikolay Aleksandrov

On 13/01/2021 11:44, Jiapeng Zhong wrote:
> Fix the following warnings:
> 
> net/bridge/br_sysfs_if.c(162): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> net/bridge/br_sysfs_if.c(155): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> net/bridge/br_sysfs_if.c(148): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> 
> Signed-off-by: Jiapeng Zhong 
> Reported-by: Abaci Robot
> ---
>  net/bridge/br_sysfs_if.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

As I replied to your other patch with the same subject please squash them
together and send them targeted at net-next.

Thanks.

> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index 7a59cdd..16a7d41 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -145,21 +145,21 @@ static ssize_t show_port_state(struct net_bridge_port 
> *p, char *buf)
>  static ssize_t show_message_age_timer(struct net_bridge_port *p,
>   char *buf)
>  {
> - return sprintf(buf, "%ld\n", br_timer_value(&p->message_age_timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&p->message_age_timer));
>  }
>  static BRPORT_ATTR(message_age_timer, 0444, show_message_age_timer, NULL);
>  
>  static ssize_t show_forward_delay_timer(struct net_bridge_port *p,
>   char *buf)
>  {
> - return sprintf(buf, "%ld\n", br_timer_value(&p->forward_delay_timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&p->forward_delay_timer));
>  }
>  static BRPORT_ATTR(forward_delay_timer, 0444, show_forward_delay_timer, 
> NULL);
>  
>  static ssize_t show_hold_timer(struct net_bridge_port *p,
>   char *buf)
>  {
> - return sprintf(buf, "%ld\n", br_timer_value(&p->hold_timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&p->hold_timer));
>  }
>  static BRPORT_ATTR(hold_timer, 0444, show_hold_timer, NULL);
>  
>

Re: [PATCH] net/bridge: Fix inconsistent format argument types

2021-01-13 Thread Nikolay Aleksandrov

On 13/01/2021 11:36, Jiapeng Zhong wrote:
> Fix the following warnings:
> 
> net/bridge/br_sysfs_br.c(833): warning: %u in format string (no. 1)
> requires 'unsigned int' but the argument type is 'signed int'.
> net/bridge/br_sysfs_br.c(817): warning: %u in format string (no. 1)
> requires 'unsigned int' but the argument type is 'signed int'.
> net/bridge/br_sysfs_br.c(261): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> net/bridge/br_sysfs_br.c(253): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> net/bridge/br_sysfs_br.c(244): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> net/bridge/br_sysfs_br.c(236): warning: %ld in format string (no. 1)
> requires 'long' but the argument type is 'unsigned long'.
> 
> Signed-off-by: Jiapeng Zhong 
> Reported-by: Abaci Robot
> ---
>  net/bridge/br_sysfs_br.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 

Hi,
You have sent 2 patches with the same subject.. Please squash them into a single
patch and target it to net-next, these don't need to be backported.

Thanks,
 Nik

> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> index 7db06e3..7512921 100644
> --- a/net/bridge/br_sysfs_br.c
> +++ b/net/bridge/br_sysfs_br.c
> @@ -233,7 +233,7 @@ static ssize_t hello_timer_show(struct device *d,
>   struct device_attribute *attr, char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%ld\n", br_timer_value(&br->hello_timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&br->hello_timer));
>  }
>  static DEVICE_ATTR_RO(hello_timer);
>  
> @@ -241,7 +241,7 @@ static ssize_t tcn_timer_show(struct device *d, struct 
> device_attribute *attr,
> char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%ld\n", br_timer_value(&br->tcn_timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&br->tcn_timer));
>  }
>  static DEVICE_ATTR_RO(tcn_timer);
>  
> @@ -250,7 +250,7 @@ static ssize_t topology_change_timer_show(struct device 
> *d,
> char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%ld\n", 
> br_timer_value(&br->topology_change_timer));
> + return sprintf(buf, "%lu\n", 
> br_timer_value(&br->topology_change_timer));
>  }
>  static DEVICE_ATTR_RO(topology_change_timer);
>  
> @@ -258,7 +258,7 @@ static ssize_t gc_timer_show(struct device *d, struct 
> device_attribute *attr,
>char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%ld\n", br_timer_value(&br->gc_work.timer));
> + return sprintf(buf, "%lu\n", br_timer_value(&br->gc_work.timer));
>  }
>  static DEVICE_ATTR_RO(gc_timer);
>  
> @@ -814,7 +814,7 @@ static ssize_t vlan_stats_enabled_show(struct device *d,
>  char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%u\n", br_opt_get(br, BROPT_VLAN_STATS_ENABLED));
> + return sprintf(buf, "%d\n", br_opt_get(br, BROPT_VLAN_STATS_ENABLED));
>  }
>  
>  static ssize_t vlan_stats_enabled_store(struct device *d,
> @@ -830,7 +830,7 @@ static ssize_t vlan_stats_per_port_show(struct device *d,
>   char *buf)
>  {
>   struct net_bridge *br = to_bridge(d);
> - return sprintf(buf, "%u\n", br_opt_get(br, BROPT_VLAN_STATS_PER_PORT));
> + return sprintf(buf, "%d\n", br_opt_get(br, BROPT_VLAN_STATS_PER_PORT));
>  }
>  
>  static ssize_t vlan_stats_per_port_store(struct device *d,
>

Re: [RFC PATCH v2] net: bridge: igmp: Extend IGMP query to be per vlan

2021-01-13 Thread Nikolay Aleksandrov

On 12/01/2021 15:59, Horatiu Vultur wrote:
> Based on the comments of the previous version, we started to work on a
> new version, so it would be possible to enable/disable queries per vlan.
> This is still work in progress and there are plenty of things that are
> not implemented and tested:
> - ipv6 support
> - the fast path needs to be improved
> - currently it is possible only to enable/disable the queries per vlan,
>   all the other configurations are global
> - toggling vlan_filtering is not tested
> - remove duplicated information
> - etc...
> 
> But there are few things that are working like:
> - sending queries per vlan
> - stop sending queries if there is a better querier per vlan
> - when ports are added/removed from vlan
> - etc...
> 
> We were wondering if this what you had in mind when you proposed to have
> this per vlan? Or we are completely off? Or we should fix some of the
> issues that I mentioned, before you can see more clearly the direction?
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  include/uapi/linux/if_link.h |   1 +
>  net/bridge/br_device.c   |   2 +-
>  net/bridge/br_input.c|   2 +-
>  net/bridge/br_multicast.c| 505 ++-
>  net/bridge/br_netlink.c  |   9 +-
>  net/bridge/br_private.h  |  90 ++-
>  net/bridge/br_sysfs_br.c |  31 ++-
>  net/bridge/br_vlan.c |   3 +
>  8 files changed, 560 insertions(+), 83 deletions(-)
> 

Hi Horatiu,
No, unfortunately not even close. We already have per-port per-vlan and global 
per-vlan
contexts which are also linked together for each vlan, those must be used for 
any vlan
configuration and state. The problem is that you'd have to mix igmp and vlan 
code and
those two live under two different kconfig options, and worse rely on different 
locks, so
extra care must be taken. Any vlan lookups must use the vlan hashes, (almost) 
_no_ linear
walks or new lists are needed (the exception is obviously port going down where 
a walk
over port's vlans is needed). In almost all contexts below a vlan lookup has 
already been
done by the input functions, the result of that lookup must be saved and 
re-used. The
vlan options API needs to be used for configuring vlans (per-vlan mcast 
options), unfortunately
I still haven't upstreamed the iproute2 part, so you might have to do that as 
well.
Obviously with all of the above the current default situation must not change 
unless the
user configures it so. If you don't need this asap, I'll probably get to it in 
two months
after EHT and the new bridge flush api, even we are still carrying an 
out-of-tree patch
for this which someone (not from cumulus) tried to upstream a few years back, 
but it also has
wrong design in general. :)

Thanks,
 Nik

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 82708c6db432..11ec1d45c24e 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -472,6 +472,7 @@ enum {
>   IFLA_BR_MCAST_MLD_VERSION,
>   IFLA_BR_VLAN_STATS_PER_PORT,
>   IFLA_BR_MULTI_BOOLOPT,
> + IFLA_BR_MCAST_QUERIER_VID,
>   __IFLA_BR_MAX,
>  };
>  
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 3f2f06b4dd27..aca4e8074a8f 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -89,7 +89,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>  
>   mdst = br_mdb_get(br, skb, vid);
>   if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> - br_multicast_querier_exists(br, eth_hdr(skb), mdst))
> + br_multicast_querier_exists(br, eth_hdr(skb), mdst, vid))
>   br_multicast_flood(mdst, skb, false, true);
>   else
>   br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 85d9dae2..03e445af6c1f 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -130,7 +130,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>   case BR_PKT_MULTICAST:
>   mdst = br_mdb_get(br, skb, vid);
>   if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> - br_multicast_querier_exists(br, eth_hdr(skb), mdst)) {
> + br_multicast_querier_exists(br, eth_hdr(skb), mdst, vid)) {
>   if ((mdst && mdst->host_joined) ||
>   br_multicast_is_router(br)) {
>   local_rcv = true;
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 257ac4e25f6d..b4fac25101e4 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -48,8 +48,11 @@ static const struct rhashtable_params 
> br_sg_port_rht_params = {
>   .automatic_shrinking = true,
>  };
>  
> +static void br_ip4_multicast_query_expired(struct timer_list *t);
> +static void br_ip4_multicast_qu

Re: [PATCH v2 net-next 1/6] net: bridge: notify switchdev of disappearance of old FDB entry upon migration

2020-12-13 Thread Nikolay Aleksandrov

On 13/12/2020 15:55, Vladimir Oltean wrote:
> Hi Nik,
> 
> On Sun, Dec 13, 2020 at 03:22:16PM +0200, Nikolay Aleksandrov wrote:
>> Hi Vladimir,
>> Thank you for the good explanation, it really helps a lot to understand the 
>> issue.
>> Even though it's deceptively simple, that call adds another lock/unlock for 
>> everyone
>> when moving or learning (due to notifier lock)
> 
> This unlikely code path is just on movement, as far as I understand it.
> How often do we expect that to happen? Is there any practical use case
> where an FDB entry ping pongs between ports?
> 

It was my bad because I was looking at the wrong atomic notifier call function.
Switchdev uses the standard atomic notifier call chain with RCU only which is 
fine
and there are no locks involved.
I was looking at the _robust version with a spin_lock and that would've meant 
that
learning (because of notifications) would also block movements and vice versa.

Anyway as I said all of that is not an issue, the patch is good. I've replied 
to my comment
and acked it a few minutes ago.

>> , but I do like how simple the solution
>> becomes with this change, so I'm not strictly against it. I think I'll add a 
>> "refcnt"-like
>> check in the switchdev fn which would process the chain only when there are 
>> registered users
>> to avoid any locks when moving fdbs on pure software bridges (like it was 
>> before swdev).
> 
> That makes sense.
> 
>> I get that the alternative is to track these within DSA, I'm tempted to say 
>> that's not such
>> a bad alternative as this change would make moving fdbs slower in general.
> 
> I deliberately said "rule" instead of "static FDB entry" and "control
> interface" instead of "CPU port" because this is not only about DSA.
> I know of at least one other switchdev device which doesn't support
> source address learning for host-injected traffic. It isn't even so much
> of a quirk as it is the way that the hardware works. If you think of it
> as a "switch with queues", there would be little reason for a hardware
> designer to not just provide you the means to inject directly into the
> queues of the egress port, therefore bypassing the normal analyzer and
> forwarding logic.
> 
> Everything we do in DSA must be copied sooner or later in other similar
> drivers, to get the same functionality. So I would really like to keep
> this interface simple, and not inflict unnecessary complications if
> possible.
> 

Right, I like how the solution and this set look.

>> Have you thought about another way to find out, e.g. if more fdb
>> information is passed to the notifications ?
> 
> Like what, keep emitting just the ADD notification, but put some
> metadata in it letting listeners know that it was actually migrated from
> a different bridge port, in order to save one notification? That would
> mean that we would need to:
> 
>   case SWITCHDEV_FDB_ADD_TO_DEVICE:
>   fdb_info = ptr;
> 
>   if (dsa_slave_dev_check(dev)) {
>   if (!fdb_info->migrated_from_dev || 
> dsa_slave_dev_check(fdb_info->migrated_from_dev)) {
>   if (!fdb_info->added_by_user)
>   return NOTIFY_OK;
> 
>   dp = dsa_slave_to_port(dev);
> 
>   add = true;
>   } else if (fdb_info->migrated_from_dev && 
> !dsa_slave_dev_check(fdb_info->migrated_from_dev)) {
>   /* An address has migrated from a non-DSA port
>* to a DSA port. Check if that non-DSA port was
>* bridged with us, aka if we previously had 
> that
>* address installed towards the CPU.
>*/
>   struct net_device *br_dev;
>   struct dsa_slave_priv *p;
> 
>   br_dev = netdev_master_upper_dev_get_rcu(dev);
>   if (!br_dev)
>   return NOTIFY_DONE;
> 
>   if (!netif_is_bridge_master(br_dev))
>   return NOTIFY_DONE;
> 
>   p = dsa_slave_dev_lower_find(br_dev);
>   if (!p)
>   return NOTIFY_DONE;
> 
>   delete = true;
>   }
>   } else {
>

Re: [PATCH v2 net-next 1/6] net: bridge: notify switchdev of disappearance of old FDB entry upon migration

2020-12-13 Thread Nikolay Aleksandrov

On 13/12/2020 15:22, Nikolay Aleksandrov wrote:
> On 13/12/2020 04:40, Vladimir Oltean wrote:
>> Currently the bridge emits atomic switchdev notifications for
>> dynamically learnt FDB entries. Monitoring these notifications works
>> wonders for switchdev drivers that want to keep their hardware FDB in
>> sync with the bridge's FDB.
>>
>> For example station A wants to talk to station B in the diagram below,
>> and we are concerned with the behavior of the bridge on the DUT device:
>>
>>DUT
>>  +-+
>>  | br0 |
>>  | +--+ +--+ +--+ +--+ |
>>  | |  | |  | |  | |  | |
>>  | | swp0 | | swp1 | | swp2 | | eth0 | |
>>  +-+
>>   ||  |
>>   Station A|  |
>>|  |
>>  +--+--+--++--+--+--+
>>  |  |  |  ||  |  |  |
>>  |  | swp0 |  ||  | swp0 |  |
>>  Another |  +--+  ||  +--+  | Another
>>   switch | br0|| br0| switch
>>  |  +--+  ||  +--+  |
>>  |  |  |  ||  |  |  |
>>  |  | swp1 |  ||  | swp1 |  |
>>  +--+--+--++--+--+--+
>>   |
>>   Station B
>>
>> Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
>> the following property: frames injected from its control interface bypass
>> the internal address analyzer logic, and therefore, this hardware does
>> not learn from the source address of packets transmitted by the network
>> stack through it. So, since bridging between eth0 (where Station B is
>> attached) and swp0 (where Station A is attached) is done in software,
>> the switchdev hardware will never learn the source address of Station B.
>> So the traffic towards that destination will be treated as unknown, i.e.
>> flooded.
>>
>> This is where the bridge notifications come in handy. When br0 on the
>> DUT sees frames with Station B's MAC address on eth0, the switchdev
>> driver gets these notifications and can install a rule to send frames
>> towards Station B's address that are incoming from swp0, swp1, swp2,
>> only towards the control interface. This is all switchdev driver private
>> business, which the notification makes possible.
>>
>> All is fine until someone unplugs Station B's cable and moves it to the
>> other switch:
>>
>>DUT
>>  +-+
>>  | br0 |
>>  | +--+ +--+ +--+ +--+ |
>>  | |  | |  | |  | |  | |
>>  | | swp0 | | swp1 | | swp2 | | eth0 | |
>>  +-+
>>   ||  |
>>   Station A|  |
>>|  |
>>  +--+--+--++--+--+--+
>>  |  |  |  ||  |  |  |
>>  |  | swp0 |  ||  | swp0 |  |
>>  Another |  +--+  ||  +--+  | Another
>>   switch | br0|| br0| switch
>>  |  +--+  ||  +--+  |
>>  |  |  |  ||  |  |  |
>>  |  | swp1 |  ||  | swp1 |  |
>>  +--+--+--++--+--+--+
>>|
>>Station B
>>
>> Luckily for the use cases we care about, Station B is noisy enough that
>> the DUT hears it (on swp1 this time). swp1 receives the frames and
>> delivers them to the bridge, who enters the unlikely path in br_fdb_update
>> of updating an existing entry. It moves the entry in the software bridge
>> to swp1 and emits an addition notification towards that.
>>
>> As far as the switchdev driver is concerned, all that it needs to ensure
>> is that traffic between Station A and Station B is not forever broken.
>> If it does nothing, then the stale rule to send frames for Station B
>> towards the control interface remains in place. But Station B is no
>> longer reachable via the control interface, but via a port that can
>> offload the bridge port learning attribute. It's just that the port is
>> prevented from learning this address, since the rule overrides FDB
>> updates. So the rule needs to go. The question is via what mechanism.
>>
>> It sure would be possible for this switchdev driver to keep track of all
>>

Re: [PATCH v2 net-next 1/6] net: bridge: notify switchdev of disappearance of old FDB entry upon migration

2020-12-13 Thread Nikolay Aleksandrov

On 13/12/2020 04:40, Vladimir Oltean wrote:
> Currently the bridge emits atomic switchdev notifications for
> dynamically learnt FDB entries. Monitoring these notifications works
> wonders for switchdev drivers that want to keep their hardware FDB in
> sync with the bridge's FDB.
> 
> For example station A wants to talk to station B in the diagram below,
> and we are concerned with the behavior of the bridge on the DUT device:
> 
>DUT
>  +-+
>  | br0 |
>  | +--+ +--+ +--+ +--+ |
>  | |  | |  | |  | |  | |
>  | | swp0 | | swp1 | | swp2 | | eth0 | |
>  +-+
>   ||  |
>   Station A|  |
>|  |
>  +--+--+--++--+--+--+
>  |  |  |  ||  |  |  |
>  |  | swp0 |  ||  | swp0 |  |
>  Another |  +--+  ||  +--+  | Another
>   switch | br0|| br0| switch
>  |  +--+  ||  +--+  |
>  |  |  |  ||  |  |  |
>  |  | swp1 |  ||  | swp1 |  |
>  +--+--+--++--+--+--+
>   |
>   Station B
> 
> Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has
> the following property: frames injected from its control interface bypass
> the internal address analyzer logic, and therefore, this hardware does
> not learn from the source address of packets transmitted by the network
> stack through it. So, since bridging between eth0 (where Station B is
> attached) and swp0 (where Station A is attached) is done in software,
> the switchdev hardware will never learn the source address of Station B.
> So the traffic towards that destination will be treated as unknown, i.e.
> flooded.
> 
> This is where the bridge notifications come in handy. When br0 on the
> DUT sees frames with Station B's MAC address on eth0, the switchdev
> driver gets these notifications and can install a rule to send frames
> towards Station B's address that are incoming from swp0, swp1, swp2,
> only towards the control interface. This is all switchdev driver private
> business, which the notification makes possible.
> 
> All is fine until someone unplugs Station B's cable and moves it to the
> other switch:
> 
>DUT
>  +-+
>  | br0 |
>  | +--+ +--+ +--+ +--+ |
>  | |  | |  | |  | |  | |
>  | | swp0 | | swp1 | | swp2 | | eth0 | |
>  +-+
>   ||  |
>   Station A|  |
>|  |
>  +--+--+--++--+--+--+
>  |  |  |  ||  |  |  |
>  |  | swp0 |  ||  | swp0 |  |
>  Another |  +--+  ||  +--+  | Another
>   switch | br0|| br0| switch
>  |  +--+  ||  +--+  |
>  |  |  |  ||  |  |  |
>  |  | swp1 |  ||  | swp1 |  |
>  +--+--+--++--+--+--+
>|
>Station B
> 
> Luckily for the use cases we care about, Station B is noisy enough that
> the DUT hears it (on swp1 this time). swp1 receives the frames and
> delivers them to the bridge, who enters the unlikely path in br_fdb_update
> of updating an existing entry. It moves the entry in the software bridge
> to swp1 and emits an addition notification towards that.
> 
> As far as the switchdev driver is concerned, all that it needs to ensure
> is that traffic between Station A and Station B is not forever broken.
> If it does nothing, then the stale rule to send frames for Station B
> towards the control interface remains in place. But Station B is no
> longer reachable via the control interface, but via a port that can
> offload the bridge port learning attribute. It's just that the port is
> prevented from learning this address, since the rule overrides FDB
> updates. So the rule needs to go. The question is via what mechanism.
> 
> It sure would be possible for this switchdev driver to keep track of all
> addresses which are sent to the control interface, and then also listen
> for bridge notifier events on its own ports, searching for the ones that
> have a MAC address which was previously sent to the control interface.
> But this is cumbersome and inefficient. Instead, with one small change,
> the bridge could notify of the address deletion from the old port, in a
> symmetrical manner with how it did for the insertion. Then the switchdev
> driver would not be required to monitor learn/forget events for its own
> ports. It could just delete the rule towards the control interface upon
> bridge entry migration. This would make hardware address learning be
> possible again. Then it would take a

Re: [PATCH v2] net: bridge: Fix a warning when del bridge sysfs

2020-12-11 Thread Nikolay Aleksandrov

On 11/12/2020 14:29, Wang Hai wrote:
> I got a warining report:
> 
> br_sysfs_addbr: can't create group bridge4/bridge
> [ cut here ]
> sysfs group 'bridge' not found for kobject 'bridge4'
> WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group 
> fs/sysfs/group.c:279 [inline]
> WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 
> sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270
> Modules linked in: iptable_nat
> ...
> Call Trace:
>   br_dev_delete+0x112/0x190 net/bridge/br_if.c:384
>   br_dev_newlink net/bridge/br_netlink.c:1381 [inline]
>   br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362
>   __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441
>   rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
>   rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562
>   netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494
>   netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
>   netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330
>   netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919
>   sock_sendmsg_nosec net/socket.c:651 [inline]
>   sock_sendmsg+0x139/0x170 net/socket.c:671
>   sys_sendmsg+0x658/0x7d0 net/socket.c:2353
>   ___sys_sendmsg+0xf8/0x170 net/socket.c:2407
>   __sys_sendmsg+0xd3/0x190 net/socket.c:2440
>   do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> In br_device_event(), if the bridge sysfs fails to be added,
> br_device_event() should return error. This can prevent warining
> when removing bridge sysfs that do not exist.
> 
> Fixes: bb900b27a2f4 ("bridge: allow creating bridge devices with netlink")
> Reported-by: Hulk Robot 
> Signed-off-by: Wang Hai 
> ---
> v1->v2: Fix this by check br_sysfs_addbr() return value as Nik's suggestion
>  net/bridge/br.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index 401eeb9142eb..1b169f8e7491 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -43,7 +43,10 @@ static int br_device_event(struct notifier_block *unused, 
> unsigned long event, v
>  
>   if (event == NETDEV_REGISTER) {
>   /* register of bridge completed, add sysfs entries */
> - br_sysfs_addbr(dev);
> + err = br_sysfs_addbr(dev);
> + if (err)
> +     return notifier_from_errno(err);
> +
>   return NOTIFY_DONE;
>   }
>   }
> 

Patch looks good, I also tested it with a notifier error injecting.
Tested-by: Nikolay Aleksandrov 
Acked-by: Nikolay Aleksandrov

Re: [RFC net-next] net: bridge: igmp: Extend IGMP query with vlan support

2020-12-11 Thread Nikolay Aleksandrov

On 11/12/2020 11:26, Horatiu Vultur wrote:
> This patch tries to add vlan support to IGMP queries.
> It extends the function 'br_ip4_multicast_alloc_query' to add
> also a vlan tag if vlan is enabled. Therefore the bridge will send
> queries for each vlan the ports are in.
> 
> There are few other places that needs to be updated to be fully
> functional. But I am curious if this is the way to go forward or is
> there a different way of implementing this?
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  net/bridge/br_multicast.c | 31 ++-
>  1 file changed, 26 insertions(+), 5 deletions(-)
> 

Hi Horatiu,
We've discussed this with other people on netdev before, the way forward is to
implement it as a per-vlan option and then have a per-vlan querier. Which would 
also
make the change much bigger and more complex. In general some of the multicast 
options
need to be replicated for vlans to get proper per-vlan multicast control and 
operation, but
that would require to change a lot of logic around the whole bridge (fast-path 
included,
where it'd be most sensitive). The good news is that these days we have 
per-vlan options
support and so only the actual per-vlan multicast implementation is left to be 
done.
I have this on my TODO list, unfortunately that list gets longer and longer,
so I'd be happy to review patches if someone decides to do it sooner. :)

Sorry, I couldn't find the previous discussion, it was a few years back.

Cheers,
 Nik

> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index 484820c223a3..4c2db8a9efe0 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -688,7 +688,8 @@ static struct sk_buff 
> *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   __be32 ip_dst, __be32 group,
>   bool with_srcs, bool 
> over_lmqt,
>   u8 sflag, u8 *igmp_type,
> - bool *need_rexmit)
> + bool *need_rexmit,
> + __u16 vid)
>  {
>   struct net_bridge_port *p = pg ? pg->key.port : NULL;
>   struct net_bridge_group_src *ent;
> @@ -724,6 +725,9 @@ static struct sk_buff 
> *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   }
>  
>   pkt_size = sizeof(*eth) + sizeof(*iph) + 4 + igmp_hdr_size;
> + if (br_vlan_enabled(br->dev) && vid != 0)
> + pkt_size += 4;
> +
>   if ((p && pkt_size > p->dev->mtu) ||
>   pkt_size > br->dev->mtu)
>   return NULL;
> @@ -732,6 +736,9 @@ static struct sk_buff 
> *br_ip4_multicast_alloc_query(struct net_bridge *br,
>   if (!skb)
>   goto out;
>  
> + if (br_vlan_enabled(br->dev) && vid != 0)
> + __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vid);
> +
>   skb->protocol = htons(ETH_P_IP);
>  
>   skb_reset_mac_header(skb);
> @@ -1008,7 +1015,8 @@ static struct sk_buff *br_multicast_alloc_query(struct 
> net_bridge *br,
>   ip4_dst, group->dst.ip4,
>   with_srcs, over_lmqt,
>   sflag, igmp_type,
> - need_rexmit);
> + need_rexmit,
> + group->vid);
>  #if IS_ENABLED(CONFIG_IPV6)
>   case htons(ETH_P_IPV6): {
>   struct in6_addr ip6_dst;
> @@ -1477,6 +1485,8 @@ static void br_multicast_send_query(struct net_bridge 
> *br,
>   struct bridge_mcast_own_query *own_query)
>  {
>   struct bridge_mcast_other_query *other_query = NULL;
> + struct net_bridge_vlan_group *vg;
> + struct net_bridge_vlan *v;
>   struct br_ip br_group;
>   unsigned long time;
>  
> @@ -1485,7 +1495,7 @@ static void br_multicast_send_query(struct net_bridge 
> *br,
>   !br_opt_get(br, BROPT_MULTICAST_QUERIER))
>   return;
>  
> - memset(&br_group.dst, 0, sizeof(br_group.dst));
> + memset(&br_group, 0, sizeof(br_group));
>  
>   if (port ? (own_query == &port->ip4_own_query) :
>  (own_query == &br->ip4_own_query)) {
> @@ -1501,8 +1511,19 @@ static void br_multicast_send_query(struct net_bridge 
> *br,
>   if (!other_query || timer_pending(&other_query->timer))
>   return;
>  
> - __br_multicast_send_query(br, port, NULL, NULL, &br_group, false, 0,
> -   NULL);
> + if (br_vlan_enabled(br->dev) && port) {
> + vg = nbp_vlan_group(port);
> +
> + list_for_each_entry(v, &vg->vlan_list, vlist) {
> + br_group.vid = v->vid == vg->pvid ? 0 : v->vid;
> +
> + __br_mult

Re: [RFC PATCH 10/11] mm/filemap: Add folio_add_to_page_cache

2020-12-11 Thread Nikolay Borisov




On 8.12.20 г. 21:46 ч., Matthew Wilcox (Oracle) wrote:
> Pages being added to the page cache should already be folios, so
> turn add_to_page_cache_lru() into a wrapper.  Saves hundreds of
> bytes of text.
> 
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>  include/linux/pagemap.h | 13 +++--
>  mm/filemap.c| 62 -
>  2 files changed, 41 insertions(+), 34 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 060faeb8d701..3bc56b3aa384 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -778,9 +778,9 @@ static inline int fault_in_pages_readable(const char 
> __user *uaddr, int size)
>  }
>  
>  int add_to_page_cache_locked(struct page *page, struct address_space 
> *mapping,
> - pgoff_t index, gfp_t gfp_mask);
> -int add_to_page_cache_lru(struct page *page, struct address_space *mapping,
> - pgoff_t index, gfp_t gfp_mask);
> + pgoff_t index, gfp_t gfp);
> +int folio_add_to_page_cache(struct folio *folio, struct address_space 
> *mapping,
> + pgoff_t index, gfp_t gfp);
>  extern void delete_from_page_cache(struct page *page);
>  extern void __delete_from_page_cache(struct page *page, void *shadow);
>  int replace_page_cache_page(struct page *old, struct page *new, gfp_t 
> gfp_mask);
> @@ -805,6 +805,13 @@ static inline int add_to_page_cache(struct page *page,
>   return error;
>  }
>  
> +static inline int add_to_page_cache_lru(struct page *page,
> + struct address_space *mapping, pgoff_t index, gfp_t gfp)
> +{
> + return folio_add_to_page_cache((struct folio *)page, mapping,
> + index, gfp);
> +}
> +
>  /**
>   * struct readahead_control - Describes a readahead request.
>   *
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 56ff6aa24265..297144524f58 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -828,25 +828,25 @@ int replace_page_cache_page(struct page *old, struct 
> page *new, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL_GPL(replace_page_cache_page);
>  
> -static noinline int __add_to_page_cache_locked(struct page *page,
> +static noinline int __add_to_page_cache_locked(struct folio *folio,
>   struct address_space *mapping,
> - pgoff_t offset, gfp_t gfp,
> + pgoff_t index, gfp_t gfp,
>   void **shadowp)
>  {
> - XA_STATE(xas, &mapping->i_pages, offset);
> - int huge = PageHuge(page);
> + XA_STATE(xas, &mapping->i_pages, index);
> + int huge = PageHuge(&folio->page);

PageHuge also does page_compound, since you know this is either the head
page or not you could use PageHeadHuge which simply checks if it's a
head page and then goes directly to perform the hugepage check via the
dtor.

Re: [PATCH v2 1/8] lib/find_bit.c: Add find_last_zero_bit

2020-12-06 Thread Nikolay Borisov

On 6.12.20 г. 10:56 ч., Yun Levi wrote:
>> This, and the change above this, are not related to this patch so you
>> might not want to include them.
> 
>> Also, why is this patch series even needed?  I don't see a justification
>> for it anywhere, only "what" this patch is, not "why".
> 
> I think the find_last_zero_bit will help to improve in
> 7th patch's change and It can be used in the future.
> But if my thinking is bad.. Please let me know..
> 
> Thanks.
> Levi.
> 

btrfs' free space cache v1 is going to be removed some time in the
future so introducing kernel-wide change just for its own sake is a bit
premature. Also do you have measurements showing it indeed improves
performances?

Re: [PATCH v3] bridge: Fix a deadlock when enabling multicast snooping

2020-12-05 Thread Nikolay Aleksandrov

On 05/12/2020 01:56, Joseph Huang wrote:
> When enabling multicast snooping, bridge module deadlocks on multicast_lock
> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
> network.
> 
> The deadlock was caused by the following sequence: While holding the lock,
> br_multicast_open calls br_multicast_join_snoopers, which eventually causes
> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
> Since the destination Ethernet address is a multicast address, br_dev_xmit
> feeds the packet back to the bridge via br_multicast_rcv, which in turn
> calls br_multicast_add_group, which then deadlocks on multicast_lock.
> 
> The fix is to move the call br_multicast_join_snoopers outside of the
> critical section. This works since br_multicast_join_snoopers only deals
> with IP and does not modify any multicast data structures of the bridge,
> so there's no need to hold the lock.
> 
> Steps to reproduce:
> 1. sysctl net.ipv6.conf.all.force_mld_version=1
> 2. have another querier
> 3. ip link set dev bridge type bridge mcast_snooping 0 && \
>ip link set dev bridge type bridge mcast_snooping 1 < deadlock >
> 
> A typical call trace looks like the following:
> 
> [  936.251495]  _raw_spin_lock+0x5c/0x68
> [  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
> [  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
> [  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
> [  936.269689]  dev_hard_start_xmit+0x94/0x158
> [  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
> [  936.277890]  dev_queue_xmit+0x10/0x18
> [  936.281563]  neigh_resolve_output+0xec/0x198
> [  936.285845]  ip6_finish_output2+0x240/0x710
> [  936.290039]  __ip6_finish_output+0x130/0x170
> [  936.294318]  ip6_output+0x6c/0x1c8
> [  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
> [  936.301834]  igmp6_send+0x358/0x558
> [  936.305326]  igmp6_join_group.part.0+0x30/0xf0
> [  936.309774]  igmp6_group_added+0xfc/0x110
> [  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
> [  936.317885]  ipv6_dev_mc_inc+0x10/0x18
> [  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
> [  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]
> 
> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> Signed-off-by: Joseph Huang 
> ---
>  net/bridge/br_device.c|  6 ++
>  net/bridge/br_multicast.c | 34 +-
>  net/bridge/br_private.h   | 10 ++
>  3 files changed, 41 insertions(+), 9 deletions(-)
> 

LGTM, thanks!
Acked-by: Nikolay Aleksandrov

Re: [PATCH v2] bridge: Fix a deadlock when enabling multicast snooping

2020-12-04 Thread Nikolay Aleksandrov

On 04/12/2020 23:39, Joseph Huang wrote:
> When enabling multicast snooping, bridge module deadlocks on multicast_lock
> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
> network.
> 
> The deadlock was caused by the following sequence: While holding the lock,
> br_multicast_open calls br_multicast_join_snoopers, which eventually causes
> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
> Since the destination Ethernet address is a multicast address, br_dev_xmit
> feeds the packet back to the bridge via br_multicast_rcv, which in turn
> calls br_multicast_add_group, which then deadlocks on multicast_lock.
> 
> The fix is to move the call br_multicast_join_snoopers outside of the
> critical section. This works since br_multicast_join_snoopers only deals
> with IP and does not modify any multicast data structures of the bridge,
> so there's no need to hold the lock.
> 
> Steps to reproduce:
> 1. sysctl net.ipv6.conf.all.force_mld_version=1
> 2. have another querier
> 3. ip link set dev bridge type bridge mcast_snooping 0 && \
>ip link set dev bridge type bridge mcast_snooping 1 < deadlock >
> 
> A typical call trace looks like the following:
> 
> [  936.251495]  _raw_spin_lock+0x5c/0x68
> [  936.255221]  br_multicast_add_group+0x40/0x170 [bridge]
> [  936.260491]  br_multicast_rcv+0x7ac/0xe30 [bridge]
> [  936.265322]  br_dev_xmit+0x140/0x368 [bridge]
> [  936.269689]  dev_hard_start_xmit+0x94/0x158
> [  936.273876]  __dev_queue_xmit+0x5ac/0x7f8
> [  936.277890]  dev_queue_xmit+0x10/0x18
> [  936.281563]  neigh_resolve_output+0xec/0x198
> [  936.285845]  ip6_finish_output2+0x240/0x710
> [  936.290039]  __ip6_finish_output+0x130/0x170
> [  936.294318]  ip6_output+0x6c/0x1c8
> [  936.297731]  NF_HOOK.constprop.0+0xd8/0xe8
> [  936.301834]  igmp6_send+0x358/0x558
> [  936.305326]  igmp6_join_group.part.0+0x30/0xf0
> [  936.309774]  igmp6_group_added+0xfc/0x110
> [  936.313787]  __ipv6_dev_mc_inc+0x1a4/0x290
> [  936.317885]  ipv6_dev_mc_inc+0x10/0x18
> [  936.321677]  br_multicast_open+0xbc/0x110 [bridge]
> [  936.326506]  br_multicast_toggle+0xec/0x140 [bridge]
> 
> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
> Signed-off-by: Joseph Huang 
> ---

Hi,
Thank you for fixing it up, a few minor nits below. Overall the patch
looks good.


>  net/bridge/br_device.c|  6 ++
>  net/bridge/br_multicast.c | 33 -
>  net/bridge/br_private.h   | 10 ++
>  3 files changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 7730c8f3cb53..d3ea9d0779fb 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -177,6 +177,9 @@ static int br_dev_open(struct net_device *dev)
>   br_stp_enable_bridge(br);
>   br_multicast_open(br);
>  
> + if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
> + br_multicast_join_snoopers(br);
> +
>   return 0;
>  }
>  
> @@ -197,6 +200,9 @@ static int br_dev_stop(struct net_device *dev)
>   br_stp_disable_bridge(br);
>   br_multicast_stop(br);
>  
> + if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
> + br_multicast_leave_snoopers(br);
> +
>   netif_stop_queue(dev);
>  
>   return 0;
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index eae898c3cff7..426fe00db708 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -3286,7 +3286,7 @@ static inline void 
> br_ip6_multicast_join_snoopers(struct net_bridge *br)
>  }
>  #endif
>  
> -static void br_multicast_join_snoopers(struct net_bridge *br)
> +void br_multicast_join_snoopers(struct net_bridge *br)
>  {
>   br_ip4_multicast_join_snoopers(br);
>   br_ip6_multicast_join_snoopers(br);
> @@ -3317,7 +3317,7 @@ static inline void 
> br_ip6_multicast_leave_snoopers(struct net_bridge *br)
>  }
>  #endif
>  
> -static void br_multicast_leave_snoopers(struct net_bridge *br)
> +void br_multicast_leave_snoopers(struct net_bridge *br)
>  {
>   br_ip4_multicast_leave_snoopers(br);
>   br_ip6_multicast_leave_snoopers(br);
> @@ -3336,9 +3336,6 @@ static void __br_multicast_open(struct net_bridge *br,
>  
>  void br_multicast_open(struct net_bridge *br)
>  {
> - if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
> - br_multicast_join_snoopers(br);
> -
>   __br_multicast_open(br, &br->ip4_own_query);
>  #if IS_ENABLED(CONFIG_IPV6)
>   __br_multicast_open(br, &br->ip6_own_query);
> @@ -3354,9 +3351,6 @@ void br_multicast_stop(struct net_bridge *br)
>   del_timer_sync(&br->ip6_other_query.timer);
>   del_timer_sync(&br->ip6_own_query.timer);
>  #endif
> -
> - if (br_opt_get(br, BROPT_MULTICAST_ENABLED))
> - br_multicast_leave_snoopers(br);
>  }
>  
>  void br_multicast_dev_del(struct net_bridge *br)
> @@ -3487,6 +3481,8 @@ static void br_multicast_start_querier(struct 
> net_bridge *br,
>  int br_multicast_toggle(struct n

Re: [PATCH net] net: bridge: vlan: fix error return code in __vlan_add()

2020-12-04 Thread Nikolay Aleksandrov

On 04/12/2020 10:48, Zhang Changzhong wrote:
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: f8ed289fab84 ("bridge: vlan: use br_vlan_(get|put)_master to deal with 
> refcounts")
> Reported-by: Hulk Robot 
> Signed-off-by: Zhang Changzhong 
> ---
>  net/bridge/br_vlan.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 3e493eb..08c7741 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -266,8 +266,10 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 
> flags,
>   }
>  
>   masterv = br_vlan_get_master(br, v->vid, extack);
> - if (!masterv)
> + if (!masterv) {
> + err = -ENOMEM;
>   goto out_filt;
> + }
>   v->brvlan = masterv;
>   if (br_opt_get(br, BROPT_VLAN_STATS_PER_PORT)) {
>   v->stats = netdev_alloc_pcpu_stats(struct 
> br_vlan_stats);
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping

2020-12-03 Thread Nikolay Aleksandrov

On 04/12/2020 00:42, Huang, Joseph wrote:
>> From: Huang, Joseph
>> Sent: Thursday, December 3, 2020 4:53 PM
>> To: Nikolay Aleksandrov ; Jakub Kicinski
>> 
>> Cc: Roopa Prabhu ; David S. Miller
>> ; bri...@lists.linux-foundation.org;
>> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
>> 
>> Subject: RE: [PATCH] bridge: Fix a deadlock when enabling multicast snooping
>>
>>> From: Nikolay Aleksandrov 
>>> Sent: Thursday, December 3, 2020 3:47 PM
>>> To: Jakub Kicinski ; Huang, Joseph
>>> 
>>> Cc: Roopa Prabhu ; David S. Miller
>>> ; bri...@lists.linux-foundation.org;
>>> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Linus Lüssing
>>> 
>>> Subject: Re: [PATCH] bridge: Fix a deadlock when enabling multicast
>>> snooping
>>>
>>> On 03/12/2020 20:28, Jakub Kicinski wrote:
>>>> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
>>>>> When enabling multicast snooping, bridge module deadlocks on
>>>>> multicast_lock if 1) IPv6 is enabled, and 2) there is an existing
>>>>> querier on the same L2 network.
>>>>>
>>>>> The deadlock was caused by the following sequence: While holding the
>>>>> lock, br_multicast_open calls br_multicast_join_snoopers, which
>>>>> eventually causes IP stack to (attempt to) send out a Listener Report (in
>>> igmp6_join_group).
>>>>> Since the destination Ethernet address is a multicast address,
>>>>> br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv,
>>>>> which in turn calls br_multicast_add_group, which then deadlocks on
>>> multicast_lock.
>>>>>
>>>>> The fix is to move the call br_multicast_join_snoopers outside of the
>>>>> critical section. This works since br_multicast_join_snoopers only
>>>>> deals with IP and does not modify any multicast data structures of
>>>>> the bridge, so there's no need to hold the lock.
>>>>>
>>>>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
>>>>>
>>>>> Signed-off-by: Joseph Huang 
>>>>
>>>> Nik, Linus - how does this one look?
>>>>
>>>
>>> Hi,
>>> Thanks, somehow I missed this one too. Need to check my email config. :) I
>>> believe I see how it can happen, although it's not straight-forward to 
>>> follow.
>>> A selftest for this case would be great, and any traces (e.g. hung task)
>> would
>>> help a lot as well.
>>> Correct me if I'm wrong but the sequence is something like:
>>> br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc ->
>>> igmp6_group_added
>>> -> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode
>>> -> igmp6_join_group() -> igmp6_join_group
>>> -> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge
>>> -> mcast processing code
>>> which uses the multicast_lock spinlock. Right?
>>
>> That is correct.
>>
>> Here's a stack trace from a typical run:
>>
>> echo -n 1 > /sys/devices/virtual/net/gmn0/bridge/multicast_snooping
>> [  936.146754] rcu: INFO: rcu_preempt self-detected stall on CPU
>> [  936.152534] rcu:   0-: (5594 ticks this GP)
>> idle=75a/1/0x4002 softirq=2787/2789 fqs=2625
>> [  936.162026](t=5253 jiffies g=4205 q=12)
>> [  936.166041] Task dump for CPU 0:
>> [  936.169272] sh  R  running task0  1315   1295 
>> 0x0002
>> [  936.176332] Call trace:
>> [  936.178797]  dump_backtrace+0x0/0x140
>> [  936.182469]  show_stack+0x14/0x20
>> [  936.185793]  sched_show_task+0x108/0x138
>> [  936.189727]  dump_cpu_task+0x40/0x50
>> [  936.193313]  rcu_dump_cpu_stacks+0x94/0xd0
>> [  936.197420]  rcu_sched_clock_irq+0x75c/0x9c0
>> [  936.201698]  update_process_times+0x2c/0x68
>> [  936.205893]  tick_sched_handle.isra.0+0x30/0x50
>> [  936.210432]  tick_sched_timer+0x48/0x98
>> [  936.214272]  __hrtimer_run_queues+0x110/0x1b0
>> [  936.218635]  hrtimer_interrupt+0xe4/0x240
>> [  936.222656]  arch_timer_handler_phys+0x30/0x40
>> [  936.227106]  handle_percpu_devid_irq+0x80/0x140
>> [  936.231654]  generic_handle_irq+0x24/0x38
>> [  936.235669]  __handle_domain_irq+0x60/0xb8
>> [  936.239774]  gic_handle_irq+0x5c/0x148
>> [  936.243535]  el1_irq+0xb8/0x180
>> [  936.246689]  queued_spin_lo

Re: [PATCH] bridge: Fix a deadlock when enabling multicast snooping

2020-12-03 Thread Nikolay Aleksandrov

On 03/12/2020 20:28, Jakub Kicinski wrote:
> On Tue, 1 Dec 2020 16:40:47 -0500 Joseph Huang wrote:
>> When enabling multicast snooping, bridge module deadlocks on multicast_lock
>> if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
>> network.
>>
>> The deadlock was caused by the following sequence: While holding the lock,
>> br_multicast_open calls br_multicast_join_snoopers, which eventually causes
>> IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
>> Since the destination Ethernet address is a multicast address, br_dev_xmit
>> feeds the packet back to the bridge via br_multicast_rcv, which in turn
>> calls br_multicast_add_group, which then deadlocks on multicast_lock.
>>
>> The fix is to move the call br_multicast_join_snoopers outside of the
>> critical section. This works since br_multicast_join_snoopers only deals
>> with IP and does not modify any multicast data structures of the bridge,
>> so there's no need to hold the lock.
>>
>> Fixes: 4effd28c1245 ("bridge: join all-snoopers multicast address")
>>
>> Signed-off-by: Joseph Huang 
> 
> Nik, Linus - how does this one look?
> 

Hi,
Thanks, somehow I missed this one too. Need to check my email config. :)
I believe I see how it can happen, although it's not straight-forward to follow.
A selftest for this case would be great, and any traces (e.g. hung task) would
help a lot as well.
Correct me if I'm wrong but the sequence is something like:
br_multicast_join_snoopers -> ipv6_dev_mc_inc -> __ipv6_dev_mc_inc -> 
igmp6_group_added
-> MLDv1 (mode) igmp6_join_group() -> Again MLDv1 mode igmp6_join_group() -> 
igmp6_join_group
-> igmp6_send() on the bridge device -> br_dev_xmit and onto the bridge mcast 
processing code
which uses the multicast_lock spinlock. Right?

One question - shouldn't leaving have the same problem? I.e. 
br_multicast_toggle -> br_multicast_leave_snoopers
-> br_ip6_multicast_leave_snoopers -> ipv6_dev_mc_dec -> igmp6_group_dropped -> 
igmp6_leave_group ->
MLDv1 mode && last reporter -> igmp6_send() ?

I think it was saved by the fact that !br_opt_get(br, BROPT_MULTICAST_ENABLED) 
would be true and the
multicast lock won't be acquired in the br_dev_xmit path? If so, I'd appreciate 
a comment about that
because it's not really trivial to find out. :)

Anyhow, the patch is fine as-is too:
Acked-by: Nikolay Aleksandrov 

Thanks,
 Nik

Re: [PATCH net] net: bridge: Fix a warning when del bridge sysfs

2020-12-03 Thread Nikolay Aleksandrov

On 03/12/2020 03:03, Jakub Kicinski wrote:
> On Tue, 1 Dec 2020 22:01:14 +0800 Wang Hai wrote:
>> If adding bridge sysfs fails, br->ifobj will be NULL, there is no
>> need to delete its non-existent sysfs when deleting the bridge device,
>> otherwise, it will cause a warning. So, when br->ifobj == NULL,
>> directly return can fix this bug.
>>
>> br_sysfs_addbr: can't create group bridge4/bridge
>> [ cut here ]
>> sysfs group 'bridge' not found for kobject 'bridge4'
>> WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group 
>> fs/sysfs/group.c:279 [inline]
>> WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 
>> sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270
>> Modules linked in: iptable_nat
>> ...
>> Call Trace:
>>   br_dev_delete+0x112/0x190 net/bridge/br_if.c:384
>>   br_dev_newlink net/bridge/br_netlink.c:1381 [inline]
>>   br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362
>>   __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441
>>   rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500
>>   rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562
>>   netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494
>>   netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
>>   netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330
>>   netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919
>>   sock_sendmsg_nosec net/socket.c:651 [inline]
>>   sock_sendmsg+0x139/0x170 net/socket.c:671
>>   sys_sendmsg+0x658/0x7d0 net/socket.c:2353
>>   ___sys_sendmsg+0xf8/0x170 net/socket.c:2407
>>   __sys_sendmsg+0xd3/0x190 net/socket.c:2440
>>   do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
>>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Reported-by: Hulk Robot 
>> Signed-off-by: Wang Hai 
> 
> Nik, is this the way you want to handle this?
> 
> Should the notifier not fail if sysfs files cannot be created?
> Currently br_sysfs_addbr() returns an int but the only caller 
> ignores it.
> 

Hi,
The fix is wrong because this is not the only user of ifobj. The bridge
port sysfs code also uses it and br_sysfs_addif() will create the new
symlink in sysfs_root_kn due to NULL kobj passed which basically means
only one port will be enslaved, the others will fail in creating their
sysfs entries and thus fail to be added as ports.

I'd prefer to just fail from the notifier based on the return value.
The only catch would be to test it with br_vlan_bridge_event() which
is called on bridge master device events, it should be fine as
br_vlan_find() deals with NULL vlan groups but at least a comment
above it in br.c's notifier would be good so if anyone decides to add
any NETDEVICE_UNREGISTER handling would be warned about it.

Also please add proper fixes tag, the bug seems to be since:
bb900b27a2f4 ("bridge: allow creating bridge devices with netlink")

It actually changed the behaviour, before that the return value of 
br_sysfs_addbr()
was checked and the device got unregistered on failure.

Thanks,
 Nik

Re: [PATCH net-next] bridge: mrp: Implement LC mode for MRP

2020-11-23 Thread Nikolay Aleksandrov

On 23/11/2020 14:31, Horatiu Vultur wrote:
> The 11/23/2020 14:13, Nikolay Aleksandrov wrote:
>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
>> content is safe
>>
>> On 23/11/2020 13:14, Horatiu Vultur wrote:
>>> Extend MRP to support LC mode(link check) for the interconnect port.
>>> This applies only to the interconnect ring.
>>>
>>> Opposite to RC mode(ring check) the LC mode is using CFM frames to
>>> detect when the link goes up or down and based on that the userspace
>>> will need to react.
>>> One advantage of the LC mode over RC mode is that there will be fewer
>>> frames in the normal rings. Because RC mode generates InTest on all
>>> ports while LC mode sends CFM frame only on the interconnect port.
>>>
>>> All 4 nodes part of the interconnect ring needs to have the same mode.
>>> And it is not possible to have running LC and RC mode at the same time
>>> on a node.
>>>
>>> Whenever the MIM starts it needs to detect the status of the other 3
>>> nodes in the interconnect ring so it would send a frame called
>>> InLinkStatus, on which the clients needs to reply with their link
>>> status.
>>>
>>> This patch adds the frame header for the frame InLinkStatus and
>>> extends existing rules on how to forward this frame.
>>>
>>> Signed-off-by: Horatiu Vultur 
>>> ---
>>>  include/uapi/linux/mrp_bridge.h |  7 +++
>>>  net/bridge/br_mrp.c | 18 +++---
>>>  2 files changed, 22 insertions(+), 3 deletions(-)
>>>
>>
>> Hi Horatiu,
>> The patch looks good overall, just one question below.
> 
> Hi Nik,
> 
> Thanks for taking time to review the patch.
> 
>>
>>> diff --git a/include/uapi/linux/mrp_bridge.h 
>>> b/include/uapi/linux/mrp_bridge.h
>>> index 6aeb13ef0b1e..450f6941a5a1 100644
>>> --- a/include/uapi/linux/mrp_bridge.h
>>> +++ b/include/uapi/linux/mrp_bridge.h
>>> @@ -61,6 +61,7 @@ enum br_mrp_tlv_header_type {
>>>   BR_MRP_TLV_HEADER_IN_TOPO = 0x7,
>>>   BR_MRP_TLV_HEADER_IN_LINK_DOWN = 0x8,
>>>   BR_MRP_TLV_HEADER_IN_LINK_UP = 0x9,
>>> + BR_MRP_TLV_HEADER_IN_LINK_STATUS = 0xa,
>>>   BR_MRP_TLV_HEADER_OPTION = 0x7f,
>>>  };
>>>
>>> @@ -156,4 +157,10 @@ struct br_mrp_in_link_hdr {
>>>   __be16 interval;
>>>  };
>>>
>>> +struct br_mrp_in_link_status_hdr {
>>> + __u8 sa[ETH_ALEN];
>>> + __be16 port_role;
>>> + __be16 id;
>>> +};
>>> +
>>
>> I didn't see this struct used anywhere, am I missing anything?
> 
> Yes, you are right, the struct is not used any. But I put it there as I
> put the other frame types for MRP.
> 

I see, we don't usually add unused code. The patch is fine as-is and since
this is already the case for other MRP parts I'm not strictly against it, so:

Acked-by: Nikolay Aleksandrov 

If Jakub decides to adhere to that rule you can keep my acked-by and just remove
the struct for v2.

Thanks,
 Nik

Re: [PATCH net-next] bridge: mrp: Implement LC mode for MRP

2020-11-23 Thread Nikolay Aleksandrov

On 23/11/2020 13:14, Horatiu Vultur wrote:
> Extend MRP to support LC mode(link check) for the interconnect port.
> This applies only to the interconnect ring.
> 
> Opposite to RC mode(ring check) the LC mode is using CFM frames to
> detect when the link goes up or down and based on that the userspace
> will need to react.
> One advantage of the LC mode over RC mode is that there will be fewer
> frames in the normal rings. Because RC mode generates InTest on all
> ports while LC mode sends CFM frame only on the interconnect port.
> 
> All 4 nodes part of the interconnect ring needs to have the same mode.
> And it is not possible to have running LC and RC mode at the same time
> on a node.
> 
> Whenever the MIM starts it needs to detect the status of the other 3
> nodes in the interconnect ring so it would send a frame called
> InLinkStatus, on which the clients needs to reply with their link
> status.
> 
> This patch adds the frame header for the frame InLinkStatus and
> extends existing rules on how to forward this frame.
> 
> Signed-off-by: Horatiu Vultur 
> ---
>  include/uapi/linux/mrp_bridge.h |  7 +++
>  net/bridge/br_mrp.c | 18 +++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 

Hi Horatiu,
The patch looks good overall, just one question below.

> diff --git a/include/uapi/linux/mrp_bridge.h b/include/uapi/linux/mrp_bridge.h
> index 6aeb13ef0b1e..450f6941a5a1 100644
> --- a/include/uapi/linux/mrp_bridge.h
> +++ b/include/uapi/linux/mrp_bridge.h
> @@ -61,6 +61,7 @@ enum br_mrp_tlv_header_type {
>   BR_MRP_TLV_HEADER_IN_TOPO = 0x7,
>   BR_MRP_TLV_HEADER_IN_LINK_DOWN = 0x8,
>   BR_MRP_TLV_HEADER_IN_LINK_UP = 0x9,
> + BR_MRP_TLV_HEADER_IN_LINK_STATUS = 0xa,
>   BR_MRP_TLV_HEADER_OPTION = 0x7f,
>  };
>  
> @@ -156,4 +157,10 @@ struct br_mrp_in_link_hdr {
>   __be16 interval;
>  };
>  
> +struct br_mrp_in_link_status_hdr {
> + __u8 sa[ETH_ALEN];
> + __be16 port_role;
> + __be16 id;
> +};
> +

I didn't see this struct used anywhere, am I missing anything?

Cheers,
 Nik

>  #endif
> diff --git a/net/bridge/br_mrp.c b/net/bridge/br_mrp.c
> index bb12fbf9aaf2..cec2c4e4561d 100644
> --- a/net/bridge/br_mrp.c
> +++ b/net/bridge/br_mrp.c
> @@ -858,7 +858,8 @@ static bool br_mrp_in_frame(struct sk_buff *skb)
>   if (hdr->type == BR_MRP_TLV_HEADER_IN_TEST ||
>   hdr->type == BR_MRP_TLV_HEADER_IN_TOPO ||
>   hdr->type == BR_MRP_TLV_HEADER_IN_LINK_DOWN ||
> - hdr->type == BR_MRP_TLV_HEADER_IN_LINK_UP)
> + hdr->type == BR_MRP_TLV_HEADER_IN_LINK_UP ||
> + hdr->type == BR_MRP_TLV_HEADER_IN_LINK_STATUS)
>   return true;
>  
>   return false;
> @@ -1126,9 +1127,9 @@ static int br_mrp_rcv(struct net_bridge_port *p,
>   goto no_forward;
>   }
>   } else {
> - /* MIM should forward IntLinkChange and
> + /* MIM should forward IntLinkChange/Status and
>* IntTopoChange between ring ports but MIM
> -  * should not forward IntLinkChange and
> +  * should not forward IntLinkChange/Status and
>* IntTopoChange if the frame was received at
>* the interconnect port
>*/
> @@ -1155,6 +1156,17 @@ static int br_mrp_rcv(struct net_bridge_port *p,
>in_type == BR_MRP_TLV_HEADER_IN_LINK_DOWN))
>   goto forward;
>  
> + /* MIC should forward IntLinkStatus frames only to
> +  * interconnect port if it was received on a ring port.
> +  * If it is received on interconnect port then, it
> +  * should be forward on both ring ports
> +  */
> + if (br_mrp_is_ring_port(p_port, s_port, p) &&
> + in_type == BR_MRP_TLV_HEADER_IN_LINK_STATUS) {
> + p_dst = NULL;
> + s_dst = NULL;
> + }
> +
>   /* Should forward the InTopo frames only between the
>* ring ports
>*/
>

Re: [PATCH] printk: ringbuffer: Convert function argument to local variable

2020-11-10 Thread Nikolay Borisov




On 10.11.20 г. 15:14 ч., John Ogness wrote:
> On 2020-11-10, Nikolay Borisov  wrote:
>> data_alloc's 2nd argument is always rb::text_data_ring and that functino
>> always takes a struct printk_ringbuffer. Instead of passing the data
>> ring buffer as an argument simply make it a local variable.
> 
> This is a relic of when we had a second data ring (for
> dictionaries). The patch is a nice cleanup, but there are actually
> several functions that could use this exact same cleanup:
> 
> - data_make_reusable()
> - data_push_tail()
> - data_alloc()
> - data_realloc()
> 
> Perhaps we should fix them all in a single patch?

I observed that right after sending this patch, so I have authored the
necessary changes I can squash them and send them.

> 
> John Ogness
>

[PATCH] printk: ringbuffer: Convert function argument to local variable

2020-11-10 Thread Nikolay Borisov

data_alloc's 2nd argument is always rb::text_data_ring and that functino
always takes a struct printk_ringbuffer. Instead of passing the data
ring buffer as an argument simply make it a local variable.

Signed-off-by: Nikolay Borisov 
---
 kernel/printk/printk_ringbuffer.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/printk_ringbuffer.c 
b/kernel/printk/printk_ringbuffer.c
index 6b1525685277..7f2713e1bbcc 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1021,10 +1021,10 @@ static unsigned long get_next_lpos(struct prb_data_ring 
*data_ring,
  * if necessary. This function also associates the data block with
  * a specified descriptor.
  */
-static char *data_alloc(struct printk_ringbuffer *rb,
-   struct prb_data_ring *data_ring, unsigned int size,
+static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size,
struct prb_data_blk_lpos *blk_lpos, unsigned long id)
 {
+   struct prb_data_ring *data_ring = &rb->text_data_ring;
struct prb_data_block *blk;
unsigned long begin_lpos;
unsigned long next_lpos;
@@ -1397,7 +1397,7 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, 
struct printk_ringbuffer
if (r->text_buf_size > max_size)
goto fail;
 
-   r->text_buf = data_alloc(rb, &rb->text_data_ring, 
r->text_buf_size,
+   r->text_buf = data_alloc(rb, r->text_buf_size,
 &d->text_blk_lpos, id);
} else {
if (!get_data(&rb->text_data_ring, &d->text_blk_lpos, 
&data_size))
@@ -1549,8 +1549,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct 
printk_ringbuffer *rb,
if (info->seq > 0)
desc_make_final(desc_ring, DESC_ID(id - 1));
 
-   r->text_buf = data_alloc(rb, &rb->text_data_ring, r->text_buf_size,
-&d->text_blk_lpos, id);
+   r->text_buf = data_alloc(rb, r->text_buf_size, &d->text_blk_lpos, id);
/* If text data allocation fails, a data-less record is committed. */
if (r->text_buf_size && !r->text_buf) {
prb_commit(e);
-- 
2.25.1

Re: [PATCH] net: bridge: disable multicast while delete bridge

2020-11-03 Thread Nikolay Aleksandrov

On Mon, 2020-11-02 at 22:38 +0800, Menglong Dong wrote:
> From: Menglong Dong 
> 
> This commit seems make no sense, as bridge is destroyed when
> br_multicast_dev_del is called.
> 
> In commit b1b9d366028f
> ("bridge: move bridge multicast cleanup to ndo_uninit"), Xin Long
> fixed the use-after-free panic in br_multicast_group_expired by
> moving br_multicast_dev_del to ndo_uninit. However, that patch is
> not applied to 4.4.X, and the bug exists.
> 
> Fix that bug by disabling multicast in br_multicast_dev_del for
> 4.4.X, and there is no harm for other branches.
> 
> Signed-off-by: Menglong Dong 
> ---
>  net/bridge/br_multicast.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index eae898c3cff7..9992fdff2951 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -3369,6 +3369,7 @@ void br_multicast_dev_del(struct net_bridge *br)
>   hlist_for_each_entry_safe(mp, tmp, &br->mdb_list, mdb_node)
>   br_multicast_del_mdb_entry(mp);
>   hlist_move_list(&br->mcast_gc_list, &deleted_head);
> + br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
>   spin_unlock_bh(&br->multicast_lock);
>  
>   br_multicast_gc(&deleted_head);

This doesn't make any sense. It doesn't fix anything.
If 4.4 has a problem then the relevant patches should get backported to it.
We don't add random changes to fix older releases.

Cheers,
 Nik

Nacked-by: Nikolay Aleksandrov

Re: [PATCH v3 net-next] net: bridge: mcast: add support for raw L2 multicast groups

2020-10-28 Thread Nikolay Aleksandrov

On Wed, 2020-10-28 at 12:54 +0200, Vladimir Oltean wrote:
> From: Nikolay Aleksandrov 
> 
> Extend the bridge multicast control and data path to configure routes
> for L2 (non-IP) multicast groups.
> 
> The uapi struct br_mdb_entry union u is extended with another variant,
> mac_addr, which does not change the structure size, and which is valid
> when the proto field is zero.
> 
> To be compatible with the forwarding code that is already in place,
> which acts as an IGMP/MLD snooping bridge with querier capabilities, we
> need to declare that for L2 MDB entries (for which there exists no such
> thing as IGMP/MLD snooping/querying), that there is always a querier.
> Otherwise, these entries would be flooded to all bridge ports and not
> just to those that are members of the L2 multicast group.
> 
> Needless to say, only permanent L2 multicast groups can be installed on
> a bridge port.
> 
> Signed-off-by: Nikolay Aleksandrov 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> - Removed some noise in the diff.
> 
> Changes in v2:
> - Removed redundant MDB_FLAGS_L2 (we are simply signalling an L2 entry
>   through proto == 0)
> - Moved mac_addr inside union dst of struct br_ip.
> - Validation that L2 multicast address is indeed multicast
> 
>  include/linux/if_bridge.h  |  1 +
>  include/uapi/linux/if_bridge.h |  1 +
>  net/bridge/br_device.c |  2 +-
>  net/bridge/br_input.c  |  2 +-
>  net/bridge/br_mdb.c| 24 ++--
>  net/bridge/br_multicast.c  |  9 +++--
>  net/bridge/br_private.h| 10 --
>  7 files changed, 41 insertions(+), 8 deletions(-)
>
[snip]
> @@ -857,6 +872,11 @@ static int br_mdb_add_group(struct net_bridge *br, 
> struct net_bridge_port *port,
>   return err;
>   }
>  
> + if (entry->state != MDB_PERMANENT && br_group_is_l2(&mp->addr)) {
> + NL_SET_ERR_MSG_MOD(extack, "Only permanent L2 entries allowed");
> + return -EINVAL;
> + }
> +

Sorry, but I didn't notice this earlier. We need to check for this error before
creating the mdb group otherwise we can end up with empty groups that can't be
deleted due to errors. I.e. it must be before the br_multicast_new_group() call.

The rest looks good to me, thanks!

>   /* host join */
>   if (!port) {
>   if (mp->host_joined) {
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> index eae898c3cff7..98de0acb0307 100644
> --- a/net/bridge/br_multicast.c
> +++ b/net/bridge/br_multicast.c
> @@ -179,7 +179,8 @@ struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge 
> *br,

Re: [Bridge] [PATCH net-next v7 01/10] net: bridge: extend the process of special frames

2020-10-27 Thread Nikolay Aleksandrov

On Tue, 2020-10-27 at 07:59 -0700, Stephen Hemminger wrote:
> On Tue, 27 Oct 2020 10:02:42 +
> Henrik Bjoernlund via Bridge  wrote:
> 
> > +/* Return 0 if the frame was not processed otherwise 1
> > + * note: already called with rcu_read_lock
> > + */
> > +static int br_process_frame_type(struct net_bridge_port *p,
> > +struct sk_buff *skb)
> > +{
> > +   struct br_frame_type *tmp;
> > +
> > +   hlist_for_each_entry_rcu(tmp, &p->br->frame_type_list, list)
> > +   if (unlikely(tmp->type == skb->protocol))
> > +   return tmp->frame_handler(p, skb);
> > +
> > +   return 0;
> > +}
> 
> Does the linear search of frame types have noticable impact on performance?
> Hint: maybe a bitmap or something would be faster.

I don't think it's necessary to optimize it so early. There are only 2 possible
types so far (with this set included) if CfM and MRP both are in use, if at some
point it grows we can turn it into a hash or bitmap, at the moment a simple and
easier to maintain solution seems better to me. We could mask the search itself
behind a static key and do it only if a protocol is registered to minimize the
impact further.

Cheers,
 Nik

Re: [RFC PATCH] net: bridge: multicast: add support for L2 entries

2020-10-25 Thread Nikolay Aleksandrov

On Sun, 2020-10-25 at 06:59 +, Vladimir Oltean wrote:
> On Wed, Oct 21, 2020 at 09:17:07AM +0000, Nikolay Aleksandrov wrote:
> > > diff --git a/include/uapi/linux/if_bridge.h 
> > > b/include/uapi/linux/if_bridge.h
> > > index 4c687686aa8f..a25f6f9aa8c3 100644
> > > --- a/include/uapi/linux/if_bridge.h
> > > +++ b/include/uapi/linux/if_bridge.h
> > > @@ -520,12 +520,14 @@ struct br_mdb_entry {
> > >  #define MDB_FLAGS_FAST_LEAVE (1 << 1)
> > >  #define MDB_FLAGS_STAR_EXCL  (1 << 2)
> > >  #define MDB_FLAGS_BLOCKED(1 << 3)
> > > +#define MDB_FLAGS_L2 (1 << 5)
> > 
> > I think this should be 4.
> > 
> 
> Shouldn't this be in sync with MDB_PG_FLAGS_L2 though? We also have
> MDB_PG_FLAGS_BLOCKED which is BIT(4).

Unfortunately they haven't been in sync from the start. MDB_FLAGS bit
0 is offload, while MDB_PG_FLAGS bit 0 is permanent. As you can see
here blocked is bit 3, while internally it's 4 due to the same reason.
We can't afford to skip 1 bit since this is uAPI and we only got 8 
available bits. I wonder if we need these L2 bits at all, why not use
only proto == 0 to denote it's a L2 entry? I can't remember why I added
the bits back then, but until now proto == 0 wasn't allowed and the
kernel couldn't export it as such, so it seems possible to use it.

Re: [RFC PATCH] net: bridge: multicast: add support for L2 entries

2020-10-21 Thread Nikolay Aleksandrov

On Wed, 2020-10-21 at 09:17 +, Nikolay Aleksandrov wrote:
> On Sat, 2020-10-17 at 21:41 +0300, Vladimir Oltean wrote:
> > From: Nikolay Aleksandrov 
> > 
> > Extend the bridge multicast control and data path to configure routes
> > for L2 (non-IP) multicast groups.
> > 
> > The uapi struct br_mdb_entry union u is extended with another variant,
> > interpretation, mac_addr, which does not change the structure size, and
> > which is valid when the MDB_FLAGS_L2 flag is found set.
> > 
> > To be compatible with the forwarding code that is already in place,
> > which acts as an IGMP/MLD snooping bridge with querier capabilities, we
> > need to declare that for L2 MDB entries (for which there exists no such
> > thing as IGMP/MLD snooping/querying), that there is always a querier.
> > Otherwise, these entries would be flooded to all bridge ports and not
> > just to those that are members of the L2 multicast group.
> > 
> > Needless to say, only permanent L2 multicast groups can be installed on
> > a bridge port.
> > 
> > Signed-off-by: Nikolay Aleksandrov 
> > Signed-off-by: Vladimir Oltean 
> > ---
> > This patch is adapted from the version that Nikolay posted here:
> > https://lore.kernel.org/netdev/20200708090454.zvb6o7jr2woirw3i@skbuf/
> > 
> > There, he marked the patch as "unfinished". I haven't made any major
> > modifications to it, but I've tested it and it appears to work ok,
> > including with offloading. Hence, I would appreciate some tips regarding
> > things that might be missing.
> > 
> 
> Hi,
> I almost missed this one, thank you for fixing it up. I was wondering if we
> can move br_ip's mac_addr in the "dst" union to save some space and reduce
> ops when matching, since we're also matching on the protocol field. In general
> do we need the ->l2 field at all, can we use proto == 0 ? In order to make it
> more readable it can be in a helper with a descriptive name so we don't wonder
> what proto == 0 meant later. A few more minor comments below.
> 

Oh, one more thing, I don't think we validate that the dst mac that's being
added is actually a multicast one.

> >  include/linux/if_bridge.h  |  1 +
> >  include/uapi/linux/if_bridge.h |  2 ++
> >  net/bridge/br_device.c |  2 +-
> >  net/bridge/br_input.c  |  2 +-
> >  net/bridge/br_mdb.c| 24 
> >  net/bridge/br_multicast.c  | 12 ++--
> >  net/bridge/br_private.h|  7 +--
> >  7 files changed, 40 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> > index 556caed00258..b135ad714383 100644
> > --- a/include/linux/if_bridge.h
> > +++ b/include/linux/if_bridge.h
> > @@ -26,6 +26,7 @@ struct br_ip {
> > struct in6_addr ip6;
> >  #endif
> > } dst;
> > +   unsigned char   mac_addr[ETH_ALEN];
> > __be16  proto;
> > __u16   vid;
> >  };
> > diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> > index 4c687686aa8f..a25f6f9aa8c3 100644
> > --- a/include/uapi/linux/if_bridge.h
> > +++ b/include/uapi/linux/if_bridge.h
> > @@ -520,12 +520,14 @@ struct br_mdb_entry {
> >  #define MDB_FLAGS_FAST_LEAVE   (1 << 1)
> >  #define MDB_FLAGS_STAR_EXCL(1 << 2)
> >  #define MDB_FLAGS_BLOCKED  (1 << 3)
> > +#define MDB_FLAGS_L2   (1 << 5)
> 
> I think this should be 4.
> 
> > __u8 flags;
> > __u16 vid;
> > struct {
> > union {
> > __be32  ip4;
> > struct in6_addr ip6;
> > +   unsigned char mac_addr[ETH_ALEN];
> > } u;
> > __be16  proto;
> > } addr;
> > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> > index 6f742fee874a..06c28753b911 100644
> > --- a/net/bridge/br_device.c
> > +++ b/net/bridge/br_device.c
> > @@ -93,7 +93,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct 
> > net_device *dev)
> >  
> > mdst = br_mdb_get(br, skb, vid);
> > if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> > -   br_multicast_querier_exists(br, eth_hdr(skb)))
> > +   br_multicast_querier_exists(br, eth_hdr(skb), mdst))
> > br_multicast_flood(mdst, skb, false, true);
> > else
> > br_flood(br, skb, BR_PKT_MULTIC

Re: [RFC PATCH] net: bridge: multicast: add support for L2 entries

2020-10-21 Thread Nikolay Aleksandrov

On Sat, 2020-10-17 at 21:41 +0300, Vladimir Oltean wrote:
> From: Nikolay Aleksandrov 
> 
> Extend the bridge multicast control and data path to configure routes
> for L2 (non-IP) multicast groups.
> 
> The uapi struct br_mdb_entry union u is extended with another variant,
> interpretation, mac_addr, which does not change the structure size, and
> which is valid when the MDB_FLAGS_L2 flag is found set.
> 
> To be compatible with the forwarding code that is already in place,
> which acts as an IGMP/MLD snooping bridge with querier capabilities, we
> need to declare that for L2 MDB entries (for which there exists no such
> thing as IGMP/MLD snooping/querying), that there is always a querier.
> Otherwise, these entries would be flooded to all bridge ports and not
> just to those that are members of the L2 multicast group.
> 
> Needless to say, only permanent L2 multicast groups can be installed on
> a bridge port.
> 
> Signed-off-by: Nikolay Aleksandrov 
> Signed-off-by: Vladimir Oltean 
> ---
> This patch is adapted from the version that Nikolay posted here:
> https://lore.kernel.org/netdev/20200708090454.zvb6o7jr2woirw3i@skbuf/
> 
> There, he marked the patch as "unfinished". I haven't made any major
> modifications to it, but I've tested it and it appears to work ok,
> including with offloading. Hence, I would appreciate some tips regarding
> things that might be missing.
> 

Hi,
I almost missed this one, thank you for fixing it up. I was wondering if we
can move br_ip's mac_addr in the "dst" union to save some space and reduce
ops when matching, since we're also matching on the protocol field. In general
do we need the ->l2 field at all, can we use proto == 0 ? In order to make it
more readable it can be in a helper with a descriptive name so we don't wonder
what proto == 0 meant later. A few more minor comments below.

>  include/linux/if_bridge.h  |  1 +
>  include/uapi/linux/if_bridge.h |  2 ++
>  net/bridge/br_device.c |  2 +-
>  net/bridge/br_input.c  |  2 +-
>  net/bridge/br_mdb.c| 24 
>  net/bridge/br_multicast.c  | 12 ++--
>  net/bridge/br_private.h|  7 +--
>  7 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
> index 556caed00258..b135ad714383 100644
> --- a/include/linux/if_bridge.h
> +++ b/include/linux/if_bridge.h
> @@ -26,6 +26,7 @@ struct br_ip {
>   struct in6_addr ip6;
>  #endif
>   } dst;
> + unsigned char   mac_addr[ETH_ALEN];
>   __be16  proto;
>   __u16   vid;
>  };
> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
> index 4c687686aa8f..a25f6f9aa8c3 100644
> --- a/include/uapi/linux/if_bridge.h
> +++ b/include/uapi/linux/if_bridge.h
> @@ -520,12 +520,14 @@ struct br_mdb_entry {
>  #define MDB_FLAGS_FAST_LEAVE (1 << 1)
>  #define MDB_FLAGS_STAR_EXCL  (1 << 2)
>  #define MDB_FLAGS_BLOCKED(1 << 3)
> +#define MDB_FLAGS_L2 (1 << 5)

I think this should be 4.

>   __u8 flags;
>   __u16 vid;
>   struct {
>   union {
>   __be32  ip4;
>   struct in6_addr ip6;
> + unsigned char mac_addr[ETH_ALEN];
>   } u;
>   __be16  proto;
>   } addr;
> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
> index 6f742fee874a..06c28753b911 100644
> --- a/net/bridge/br_device.c
> +++ b/net/bridge/br_device.c
> @@ -93,7 +93,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct 
> net_device *dev)
>  
>   mdst = br_mdb_get(br, skb, vid);
>   if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> - br_multicast_querier_exists(br, eth_hdr(skb)))
> + br_multicast_querier_exists(br, eth_hdr(skb), mdst))
>   br_multicast_flood(mdst, skb, false, true);
>   else
>   br_flood(br, skb, BR_PKT_MULTICAST, false, true);
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 59a318b9f646..d31b5c18c6a1 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -134,7 +134,7 @@ int br_handle_frame_finish(struct net *net, struct sock 
> *sk, struct sk_buff *skb
>   case BR_PKT_MULTICAST:
>   mdst = br_mdb_get(br, skb, vid);
>   if ((mdst || BR_INPUT_SKB_CB_MROUTERS_ONLY(skb)) &&
> - br_multicast_querier_exists(br, eth_hdr(skb))) {
> + br_multicast_querier_exists(br, eth_hdr(skb), mdst)) {
>

Re: [RFC PATCH] net: bridge: call br_multicast_del_port before the port leaves

2020-10-16 Thread Nikolay Aleksandrov

On Thu, 2020-10-15 at 20:33 +0300, Vladimir Oltean wrote:
> Switchdev drivers often have different VLAN semantics than the bridge.
> For example, consider this:
> 
> ip link add br0 type bridge
> ip link set swp0 master br0
> bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent
> ip link del br0
> [   26.085816] mscc_felix :00:00.5 swp0: failed (err=-2) to del object 
> (id=2)
> 
> This is because the mscc_ocelot driver, when VLAN awareness is disabled,
> classifies all traffic to the port-based VLAN (pvid). The pvid is 0 when
> the port is standalone, and it is inherited from the bridge default pvid
> (which is 1 by default, but it may take other values) when it joins the
> VLAN-unaware bridge, and then the pvid resets to 0 when the port leaves
> the bridge again.
> 
> Now because the mscc_ocelot switch classifies all traffic to its private
> pvid, it needs to translate between the vid that the mdb comes with, and
> the vid that will actually be programmed into hardware. The bridge uses
> the vid of 0 in VLAN-unaware mode, while the hardware uses the pvid
> inherited from the bridge, that's the difference.
> 
> So what will happen is:
> 
> Step 1 (addition):
> br_mdb_notify(RTM_NEWMDB)
> -> ocelot_port_mdb_add(mdb->addr=01:02:03:04:05:06, mdb->vid=0)
>-> mdb->vid is remapped from 0 to 1 and installed into ocelot->multicast
> 
> Step 2 (removal):
> del_nbp
> -> netdev_upper_dev_unlink(dev, br->dev)
>-> ocelot_port_bridge_leave
>   -> ocelot_port_set_pvid(ocelot, port, 0)
> -> br_multicast_del_port is called and the switchdev notifier is
>deferred for some time later
>-> ocelot_port_mdb_del(mdb->addr=01:02:03:04:05:06, mdb->vid=0)
>   -> mdb->vid is remapped from 0 to 0, the port pvid (!!!)
>   -> the remapped mdb (addr=01:02:03:04:05:06, vid=0) is not found
>  inside the ocelot->multicast list, and -ENOENT is returned
> 
> So the problem is that mscc_ocelot assumes that the port is removed
> _after_ the multicast entries have been deleted. And this is not an
> unreasonable assumption, presumably it isn't the only switchdev that
> needs to remap the vid. So we can reorder the teardown path in order
> for that assumption to hold true.
> 
> Since br_mdb_notify() issues a SWITCHDEV_F_DEFER operation, we must move
> the call not only before netdev_upper_dev_unlink(), but in fact before
> switchdev_deferred_process().
> 
> Signed-off-by: Vladimir Oltean 
> ---
>  net/bridge/br_if.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 

It can potentially use after free, multicast resources (per-cpu stats) are freed
in br_multicast_del_port() and can be used due to a race with port state
sync on other CPUs since the handler can still process packets. That has a
chance of happening if vlans are not used.

Interesting that br_stp_disable_port() calls br_multicast_disable_port() which
flushes all non-permanent mdb entries, so I'm guessing you have problem only
with permanent ones? Perhaps we can flush them all before. Either by passing an
argument to br_stp_disable_port() that we're deleting the port which will be
passed down to br_multicast_disable_port() or by calling an additional helper to
flush all which can be re-used by both disable_port() and stop_multicast()
calls. Adding an argument to br_stp_disable_port() to be passed down sounds
cleaner to me. What do you think?

Cheers,
 Nik

Re: [PATCH net-next v5 09/10] bridge: cfm: Netlink GET status Interface.

2020-10-14 Thread Nikolay Aleksandrov

On Mon, 2020-10-12 at 14:04 +, Henrik Bjoernlund wrote:
> This is the implementation of CFM netlink status
> get information interface.
> 
> Add new nested netlink attributes. These attributes are used by the
> user space to get status information.
> 
> GETLINK:
> Request filter RTEXT_FILTER_CFM_STATUS:
> Indicating that CFM status information must be delivered.
> 
> IFLA_BRIDGE_CFM:
> Points to the CFM information.
> 
> IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
> This indicate that the MEP instance status are following.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
> This indicate that the peer MEP status are following.
> 
> CFM nested attribute has the following attributes in next level.
> 
> GETLINK RTEXT_FILTER_CFM_STATUS:
> IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected Opcode.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected version.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
> The MEP instance received CCM PDU with MD level lower than
> configured level. This frame is discarded.
> The type is u32 (bool).
> 
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
> The added Peer MEP ID of the delivered status.
> The type is u32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
> The CCM defect status.
> The type is u32 (bool).
> True means no CCM frame is received for 3.25 intervals.
> IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
> The last received CCM PDU RDI.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
> The last received CCM PDU Port Status TLV value field.
> The type is u8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
> The last received CCM PDU Interface Status TLV value field.
> The type is u8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
> A CCM frame has been received from Peer MEP.
> The type is u32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
> A CCM frame with TLV has been received from Peer MEP.
> The type is u32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
> A CCM frame with unexpected sequence number has been received
> from Peer MEP.
> The type is u32 (bool).
> When a sequence number is not one higher than previously received
> then it is unexpected.
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  include/uapi/linux/if_bridge.h |  29 +
>  include/uapi/linux/rtnetlink.h |   1 +
>  net/bridge/br_cfm_netlink.c| 105 +
>  net/bridge/br_netlink.c|  16 -
>  net/bridge/br_private.h|   6 ++
>  5 files changed, 154 insertions(+), 3 deletions(-)
> 
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH net-next v5 08/10] bridge: cfm: Netlink GET configuration Interface.

2020-10-14 Thread Nikolay Aleksandrov

crement) of sequence
> number is enabled or disabled.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
> The period of time where CCM frame are transmitted.
> The type is u32.
> The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
> must be done before timeout to keep transmission alive.
> When period is zero any ongoing CCM frame transmission
> will be stopped.
> IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
> The transmitted CCM frame update with Interface Status TLV
> is enabled or disabled.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
> The transmitted Interface Status TLV value field.
> The type is u8.
> IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
> The transmitted CCM frame update with Port Status TLV is enabled
> or disabled.
> The type is u32 (bool).
> IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
> The transmitted Port Status TLV value field.
> The type is u8.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  include/uapi/linux/if_bridge.h |   6 ++
>  net/bridge/br_cfm_netlink.c| 161 +
>  net/bridge/br_netlink.c|  29 +-
>  net/bridge/br_private.h|   6 ++
>  4 files changed, 200 insertions(+), 2 deletions(-)
> 

Acked-by: Nikolay Aleksandrov

Re: [PATCH net-next v5 01/10] net: bridge: extend the process of special frames

2020-10-14 Thread Nikolay Aleksandrov

On Mon, 2020-10-12 at 14:04 +, Henrik Bjoernlund wrote:
> This patch extends the processing of frames in the bridge. Currently MRP
> frames needs special processing and the current implementation doesn't
> allow a nice way to process different frame types. Therefore try to
> improve this by adding a list that contains frame types that need
> special processing. This list is iterated for each input frame and if
> there is a match based on frame type then these functions will be called
> and decide what to do with the frame. It can process the frame then the
> bridge doesn't need to do anything or don't process so then the bridge
> will do normal forwarding.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  net/bridge/br_device.c  |  1 +
>  net/bridge/br_input.c   | 33 -
>  net/bridge/br_mrp.c | 19 +++
>  net/bridge/br_private.h | 19 ---
>  4 files changed, 60 insertions(+), 12 deletions(-)
> 

Looks good.
Acked-by: Nikolay Aleksandrov

Re: [PATCH net-next v4 01/10] net: bridge: extend the process of special frames

2020-10-12 Thread Nikolay Aleksandrov

On Fri, 2020-10-09 at 14:35 +, Henrik Bjoernlund wrote:
> This patch extends the processing of frames in the bridge. Currently MRP
> frames needs special processing and the current implementation doesn't
> allow a nice way to process different frame types. Therefore try to
> improve this by adding a list that contains frame types that need
> special processing. This list is iterated for each input frame and if
> there is a match based on frame type then these functions will be called
> and decide what to do with the frame. It can process the frame then the
> bridge doesn't need to do anything or don't process so then the bridge
> will do normal forwarding.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  net/bridge/br_device.c  |  1 +
>  net/bridge/br_input.c   | 33 -
>  net/bridge/br_mrp.c | 19 +++
>  net/bridge/br_private.h | 18 --
>  4 files changed, 60 insertions(+), 11 deletions(-)
> 
[snip]
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 345118e35c42..3e62ce2ef8a5 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -480,6 +480,8 @@ struct net_bridge {
>  #endif
>   struct hlist_head   fdb_list;
>  
> + struct hlist_head   frame_type_list;

Since there will be a v5, I'd suggest to move this struct member in the first
cache line as it will be always used in the bridge fast-path for all cases.
In order to make room for it there you can move port_list after fdb_hash_tbl and
add this in its place, port_list is currently used only when flooding and soon
I'll even change that.

Thanks,
 Nik

> +
>  #if IS_ENABLED(CONFIG_BRIDGE_MRP)
>   struct list_headmrp_list;
>  #endif
> @@ -755,6 +757,16 @@ int nbp_backup_change(struct net_bridge_port *p, struct 
> net_device *backup_dev);
>  int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff 
> *skb);
>  rx_handler_func_t *br_get_rx_handler(const struct net_device *dev);
>  
> +struct br_frame_type {
> + __be16  type;
> + int (*frame_handler)(struct net_bridge_port *port,
> +  struct sk_buff *skb);
> + struct hlist_node   list;
> +};
> +
> +void br_add_frame(struct net_bridge *br, struct br_frame_type *ft);
> +void br_del_frame(struct net_bridge *br, struct br_frame_type *ft);
> +
>  static inline bool br_rx_handler_check_rcu(const struct net_device *dev)
>  {
>   return rcu_dereference(dev->rx_handler) == br_get_rx_handler(dev);
> @@ -1417,7 +1429,6 @@ extern int (*br_fdb_test_addr_hook)(struct net_device 
> *dev, unsigned char *addr)
>  #if IS_ENABLED(CONFIG_BRIDGE_MRP)
>  int br_mrp_parse(struct net_bridge *br, struct net_bridge_port *p,
>struct nlattr *attr, int cmd, struct netlink_ext_ack *extack);
> -int br_mrp_process(struct net_bridge_port *p, struct sk_buff *skb);
>  bool br_mrp_enabled(struct net_bridge *br);
>  void br_mrp_port_del(struct net_bridge *br, struct net_bridge_port *p);
>  int br_mrp_fill_info(struct sk_buff *skb, struct net_bridge *br);
> @@ -1429,11 +1440,6 @@ static inline int br_mrp_parse(struct net_bridge *br, 
> struct net_bridge_port *p,
>   return -EOPNOTSUPP;
>  }
>  
> -static inline int br_mrp_process(struct net_bridge_port *p, struct sk_buff 
> *skb)
> -{
> - return 0;
> -}
> -
>  static inline bool br_mrp_enabled(struct net_bridge *br)
>  {
>   return false;

Re: [PATCH net-next v4 10/10] bridge: cfm: Netlink Notifications.

2020-10-09 Thread Nikolay Aleksandrov

On Fri, 2020-10-09 at 14:35 +, Henrik Bjoernlund wrote:
> This is the implementation of Netlink notifications out of CFM.
> 
> Notifications are initiated whenever a state change happens in CFM.
> 
> IFLA_BRIDGE_CFM:
> Points to the CFM information.
> 
> IFLA_BRIDGE_CFM_MEP_STATUS_INFO:
> This indicate that the MEP instance status are following.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO:
> This indicate that the peer MEP status are following.
> 
> CFM nested attribute has the following attributes in next level.
> 
> IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is NLA_U32.
> IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected Opcode.
> The type is NLA_U32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN:
> The MEP instance received CFM PDU with unexpected version.
> The type is NLA_U32 (bool).
> IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN:
> The MEP instance received CCM PDU with MD level lower than
> configured level. This frame is discarded.
> The type is NLA_U32 (bool).
> 
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE:
> The MEP instance number of the delivered status.
> The type is NLA_U32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID:
> The added Peer MEP ID of the delivered status.
> The type is NLA_U32.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT:
> The CCM defect status.
> The type is NLA_U32 (bool).
> True means no CCM frame is received for 3.25 intervals.
> IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI:
> The last received CCM PDU RDI.
> The type is NLA_U32 (bool).
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE:
> The last received CCM PDU Port Status TLV value field.
> The type is NLA_U8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE:
> The last received CCM PDU Interface Status TLV value field.
> The type is NLA_U8.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN:
> A CCM frame has been received from Peer MEP.
> The type is NLA_U32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN:
> A CCM frame with TLV has been received from Peer MEP.
> The type is NLA_U32 (bool).
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN:
> A CCM frame with unexpected sequence number has been received
> from Peer MEP.
> The type is NLA_U32 (bool).
> When a sequence number is not one higher than previously received
> then it is unexpected.
> This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO.
> 
> Signed-off-by: Henrik Bjoernlund  
> Reviewed-by: Horatiu Vultur  
> ---
>  net/bridge/br_cfm.c | 48 
>  net/bridge/br_cfm_netlink.c | 25 ++++-
>  net/bridge/br_netlink.c | 73 -
>  net/bridge/br_private.h | 22 ++-
>  4 files changed, 147 insertions(+), 21 deletions(-)
> 

Acked-by: Nikolay Aleksandrov

1 2 3 4 5 6 7 >

1 - 100 of 653 matches

Mail list logo