from:"Ido Schimmel"

Re: [PATCH net v2] selftests: forwarding: Avoid false MDB delete/flush failures

2024-09-19 Thread Ido Schimmel

Hi,

Thanks for the patch and sorry for the late reply (was OOO).

On Mon, Sep 16, 2024 at 07:49:05PM +1000, Jamie Bainbridge wrote:
> Running this test on a small system produces different failures every
> test checking deletions, and some flushes. From different test runs:
> 
> TEST: Common host entries configuration tests (L2)[FAIL]
>   Failed to delete L2 host entry
> 
> TEST: Common port group entries configuration tests (IPv4 (S, G)) [FAIL]
>   IPv4 (S, G) entry with VLAN 10 not deleted when VLAN was not specified
> 
> TEST: Common port group entries configuration tests (IPv6 (*, G)) [FAIL]
>   IPv6 (*, G) entry with VLAN 10 not deleted when VLAN was not specified
> 
> TEST: Flush tests [FAIL]
>   Entry not flushed by specified VLAN ID
> 
> TEST: Flush tests [FAIL]
>   IPv6 host entry not flushed by "nopermanent" state
> 
> Add a short sleep after deletion and flush to resolve this.

The port group entry is removed from MDB entry's list synchronously, but
the MDB entry itself is removed from the hash table asynchronously and
the MDB get query will only return an error if an entry was not found
there.

IOW, I think that when you do get a response after deletion, the entry
you get is empty.

Can you please test the following patch [1] (w/o yours, obviously)?

[1]
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index bc37e47ad829..1a52a0bca086 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1674,7 +1674,7 @@ int br_mdb_get(struct net_device *dev, struct nlattr 
*tb[], u32 portid, u32 seq,
spin_lock_bh(&br->multicast_lock);
 
mp = br_mdb_ip_get(br, &group);
-   if (!mp) {
+   if (!mp || (!mp->ports && !mp->host_joined)) {
NL_SET_ERR_MSG_MOD(extack, "MDB entry not found");
err = -ENOENT;
goto unlock;

Re: [PATCH][next] mlxsw: spectrum_router: remove redundant initialization of variable force

2021-03-29 Thread Ido Schimmel

On Sat, Mar 27, 2021 at 10:33:34PM +, Colin King wrote:
> From: Colin Ian King 
> 
> The variable force is being initialized with a value that is
> never read and it is being updated later with a new value. The
> initialization is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> index 6ccaa194733b..ff240e3c29f7 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> @@ -5059,7 +5059,7 @@ mlxsw_sp_nexthop_obj_bucket_adj_update(struct mlxsw_sp 
> *mlxsw_sp,
>  {
>   u16 bucket_index = info->nh_res_bucket->bucket_index;
>   struct netlink_ext_ack *extack = info->extack;
> - bool force = info->nh_res_bucket->force;
> + bool force;

Actually, there is a bug to be fixed here:

```
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 6ccaa194733b..41259c0004d1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -5068,8 +5068,9 @@ mlxsw_sp_nexthop_obj_bucket_adj_update(struct mlxsw_sp 
*mlxsw_sp,
/* No point in trying an atomic replacement if the idle timer interval
 * is smaller than the interval in which we query and clear activity.
 */
-   force = info->nh_res_bucket->idle_timer_ms <
-   MLXSW_SP_NH_GRP_ACTIVITY_UPDATE_INTERVAL;
+   if (!force && info->nh_res_bucket->idle_timer_ms <
+   MLXSW_SP_NH_GRP_ACTIVITY_UPDATE_INTERVAL)
+   force = true;
 
adj_index = nh->nhgi->adj_index + bucket_index;
err = mlxsw_sp_nexthop_update(mlxsw_sp, adj_index, nh, force, ratr_pl);
```

We should only fallback to a non-atomic replacement when the current
replacement is atomic and the idle timer is too short.

We currently ignore the value of 'force'. This means that a non-atomic
replacement ('force' is true) can be made atomic if idle timer is larger
than 1 second.

Colin, do you mind if I submit it formally as a fix later this week? I
want to run it through our usual process. Will mention you in
Reported-by, obviously.

>   char ratr_pl_new[MLXSW_REG_RATR_LEN];
>   char ratr_pl[MLXSW_REG_RATR_LEN];
>   u32 adj_index;
> -- 
> 2.30.2
>

Re: [PATCH v3 1/5] thermal/drivers/core: Use a char pointer for the cooling device name

2021-03-14 Thread Ido Schimmel

On Sun, Mar 14, 2021 at 12:13:29PM +0100, Daniel Lezcano wrote:
> We want to have any kind of name for the cooling devices as we do no
> longer want to rely on auto-numbering. Let's replace the cooling
> device's fixed array by a char pointer to be allocated dynamically
> when registering the cooling device, so we don't limit the length of
> the name.
> 
> Rework the error path at the same time as we have to rollback the
> allocations in case of error.
> 
> Tested with a dummy device having the name:
>  "Llanfairpwllgwyngyllgogerychwyrndrobwantysiliogogogoch"
> 
> A village on the island of Anglesey (Wales), known to have the longest
> name in Europe.
> 
> Signed-off-by: Daniel Lezcano 
> Reviewed-by: Lukasz Luba 

Tested-by: Ido Schimmel

Re: [PATCH v2 1/5] thermal/drivers/core: Use a char pointer for the cooling device name

2021-03-14 Thread Ido Schimmel

On Fri, Mar 12, 2021 at 06:03:12PM +0100, Daniel Lezcano wrote:
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 996c038f83a4..9ef8090eb645 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -960,10 +960,7 @@ __thermal_cooling_device_register(struct device_node *np,
>  {
>   struct thermal_cooling_device *cdev;
>   struct thermal_zone_device *pos = NULL;
> - int result;
> -
> - if (type && strlen(type) >= THERMAL_NAME_LENGTH)
> - return ERR_PTR(-EINVAL);
> + int ret;
>  
>   if (!ops || !ops->get_max_state || !ops->get_cur_state ||
>   !ops->set_cur_state)
> @@ -973,14 +970,17 @@ __thermal_cooling_device_register(struct device_node 
> *np,
>   if (!cdev)
>   return ERR_PTR(-ENOMEM);
>  
> - result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> - if (result < 0) {
> - kfree(cdev);
> - return ERR_PTR(result);
> + ret = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
> + if (ret < 0)
> + goto out_kfree_cdev;
> + cdev->id = ret;
> +
> + cdev->type = kstrdup(type ? type : "", GFP_KERNEL);
> + if (!cdev->type) {
> + ret = -ENOMEM;
> + goto out_ida_remove;
>   }
>  
> - cdev->id = result;
> - strlcpy(cdev->type, type ? : "", sizeof(cdev->type));
>   mutex_init(&cdev->lock);
>   INIT_LIST_HEAD(&cdev->thermal_instances);
>   cdev->np = np;
> @@ -990,12 +990,9 @@ __thermal_cooling_device_register(struct device_node *np,
>   cdev->devdata = devdata;
>   thermal_cooling_device_setup_sysfs(cdev);
>   dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
> - result = device_register(&cdev->device);
> - if (result) {
> - ida_simple_remove(&thermal_cdev_ida, cdev->id);
> - put_device(&cdev->device);
> - return ERR_PTR(result);
> - }
> + ret = device_register(&cdev->device);
> + if (ret)
> + goto out_kfree_type;
>  
>   /* Add 'this' new cdev to the global cdev list */
>   mutex_lock(&thermal_list_lock);
> @@ -1013,6 +1010,14 @@ __thermal_cooling_device_register(struct device_node 
> *np,
>   mutex_unlock(&thermal_list_lock);
>  
>   return cdev;
> +
> +out_kfree_type:
> + kfree(cdev->type);
> + put_device(&cdev->device);
> +out_ida_remove:
> + ida_simple_remove(&thermal_cdev_ida, cdev->id);
> +out_kfree_cdev:
> + return ERR_PTR(ret);
>  }
>  
>  /**
> @@ -1172,6 +1177,7 @@ void thermal_cooling_device_unregister(struct 
> thermal_cooling_device *cdev)
>   device_del(&cdev->device);
>   thermal_cooling_device_destroy_sysfs(cdev);
>   put_device(&cdev->device);
> + kfree(cdev->type);
>  }
>  EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

I'm getting the following user-after-free with this patch [1]. Fixed by:

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9ef8090eb645..c8d4010940ef 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1176,8 +1176,8 @@ void thermal_cooling_device_unregister(struct 
thermal_cooling_device *cdev)
ida_simple_remove(&thermal_cdev_ida, cdev->id);
device_del(&cdev->device);
thermal_cooling_device_destroy_sysfs(cdev);
-   put_device(&cdev->device);
kfree(cdev->type);
+   put_device(&cdev->device);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);

[1]
[  148.601815] 
==
[  148.610260] BUG: KASAN: use-after-free in 
thermal_cooling_device_unregister+0x6ca/0x6e0
[  148.619304] Read of size 8 at addr 8881510a0808 by task devlink/574
[  148.626768]
[  148.628477] CPU: 2 PID: 574 Comm: devlink Not tainted 
5.12.0-rc2-custom-12525-g7ba8a2feee15 #3301
[  148.638463] Hardware name: Mellanox Technologies Ltd. 
MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  148.648625] Call Trace:
[  148.651408]  dump_stack+0xfa/0x151
[  148.661701]  print_address_description.constprop.0+0x18/0x130
[  148.681014]  kasan_report.cold+0x7f/0x111
[  148.692003]  thermal_cooling_device_unregister+0x6ca/0x6e0
[  148.703984]  mlxsw_thermal_fini+0xd2/0x1f0
[  148.708664]  mlxsw_core_bus_device_unregister+0x158/0x8d0
[  148.714794]  mlxsw_devlink_core_bus_device_reload_down+0x93/0xc0
[  148.721594]  devlink_reload+0x15f/0x5e0
[  148.749669]  devlink_nl_cmd_reload+0x7fc/0x1210
[  148.775992]  genl_family_rcv_msg_doit+0x22a/0x320
[  148.799789]  genl_rcv_msg+0x341/0x5a0
[  148.818789]  netlink_rcv_skb+0x14d/0x430
[  148.836450]  genl_rcv+0x29/0x40
[  148.840034]  netlink_unicast+0x539/0x7e0
[  148.859219]  netlink_sendmsg+0x8d7/0xe10
[  148.879271]  __sys_sendto+0x23f/0x350
[  148.904178]  __x64_sys_sendto+0xe2/0x1b0
[  148.919297]  do_syscall_64+0x2d/0x40
[  148.923365]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  148.929081] RIP: 0033:0x7f17c0dbaefa
[  1

Re: [PATCH] netdevsim: fib: Remove redundant code

2021-03-10 Thread Ido Schimmel

On Wed, Mar 10, 2021 at 10:35:27AM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return
> "0" on line 889.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/net/netdevsim/fib.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
> index 46fb414..db794f9 100644
> --- a/drivers/net/netdevsim/fib.c
> +++ b/drivers/net/netdevsim/fib.c
> @@ -871,8 +871,6 @@ static int nsim_fib6_event(struct nsim_fib_data *data,
>  
>  static int nsim_fib_event(struct nsim_fib_event *fib_event)

Can you change to 'static void' ?

>  {
> - int err = 0;
> -
>   switch (fib_event->family) {
>   case AF_INET:
>   nsim_fib4_event(fib_event->data, &fib_event->fen_info,
> @@ -886,7 +884,7 @@ static int nsim_fib_event(struct nsim_fib_event 
> *fib_event)
>   break;
>   }
>  
> - return err;
> + return 0;
>  }
>  
>  static int nsim_fib4_prepare_event(struct fib_notifier_info *info,
> -- 
> 1.8.3.1
>

Re: [syzbot] WARNING: ODEBUG bug in net_dm_cmd_trace

2021-03-08 Thread Ido Schimmel

On Sun, Mar 07, 2021 at 06:30:27PM -0800, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:d310ec03 Merge tag 'perf-core-2021-02-17' of git://git.ker..
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=108adb32d0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2b8307379601586a
> dashboard link: https://syzkaller.appspot.com/bug?extid=779559d6503f3a56213d
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=12ad095cd0
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+779559d6503f3a562...@syzkaller.appspotmail.com
> 
> [ cut here ]
> ODEBUG: init active (active state 0) object type: timer_list hint: 
> sched_send_work+0x0/0x60 include/linux/list.h:135

Will check. I think the problem is in the error path that does not
deactivate the timer

> WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 
> debug_print_object+0x16e/0x250 lib/debugobjects.c:505
> Modules linked in:
> CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
> Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 40 
> 4c bf 89 4c 89 ee 48 c7 c7 40 40 bf 89 e8 64 79 fa 04 <0f> 0b 83 05 15 e0 ff 
> 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
> RSP: 0018:c900021df438 EFLAGS: 00010286
> RAX:  RBX: 0003 RCX: 
> RDX: 888020cd1bc0 RSI: 815b4c85 RDI: f5200043be79
> RBP: 0001 R08:  R09: 
> R10: 815ade9e R11:  R12: 896d8ea0
> R13: 89bf4540 R14: 8161d660 R15: 900042b0
> FS:  7f73bb69e700() GS:8880b9c0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f7a4370f470 CR3: 1334a000 CR4: 001506f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  __debug_object_init+0x524/0xd10 lib/debugobjects.c:588
>  debug_timer_init kernel/time/timer.c:722 [inline]
>  debug_init kernel/time/timer.c:770 [inline]
>  init_timer_key+0x2d/0x340 kernel/time/timer.c:814
>  net_dm_trace_on_set net/core/drop_monitor.c: [inline]
>  set_all_monitor_traces net/core/drop_monitor.c:1188 [inline]
>  net_dm_monitor_start net/core/drop_monitor.c:1295 [inline]
>  net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339
>  genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739
>  genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
>  genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800
>  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
>  genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
>  netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
>  netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
>  sock_sendmsg_nosec net/socket.c:652 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:672
>  sys_sendmsg+0x6e8/0x810 net/socket.c:2348
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2402
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435
>  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x465ef9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f73bb69e188 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 0056bf60 RCX: 00465ef9
> RDX: 0800 RSI: 2500 RDI: 0005
> RBP: 7f73bb69e1d0 R08:  R09: 
> R10:  R11: 0246 R12: 0002
> R13: 7ffdb1c8cb0f R14: 7f73bb69e300 R15: 00022000
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

Re: [PATCH] net: mellanox: mlxsw: fix error return code of mlxsw_sp_router_nve_promote_decap()

2021-03-07 Thread Ido Schimmel

On Sat, Mar 06, 2021 at 03:32:39PM +0100, Heiner Kallweit wrote:
> On 06.03.2021 15:07, Jia-Ju Bai wrote:
> > When fib_entry is NULL, no error return code of
> > mlxsw_sp_router_nve_promote_decap() is assigned.
> > To fix this bug, err is assigned with -EINVAL in this case.
> > 
> Again, are you sure this is a bug? To me it looks like it is
> intentional to not return an error code if fib_entry is NULL.
> Please don't blindly trust the robot results, there may
> always be false positives.

Yes, it is OK not to return an error. There is even a comment above the
call to mlxsw_sp_router_ip2me_fib_entry_find():

/* It is valid to create a tunnel with a local IP and only later
 * assign this IP address to a local interface
 */

> 
> > Reported-by: TOTE Robot 
> > Signed-off-by: Jia-Ju Bai 
> > ---
> >  drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
> > b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > index 9ce90841f92d..7b260e25df1b 100644
> > --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> > @@ -1981,8 +1981,10 @@ int mlxsw_sp_router_nve_promote_decap(struct 
> > mlxsw_sp *mlxsw_sp, u32 ul_tb_id,
> > fib_entry = mlxsw_sp_router_ip2me_fib_entry_find(mlxsw_sp, ul_tb_id,
> >  ul_proto, ul_sip,
> >  type);
> > -   if (!fib_entry)
> > +   if (!fib_entry) {
> > +   err = -EINVAL;
> > goto out;
> > +   }
> >  
> > fib_entry->decap.tunnel_index = tunnel_index;
> > fib_entry->type = MLXSW_SP_FIB_ENTRY_TYPE_NVE_DECAP;
> > 
>

Re: [PATCH 10/11] pragma once: delete few backslashes

2021-03-01 Thread Ido Schimmel

On Sun, Feb 28, 2021 at 08:05:14PM +0300, Alexey Dobriyan wrote:
> From 251ca5673886b5bb0a42004944290b9d2b267a4a Mon Sep 17 00:00:00 2001
> From: Alexey Dobriyan 
> Date: Fri, 19 Feb 2021 13:37:24 +0300
> Subject: [PATCH 10/11] pragma once: delete few backslashes
> 
> Some macros contain one backslash too many and end up being the last
> macro in a header file. When #pragma once conversion script truncates
> the last #endif and whitespace before it, such backslash triggers
> a warning about "OMG file ends up in a backslash-newline".
> 
> Needless to say I don't want to handle another case in my script,
> so delete useless backslashes instead.
> 
> Signed-off-by: Alexey Dobriyan 

For mlxsw:

Reviewed-by: Ido Schimmel 
Tested-by: Ido Schimmel 

Thanks

Re: [PATCH v2] netdevsim: fib: remove unneeded semicolon

2021-02-23 Thread Ido Schimmel

On Tue, Feb 23, 2021 at 10:28:46AM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/net/netdevsim/fib.c:564:2-3: Unneeded semicolon.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 

Reviewed-by: Ido Schimmel

Re: [PATCH] netdevsim: fib: remove unneeded semicolon

2021-02-22 Thread Ido Schimmel

On Mon, Feb 22, 2021 at 05:47:04PM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/net/netdevsim/fib.c:564:2-3: Unneeded semicolon.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/net/netdevsim/fib.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/netdevsim/fib.c b/drivers/net/netdevsim/fib.c
> index 46fb414..11b43ce 100644
> --- a/drivers/net/netdevsim/fib.c
> +++ b/drivers/net/netdevsim/fib.c
> @@ -561,7 +561,7 @@ static void nsim_fib6_rt_nh_del(struct nsim_fib6_rt 
> *fib6_rt,
>  err_fib6_rt_nh_del:
>   for (i--; i >= 0; i--) {
>   nsim_fib6_rt_nh_del(fib6_rt, rt_arr[i]);
> - };
> + }

You can simply remove the braces since they are not needed

>   nsim_fib_rt_fini(&fib6_rt->common);
>   kfree(fib6_rt);
>   return ERR_PTR(err);
> -- 
> 1.8.3.1
>

Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

2021-02-11 Thread Ido Schimmel

On Thu, Feb 11, 2021 at 11:35:27AM +0200, Vladimir Oltean wrote:
> On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote:
> > On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > > > The reverse, during unlinking, would be to refuse unlinking if the 
> > > > > > upper
> > > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > > > return an error and callers such as team/bond need to learn to 
> > > > > > handle
> > > > > > it, but it seems patchable.
> > > > >
> > > > > Again, this was treated prior to my deletion in this series and not by
> > > > > erroring out, I just really didn't think it through.
> > > > >
> > > > > So you're saying that if we impose that all switchdev drivers restrict
> > > > > the house of cards to be constructed from the bottom up, and 
> > > > > destructed
> > > > > from the top down, then the notification of bridge port flags can stay
> > > > > in the bridge layer?
> > > >
> > > > I actually don't think it's a good idea to have this in the bridge in
> > > > any case. I understand that it makes sense for some devices where
> > > > learning, flooding, etc are port attributes, but in other devices these
> > > > can be {port,vlan} attributes and then you need to take care of them
> > > > when a vlan is added / deleted and not only when a port is removed from
> > > > the bridge. So for such devices this really won't save anything. I would
> > > > thus leave it to the lower levels to decide.
> > >
> > > Just for my understanding, how are per-{port,vlan} attributes such as
> > > learning and flooding managed by the Linux bridge? How can I disable
> > > flooding only in a certain VLAN?
> >
> > You can't (currently). But it does not change the fact that in some
> > devices these are {port,vlan} attributes and we are talking here about
> > the interface towards these devices. Having these as {port,vlan}
> > attributes allows you to support use cases such as a port being enslaved
> > to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
> > bridge(s).
> 
> I don't think I understand the use case really. You mean something like this?
> 
> br1 (vlan_filtering=0)
> /   \
>/ \
>  swp0.100 \
>|   \
>|(vlan_filtering \
>|  br0  =1)   \
>| /   \\
>|/ \\
>  swp0swp1 swp2
> 
> A packet received on swp0 with VLAN tag 100 will go to swp0.100 which
> will be forwarded according to the FDB of br1, and will be delivered to
> swp2 as untagged? Respectively in the other direction, a packet received
> on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if
> it is already VLAN-tagged?
> 
> What do you even use this for?

The more common use case is to have multiple VLAN-unaware bridges
instead of one VLAN-aware bridge. I'm not aware of users that use the
hybrid model (VLAN-aware + VLAN-unaware). But regardless, this entails
treating above mentioned attributes as {port,vlan} attributes. A device
that only supports them as port attributes will have problems supporting
such a model.

> And also: if the {port,vlan} attributes can be simulated by making the
> bridge port be an 8021q upper of a physical interface, then as far as
> the bridge is concerned, they still are per-port attributes, and they
> are per-{port,vlan} only as far as the switch driver is concerned -
> therefore I don't see why it isn't okay for the bridge to notify the
> brport flags in exactly the same way for them too.

Look at this hunk from the patch:

@@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p)
update_headroom(br, get_max_headroom(br));
netdev_reset_rx_headroom(dev);
 
+   nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING,
+BR_PORT_DEFAULT_FLAGS);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();

Devices that treat these attributes as {port,vlan} attributes will undo
this change upon the call to nbp_vlan_flush() when all the VLANs are
flushed.

> 
> > Obviously you need to ensure there is no conflict between the
> > VLANs used by the VLAN-aware bridge and the VLAN device(s).
> 
> On the other hand I think I have a more real-life use case that I think

Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

2021-02-11 Thread Ido Schimmel

On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > return an error and callers such as team/bond need to learn to handle
> > > > it, but it seems patchable.
> > >
> > > Again, this was treated prior to my deletion in this series and not by
> > > erroring out, I just really didn't think it through.
> > >
> > > So you're saying that if we impose that all switchdev drivers restrict
> > > the house of cards to be constructed from the bottom up, and destructed
> > > from the top down, then the notification of bridge port flags can stay
> > > in the bridge layer?
> >
> > I actually don't think it's a good idea to have this in the bridge in
> > any case. I understand that it makes sense for some devices where
> > learning, flooding, etc are port attributes, but in other devices these
> > can be {port,vlan} attributes and then you need to take care of them
> > when a vlan is added / deleted and not only when a port is removed from
> > the bridge. So for such devices this really won't save anything. I would
> > thus leave it to the lower levels to decide.
> 
> Just for my understanding, how are per-{port,vlan} attributes such as
> learning and flooding managed by the Linux bridge? How can I disable
> flooding only in a certain VLAN?

You can't (currently). But it does not change the fact that in some
devices these are {port,vlan} attributes and we are talking here about
the interface towards these devices. Having these as {port,vlan}
attributes allows you to support use cases such as a port being enslaved
to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
bridge(s). Obviously you need to ensure there is no conflict between the
VLANs used by the VLAN-aware bridge and the VLAN device(s).

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Ido Schimmel

On Wed, Feb 10, 2021 at 02:55:01PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 02:38:23PM +0200, Ido Schimmel wrote:
> > On Wed, Feb 10, 2021 at 02:29:36PM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> > > > On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > > > > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > > > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov 
> > > > > > > wrote:
> > > > > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > > > > >>> Hi Nikolay,
> > > > > > >>>
> > > > > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov 
> > > > > > >>> wrote:
> > > > > > >>>> Hi Vladimir,
> > > > > > >>>> Let's take a step back for a moment and discuss the bridge 
> > > > > > >>>> unlock/lock sequences
> > > > > > >>>> that come with this set. I'd really like to avoid those as 
> > > > > > >>>> they're a recipe
> > > > > > >>>> for future problems. The only good way to achieve that 
> > > > > > >>>> currently is to keep
> > > > > > >>>> the PRE_FLAGS call and do that in unsleepable context but move 
> > > > > > >>>> the FLAGS call
> > > > > > >>>> after the flags have been changed (if they have changed 
> > > > > > >>>> obviously). That would
> > > > > > >>>> make the code read much easier since we'll have all our 
> > > > > > >>>> lock/unlock sequences
> > > > > > >>>> in the same code blocks and won't play games to get sleepable 
> > > > > > >>>> context.
> > > > > > >>>> Please let's think and work in that direction, rather than 
> > > > > > >>>> having:
> > > > > > >>>> +  spin_lock_bh(&p->br->lock);
> > > > > > >>>> +  if (err) {
> > > > > > >>>> +  netdev_err(p->dev, "%s\n", extack._msg);
> > > > > > >>>> +  return err;
> > > > > > >>>>}
> > > > > > >>>> +
> > > > > > >>>>
> > > > > > >>>> which immediately looks like a bug even though after some code 
> > > > > > >>>> checking we can
> > > > > > >>>> verify it's ok. WDYT?
> > > > > > >>>>
> > > > > > >>>> I plan to get rid of most of the br->lock since it's been 
> > > > > > >>>> abused for a very long
> > > > > > >>>> time because it's essentially STP lock, but people have 
> > > > > > >>>> started using it for other
> > > > > > >>>> things and I plan to fix that when I get more time.
> > > > > > >>>
> > > > > > >>> This won't make the sysfs codepath any nicer, will it?
> > > > > > >>>
> > > > > > >>
> > > > > > >> Currently we'll have to live with a hack that checks if the 
> > > > > > >> flags have changed. I agree
> > > > > > >> it won't be pretty, but we won't have to unlock and lock again 
> > > > > > >> in the middle of the
> > > > > > >> called function and we'll have all our locking in the same 
> > > > > > >> place, easier to verify and
> > > > > > >> later easier to remove. Once I get rid of most of the br->lock 
> > > > > > >> usage we can revisit
> > > > > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to 
> > > > > > >> change the flags, then
> > > > > > >> send the switchdev notification outside of the lock and revert 
> > > > > > &

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Ido Schimmel

On Wed, Feb 10, 2021 at 02:29:36PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> > On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > > >>> Hi Nikolay,
> > > > >>>
> > > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > > > >>>> Hi Vladimir,
> > > > >>>> Let's take a step back for a moment and discuss the bridge 
> > > > >>>> unlock/lock sequences
> > > > >>>> that come with this set. I'd really like to avoid those as they're 
> > > > >>>> a recipe
> > > > >>>> for future problems. The only good way to achieve that currently 
> > > > >>>> is to keep
> > > > >>>> the PRE_FLAGS call and do that in unsleepable context but move the 
> > > > >>>> FLAGS call
> > > > >>>> after the flags have been changed (if they have changed 
> > > > >>>> obviously). That would
> > > > >>>> make the code read much easier since we'll have all our 
> > > > >>>> lock/unlock sequences
> > > > >>>> in the same code blocks and won't play games to get sleepable 
> > > > >>>> context.
> > > > >>>> Please let's think and work in that direction, rather than having:
> > > > >>>> +  spin_lock_bh(&p->br->lock);
> > > > >>>> +  if (err) {
> > > > >>>> +  netdev_err(p->dev, "%s\n", extack._msg);
> > > > >>>> +  return err;
> > > > >>>>}
> > > > >>>> +
> > > > >>>>
> > > > >>>> which immediately looks like a bug even though after some code 
> > > > >>>> checking we can
> > > > >>>> verify it's ok. WDYT?
> > > > >>>>
> > > > >>>> I plan to get rid of most of the br->lock since it's been abused 
> > > > >>>> for a very long
> > > > >>>> time because it's essentially STP lock, but people have started 
> > > > >>>> using it for other
> > > > >>>> things and I plan to fix that when I get more time.
> > > > >>>
> > > > >>> This won't make the sysfs codepath any nicer, will it?
> > > > >>>
> > > > >>
> > > > >> Currently we'll have to live with a hack that checks if the flags 
> > > > >> have changed. I agree
> > > > >> it won't be pretty, but we won't have to unlock and lock again in 
> > > > >> the middle of the
> > > > >> called function and we'll have all our locking in the same place, 
> > > > >> easier to verify and
> > > > >> later easier to remove. Once I get rid of most of the br->lock usage 
> > > > >> we can revisit
> > > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to 
> > > > >> change the flags, then
> > > > >> send the switchdev notification outside of the lock and revert the 
> > > > >> flags if it doesn't
> > > > >> go through which doesn't sound much better.
> > > > >> I'm open to any other suggestions, but definitely would like to 
> > > > >> avoid playing locking games.
> > > > >> Even if it means casing out flag setting from all other store_ 
> > > > >> functions for sysfs.
> > > > >
> > > > > By casing out flag settings you mean something like this?
> > > > >
> > > > >
> > > > > #define BRPORT_ATTR(_name, _mode, _show, _store)  \
> > > > > const struct brport_attribute brport_attr_##_name = { 
> > > > > \
> > > > >   .attr = {.name = __stringify(_name),\
> > > &

Re: [PATCH v3 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

2021-02-10 Thread Ido Schimmel

On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > >>> Hi Nikolay,
> > >>>
> > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> >  Hi Vladimir,
> >  Let's take a step back for a moment and discuss the bridge unlock/lock 
> >  sequences
> >  that come with this set. I'd really like to avoid those as they're a 
> >  recipe
> >  for future problems. The only good way to achieve that currently is to 
> >  keep
> >  the PRE_FLAGS call and do that in unsleepable context but move the 
> >  FLAGS call
> >  after the flags have been changed (if they have changed obviously). 
> >  That would
> >  make the code read much easier since we'll have all our lock/unlock 
> >  sequences
> >  in the same code blocks and won't play games to get sleepable context.
> >  Please let's think and work in that direction, rather than having:
> >  +  spin_lock_bh(&p->br->lock);
> >  +  if (err) {
> >  +  netdev_err(p->dev, "%s\n", extack._msg);
> >  +  return err;
> > }
> >  +
> > 
> >  which immediately looks like a bug even though after some code 
> >  checking we can
> >  verify it's ok. WDYT?
> > 
> >  I plan to get rid of most of the br->lock since it's been abused for a 
> >  very long
> >  time because it's essentially STP lock, but people have started using 
> >  it for other
> >  things and I plan to fix that when I get more time.
> > >>>
> > >>> This won't make the sysfs codepath any nicer, will it?
> > >>>
> > >>
> > >> Currently we'll have to live with a hack that checks if the flags have 
> > >> changed. I agree
> > >> it won't be pretty, but we won't have to unlock and lock again in the 
> > >> middle of the
> > >> called function and we'll have all our locking in the same place, easier 
> > >> to verify and
> > >> later easier to remove. Once I get rid of most of the br->lock usage we 
> > >> can revisit
> > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change 
> > >> the flags, then
> > >> send the switchdev notification outside of the lock and revert the flags 
> > >> if it doesn't
> > >> go through which doesn't sound much better.
> > >> I'm open to any other suggestions, but definitely would like to avoid 
> > >> playing locking games.
> > >> Even if it means casing out flag setting from all other store_ functions 
> > >> for sysfs.
> > >
> > > By casing out flag settings you mean something like this?
> > >
> > >
> > > #define BRPORT_ATTR(_name, _mode, _show, _store)  \
> > > const struct brport_attribute brport_attr_##_name = { \
> > >   .attr = {.name = __stringify(_name),\
> > >.mode = _mode },   \
> > >   .show   = _show,\
> > >   .store_unlocked = _store,   \
> > > };
> > >
> > > #define BRPORT_ATTR_FLAG(_name, _mask)\
> > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > { \
> > >   return sprintf(buf, "%d\n", !!(p->flags & _mask));  \
> > > } \
> > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > { \
> > >   return store_flag(p, v, _mask); \
> > > } \
> > > static BRPORT_ATTR(_name, 0644,   \
> > >  show_##_name, store_##_name)
> > >
> > > static ssize_t brport_store(struct kobject *kobj,
> > >   struct attribute *attr,
> > >   const char *buf, size_t count)
> > > {
> > >   ...
> > >
> > >   } else if (brport_attr->store_unlocked) {
> > >   val = simple_strtoul(buf, &endp, 0);
> > >   if (endp == buf)
> > >   goto out_unlock;
> > >   ret = brport_attr->store_unlocked(p, val);
> > >   }
> > >
> >
> > Yes, this can work but will need a bit more changes because of 
> > br_port_flags_change().
> > Then the netlink side can be modeled in a similar way.
> 
> What I just don't understand is how others can get away with doing
> sleepable work in atomic context but I can't make the notifier blocking
> by dropping a spinlock which isn't needed there, because it looks ugly :D

Can you please point to the bug? I'm not following

Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

2021-02-10 Thread Ido Schimmel

On Wed, Feb 10, 2021 at 12:51:53AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> > > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > > > So switchdev drivers operating in standalone mode should disable 
> > > > > address
> > > > > learning. As a matter of practicality, we can reduce code duplication 
> > > > > in
> > > > > drivers by having the bridge notify through switchdev of the initial 
> > > > > and
> > > > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > > > address learning (similar to how they already start up hardcoded for 
> > > > > no
> > > > > forwarding), then they only need to listen for
> > > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, 
> > > > > no
> > > > > need for special cases when the port joins or leaves the bridge etc.
> > > >
> > > > How are you handling the case where a port leaves a LAG that is linked
> > > > to a bridge? In this case the port becomes a standalone port, but will
> > > > not get this notification.
> > >
> > > Apparently the answer to that question is "I delete the code that makes
> > > this use case work", how smart of me. Thanks.
> >
> > Not sure how you expect to interpret this.
> 
> Next patch (05/11) deletes that explicit notification from 
> dsa_port_bridge_leave,
> function which is called from dsa_port_lag_leave too, apparently with good 
> reason.
> 
> > > Unless you have any idea how I could move the logic into the bridge, I
> > > guess I'm stuck with DSA and all the other switchdev drivers having this
> > > forest of corner cases to deal with. At least I can add a comment so I'm
> > > not tempted to delete it next time.
> >
> > There are too many moving pieces with stacked devices. It is not only
> > LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
> > to gracefully / explicitly not handle a case rather than pretending to
> > handle it correctly with complex / buggy code.
> >
> > For example, you should refuse to be enslaved to a LAG that already has
> > upper devices such as a bridge. You are probably not handling this
> > correctly / at all. This is easy. Just a call to
> > netdev_has_any_upper_dev().
> 
> Correct, good point, in particular this means that joining a bridged LAG
> will not get me any notifications of that LAG's CHANGEUPPER because that
> was consumed a long time ago. An equally valid approach seems to be to
> check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call
> dsa_port_bridge_join on the upper if that is present.

The bridge might already have a state you are not familiar with (e.g.,
FDB entry pointing to the LAG), so best to just forbid this. I think
it's fair to impose such limitations (assuming they are properly
communicated to user space) given it results in a much less
buggy/complex code to maintain.

> 
> > The reverse, during unlinking, would be to refuse unlinking if the upper
> > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > return an error and callers such as team/bond need to learn to handle
> > it, but it seems patchable.
> 
> Again, this was treated prior to my deletion in this series and not by
> erroring out, I just really didn't think it through.
> 
> So you're saying that if we impose that all switchdev drivers restrict
> the house of cards to be constructed from the bottom up, and destructed
> from the top down, then the notification of bridge port flags can stay
> in the bridge layer?

I actually don't think it's a good idea to have this in the bridge in
any case. I understand that it makes sense for some devices where
learning, flooding, etc are port attributes, but in other devices these
can be {port,vlan} attributes and then you need to take care of them
when a vlan is added / deleted and not only when a port is removed from
the bridge. So for such devices this really won't save anything. I would
thus leave it to the lower levels to decide.

Re: [PATCH v3 net-next 07/11] net: prep switchdev drivers for concurrent SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS

2021-02-10 Thread Ido Schimmel

On Wed, Feb 10, 2021 at 11:14:41AM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> Because the bridge will start offloading SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
> while not serialized by any lock such as the br->lock spinlock, existing
> drivers that treat that attribute and cache the brport flags might no
> longer work correctly.

Can you explain the race? This notification is sent from sysfs/netlink
call path, both of which take rtnl.

Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

2021-02-09 Thread Ido Schimmel

On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > So switchdev drivers operating in standalone mode should disable address
> > > learning. As a matter of practicality, we can reduce code duplication in
> > > drivers by having the bridge notify through switchdev of the initial and
> > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > address learning (similar to how they already start up hardcoded for no
> > > forwarding), then they only need to listen for
> > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > need for special cases when the port joins or leaves the bridge etc.
> > 
> > How are you handling the case where a port leaves a LAG that is linked
> > to a bridge? In this case the port becomes a standalone port, but will
> > not get this notification.
> 
> Apparently the answer to that question is "I delete the code that makes
> this use case work", how smart of me. Thanks.

Not sure how you expect to interpret this.

> 
> Unless you have any idea how I could move the logic into the bridge, I
> guess I'm stuck with DSA and all the other switchdev drivers having this
> forest of corner cases to deal with. At least I can add a comment so I'm
> not tempted to delete it next time.

There are too many moving pieces with stacked devices. It is not only
LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
to gracefully / explicitly not handle a case rather than pretending to
handle it correctly with complex / buggy code.

For example, you should refuse to be enslaved to a LAG that already has
upper devices such as a bridge. You are probably not handling this
correctly / at all. This is easy. Just a call to
netdev_has_any_upper_dev().

The reverse, during unlinking, would be to refuse unlinking if the upper
has uppers of its own. netdev_upper_dev_unlink() needs to learn to
return an error and callers such as team/bond need to learn to handle
it, but it seems patchable.

Re: [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag

2021-02-09 Thread Ido Schimmel

On Tue, Feb 09, 2021 at 07:36:31PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 05:19:28PM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean 
> >
> > Currently br_switchdev_set_port_flag has two options for error handling
> > and neither is good:
> > - The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
> >   support offloading that flag, and this gets silently ignored and
> >   converted to an errno of 0. Nobody does this.
> > - The driver returns some other error code, like -EINVAL, in
> >   PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.
> >
> > The problem is that we'd like to offload some port flags during bridge
> > join and leave, but also not have the bridge shout at us if those fail.
> > But on the other hand we'd like the user to know that we can't offload
> > something when they set that through netlink. And since we can't have
> > the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
> > called by the user or internally by the bridge, let's just add an extack
> > argument to br_switchdev_set_port_flag and propagate it to its callers.
> > Then, when we need offloading to really fail silently, this can simply
> > be passed a NULL argument.
> >
> > Signed-off-by: Vladimir Oltean 
> > ---
> 
> The build fails because since I started working on v2 and until I sent
> it, Jakub merged net into net-next which contained this fix:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210207194733.1811529-1-olte...@gmail.com/
> for which I couldn't change prototype due to it missing in net-next.
> I think I would like to rather wait to gather some feedback first before
> respinning v3, if possible.

It seems that in the sysfs call path br_switchdev_set_port_flag() will
be called with the bridge lock held, which is going to be a problem
given that patch #8 allows this function to block.

Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

2021-02-09 Thread Ido Schimmel

On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> So switchdev drivers operating in standalone mode should disable address
> learning. As a matter of practicality, we can reduce code duplication in
> drivers by having the bridge notify through switchdev of the initial and
> final brport flags. Then, drivers can simply start up hardcoded for no
> address learning (similar to how they already start up hardcoded for no
> forwarding), then they only need to listen for
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> need for special cases when the port joins or leaves the bridge etc.

How are you handling the case where a port leaves a LAG that is linked
to a bridge? In this case the port becomes a standalone port, but will
not get this notification.

Re: [PATCH v2 net-next 01/11] net: switchdev: propagate extack to port attributes

2021-02-09 Thread Ido Schimmel

On Tue, Feb 09, 2021 at 05:19:26PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean 
> 
> When a struct switchdev_attr is notified through switchdev, there is no
> way to report informational messages, unlike for struct switchdev_obj.
> 
> Signed-off-by: Vladimir Oltean 

Reviewed-by: Ido Schimmel

Re: [RFC v5 net-next] net: core: devlink: add 'dropped' stats field for traps

2021-02-06 Thread Ido Schimmel

On Thu, Feb 04, 2021 at 01:41:22PM +0200, Oleksandr Mazur wrote:
> Whenever query statistics is issued for trap, devlink subsystem
> would also fill-in statistics 'dropped' field. This field indicates
> the number of packets HW dropped and failed to report to the device driver,
> and thus - to the devlink subsystem itself.
> In case if device driver didn't register callback for hard drop
> statistics querying, 'dropped' field will be omitted and not filled.
> Add trap_drop_counter_get callback implementation to the netdevsim.
> Add new test cases for netdevsim, to test both the callback
> functionality, as well as drop statistics alteration check.
> 
> Signed-off-by: Oleksandr Mazur 

Code looks fine to me, but for non-RFC submission (with actual hw
implementation), you probably want to split it into three patches:
devlink, netdevsim, selftest.

Re: [RFC v3 net-next] net: core: devlink: add 'dropped' stats field for DROP trap action

2021-02-01 Thread Ido Schimmel

I missed this patch. Please Cc me on future versions given I commented
on previous versions.

On Mon, Jan 25, 2021 at 02:38:56PM +0200, Oleksandr Mazur wrote:
> Whenever query statistics is issued for trap with DROP action,
> devlink subsystem would also fill-in statistics 'dropped' field.
> In case if device driver did't register callback for hard drop
> statistics querying, 'dropped' field will be omitted and not filled.
> Add trap_drop_counter_get callback implementation to the netdevsim.
> Add new test cases for netdevsim, to test both the callback
> functionality, as well as drop statistics alteration check.
> 
> Signed-off-by: Oleksandr Mazur 

[...]

> +static int devlink_trap_stats_put(struct sk_buff *msg, struct devlink 
> *devlink,
> +   const struct devlink_trap_item *trap_item)
> +{
> + struct devlink_stats stats;
> + struct nlattr *attr;
> + u64 drops = 0;
> + int err;
> +
> + if (trap_item->action == DEVLINK_TRAP_ACTION_DROP &&
> + devlink->ops->trap_drop_counter_get) {
> + err = devlink->ops->trap_drop_counter_get(devlink,
> +   trap_item->trap,
> +   &drops);
> + if (err)
> + return err;
> + }
> +
> + devlink_trap_stats_read(trap_item->stats, &stats);
> +
> + attr = nla_nest_start(msg, DEVLINK_ATTR_STATS);
> + if (!attr)
> + return -EMSGSIZE;
> +
> + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops,
> +   DEVLINK_ATTR_PAD))

Commit message says: "In case if device driver did't register callback
for hard drop statistics querying, 'dropped' field will be omitted and
not filled."

But looks like this attribute is always reported to user space.

> + goto nla_put_failure;
> +
> + if (trap_item->action == DEVLINK_TRAP_ACTION_DROP &&
> + devlink->ops->trap_drop_counter_get &&
> + nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS,
> +   stats.rx_packets, DEVLINK_ATTR_PAD))

This is needed for DEVLINK_ATTR_STATS_RX_DROPPED, not for
DEVLINK_ATTR_STATS_RX_PACKETS.

I don't think it makes sense for a counter to come and go based on the
action. It should always be reported (if device supports it), regardless
of current action. Note that the first check will result in this counter
being reported as zero when the action is not drop, but as non-zero
otherwise. That's not good because the basic property of a counter is
that it is monotonically increasing.

> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES,
> +   stats.rx_bytes, DEVLINK_ATTR_PAD))
> + goto nla_put_failure;
> +
> + nla_nest_end(msg, attr);
> +
> + return 0;
> +
> +nla_put_failure:
> + nla_nest_cancel(msg, attr);
> + return -EMSGSIZE;
> +}

Re: [PATCH net-next] net: core: devlink: add new trap action HARD_DROP

2021-01-25 Thread Ido Schimmel

On Mon, Jan 25, 2021 at 03:56:14PM +0100, Jiri Pirko wrote:
> Mon, Jan 25, 2021 at 01:24:27PM CET, oleksandr.ma...@plvision.eu wrote:
> >Thu, Jan 21, 2021 at 06:36:05PM CET, k...@kernel.org wrote:
> >>On Thu, 21 Jan 2021 14:21:52 +0200 Ido Schimmel wrote:
> >>> On Thu, Jan 21, 2021 at 01:29:37PM +0200, Oleksandr Mazur wrote:
> >>> > Add new trap action HARD_DROP, which can be used by the
> >>> > drivers to register traps, where it's impossible to get
> >>> > packet reported to the devlink subsystem by the device
> >>> > driver, because it's impossible to retrieve dropped packet
> >>> > from the device itself.
> >>> > In order to use this action, driver must also register
> >>> > additional devlink operation - callback that is used
> >>> > to retrieve number of packets that have been dropped by
> >>> > the device.  
> >>> 
> >>> Are these global statistics about number of packets the hardware dropped
> >>> for a specific reason or are these per-port statistics?
> >>> 
> >>> It's a creative use of devlink-trap interface, but I think it makes
> >>> sense. Better to re-use an existing interface than creating yet another
> >>> one.
> >>
> >>Not sure if I agree, if we can't trap why is it a trap?
> >>It's just a counter.
> >
> >>+1
> >Device might be unable to trap only the 'DROP' packets, and this information 
> >should be transparent for the user.
> >
> >I agree on the statement, that new action might be an overhead.
> >I could continue on with the solution Ido Schimmel proposed: since no new 
> >action would be needed and no UAPI changes are required, i could simply do 
> >the dropped statistics (additional field) output added upon trap stats 
> >queiring.
> >(In case if driver registerd callback, of course; and do so only for DROP 
> >actions)
> 
> It is not "a trap". You just need to count dropped packet. You don't
> trap anything. That is why I don't think this has anything to do with
> "trap" infra.

>From [1] I understand that it is a trap and the action can be switched,
but when it is 'drop', the hardware can provide statistics about number
of packets that were discarded in hardware. If this is correct, then the
suggestion in [2] looks valid to me.

[1] 
https://lore.kernel.org/netdev/am0p190mb073828252ffda3215387765ce4...@am0p190mb0738.eurp190.prod.outlook.com/
[2] https://lore.kernel.org/netdev/20210123160348.gb2799...@shredder.lan/

Re: [PATCH net-next] net: core: devlink: add new trap action HARD_DROP

2021-01-24 Thread Ido Schimmel

On Sat, Jan 23, 2021 at 12:14:39PM -0800, Jakub Kicinski wrote:
> On Sat, 23 Jan 2021 18:03:48 +0200 Ido Schimmel wrote:
> > [DEVLINK_ATTR_STATS_RX_DROPPED]
> 
> nit: maybe discarded? dropped sounds like may have been due to an
> overflow or something

Well, it's an existing attribute which is why I suggested using it.

Re: [PATCH net-next] net: core: devlink: add new trap action HARD_DROP

2021-01-23 Thread Ido Schimmel

On Fri, Jan 22, 2021 at 08:36:01AM +, Oleksandr Mazur wrote:
> On Thu, 21 Jan 2021 14:21:52 +0200 Ido Schimmel wrote:
> > On Thu, Jan 21, 2021 at 01:29:37PM +0200, Oleksandr Mazur wrote:
> > > Add new trap action HARD_DROP, which can be used by the
> > > drivers to register traps, where it's impossible to get
> > > packet reported to the devlink subsystem by the device
> > > driver, because it's impossible to retrieve dropped packet
> > > from the device itself.
> > > In order to use this action, driver must also register
> > > additional devlink operation - callback that is used
> > > to retrieve number of packets that have been dropped by
> > > the device.  
> > 
> > Are these global statistics about number of packets the hardware dropped
> > for a specific reason or are these per-port statistics?
> > 
> > It's a creative use of devlink-trap interface, but I think it makes
> > sense. Better to re-use an existing interface than creating yet another
> > one.
> 
> > Not sure if I agree, if we can't trap why is it a trap?
> > It's just a counter.
> 
> It's just another ACTION for trap item. Action however can be switched, e.g. 
> from HARD_DROP to MIRROR.
> 
> The thing is to be able to configure specific trap to be dropped, and provide 
> a way for the device to report back how many packets have been dropped.
> If device is able to report the packet itself, then devlink would be in 
> charge of counting. If not, there should be a way to retrieve these 
> statistics from the devlink.

So no need for another action. Just report these stats via
'DEVLINK_ATTR_STATS_RX_DROPPED' if the hardware supports it.

Currently you do:

+static int
+devlink_trap_hard_drop_stats_put(struct sk_buff *msg,
+struct devlink *devlink,
+const struct devlink_trap_item *trap_item)
+{
+   struct nlattr *attr;
+   u64 drops;
+   int err;
+
+   err = devlink->ops->trap_hard_drop_counter_get(devlink, trap_item->trap,
+  &drops);
+   if (err)
+   return err;
+
+   attr = nla_nest_start(msg, DEVLINK_ATTR_STATS);
+   if (!attr)
+   return -EMSGSIZE;
+
+   if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops,
+ DEVLINK_ATTR_PAD))
+   goto nla_put_failure;
+
+   nla_nest_end(msg, attr);
+
+   return 0;
+
+nla_put_failure:
+   nla_nest_cancel(msg, attr);
+   return -EMSGSIZE;
+}
+
 static int devlink_nl_trap_fill(struct sk_buff *msg, struct devlink *devlink,
const struct devlink_trap_item *trap_item,
enum devlink_command cmd, u32 portid, u32 seq,
@@ -6857,7 +6889,10 @@ static int devlink_nl_trap_fill(struct sk_buff *msg, 
struct devlink *devlink,
if (err)
goto nla_put_failure;
 
-   err = devlink_trap_stats_put(msg, trap_item->stats);
+   if (trap_item->action == DEVLINK_TRAP_ACTION_HARD_DROP)
+   err = devlink_trap_hard_drop_stats_put(msg, devlink, trap_item);
+   else
+   err = devlink_trap_stats_put(msg, trap_item->stats);
if (err)
goto nla_put_failure;

Which means that user space will see stats come and go based on the
trap's action. That's not desirable. Instead, change:

[DEVLINK_ATTR_STATS]
[DEVLINK_ATTR_STATS_RX_PACKETS]
[DEVLINK_ATTR_STATS_RX_BYTES]

To:

[DEVLINK_ATTR_STATS]
[DEVLINK_ATTR_STATS_RX_PACKETS]
[DEVLINK_ATTR_STATS_RX_BYTES]
[DEVLINK_ATTR_STATS_RX_DROPPED]

Where the last attribute is reported to user space for devices that
support such stats. No changes required in uAPI / iproute2.

Re: [PATCH iproute2-next] devlink: add support for HARD_DROP trap action

2021-01-21 Thread Ido Schimmel

On Thu, Jan 21, 2021 at 01:29:54PM +0200, Oleksandr Mazur wrote:
> Add support for new HARD_DROP action, which is used for
> trap hard drop statistics retrival. It's used whenever
> device is unable to report trapped packet to the devlink
> subsystem, and thus device could only state how many
> packets have been dropped, without passing a packet
> to the devlink subsystem to get track of traffic statistics.

This patch should also be marked as "RFC".

> 
> Signed-off-by: Oleksandr Mazur 
> ---
>  devlink/devlink.c| 4 
>  include/uapi/linux/devlink.h | 4 

Missing man page and bash completion extensions:

man/man8/devlink-trap.8
bash-completion/devlink

>  2 files changed, 8 insertions(+)
> 
> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index a2e06644..77185f7c 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -1335,6 +1335,8 @@ static int trap_action_get(const char *actionstr,
>  {
>   if (strcmp(actionstr, "drop") == 0) {
>   *p_action = DEVLINK_TRAP_ACTION_DROP;
> + } else if (strcmp(actionstr, "hard_drop") == 0) {
> + *p_action = DEVLINK_TRAP_ACTION_HARD_DROP;
>   } else if (strcmp(actionstr, "trap") == 0) {
>   *p_action = DEVLINK_TRAP_ACTION_TRAP;
>   } else if (strcmp(actionstr, "mirror") == 0) {
> @@ -7726,6 +7728,8 @@ static const char *trap_action_name(uint8_t action)
>   switch (action) {
>   case DEVLINK_TRAP_ACTION_DROP:
>   return "drop";
> + case DEVLINK_TRAP_ACTION_HARD_DROP:
> + return "hard_drop";
>   case DEVLINK_TRAP_ACTION_TRAP:
>   return "trap";
>   case DEVLINK_TRAP_ACTION_MIRROR:
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 1414acee..ecee2541 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -261,12 +261,16 @@ enum {
>   * enum devlink_trap_action - Packet trap action.
>   * @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is 
> not
>   *sent to the CPU.
> + * @DEVLINK_TRAP_ACTION_HARD_DROP: Packet was dropped by the underlying 
> device,
> + * and device cannot report packet to devlink
> + * (or inject it into the kernel RX path).
>   * @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU.
>   * @DEVLINK_TRAP_ACTION_MIRROR: Packet is forwarded by the device and a copy 
> is
>   *  sent to the CPU.
>   */
>  enum devlink_trap_action {
>   DEVLINK_TRAP_ACTION_DROP,
> + DEVLINK_TRAP_ACTION_HARD_DROP,
>   DEVLINK_TRAP_ACTION_TRAP,
>   DEVLINK_TRAP_ACTION_MIRROR,
>  };
> -- 
> 2.17.1
>

Re: [PATCH net-next] net: core: devlink: add new trap action HARD_DROP

2021-01-21 Thread Ido Schimmel

On Thu, Jan 21, 2021 at 01:29:37PM +0200, Oleksandr Mazur wrote:
> Add new trap action HARD_DROP, which can be used by the
> drivers to register traps, where it's impossible to get
> packet reported to the devlink subsystem by the device
> driver, because it's impossible to retrieve dropped packet
> from the device itself.
> In order to use this action, driver must also register
> additional devlink operation - callback that is used
> to retrieve number of packets that have been dropped by
> the device.

Are these global statistics about number of packets the hardware dropped
for a specific reason or are these per-port statistics?

It's a creative use of devlink-trap interface, but I think it makes
sense. Better to re-use an existing interface than creating yet another
one.

Anyway, this patch really needs to be marked as "RFC" since we cannot
add infrastructure without anyone using it.

Additionally, the documentation
(Documentation/networking/devlink/devlink-trap.rst) needs to be updated,
netdevsim needs to be patched and the test over netdevsim
(tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh) needs to
be extended to cover the new functionality.

More comments below.

> 
> Signed-off-by: Oleksandr Mazur 
> ---
>  include/net/devlink.h| 10 
>  include/uapi/linux/devlink.h |  4 
>  net/core/devlink.c   | 44 +++-
>  3 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index f466819cc477..6811a614f6fd 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -1294,6 +1294,16 @@ struct devlink_ops {
>const struct devlink_trap_group *group,
>enum devlink_trap_action action,
>struct netlink_ext_ack *extack);
> + /**
> +  * @trap_hard_drop_counter_get: Trap hard drop counter get function.
> +  *
> +  * Should be used by device drivers to report number of packets dropped
> +  * by the underlying device, that have been dropped because device
> +  * failed to pass the trapped packet.
> +  */
> + int (*trap_hard_drop_counter_get)(struct devlink *devlink,
> +   const struct devlink_trap *trap,
> +   u64 *p_drops);
>   /**
>* @trap_policer_init: Trap policer initialization function.
>*
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index cf89c318f2ac..9247d9c7db03 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -261,12 +261,16 @@ enum {
>   * enum devlink_trap_action - Packet trap action.
>   * @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is 
> not
>   *sent to the CPU.
> + * @DEVLINK_TRAP_ACTION_HARD_DROP: Packet was dropped by the underlying 
> device,
> + * and device cannot report packet to devlink
> + * (or inject it into the kernel RX path).
>   * @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU.
>   * @DEVLINK_TRAP_ACTION_MIRROR: Packet is forwarded by the device and a copy 
> is
>   *  sent to the CPU.
>   */
>  enum devlink_trap_action {
>   DEVLINK_TRAP_ACTION_DROP,
> + DEVLINK_TRAP_ACTION_HARD_DROP,

This breaks uAPI. New values should be added at the end.

>   DEVLINK_TRAP_ACTION_TRAP,
>   DEVLINK_TRAP_ACTION_MIRROR,
>  };
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index ee828e4b1007..5a06e00429e1 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -6732,6 +6732,7 @@ devlink_trap_action_get_from_info(struct genl_info 
> *info,
>   val = nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_ACTION]);
>   switch (val) {
>   case DEVLINK_TRAP_ACTION_DROP:
> + case DEVLINK_TRAP_ACTION_HARD_DROP:
>   case DEVLINK_TRAP_ACTION_TRAP:
>   case DEVLINK_TRAP_ACTION_MIRROR:
>   *p_trap_action = val;
> @@ -6820,6 +6821,37 @@ static int devlink_trap_stats_put(struct sk_buff *msg,
>   return -EMSGSIZE;
>  }
>  
> +static int
> +devlink_trap_hard_drop_stats_put(struct sk_buff *msg,
> +  struct devlink *devlink,
> +  const struct devlink_trap_item *trap_item)
> +{
> + struct nlattr *attr;
> + u64 drops;
> + int err;
> +
> + err = devlink->ops->trap_hard_drop_counter_get(devlink, trap_item->trap,
> +&drops);
> + if (err)
> + return err;
> +
> + attr = nla_nest_start(msg, DEVLINK_ATTR_STATS);
> + if (!attr)
> + return -EMSGSIZE;
> +
> + if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops,
> +   DEVLINK_ATTR_PAD))
> + goto nla_put_failure;

Re: [PATCH] mlxsw: pci: switch from 'pci_' to 'dma_' API

2021-01-14 Thread Ido Schimmel

On Thu, Jan 14, 2021 at 09:47:57AM +0100, Christophe JAILLET wrote:
> The wrappers in include/linux/pci-dma-compat.h should go away.
> 
> The patch has been generated with the coccinelle script below and has been
> hand modified to replace GFP_ with a correct flag.
> It has been compile tested.
> 
> When memory is allocated in 'mlxsw_pci_queue_init()' and
> 'mlxsw_pci_fw_area_init()' GFP_KERNEL can be used because both functions
> are already using this flag and no lock is acquired.
> 
> When memory is allocated in 'mlxsw_pci_mbox_alloc()' GFP_KERNEL can be used
> because it is only called from the probe function and no lock is acquired
> in the between.
> The call chain is:
>   --> mlxsw_pci_probe()
> --> mlxsw_pci_cmd_init()
>   --> mlxsw_pci_mbox_alloc()
> 
> While at it, also replace the 'dma_set_mask/dma_set_coherent_mask' sequence
> by a less verbose 'dma_set_mask_and_coherent() call.

[...]

> 
> Signed-off-by: Christophe JAILLET 

For net-next:

Tested-by: Ido Schimmel 

Thanks

Re: [PATCH net-next] net/mlxfw: Use kzalloc for allocating only one thing

2020-12-30 Thread Ido Schimmel

On Wed, Dec 30, 2020 at 04:18:35PM +0800, Zheng Yongjun wrote:
> Use kzalloc rather than kcalloc(1,...)
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // 
> @@
> @@
> 
> - kcalloc(1,
> + kzalloc(
>   ...)
> // 
> 
> Signed-off-by: Zheng Yongjun 

Reviewed-by: Ido Schimmel 

Thanks

Re: [PATCH v3 net-next 1/7] net: bridge: notify switchdev of disappearance of old FDB entry upon migration

2020-12-13 Thread Ido Schimmel

ll
> addresses which are sent to the control interface, and then also listen
> for bridge notifier events on its own ports, searching for the ones that
> have a MAC address which was previously sent to the control interface.
> But this is cumbersome and inefficient. Instead, with one small change,
> the bridge could notify of the address deletion from the old port, in a
> symmetrical manner with how it did for the insertion. Then the switchdev
> driver would not be required to monitor learn/forget events for its own
> ports. It could just delete the rule towards the control interface upon
> bridge entry migration. This would make hardware address learning be
> possible again. Then it would take a few more packets until the hardware
> and software FDB would be in sync again.
> 
> Signed-off-by: Vladimir Oltean 
> Acked-by: Nikolay Aleksandrov 

Reviewed-by: Ido Schimmel

Re: [PATCH v5 2/3] net: add kcov handle to skb extensions

2020-11-30 Thread Ido Schimmel

On Mon, Nov 30, 2020 at 05:52:48PM -0800, Jakub Kicinski wrote:
> On Sat, 21 Nov 2020 18:09:41 +0200 Ido Schimmel wrote:
> > + Florian
> > 
> > On Thu, Oct 29, 2020 at 05:36:19PM +, Aleksandr Nogikh wrote:
> > > From: Aleksandr Nogikh 
> > > 
> > > Remote KCOV coverage collection enables coverage-guided fuzzing of the
> > > code that is not reachable during normal system call execution. It is
> > > especially helpful for fuzzing networking subsystems, where it is
> > > common to perform packet handling in separate work queues even for the
> > > packets that originated directly from the user space.
> > > 
> > > Enable coverage-guided frame injection by adding kcov remote handle to
> > > skb extensions. Default initialization in __alloc_skb and
> > > __build_skb_around ensures that no socket buffer that was generated
> > > during a system call will be missed.
> > > 
> > > Code that is of interest and that performs packet processing should be
> > > annotated with kcov_remote_start()/kcov_remote_stop().
> > > 
> > > An alternative approach is to determine kcov_handle solely on the
> > > basis of the device/interface that received the specific socket
> > > buffer. However, in this case it would be impossible to distinguish
> > > between packets that originated during normal background network
> > > processes or were intentionally injected from the user space.
> > > 
> > > Signed-off-by: Aleksandr Nogikh 
> > > Acked-by: Willem de Bruijn   
> > 
> > [...]
> > 
> > > @@ -249,6 +249,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
> > > gfp_mask,
> > >  
> > >   fclones->skb2.fclone = SKB_FCLONE_CLONE;
> > >   }
> > > +
> > > + skb_set_kcov_handle(skb, kcov_common_handle());  
> > 
> > Hi,
> > 
> > This causes skb extensions to be allocated for the allocated skb, but
> > there are instances that blindly overwrite 'skb->extensions' by invoking
> > skb_copy_header() after __alloc_skb(). For example, skb_copy(),
> > __pskb_copy_fclone() and skb_copy_expand(). This results in the skb
> > extensions being leaked [1].
> > 
> > One possible solution is to try to patch all these instances with
> > skb_ext_put() before skb_copy_header().
> > 
> > Another possible solution is to convert skb_copy_header() to use
> > skb_ext_copy() instead of __skb_ext_copy(). It will first drop the
> > reference on the skb extensions of the new skb, but it assumes that
> > 'skb->active_extensions' is valid. This is not the case in the
> > skb_clone() path so we should probably zero this field in __skb_clone().
> > 
> > Other suggestions?
> 
> Looking at the patch from Marco to move back to a field now I'm
> wondering how you run into this, Ido :D
> 
> AFAIU the extension is only added if process as a KCOV handle.
> 
> Are you using KCOV?

Hi Jakub,

Yes. We have an internal syzkaller instance where this is enabled. See
"syz-executor.0" in the trace below.

> 
> > [1]
> > BUG: memory leak
> > unreferenced object 0x888027f9a490 (size 16):
> >   comm "syz-executor.0", pid 1155, jiffies 4295996826 (age 66.927s)
> >   hex dump (first 16 bytes):
> > 01 00 00 00 01 02 6b 6b 01 00 00 00 00 00 00 00  ..kk
> >   backtrace:
> > [<05a5f2c4>] kmemleak_alloc_recursive 
> > include/linux/kmemleak.h:43 [inline]
> > [<05a5f2c4>] slab_post_alloc_hook mm/slab.h:528 [inline]
> > [<05a5f2c4>] slab_alloc_node mm/slub.c:2891 [inline]
> > [<05a5f2c4>] slab_alloc mm/slub.c:2899 [inline]
> > [<05a5f2c4>] kmem_cache_alloc+0x173/0x800 mm/slub.c:2904
> > [<c5e43ea9>] __skb_ext_alloc+0x22/0x90 net/core/skbuff.c:6173
> > [<0de35e81>] skb_ext_add+0x230/0x4a0 net/core/skbuff.c:6268
> > [<3b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4622 
> > [inline]
> > [<3b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4612 
> > [inline]
> > [<3b7efba4>] __alloc_skb+0x47f/0x6a0 net/core/skbuff.c:253
> > [<7f789b23>] skb_copy+0x151/0x310 net/core/skbuff.c:1512
> > [<1ce26864>] mlxsw_emad_transmit+0x4e/0x620 
> > drivers/net/ethernet/mellanox/mlxsw/core.c:585
> > [<5c732123>] mlxsw_emad_reg_access 
> > drivers/net/ethernet/mellanox/mlxsw/core.c:829 [inli

Re: [PATCH net-next 2/3] mlxsw: spectrum_ptp: use PTP wide message type definitions

2020-11-22 Thread Ido Schimmel

On Sun, Nov 22, 2020 at 04:35:58PM +0200, Ido Schimmel wrote:
> Anyway, I added all six patches to our regression as we have some PTP
> tests. Will let you know tomorrow.

Looks good

Re: [PATCH net-next 2/3] mlxsw: spectrum_ptp: use PTP wide message type definitions

2020-11-22 Thread Ido Schimmel

On Sun, Nov 22, 2020 at 08:29:22PM +0100, Christian Eggers wrote:
> On Sunday, 22 November 2020, 15:35:55 CET, Ido Schimmel wrote:
> > On Sun, Nov 22, 2020 at 09:26:35AM +0100, Christian Eggers wrote:
> > > Use recently introduced PTP wide defines instead of a driver internal
> > > enumeration.
> > >
> > > Signed-off-by: Christian Eggers 
> > > Cc: Petr Machata 
> > > Cc: Jiri Pirko 
> > > Cc: Ido Schimmel 
> >
> > Reviewed-by: Ido Schimmel 
> >
> > But:
> >
> > 1. Checkpatch complains about:
> > WARNING: From:/Signed-off-by: email address mismatch: 'From: Christian
> > Eggers ' != 'Signed-off-by: Christian Eggers
> > '
> unfortunately I changed this after running checkpatch. My intention was to
> separate my (private) weekend work from the patches I do while I'm on the job.

No problem. Just make sure that authorship and Signed-off-by agree. You
can use:

# git commit --amend --author="Christian Eggers "

> 
> > 2. This series does not build, which fails the CI [1][2] and also
> > required me to fetch the dependencies that are currently under review
> > [3]. I believe it is generally discouraged to create dependencies
> > between patch sets that are under review for exactly these reasons.
> this was also not by intention. Vladimir found some files I missed in the
> first series. As the whole first series had already been reviewed at that 
> time,
> I wasn't sure whether I am allowed to add further patches to it. Additionally
> I didn't concern that although my local build is successful, I should wait
> until the first series is applied...

Yea, I saw that, no problem :)

> 
> > I don't know what are Jakub's preferences, but had this happened on our
> > internal patchwork instance, I would just ask the author to submit
> > another version with all the patches.
> Please let me know how I shall proceed...

Jakub has the final say, so I assume he will comment on that.

Regardless, thanks for the patches.

Re: [PATCH net-next 2/3] mlxsw: spectrum_ptp: use PTP wide message type definitions

2020-11-22 Thread Ido Schimmel

On Sun, Nov 22, 2020 at 09:26:35AM +0100, Christian Eggers wrote:
> Use recently introduced PTP wide defines instead of a driver internal
> enumeration.
> 
> Signed-off-by: Christian Eggers 
> Cc: Petr Machata 
> Cc: Jiri Pirko 
> Cc: Ido Schimmel 

Reviewed-by: Ido Schimmel 

But:

1. Checkpatch complains about:
WARNING: From:/Signed-off-by: email address mismatch: 'From: Christian Eggers 
' != 'Signed-off-by: Christian Eggers '

2. This series does not build, which fails the CI [1][2] and also
required me to fetch the dependencies that are currently under review
[3]. I believe it is generally discouraged to create dependencies
between patch sets that are under review for exactly these reasons. I
don't know what are Jakub's preferences, but had this happened on our
internal patchwork instance, I would just ask the author to submit
another version with all the patches.

Anyway, I added all six patches to our regression as we have some PTP
tests. Will let you know tomorrow.

Thanks

[1] 
https://lore.kernel.org/netdev/20201122082636.12451-1-cegg...@arri.de/T/#mcef35858585d23b72b8f75450a51618d5c5d3260
[2] 
https://patchwork.hopto.org/static/nipa/389053/11923809/build_allmodconfig_warn/summary
[3] 
https://patchwork.kernel.org/project/netdevbpf/cover/20201120084106.10046-1-cegg...@arri.de/

Re: [PATCH v5 2/3] net: add kcov handle to skb extensions

2020-11-21 Thread Ido Schimmel

+ Florian

On Thu, Oct 29, 2020 at 05:36:19PM +, Aleksandr Nogikh wrote:
> From: Aleksandr Nogikh 
> 
> Remote KCOV coverage collection enables coverage-guided fuzzing of the
> code that is not reachable during normal system call execution. It is
> especially helpful for fuzzing networking subsystems, where it is
> common to perform packet handling in separate work queues even for the
> packets that originated directly from the user space.
> 
> Enable coverage-guided frame injection by adding kcov remote handle to
> skb extensions. Default initialization in __alloc_skb and
> __build_skb_around ensures that no socket buffer that was generated
> during a system call will be missed.
> 
> Code that is of interest and that performs packet processing should be
> annotated with kcov_remote_start()/kcov_remote_stop().
> 
> An alternative approach is to determine kcov_handle solely on the
> basis of the device/interface that received the specific socket
> buffer. However, in this case it would be impossible to distinguish
> between packets that originated during normal background network
> processes or were intentionally injected from the user space.
> 
> Signed-off-by: Aleksandr Nogikh 
> Acked-by: Willem de Bruijn 

[...]

> @@ -249,6 +249,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t 
> gfp_mask,
>  
>   fclones->skb2.fclone = SKB_FCLONE_CLONE;
>   }
> +
> + skb_set_kcov_handle(skb, kcov_common_handle());

Hi,

This causes skb extensions to be allocated for the allocated skb, but
there are instances that blindly overwrite 'skb->extensions' by invoking
skb_copy_header() after __alloc_skb(). For example, skb_copy(),
__pskb_copy_fclone() and skb_copy_expand(). This results in the skb
extensions being leaked [1].

One possible solution is to try to patch all these instances with
skb_ext_put() before skb_copy_header().

Another possible solution is to convert skb_copy_header() to use
skb_ext_copy() instead of __skb_ext_copy(). It will first drop the
reference on the skb extensions of the new skb, but it assumes that
'skb->active_extensions' is valid. This is not the case in the
skb_clone() path so we should probably zero this field in __skb_clone().

Other suggestions?

Thanks

[1]
BUG: memory leak
unreferenced object 0x888027f9a490 (size 16):
  comm "syz-executor.0", pid 1155, jiffies 4295996826 (age 66.927s)
  hex dump (first 16 bytes):
01 00 00 00 01 02 6b 6b 01 00 00 00 00 00 00 00  ..kk
  backtrace:
[<05a5f2c4>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 
[inline]
[<05a5f2c4>] slab_post_alloc_hook mm/slab.h:528 [inline]
[<05a5f2c4>] slab_alloc_node mm/slub.c:2891 [inline]
[<05a5f2c4>] slab_alloc mm/slub.c:2899 [inline]
[<05a5f2c4>] kmem_cache_alloc+0x173/0x800 mm/slub.c:2904
[] __skb_ext_alloc+0x22/0x90 net/core/skbuff.c:6173
[<0de35e81>] skb_ext_add+0x230/0x4a0 net/core/skbuff.c:6268
[<3b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4622 
[inline]
[<3b7efba4>] skb_set_kcov_handle include/linux/skbuff.h:4612 
[inline]
[<3b7efba4>] __alloc_skb+0x47f/0x6a0 net/core/skbuff.c:253
[<7f789b23>] skb_copy+0x151/0x310 net/core/skbuff.c:1512
[<1ce26864>] mlxsw_emad_transmit+0x4e/0x620 
drivers/net/ethernet/mellanox/mlxsw/core.c:585
[<5c732123>] mlxsw_emad_reg_access 
drivers/net/ethernet/mellanox/mlxsw/core.c:829 [inline]
[<5c732123>] mlxsw_core_reg_access_emad+0xda8/0x1770 
drivers/net/ethernet/mellanox/mlxsw/core.c:2408
[] mlxsw_core_reg_access+0x101/0x7f0 
drivers/net/ethernet/mellanox/mlxsw/core.c:2583
[<7c47f30f>] mlxsw_reg_write+0x30/0x40 
drivers/net/ethernet/mellanox/mlxsw/core.c:2603
[<675e3fc7>] mlxsw_sp_port_admin_status_set+0x8a7/0x980 
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:300
[] mlxsw_sp_port_stop+0x63/0x70 
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:537
[] __dev_close_many+0x1c7/0x300 net/core/dev.c:1607
[<628c5987>] __dev_close net/core/dev.c:1619 [inline]
[<628c5987>] __dev_change_flags+0x2b9/0x710 net/core/dev.c:8421
[<8cc810c6>] dev_change_flags+0x97/0x170 net/core/dev.c:8494
[<53274a78>] do_setlink+0xa5b/0x3b80 net/core/rtnetlink.c:2706
[] rtnl_group_changelink net/core/rtnetlink.c:3225 
[inline]
[] __rtnl_newlink+0xe06/0x17d0 net/core/rtnetlink.c:3379

Re: [PATCH -next] mlxsw: spectrum_acl_tcam: simplify the return expression of ishtp_cl_driver_register()

2020-09-21 Thread Ido Schimmel

On Mon, Sep 21, 2020 at 09:10:39PM +0800, Qinglang Miao wrote:
> Simplify the return expression.
> 
> Signed-off-by: Qinglang Miao 
> ---
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
> index 5c0204033..5b4313991 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
> @@ -289,17 +289,11 @@ static int
>  mlxsw_sp_acl_tcam_group_add(struct mlxsw_sp_acl_tcam *tcam,
>   struct mlxsw_sp_acl_tcam_group *group)
>  {
> - int err;
> -
>   group->tcam = tcam;
>   mutex_init(&group->lock);
>   INIT_LIST_HEAD(&group->region_list);
>  
> - err = mlxsw_sp_acl_tcam_group_id_get(tcam, &group->id);
> - if (err)
> - return err;
> -
> - return 0;
> + return mlxsw_sp_acl_tcam_group_id_get(tcam, &group->id);

There is actually a problem here. We don't call mutex_destroy() on
error. Should be:

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index 5c020403342f..741dd69c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -292,13 +292,14 @@ mlxsw_sp_acl_tcam_group_add(struct mlxsw_sp_acl_tcam 
*tcam,
int err;
 
group->tcam = tcam;
-   mutex_init(&group->lock);
INIT_LIST_HEAD(&group->region_list);
 
err = mlxsw_sp_acl_tcam_group_id_get(tcam, &group->id);
if (err)
return err;
 
+   mutex_init(&group->lock);
+
return 0;
 }

Then it's symmetric with mlxsw_sp_acl_tcam_group_del(). Do you want to
send this patch to 'net' or should I? If so, it should have this Fixes
line:

Fixes: 5ec2ee28d27b ("mlxsw: spectrum_acl: Introduce a mutex to guard region 
list updates")

Thanks

>  }
>  
>  static void mlxsw_sp_acl_tcam_group_del(struct mlxsw_sp_acl_tcam_group 
> *group)
> -- 
> 2.23.0
>

Re: [PATCH -next] mlxsw: spectrum_router: simplify the return expression of __mlxsw_sp_router_init()

2020-09-21 Thread Ido Schimmel

On Mon, Sep 21, 2020 at 09:10:41PM +0800, Qinglang Miao wrote:
> Simplify the return expression.
> 
> Signed-off-by: Qinglang Miao 

Reviewed-by: Ido Schimmel

Re: [PATCH net-next RFC v4 04/15] devlink: Add reload actions stats to dev get

2020-09-14 Thread Ido Schimmel

On Mon, Sep 14, 2020 at 03:45:00PM +0200, Jiri Pirko wrote:
> Mon, Sep 14, 2020 at 08:07:51AM CEST, mo...@mellanox.com wrote:
> >Expose devlink reload actions stats to the user through devlink dev
> >get command.
> >
> >Examples:
> >$ devlink dev show
> >pci/:82:00.0:
> >  reload_action_stats:
> >driver_reinit 2
> >fw_activate 1
> >driver_reinit_no_reset 0
> >fw_activate_no_reset 0
> >pci/:82:00.1:
> >  reload_action_stats:
> >driver_reinit 1
> >fw_activate 1
> >driver_reinit_no_reset 0
> >fw_activate_no_reset 0
> 
> I would rather have something like:
>stats:
>  reload_action:
>driver_reinit 1
>fw_activate 1
>driver_reinit_no_reset 0
>fw_activate_no_reset 0
> 
> Then we can easily extend and add other stats in the tree.
> 
> 
> Also, I wonder if these stats could be somehow merged with Ido's metrics
> work:
> https://github.com/idosch/linux/commits/submit/devlink_metric_rfc_v1
> 
> Ido, would it make sense?

I guess. My original idea for devlink-metric was to expose
design-specific metrics to user space where the entity registering the
metrics is the device driver. In this case the entity would be devlink
itself and it would be auto-registered for each device.

> 
> 
> >
> >$ devlink dev show -jp
> >{
> >"dev": {
> >"pci/:82:00.0": {
> >"reload_action_stats": [ {
> >"driver_reinit": 2
> >},{
> >"fw_activate": 1
> >},{
> >"driver_reinit_no_reset": 0
> >},{
> >"fw_activate_no_reset": 0
> >} ]
> >},
> >"pci/:82:00.1": {
> >"reload_action_stats": [ {
> >"driver_reinit": 1
> >},{
> >"fw_activate": 1
> >},{
> >"driver_reinit_no_reset": 0
> >},{
> >"fw_activate_no_reset": 0
> >} ]
> >}
> >}
> >}
> >
> 
> [..]

Re: [PATCH net-next RFC v1 3/4] devlink: Add hierarchy between traps in device level and port level

2020-09-06 Thread Ido Schimmel

On Wed, Sep 02, 2020 at 06:32:13PM +0300, Aya Levin wrote:
> Managing large scale port's traps may be complicated. This patch
> introduces a shortcut: when setting a trap on a device and this trap is
> not registered on this device, the action will take place on all related
> ports that did register this trap.

I'm not really a fan of this and I'm not sure there is precedent for
something similar. Also, it's an optimization, so I wouldn't put it as
part of the first submission before you gather some operational
experience with the initial interface.

In addition, I find it very unintuitive for users. When I do 'devlink
trap show' I will not see anything. I will only see the traps when I
issue 'devlink port trap show', yet 'devlink trap set ...' is expected
to work.

Lets assume that this is a valid change, it would be better implemented
with my suggestion from the previous patch: When devlink sees that a
trap is registered on all the ports it can auto-register a new
per-device trap and user space gets the appropriate notification.

> 
> Signed-off-by: Aya Levin 
> ---
>  net/core/devlink.c | 43 +--
>  1 file changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index b13e1b40bf1c..dea5482b2517 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -6501,23 +6501,46 @@ static int devlink_nl_cmd_trap_set_doit(struct 
> sk_buff *skb,
>   struct devlink *devlink = info->user_ptr[0];
>   struct devlink_trap_mngr *trap_mngr;
>   struct devlink_trap_item *trap_item;
> + struct devlink_port *devlink_port;
>   int err;
>  
> - trap_mngr = devlink_trap_get_trap_mngr_from_info(devlink, info);
> - if (list_empty(&trap_mngr->trap_list))
> - return -EOPNOTSUPP;
> + devlink_port = devlink_port_get_from_attrs(devlink, info->attrs);
> + if (IS_ERR(devlink_port)) {
> + trap_mngr =  &devlink->trap_mngr;
> + if (list_empty(&trap_mngr->trap_list))
> + goto loop_over_ports;
>  
> - trap_item = devlink_trap_item_get_from_info(trap_mngr, info);
> - if (!trap_item) {
> - NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap");
> - return -ENOENT;
> + trap_item = devlink_trap_item_get_from_info(trap_mngr, info);
> + if (!trap_item)
> + goto loop_over_ports;
> + } else {
> + trap_mngr = &devlink_port->trap_mngr;
> + if (list_empty(&trap_mngr->trap_list))
> + return -EOPNOTSUPP;
> +
> + trap_item = devlink_trap_item_get_from_info(trap_mngr, info);
> + if (!trap_item) {
> + NL_SET_ERR_MSG_MOD(extack, "Port did not register this 
> trap");
> + return -ENOENT;
> + }
>   }
>   return devlink_trap_action_set(devlink, trap_mngr, trap_item, info);
>  
> - err = devlink_trap_action_set(devlink, trap_mngr, trap_item, info);
> - if (err)
> - return err;
> +loop_over_ports:
> + if (list_empty(&devlink->port_list))
> + return -EOPNOTSUPP;
> + list_for_each_entry(devlink_port, &devlink->port_list, list) {
> + trap_mngr = &devlink_port->trap_mngr;
> + if (list_empty(&trap_mngr->trap_list))
> + continue;
>  
> + trap_item = devlink_trap_item_get_from_info(trap_mngr, info);
> + if (!trap_item)
> + continue;
> + err = devlink_trap_action_set(devlink, trap_mngr, trap_item, 
> info);
> + if (err)
> + return err;
> + }
>   return 0;
>  }
>  
> -- 
> 2.14.1
>

Re: [PATCH net-next RFC v1 2/4] devlink: Add devlink traps under devlink_ports context

2020-09-06 Thread Ido Schimmel

On Wed, Sep 02, 2020 at 06:32:12PM +0300, Aya Levin wrote:
> There are some cases where we would like to trap dropped packets only
> for a single port on a device without affecting the others. For that
> purpose trap_mngr was added to devlink_port and corresponding Trap API
> with devlink_port were added too.
> 
> Signed-off-by: Aya Levin 
> ---
>  drivers/net/ethernet/mellanox/mlxsw/core.c |   1 +
>  include/net/devlink.h  |  25 +++
>  net/core/devlink.c | 332 
> -
>  3 files changed, 353 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c 
> b/drivers/net/ethernet/mellanox/mlxsw/core.c
> index 97460f47e537..cb9567a6a90d 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/core.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
> @@ -1178,6 +1178,7 @@ static void mlxsw_devlink_trap_fini(struct devlink 
> *devlink,
>  static int mlxsw_devlink_trap_action_set(struct devlink *devlink,
>const struct devlink_trap *trap,
>enum devlink_trap_action action,
> +  void *trap_ctx,

This is an unrelated change.

>struct netlink_ext_ack *extack)
>  {
>   struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index d387ea5518c3..b4897ee38209 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -110,6 +110,7 @@ struct devlink_port {
>   struct delayed_work type_warn_dw;
>   struct list_head reporter_list;
>   struct mutex reporters_lock; /* Protects reporter_list */
> + struct devlink_trap_mngr trap_mngr;
>  };
>  
>  struct devlink_sb_pool_info {
> @@ -1108,6 +1109,7 @@ struct devlink_trap_ops {
>   int (*trap_action_set)(struct devlink *devlink,
>  const struct devlink_trap *trap,
>  enum devlink_trap_action action,
> +void *trap_ctx,

Same.

>  struct netlink_ext_ack *extack);
>   /**
>* @trap_group_init: Trap group initialization function.
> @@ -1414,6 +1416,29 @@ devlink_trap_policers_unregister(struct devlink 
> *devlink,
>const struct devlink_trap_policer *policers,
>size_t policers_count);
>  
> +void devlink_port_traps_ops(struct devlink_port *devlink_port,
> + const struct devlink_trap_ops *ops);
> +int devlink_port_traps_register(struct devlink_port *devlink_port,
> + const struct devlink_trap *traps,
> + size_t traps_count, void *priv);
> +void devlink_port_traps_unregister(struct devlink_port *devlink_port,
> +const struct devlink_trap *traps,
> +size_t traps_count);
> +void devlink_port_trap_report(struct devlink_port *devlink_port, struct 
> sk_buff *skb,
> +   void *trap_ctx, const struct flow_action_cookie 
> *fa_cookie);
> +int devlink_port_trap_groups_register(struct devlink_port *devlink_port,
> +   const struct devlink_trap_group *groups,
> +   size_t groups_count);
> +void devlink_port_trap_groups_unregister(struct devlink_port *devlink_port,
> +  const struct devlink_trap_group 
> *groups,
> +  size_t groups_count);
> +int devlink_port_trap_policers_register(struct devlink_port *devlink_port,
> + const struct devlink_trap_policer 
> *policers,
> + size_t policers_count);
> +void devlink_port_trap_policers_unregister(struct devlink_port *devlink_port,
> +const struct devlink_trap_policer 
> *policers,
> +size_t policers_count);

No driver is calling the last two functions, so lets not add them.

> +
>  #if IS_ENABLED(CONFIG_NET_DEVLINK)
>  
>  void devlink_compat_running_version(struct net_device *dev,
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index a30b5444289b..b13e1b40bf1c 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -6155,7 +6155,13 @@ struct devlink_trap_item {
>  static struct devlink_trap_mngr *
>  devlink_trap_get_trap_mngr_from_info(struct devlink *devlink, struct 
> genl_info *info)
>  {
> - return &devlink->trap_mngr;
> + struct devlink_port *devlink_port;
> +
> + devlink_port = devlink_port_get_from_attrs(devlink, info->attrs);
> + if (IS_ERR(devlink_port))
> + return  &devlink->trap_mngr;
> + else
> + return &devlink_port->trap_mngr;
>  }

I understand how this struct allows you to re-use a lot of code

Re: [PATCH][next] mlxsw: spectrum_cnt: Use flex_array_size() helper in memcpy()

2020-07-30 Thread Ido Schimmel

On Wed, Jul 29, 2020 at 05:58:03PM -0500, Gustavo A. R. Silva wrote:
> Make use of the flex_array_size() helper to calculate the size of a
> flexible array member within an enclosing structure.
> 
> This helper offers defense-in-depth against potential integer
> overflows, while at the same time makes it explicitly clear that
> we are dealing witha flexible array member.
> 
> Also, remove unnecessary pointer identifier sub_pool.
> 
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Ido Schimmel 
Tested-by: Ido Schimmel 

Thanks

Re: [PATCH 19/26] net/ipv6: switch ipv6_flowlabel_opt to sockptr_t

2020-07-27 Thread Ido Schimmel

On Mon, Jul 27, 2020 at 06:15:55PM +0200, Christoph Hellwig wrote:
> I have to admit I didn't spot the difference between the good and the
> bad output even after trying hard..
> 
> But can you try the patch below?
> 
> ---
> From cce2d2e1b43ecee5f4af7cf116808b74b330080f Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Mon, 27 Jul 2020 17:42:27 +0200
> Subject: net: remove sockptr_advance
> 
> sockptr_advance never properly worked.  Replace it with _offset variants
> of copy_from_sockptr and copy_to_sockptr.
> 
> Fixes: ba423fdaa589 ("net: add a new sockptr_t type")
> Reported-by: Jason A. Donenfeld 
> Reported-by: Ido Schimmel 
> Signed-off-by: Christoph Hellwig 

Tested-by: Ido Schimmel 

Thanks!

Re: [PATCH 19/26] net/ipv6: switch ipv6_flowlabel_opt to sockptr_t

2020-07-27 Thread Ido Schimmel

On Mon, Jul 27, 2020 at 03:00:29PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 27, 2020 at 03:15:05PM +0300, Ido Schimmel wrote:
> > I see a regression with IPv6 flowlabel that I bisected to this patch.
> > When passing '-F 0' to 'ping' the flow label should be random, yet it's
> > the same every time after this patch.
> 
> Can you send a reproducer?

```
#!/bin/bash

ip link add name dummy10 up type dummy

ping -q -F 0 -I dummy10 ff02::1 &> /dev/null &
tcpdump -nne -e -i dummy10 -vvv -c 1 dst host ff02::1
pkill ping

echo

ping -F 0 -I dummy10 ff02::1 &> /dev/null &
tcpdump -nne -e -i dummy10 -vvv -c 1 dst host ff02::1
pkill ping

ip link del dev dummy10
```

Output with commit ff6a4cf214ef ("net/ipv6: split up
ipv6_flowlabel_opt"):

```
dropped privs to tcpdump
tcpdump: listening on dummy10, link-type EN10MB (Ethernet), capture size 262144 
bytes
16:26:27.072559 62:80:34:1d:b4:b8 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 118: (flowlabel 0x920cf, hlim 1, next-header ICMPv6 (58) payload length: 
64) fe80::6080:34ff:fe1d:b4b8 > ff02::1: [icmp6 sum ok] ICMP6, echo request, 
seq 2
1 packet captured
1 packet received by filter
0 packets dropped by kernel

dropped privs to tcpdump
tcpdump: listening on dummy10, link-type EN10MB (Ethernet), capture size 262144 
bytes
16:26:28.352528 62:80:34:1d:b4:b8 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 118: (flowlabel 0xcdd97, hlim 1, next-header ICMPv6 (58) payload length: 
64) fe80::6080:34ff:fe1d:b4b8 > ff02::1: [icmp6 sum ok] ICMP6, echo request, 
seq 2
1 packet captured
1 packet received by filter
0 packets dropped by kernel
```

Output with commit 86298285c9ae ("net/ipv6: switch ipv6_flowlabel_opt to
sockptr_t"):

```
dropped privs to tcpdump
tcpdump: listening on dummy10, link-type EN10MB (Ethernet), capture size 262144 
bytes
16:32:17.848517 f2:9a:05:ff:cb:25 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 118: (flowlabel 0xfab36, hlim 1, next-header ICMPv6 (58) payload length: 
64) fe80::f09a:5ff:feff:cb25 > ff02::1: [icmp6 sum ok] ICMP6, echo request, seq 
2
1 packet captured
1 packet received by filter
0 packets dropped by kernel

dropped privs to tcpdump
tcpdump: listening on dummy10, link-type EN10MB (Ethernet), capture size 262144 
bytes
16:32:19.000779 f2:9a:05:ff:cb:25 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 118: (flowlabel 0xfab36, hlim 1, next-header ICMPv6 (58) payload length: 
64) fe80::f09
a:5ff:feff:cb25 > ff02::1: [icmp6 sum ok] ICMP6, echo request, seq 2
1 packet captured
1 packet received by filter
0 packets dropped by kernel
```

> 
> > 
> > It seems that the pointer is never advanced after the call to
> > sockptr_advance() because it is passed by value and not by reference.
> > Even if you were to pass it by reference I think you would later need to
> > call sockptr_decrease() or something similar. Otherwise it is very
> > error-prone.
> > 
> > Maybe adding an offset to copy_to_sockptr() and copy_from_sockptr() is
> > better?
> 
> We could do that, although I wouldn't add it to the existing functions
> to avoid the churns and instead add copy_to_sockptr_offset or something
> like that.

Sounds good

Thanks

Re: [PATCH 19/26] net/ipv6: switch ipv6_flowlabel_opt to sockptr_t

2020-07-27 Thread Ido Schimmel

On Thu, Jul 23, 2020 at 08:09:01AM +0200, Christoph Hellwig wrote:
> Pass a sockptr_t to prepare for set_fs-less handling of the kernel
> pointer from bpf-cgroup.
> 
> Note that the get case is pretty weird in that it actually copies data
> back to userspace from setsockopt.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/net/ipv6.h   |  2 +-
>  net/ipv6/ip6_flowlabel.c | 16 +---
>  net/ipv6/ipv6_sockglue.c |  2 +-
>  3 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 262fc88dbd7e2f..4c9d89b5d73268 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -406,7 +406,7 @@ struct ipv6_txoptions *fl6_merge_options(struct 
> ipv6_txoptions *opt_space,
>struct ip6_flowlabel *fl,
>struct ipv6_txoptions *fopt);
>  void fl6_free_socklist(struct sock *sk);
> -int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen);
> +int ipv6_flowlabel_opt(struct sock *sk, sockptr_t optval, int optlen);
>  int ipv6_flowlabel_opt_get(struct sock *sk, struct in6_flowlabel_req *freq,
>  int flags);
>  int ip6_flowlabel_init(void);
> diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> index 27ee6de9beffc4..6b3c315f3d461a 100644
> --- a/net/ipv6/ip6_flowlabel.c
> +++ b/net/ipv6/ip6_flowlabel.c
> @@ -371,7 +371,7 @@ static int fl6_renew(struct ip6_flowlabel *fl, unsigned 
> long linger, unsigned lo
>  
>  static struct ip6_flowlabel *
>  fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq,
> -   char __user *optval, int optlen, int *err_p)
> +   sockptr_t optval, int optlen, int *err_p)
>  {
>   struct ip6_flowlabel *fl = NULL;
>   int olen;
> @@ -401,7 +401,8 @@ fl_create(struct net *net, struct sock *sk, struct 
> in6_flowlabel_req *freq,
>   memset(fl->opt, 0, sizeof(*fl->opt));
>   fl->opt->tot_len = sizeof(*fl->opt) + olen;
>   err = -EFAULT;
> - if (copy_from_user(fl->opt+1, optval+CMSG_ALIGN(sizeof(*freq)), 
> olen))
> + sockptr_advance(optval, CMSG_ALIGN(sizeof(*freq)));
> + if (copy_from_sockptr(fl->opt + 1, optval, olen))
>   goto done;
>  
>   msg.msg_controllen = olen;
> @@ -604,7 +605,7 @@ static int ipv6_flowlabel_renew(struct sock *sk, struct 
> in6_flowlabel_req *freq)
>  }
>  
>  static int ipv6_flowlabel_get(struct sock *sk, struct in6_flowlabel_req 
> *freq,
> - void __user *optval, int optlen)
> + sockptr_t optval, int optlen)
>  {
>   struct ipv6_fl_socklist *sfl, *sfl1 = NULL;
>   struct ip6_flowlabel *fl, *fl1 = NULL;
> @@ -702,8 +703,9 @@ static int ipv6_flowlabel_get(struct sock *sk, struct 
> in6_flowlabel_req *freq,
>   goto recheck;
>  
>   if (!freq->flr_label) {
> - if (copy_to_user(&((struct in6_flowlabel_req __user *) 
> optval)->flr_label,
> -  &fl->label, sizeof(fl->label))) {
> + sockptr_advance(optval,
> + offsetof(struct in6_flowlabel_req, flr_label));

Christoph,

I see a regression with IPv6 flowlabel that I bisected to this patch.
When passing '-F 0' to 'ping' the flow label should be random, yet it's
the same every time after this patch.

It seems that the pointer is never advanced after the call to
sockptr_advance() because it is passed by value and not by reference.
Even if you were to pass it by reference I think you would later need to
call sockptr_decrease() or something similar. Otherwise it is very
error-prone.

Maybe adding an offset to copy_to_sockptr() and copy_from_sockptr() is
better?

Thanks

> + if (copy_to_sockptr(optval, &fl->label, sizeof(fl->label))) {
>   /* Intentionally ignore fault. */
>   }
>   }
> @@ -716,13 +718,13 @@ static int ipv6_flowlabel_get(struct sock *sk, struct 
> in6_flowlabel_req *freq,
>   return err;
>  }
>  
> -int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
> +int ipv6_flowlabel_opt(struct sock *sk, sockptr_t optval, int optlen)
>  {
>   struct in6_flowlabel_req freq;
>  
>   if (optlen < sizeof(freq))
>   return -EINVAL;
> - if (copy_from_user(&freq, optval, sizeof(freq)))
> + if (copy_from_sockptr(&freq, optval, sizeof(freq)))
>   return -EFAULT;
>  
>   switch (freq.flr_action) {
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 119dfaf5f4bb26..3897fb55372d38 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -929,7 +929,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, 
> int optname,
>   retv = 0;
>   break;
>   case IPV6_FLOWLABEL_MGR:
> - retv = ipv6_flowlabel_opt(sk, optval, optlen);
> + retv = ipv6_flowlabel_opt(sk, US

Re: KASAN: use-after-free Read in devlink_health_reporter_destroy

2020-07-14 Thread Ido Schimmel

On Mon, Jul 13, 2020 at 03:55:21PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:71930d61 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c8d15710
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4c5bc87125719cf4
> dashboard link: https://syzkaller.appspot.com/bug?extid=dd0040db0d77d52f98a5
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1421cd3f10
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13ccfe4f10
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+dd0040db0d77d52f9...@syzkaller.appspotmail.com

#syz fix: devlink: Fix use-after-free when destroying health reporters

Re: [PATCH net-next v3 5/7] devlink: Add devlink health port reporters API

2020-07-13 Thread Ido Schimmel

On Mon, Jul 13, 2020 at 09:18:25AM -0400, Qian Cai wrote:
> On Fri, Jul 10, 2020 at 03:25:11PM +0300, Moshe Shemesh wrote:
> > From: Vladyslav Tarasiuk 
> > 
> > In order to use new devlink port health reporters infrastructure, add
> > corresponding constructor and destructor functions.
> > 
> > Signed-off-by: Vladyslav Tarasiuk 
> > Reviewed-by: Moshe Shemesh 
> > Reviewed-by: Jiri Pirko 
> 
> This will trigger an use-after-free below while doing SR-IOV,

Yes. I sent a patch for internal review and I'm waiting for Vladyslav
and Moshe to review it. Will copy you once I post it.

Re: [PATCH net-next] mlxsw: spectrum_dcb: Fix a spelling typo in spectrum_dcb.c

2020-06-23 Thread Ido Schimmel

On Tue, Jun 23, 2020 at 11:13:01PM +0900, Masanari Iida wrote:
> This patch fixes a spelling typo in spectrum_dcb.c
> 
> Signed-off-by: Masanari Iida 

Reviewed-by: Ido Schimmel

Re: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

2020-06-23 Thread Ido Schimmel

On Tue, Jun 23, 2020 at 02:34:11PM +0800, Po Liu wrote:
> From: Po Liu 
> 
> Hardware may own many entries for police flow. So that make one(or
>  multi) flow to be policed by one hardware entry. This patch add the
> police action index provide to the driver side make it mapping the
> driver hardware entry index.

Maybe first mention that it is possible for multiple filters in software
to share the same policer. Something like:

"
It is possible for several tc filters to share the same police action by
specifying the action's index when installing the filters.

Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.
"

> 
> Signed-off-by: Po Liu 
> ---
>  include/net/flow_offload.h | 1 +
>  net/sched/cls_api.c| 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index c2ef19c6b27d..eed98075b1ae 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -232,6 +232,7 @@ struct flow_action_entry {
>   booltruncate;
>   } sample;
>   struct {/* FLOW_ACTION_POLICE */
> + u32 index;
>   s64 burst;
>   u64 rate_bytes_ps;
>   u32 mtu;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 6aba7d5ba1ec..fdc4c89ca1fa 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action 
> *flow_action,
>   entry->police.rate_bytes_ps =
>   tcf_police_rate_bytes_ps(act);
>   entry->police.mtu = tcf_police_tcfp_mtu(act);
> + entry->police.index = act->tcfa_index;
>   } else if (is_tcf_ct(act)) {
>   entry->id = FLOW_ACTION_CT;
>   entry->ct.action = tcf_ct_action(act);
> -- 
> 2.17.1
>

Re: [v1,net-next 1/4] net: qos: add tc police offloading action with max frame size limit

2020-06-23 Thread Ido Schimmel

On Tue, Jun 23, 2020 at 02:34:09PM +0800, Po Liu wrote:
> From: Po Liu 
> 
> Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some

s/support/supports/
s/'burst''/'burst'/

> hardware own the capability to limit the frame size. If the frame size
> larger than the setting, the frame would be dropped. For the police
> action itself already accept the 'mtu' parameter in tc command. But not

s/accept/accepts/

> extend to tc flower offloading. So extend 'mtu' to tc flower offloading.

Throughout the submission you are always using the term 'flower
offloading', but this has nothing to do with flower. Flower is the
classifier, whereas you are extending police action which can be tied to
any classifier.

> 
> Signed-off-by: Po Liu 
> ---
> continue the thread 20200306125608.11717-7-po@nxp.com for the police
> action offloading.

For a patch set you need a cover letter (patch 0). It should include
necessary background, motivation and overview of the patches. You can
mention there that some of the patches were sent as RFC back in March
and provide a link:

https://lore.kernel.org/netdev/20200306125608.11717-1-po@nxp.com/

The code itself looks good to me.

> 
>  include/net/flow_offload.h |  1 +
>  include/net/tc_act/tc_police.h | 10 ++
>  net/sched/cls_api.c|  1 +
>  3 files changed, 12 insertions(+)
> 
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 00c15f14c434..c2ef19c6b27d 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -234,6 +234,7 @@ struct flow_action_entry {
>   struct {/* FLOW_ACTION_POLICE */
>   s64 burst;
>   u64 rate_bytes_ps;
> + u32 mtu;
>   } police;
>   struct {/* FLOW_ACTION_CT */
>   int action;
> diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
> index f098ad4424be..cd973b10ae8c 100644
> --- a/include/net/tc_act/tc_police.h
> +++ b/include/net/tc_act/tc_police.h
> @@ -69,4 +69,14 @@ static inline s64 tcf_police_tcfp_burst(const struct 
> tc_action *act)
>   return params->tcfp_burst;
>  }
>  
> +static inline u32 tcf_police_tcfp_mtu(const struct tc_action *act)
> +{
> + struct tcf_police *police = to_police(act);
> + struct tcf_police_params *params;
> +
> + params = rcu_dereference_protected(police->params,
> +lockdep_is_held(&police->tcf_lock));
> + return params->tcfp_mtu;
> +}
> +
>  #endif /* __NET_TC_POLICE_H */
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index a00a203b2ef5..6aba7d5ba1ec 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3658,6 +3658,7 @@ int tc_setup_flow_action(struct flow_action 
> *flow_action,
>   entry->police.burst = tcf_police_tcfp_burst(act);
>   entry->police.rate_bytes_ps =
>   tcf_police_rate_bytes_ps(act);
> + entry->police.mtu = tcf_police_tcfp_mtu(act);
>   } else if (is_tcf_ct(act)) {
>   entry->id = FLOW_ACTION_CT;
>   entry->ct.action = tcf_ct_action(act);
> -- 
> 2.17.1
>

Re: [RFC,net-next 8/9] net: qos: police action add index for tc flower offloading

2020-06-21 Thread Ido Schimmel

On Fri, Mar 06, 2020 at 08:56:06PM +0800, Po Liu wrote:
> Hardware may own many entries for police flow. So that make one(or
>  multi) flow to be policed by one hardware entry. This patch add the
> police action index provide to the driver side make it mapping the
> driver hardware entry index.
> 
> Signed-off-by: Po Liu 

Hi,

I started looking into tc-police offload in mlxsw and remembered your
patch. Are you planning to formally submit it? I'm asking because in
mlxsw it is also possible to share the same policer between multiple
filters.

Thanks

> ---
>  include/net/flow_offload.h | 1 +
>  net/sched/cls_api.c| 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 54df87328edc..3b78b15ed20b 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -201,6 +201,7 @@ struct flow_action_entry {
>   booltruncate;
>   } sample;
>   struct {/* FLOW_ACTION_POLICE */
> + u32 index;
>   s64 burst;
>   u64 rate_bytes_ps;
>   u32 mtu;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 363d3991793d..ce846a9dadc1 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3584,6 +3584,7 @@ int tc_setup_flow_action(struct flow_action 
> *flow_action,
>   entry->police.rate_bytes_ps =
>   tcf_police_rate_bytes_ps(act);
>   entry->police.mtu = tcf_police_mtu(act);
> + entry->police.index = act->tcfa_index;
>   } else if (is_tcf_ct(act)) {
>   entry->id = FLOW_ACTION_CT;
>   entry->ct.action = tcf_ct_action(act);
> -- 
> 2.17.1
>

Re: [net-next 1/6] net: marvell: prestera: Add driver for Prestera family ASIC devices

2020-06-21 Thread Ido Schimmel

On Sat, Jun 20, 2020 at 03:56:39PM +0300, Vadym Kochan wrote:
> But it will look same as prestera_destroy_ports(), do you think
> this is not a problem to have a same logic doubled ?

No, error paths of init() usually share logic with fini(). The benefits
of being consistent, always having init() followed by fini() and making
sure they are symmetric, out-weigh the benefit of saving a few lines of
code.

Re: mlxsw: spectrum: Adjust headroom buffers for 8x ports

2020-06-17 Thread Ido Schimmel

On Wed, Jun 17, 2020 at 06:15:35PM +0100, Colin Ian King wrote:
> Hi
> 
> Static analysis with Coverity has detected an issue that relies on the
> machine endianness to work. The commit in question is:
> 
> commit 60833d54d56c21e7538296eb2e00e104768fd047
> Author: Ido Schimmel 
> Date:   Tue Jun 16 10:14:58 2020 +0300
> 
> mlxsw: spectrum: Adjust headroom buffers for 8x ports
> 
> in line:
> mlxsw_sp_port_headroom_8x_adjust(mlxsw_sp_port, (u16 *) &buffsize);
> 
> 
> The cast of the u32 buffsize to (u16 *) to scale buffsize in the call to
> to mlxsw_sp_port_headroom_8x_adjust() will behave differently on big
> endian architectures to that of little endian architectures.  I'm not
> sure if this is intentional or not.

Not intentional. The hardware interface is quite weird. The buffer size
can be configured via two registers. One takes size as 24 bits and the
second as 16 bits. Either way, the max size is much lower than 2^16-1.

> One solution is to either make buffsize a u16, but I am concerned this
> may be incorrect as the buffsize is assigned from the call
> mlxsw_sp_span_buffsize_get() and this returns a u32 so we may have
> overflow issues. Probably better to make
> mlxsw_sp_port_headroom_8x_adjust handle u32 integers and to return the
> adjusted value rather than modifying it by pass-by-reference.

Thanks for the report, will fix.

> 
> Colin
>

Re: [PATCH AUTOSEL 5.7 264/274] vxlan: Avoid infinite loop when suppressing NS messages with invalid options

2020-06-17 Thread Ido Schimmel

On Wed, Jun 17, 2020 at 12:28:23PM -0400, Sasha Levin wrote:
> On Tue, Jun 09, 2020 at 09:55:48AM +0300, Ido Schimmel wrote:
> > On Mon, Jun 08, 2020 at 07:05:57PM -0400, Sasha Levin wrote:
> > > From: Ido Schimmel 
> > > 
> > > [ Upstream commit 8066e6b449e050675df48e7c4b16c29f00507ff0 ]
> > 
> > Hi,
> > 
> > In the same patch set I also included a similar fix for the bridge
> > module:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53fc685243bd6fb90d90305cea54598b78d3cbfc
> > 
> > But I don't see it in the patch sets you sent.
> > 
> > Don't see it here as well:
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.7
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.7.y
> > 
> > Did it get lost or it's just pending somewhere else?
> 
> AUTOSEL ignores net/ patches that are maintained by David Miller.
> 
> I can pick it up manually.

No need. Dave queued both patches to stable and they are already in
5.7.y:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.7.y&id=f3f4183f6d36df54f5a867653c30852ec6b5ab9d
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.7.y&id=ecf0b3c5a6684fcf27073365d576164390bc000e

> 
> -- 
> Thanks,
> Sasha

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: syz-fuzzer/6792

2020-06-12 Thread Ido Schimmel

On Fri, Jun 12, 2020 at 07:09:04PM +0530, Ritesh Harjani wrote:
> I see Ted has already taken v2 of this patch in his dev repo.
> Should be able to see in linux tree soon.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=811985365378df01386c3cfb7ff716e74ca376d5

Great, thanks a lot. I've replaced previous patch with this one in my
testing tree.

Re: linux-next test error: BUG: using smp_processor_id() in preemptible [ADDR] code: syz-fuzzer/6792

2020-06-12 Thread Ido Schimmel

On Tue, Jun 02, 2020 at 06:11:29PM +0530, Ritesh Harjani wrote:
> #syz test:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> 0e21d4620dd047da7952f44a2e1ac777ded2d57e

> >From cc1cf67d99d5fa61db0651c89c288df31bad6b8e Mon Sep 17 00:00:00 2001
> From: Ritesh Harjani 
> Date: Tue, 2 Jun 2020 17:54:12 +0530
> Subject: [PATCH 1/1] ext4: mballoc: Use raw_cpu_ptr in case if preemption is 
> enabled
> 
> It doesn't matter really in ext4_mb_new_blocks() about whether the code
> is rescheduled on any other cpu due to preemption. Because we care
> about discard_pa_seq only when the block allocation fails and then too
> we add the seq counter of all the cpus against the initial sampled one
> to check if anyone has freed any blocks while we were doing allocation.
> 
> So just use raw_cpu_ptr to not trigger this BUG.
> 
> BUG: using smp_processor_id() in preemptible [] code: syz-fuzzer/6927
> caller is ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
> CPU: 1 PID: 6927 Comm: syz-fuzzer Not tainted 5.7.0-next-20200602-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
> Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  check_preemption_disabled+0x20d/0x220 lib/smp_processor_id.c:48
>  ext4_mb_new_blocks+0xa4d/0x3b70 fs/ext4/mballoc.c:4711
>  ext4_ext_map_blocks+0x201b/0x33e0 fs/ext4/extents.c:4244
>  ext4_map_blocks+0x4cb/0x1640 fs/ext4/inode.c:626
>  ext4_getblk+0xad/0x520 fs/ext4/inode.c:833
>  ext4_bread+0x7c/0x380 fs/ext4/inode.c:883
>  ext4_append+0x153/0x360 fs/ext4/namei.c:67
>  ext4_init_new_dir fs/ext4/namei.c:2757 [inline]
>  ext4_mkdir+0x5e0/0xdf0 fs/ext4/namei.c:2802
>  vfs_mkdir+0x419/0x690 fs/namei.c:3632
>  do_mkdirat+0x21e/0x280 fs/namei.c:3655
>  do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: Ritesh Harjani 
> Reported-by: syzbot+82f324bb69744c5f6...@syzkaller.appspotmail.com

Hi,

Are you going to submit this patch formally? Without it I'm constantly
seeing the above splat.

Thanks

> ---
>  fs/ext4/mballoc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a9083113a8c0..b79b32dbe3ea 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4708,7 +4708,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
>   }
>  
>   ac->ac_op = EXT4_MB_HISTORY_PREALLOC;
> - seq = *this_cpu_ptr(&discard_pa_seq);
> + seq = *raw_cpu_ptr(&discard_pa_seq);
>   if (!ext4_mb_use_preallocated(ac)) {
>   ac->ac_op = EXT4_MB_HISTORY_ALLOC;
>   ext4_mb_normalize_request(ac, ar);
> -- 
> 2.21.3
>

Re: [PATCH AUTOSEL 5.7 264/274] vxlan: Avoid infinite loop when suppressing NS messages with invalid options

2020-06-08 Thread Ido Schimmel

On Mon, Jun 08, 2020 at 07:05:57PM -0400, Sasha Levin wrote:
> From: Ido Schimmel 
> 
> [ Upstream commit 8066e6b449e050675df48e7c4b16c29f00507ff0 ]

Hi,

In the same patch set I also included a similar fix for the bridge
module:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53fc685243bd6fb90d90305cea54598b78d3cbfc

But I don't see it in the patch sets you sent.

Don't see it here as well:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.7
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-5.7.y

Did it get lost or it's just pending somewhere else?

Thanks

> 
> When proxy mode is enabled the vxlan device might reply to Neighbor
> Solicitation (NS) messages on behalf of remote hosts.
> 
> In case the NS message includes the "Source link-layer address" option
> [1], the vxlan device will use the specified address as the link-layer
> destination address in its reply.
> 
> To avoid an infinite loop, break out of the options parsing loop when
> encountering an option with length zero and disregard the NS message.
> 
> This is consistent with the IPv6 ndisc code and RFC 4886 which states
> that "Nodes MUST silently discard an ND packet that contains an option
> with length zero" [2].
> 
> [1] https://tools.ietf.org/html/rfc4861#section-4.3
> [2] https://tools.ietf.org/html/rfc4861#section-4.6
> 
> Fixes: 4b29dba9c085 ("vxlan: fix nonfunctional neigh_reduce()")
> Signed-off-by: Ido Schimmel 
> Acked-by: Nikolay Aleksandrov 
> Signed-off-by: David S. Miller 
> Signed-off-by: Sasha Levin 
> ---
>  drivers/net/vxlan.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index a5b415fed11e..779e56c43d27 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1924,6 +1924,10 @@ static struct sk_buff *vxlan_na_create(struct sk_buff 
> *request,
>   ns_olen = request->len - skb_network_offset(request) -
>   sizeof(struct ipv6hdr) - sizeof(*ns);
>   for (i = 0; i < ns_olen-1; i += (ns->opt[i+1]<<3)) {
> + if (!ns->opt[i + 1]) {
> + kfree_skb(reply);
> + return NULL;
> + }
>   if (ns->opt[i] == ND_OPT_SOURCE_LL_ADDR) {
>   daddr = ns->opt + i + sizeof(struct nd_opt_hdr);
>   break;
> -- 
> 2.25.1
>

Re: [net-next 3/6] net: marvell: prestera: Add basic devlink support

2020-06-03 Thread Ido Schimmel

On Thu, May 28, 2020 at 06:12:42PM +0300, Vadym Kochan wrote:
> Add very basic support for devlink interface:
> 
> - driver name
> - fw version
> - devlink ports

I suggest adding support for reload while the driver is still simple. I
use it all the time because I run with modules built-in. It's also used
in FIB offload tests to relocate the netdevs to a different namespace
where the test is performed

> 
> Signed-off-by: Vadym Kochan 
> ---
>  drivers/net/ethernet/marvell/prestera/Kconfig |   1 +
>  .../net/ethernet/marvell/prestera/Makefile|   2 +-
>  .../net/ethernet/marvell/prestera/prestera.h  |   4 +
>  .../marvell/prestera/prestera_devlink.c   | 111 ++
>  .../marvell/prestera/prestera_devlink.h   |  25 
>  .../ethernet/marvell/prestera/prestera_main.c |  27 -
>  6 files changed, 165 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_devlink.c
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_devlink.h
> 
> diff --git a/drivers/net/ethernet/marvell/prestera/Kconfig 
> b/drivers/net/ethernet/marvell/prestera/Kconfig
> index 0848edb272a5..dfd5174d0568 100644
> --- a/drivers/net/ethernet/marvell/prestera/Kconfig
> +++ b/drivers/net/ethernet/marvell/prestera/Kconfig
> @@ -6,6 +6,7 @@
>  config PRESTERA
>   tristate "Marvell Prestera Switch ASICs support"
>   depends on NET_SWITCHDEV && VLAN_8021Q
> + select NET_DEVLINK
>   help
> This driver supports Marvell Prestera Switch ASICs family.
>  
> diff --git a/drivers/net/ethernet/marvell/prestera/Makefile 
> b/drivers/net/ethernet/marvell/prestera/Makefile
> index 2146714eab21..babd71fba809 100644
> --- a/drivers/net/ethernet/marvell/prestera/Makefile
> +++ b/drivers/net/ethernet/marvell/prestera/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_PRESTERA)   += prestera.o
>  prestera-objs:= prestera_main.o prestera_hw.o prestera_dsa.o 
> \
> -prestera_rxtx.o
> +prestera_rxtx.o prestera_devlink.o
>  
>  obj-$(CONFIG_PRESTERA_PCI)   += prestera_pci.o
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera.h 
> b/drivers/net/ethernet/marvell/prestera/prestera.h
> index 5079d872e18a..f8abaaff5f21 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera.h
> +++ b/drivers/net/ethernet/marvell/prestera/prestera.h
> @@ -11,6 +11,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +#define PRESTERA_DRV_NAME"prestera"
>  
>  struct prestera_fw_rev {
>   u16 maj;
> @@ -63,6 +66,7 @@ struct prestera_port_caps {
>  struct prestera_port {
>   struct net_device *dev;
>   struct prestera_switch *sw;
> + struct devlink_port dl_port;
>   u32 id;
>   u32 hw_id;
>   u32 dev_id;
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_devlink.c 
> b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c
> new file mode 100644
> index ..58021057981b
> --- /dev/null
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_devlink.c
> @@ -0,0 +1,111 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/* Copyright (c) 2019-2020 Marvell International Ltd. All rights reserved */
> +
> +#include 
> +
> +#include "prestera.h"
> +#include "prestera_devlink.h"
> +
> +static int prestera_dl_info_get(struct devlink *dl,
> + struct devlink_info_req *req,
> + struct netlink_ext_ack *extack)
> +{
> + struct prestera_switch *sw = devlink_priv(dl);
> + char buf[16];
> + int err = 0;
> +
> + err = devlink_info_driver_name_put(req, PRESTERA_DRV_NAME);
> + if (err)
> + return err;
> +
> + snprintf(buf, sizeof(buf), "%d.%d.%d",
> +  sw->dev->fw_rev.maj,
> +  sw->dev->fw_rev.min,
> +  sw->dev->fw_rev.sub);
> +
> + err = devlink_info_version_running_put(req,
> +DEVLINK_INFO_VERSION_GENERIC_FW,
> +buf);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> +static const struct devlink_ops prestera_dl_ops = {
> + .info_get = prestera_dl_info_get,
> +};
> +
> +struct prestera_switch *prestera_devlink_alloc(void)
> +{
> + struct devlink *dl;
> +
> + dl = devlink_alloc(&prestera_dl_ops, sizeof(struct prestera_switch));
> +
> + return devlink_priv(dl);
> +}
> +
> +void prestera_devlink_free(struct prestera_switch *sw)
> +{
> + struct devlink *dl = priv_to_devlink(sw);
> +
> + devlink_free(dl);
> +}
> +
> +int prestera_devlink_register(struct prestera_switch *sw)
> +{
> + struct devlink *dl = priv_to_devlink(sw);
> + int err;
> +
> + err = devlink_register(dl, sw->dev->dev);
> + if (err) {
> + dev_warn(sw->dev->dev, "devlink_register failed: %d\n", err);
> + return err;
> + }
> +
> + re

Re: [net-next 2/6] net: marvell: prestera: Add PCI interface support

2020-06-03 Thread Ido Schimmel

On Thu, May 28, 2020 at 06:12:41PM +0300, Vadym Kochan wrote:
>  drivers/net/ethernet/marvell/prestera/Kconfig |  11 +
>  .../net/ethernet/marvell/prestera/Makefile|   2 +
>  .../ethernet/marvell/prestera/prestera_pci.c  | 825 ++
>  3 files changed, 838 insertions(+)
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_pci.c
> 
> diff --git a/drivers/net/ethernet/marvell/prestera/Kconfig 
> b/drivers/net/ethernet/marvell/prestera/Kconfig
> index 76b68613ea7a..0848edb272a5 100644
> --- a/drivers/net/ethernet/marvell/prestera/Kconfig
> +++ b/drivers/net/ethernet/marvell/prestera/Kconfig
> @@ -11,3 +11,14 @@ config PRESTERA
>  
> To compile this driver as a module, choose M here: the
> module will be called prestera.
> +
> +config PRESTERA_PCI
> + tristate "PCI interface driver for Marvell Prestera Switch ASICs family"
> + depends on PCI && HAS_IOMEM && PRESTERA
> + default m
> + ---help---
> +   This is implementation of PCI interface support for Marvell Prestera
> +   Switch ASICs family.
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called prestera_pci.
> diff --git a/drivers/net/ethernet/marvell/prestera/Makefile 
> b/drivers/net/ethernet/marvell/prestera/Makefile
> index 610d75032b78..2146714eab21 100644
> --- a/drivers/net/ethernet/marvell/prestera/Makefile
> +++ b/drivers/net/ethernet/marvell/prestera/Makefile
> @@ -2,3 +2,5 @@
>  obj-$(CONFIG_PRESTERA)   += prestera.o
>  prestera-objs:= prestera_main.o prestera_hw.o prestera_dsa.o 
> \
>  prestera_rxtx.o
> +
> +obj-$(CONFIG_PRESTERA_PCI)   += prestera_pci.o
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_pci.c 
> b/drivers/net/ethernet/marvell/prestera/prestera_pci.c
> new file mode 100644
> index ..0ec07732b12e
> --- /dev/null
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_pci.c
> @@ -0,0 +1,825 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/* Copyright (c) 2019-2020 Marvell International Ltd. All rights reserved */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "prestera.h"
> +
> +#define PRESTERA_MSG_MAX_SIZE 1500
> +
> +#define PRESTERA_SUPP_FW_MAJ_VER 2
> +#define PRESTERA_SUPP_FW_MIN_VER 0
> +
> +#define PRESTERA_FW_PATH \
> + "mrvl/prestera/mvsw_prestera_fw-v" \
> + __stringify(PRESTERA_SUPP_FW_MAJ_VER) \
> + "." __stringify(PRESTERA_SUPP_FW_MIN_VER) ".img"
> +
> +#define PRESTERA_FW_HDR_MAGIC0x351D9D06
> +#define PRESTERA_FW_DL_TIMEOUT   5

#define PRESTERA_FW_DL_TIMEOUT_MS (50 * 1000)

> +#define PRESTERA_FW_BLK_SZ   1024
> +
> +#define PRESTERA_FW_VER_MAJ_MUL 100
> +#define PRESTERA_FW_VER_MIN_MUL 1000
> +
> +#define PRESTERA_FW_VER_MAJ(v)   ((v) / PRESTERA_FW_VER_MAJ_MUL)
> +
> +#define PRESTERA_FW_VER_MIN(v) \
> + (((v) - (PRESTERA_FW_VER_MAJ(v) * PRESTERA_FW_VER_MAJ_MUL)) / \
> + PRESTERA_FW_VER_MIN_MUL)
> +
> +#define PRESTERA_FW_VER_PATCH(v) \
> + ((v) - (PRESTERA_FW_VER_MAJ(v) * PRESTERA_FW_VER_MAJ_MUL) - \
> + (PRESTERA_FW_VER_MIN(v) * PRESTERA_FW_VER_MIN_MUL))
> +
> +struct prestera_fw_header {
> + __be32 magic_number;
> + __be32 version_value;
> + u8 reserved[8];
> +} __packed;
> +
> +struct prestera_ldr_regs {
> + u32 ldr_ready;
> + u32 pad1;
> +
> + u32 ldr_img_size;
> + u32 ldr_ctl_flags;
> +
> + u32 ldr_buf_offs;
> + u32 ldr_buf_size;
> +
> + u32 ldr_buf_rd;
> + u32 pad2;
> + u32 ldr_buf_wr;
> +
> + u32 ldr_status;
> +} __packed __aligned(4);
> +
> +#define PRESTERA_LDR_REG_OFFSET(f)   offsetof(struct prestera_ldr_regs, f)
> +
> +#define PRESTERA_LDR_READY_MAGIC 0xf00dfeed
> +
> +#define PRESTERA_LDR_STATUS_IMG_DL   BIT(0)
> +#define PRESTERA_LDR_STATUS_START_FW BIT(1)
> +#define PRESTERA_LDR_STATUS_INVALID_IMG  BIT(2)
> +#define PRESTERA_LDR_STATUS_NOMEMBIT(3)
> +
> +#define PRESTERA_LDR_REG_BASE(fw)((fw)->ldr_regs)
> +#define PRESTERA_LDR_REG_ADDR(fw, reg)   (PRESTERA_LDR_REG_BASE(fw) + 
> (reg))
> +
> +#define prestera_ldr_write(fw, reg, val) \
> + writel(val, PRESTERA_LDR_REG_ADDR(fw, reg))
> +#define prestera_ldr_read(fw, reg)   \
> + readl(PRESTERA_LDR_REG_ADDR(fw, reg))
> +
> +/* fw loader registers */
> +#define PRESTERA_LDR_READY_REG   
> PRESTERA_LDR_REG_OFFSET(ldr_ready)
> +#define PRESTERA_LDR_IMG_SIZE_REGPRESTERA_LDR_REG_OFFSET(ldr_img_size)
> +#define PRESTERA_LDR_CTL_REG PRESTERA_LDR_REG_OFFSET(ldr_ctl_flags)
> +#define PRESTERA_LDR_BUF_SIZE_REGPRESTERA_LDR_REG_OFFSET(ldr_buf_size)
> +#define PRESTERA_LDR_BUF_OFFS_REGPRESTERA_LDR_REG_OFFSET(ldr_buf_offs)
> +#define PRESTERA_LDR_BUF_RD_REG  
> PRESTERA_LDR_REG_OFFSET(ldr_buf_rd)
> +#define PRESTERA_LDR_BUF_WR_REG  
> PRESTERA_LDR_REG_OFFSET(ldr_buf_wr)
> +#define PR

Re: [net-next 1/6] net: marvell: prestera: Add driver for Prestera family ASIC devices

2020-06-03 Thread Ido Schimmel

On Mon, Jun 01, 2020 at 01:50:13PM +0300, Vadym Kochan wrote:
> Hi Ido,
> 
> On Sat, May 30, 2020 at 06:48:01PM +0300, Ido Schimmel wrote:
> > On Thu, May 28, 2020 at 06:12:40PM +0300, Vadym Kochan wrote:
> > 
> 
> [...]
> 
> > Nit: "From" ?
> > 
> > > + PRESTERA_DSA_CMD_FROM_CPU,
> > > +};
> > > +
> > > +struct prestera_dsa_vlan {
> > > + u16 vid;
> > > + u8 vpt;
> > > + u8 cfi_bit;
> > > + bool is_tagged;
> > > +};
> > > +
> > > +struct prestera_dsa {
> > > + struct prestera_dsa_vlan vlan;
> > > + u32 hw_dev_num;
> > > + u32 port_num;
> > > +};
> > > +
> > > +int prestera_dsa_parse(struct prestera_dsa *dsa, const u8 *dsa_buf);
> > > +int prestera_dsa_build(const struct prestera_dsa *dsa, u8 *dsa_buf);
> > > +
> > > +#endif /* _PRESTERA_DSA_H_ */
> > > diff --git a/drivers/net/ethernet/marvell/prestera/prestera_hw.c 
> > > b/drivers/net/ethernet/marvell/prestera/prestera_hw.c
> > > new file mode 100644
> > > index ..3aa3974f957a
> > > --- /dev/null
> > > +++ b/drivers/net/ethernet/marvell/prestera/prestera_hw.c
> > > @@ -0,0 +1,610 @@
> > > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > > +/* Copyright (c) 2019-2020 Marvell International Ltd. All rights 
> > > reserved */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "prestera.h"
> > > +#include "prestera_hw.h"
> > > +
> > > +#define PRESTERA_SWITCH_INIT_TIMEOUT 3000/* 30sec */
> > 
> > Out of curiosity, how long does it actually take you to initialize the
> > hardware?
> > 
> > Also, I find it useful to note the units in the name, so:
> > 
> > #define PRESTERA_SWITCH_INIT_TIMEOUT_US (30 * 1000 * 1000)
> > 
> > BTW, it says 30 seconds in comment, but the call chain where it is used
> > is:
> > 
> > prestera_cmd_ret_wait(, PRESTERA_SWITCH_INIT_TIMEOUT)
> >   __prestera_cmd_ret(..., wait)
> > prestera_fw_send_req(..., waitms)
> >   prestera_fw_cmd_send(..., waitms)
> > prestera_fw_wait_reg32(..., waitms)
> >   readl_poll_timeout(..., waitms * 1000)
> > 
> > So I think you should actually define it as:
> > 
> > #define PRESTERA_SWITCH_INIT_TIMEOUT_MS (30 * 1000)
> > 
> > And rename all these 'wait' arguments to 'waitms' so it's clearer which
> > unit they expect.
> > 
> [...]
> > > +static int __prestera_cmd_ret(struct prestera_switch *sw,
> > > +   enum prestera_cmd_type_t type,
> > > +   struct prestera_msg_cmd *cmd, size_t clen,
> > > +   struct prestera_msg_ret *ret, size_t rlen,
> > > +   int wait)
> > > +{
> > > + struct prestera_device *dev = sw->dev;
> > > + int err;
> > > +
> > > + cmd->type = type;
> > > +
> > > + err = dev->send_req(dev, (u8 *)cmd, clen, (u8 *)ret, rlen, wait);
> > > + if (err)
> > > + return err;
> > > +
> > > + if (ret->cmd.type != PRESTERA_CMD_TYPE_ACK)
> > > + return -EBADE;
> > > + if (ret->status != PRESTERA_CMD_ACK_OK)
> > 
> > You don't have more states here other than OK / FAIL ? It might help you
> > in debugging if you include them. You might find trace_devlink_hwerr()
> > useful.
> Thanks, I will consider this.
> 
> > 
> > > + return -EINVAL;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int prestera_cmd_ret(struct prestera_switch *sw,
> > > + enum prestera_cmd_type_t type,
> > > + struct prestera_msg_cmd *cmd, size_t clen,
> > > + struct prestera_msg_ret *ret, size_t rlen)
> > > +{
> > > + return __prestera_cmd_ret(sw, type, cmd, clen, ret, rlen, 0);
> > > +}
> > > +
> > > +static int prestera_cmd_ret_wait(struct prestera_switch *sw,
> > > +  enum prestera_cmd_type_t type,
> > > +  struct prestera_msg_cmd *cmd, size_t clen,
> > > +  struct prestera_msg_ret *ret, size_t rlen,
> > > +  int wait)
> > > +{
> > > + return __prestera_cmd_ret(sw, type, cm

Re: [net-next 0/6] net: marvell: prestera: Add Switchdev driver for Prestera family ASIC device 98DX326x (AC3x)

2020-05-30 Thread Ido Schimmel

On Sat, May 30, 2020 at 05:52:31PM +0300, Vadym Kochan wrote:
> Hi Ido,
> 
> On Sat, May 30, 2020 at 05:29:28PM +0300, Ido Schimmel wrote:
> > On Thu, May 28, 2020 at 06:12:39PM +0300, Vadym Kochan wrote:
> > > Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
> > > ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
> > > wireless SMB deployment.
> > > 
> > > Prestera Switchdev is a firmware based driver that operates via PCI bus.  
> > > The
> > > current implementation supports only boards designed for the Marvell 
> > > Switchdev
> > > solution and requires special firmware.
> > > 
> > > This driver implementation includes only L1, basic L2 support, and RX/TX.
> > > 
> > > The core Prestera switching logic is implemented in prestera_main.c, 
> > > there is
> > > an intermediate hw layer between core logic and firmware. It is
> > > implemented in prestera_hw.c, the purpose of it is to encapsulate hw
> > > related logic, in future there is a plan to support more devices with
> > > different HW related configurations.
> > > 
> > > The following Switchdev features are supported:
> > > 
> > > - VLAN-aware bridge offloading
> > > - VLAN-unaware bridge offloading
> > > - FDB offloading (learning, ageing)
> > > - Switchport configuration
> > > 
> > > The firmware image will be uploaded soon to the linux-firmware repository.
> > > 
> > > PATCH:
> > > 1) Fixed W=1 warnings
> > 
> > Hi,
> > 
> > I just applied the patches for review and checkpatch had a lot of
> > complaints. Some are even ERRORs. For example:
> > 
> > WARNING: do not add new typedefs
> > #1064: FILE: drivers/net/ethernet/marvell/prestera/prestera_hw.h:32:
> > +typedef void (*prestera_event_cb_t)
> I may be wrong, as I remember Jiri suggested it and looks like
> it makes sense. I really don't have strong opinion about this.

OK, so I'll let Jiri comment when he is back at work.

> 
> > 
> > WARNING: line over 80 characters
> > #2007: FILE: drivers/net/ethernet/marvell/prestera/prestera_rxtx.c:321:
> > +   __skb_trim(buf->skb, 
> > PRESTERA_SDMA_RX_DESC_PKT_LEN(desc));
> > 
> > WARNING: line over 80 characters
> > #2007: FILE: drivers/net/ethernet/marvell/prestera/prestera_rxtx.c:321:
> > +   __skb_trim(buf->skb, 
> > PRESTERA_SDMA_RX_DESC_PKT_LEN(desc));
> > 
> > ERROR: Macros with complex values should be enclosed in parentheses
> > #196: FILE: drivers/net/ethernet/marvell/prestera/prestera_pci.c:161:
> > +#define PRESTERA_FW_REG_ADDR(fw, reg)  PRESTERA_FW_REG_BASE(fw) + (reg)
> This one makes sense.
> > 
> > WARNING: prefer 'help' over '---help---' for new help texts
> > #52: FILE: drivers/net/ethernet/marvell/prestera/Kconfig:15:
> > +config PRESTERA_PCI
> I will fix it.
> > 
> > ...
> 
> The most are about using ethtool types which are in camel style.
> Regarding > 80 chars is is a required rule ? I saw some discussion
> on LKML that 80+ are acceptable sometimes.

Yea, that's why I didn't include them. Error messages can always exceed
80 characters. Other times I try to follow the rule unless
"the cure is worse than the disease":

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1818701.html

> 
> > 
> > Also, smatch complaints about:
> > 
> > drivers/net/ethernet/marvell/prestera//prestera_ethtool.c:713
> > prestera_ethtool_get_strings() error: memcpy() '*prestera_cnt_name' too
> > small (32 vs 960)
> > 
> > And coccicheck about:
> > 
> > drivers/net/ethernet/marvell/prestera/prestera_hw.c:681:2-3: Unneeded
> > semicolon
> These looks interesting, I did not use smatch and coccicheck, will look
> on these.
> 
> > 
> > > 
> > > 2) Renamed PCI driver name to be more generic "Prestera DX" because
> > >there will be more devices supported.
> > > 
> > > 3) Changed firmware image dir path: marvell/ -> mrvl/prestera/
> > >to be aligned with location in linux-firmware.git (if such
> > >will be accepted).
> > > 
> > > RFC v3:
> > > 1) Fix prestera prefix in prestera_rxtx.c
> > > 
> > > 2) Protect concurrent access from multiple ports on multiple CPU 
> > > system
> > >on tx path by spinlock in prestera_rxtx.c
> > > 
> > >

Re: [net-next 1/6] net: marvell: prestera: Add driver for Prestera family ASIC devices

2020-05-30 Thread Ido Schimmel

On Thu, May 28, 2020 at 06:12:40PM +0300, Vadym Kochan wrote:
> Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
> ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
> wireless SMB deployment.
> 
> The current implementation supports only boards designed for the Marvell
> Switchdev solution and requires special firmware.
> 
> The core Prestera switching logic is implemented in prestera_main.c,
> there is an intermediate hw layer between core logic and firmware. It is
> implemented in prestera_hw.c, the purpose of it is to encapsulate hw
> related logic, in future there is a plan to support more devices with
> different HW related configurations.
> 
> This patch contains only basic switch initialization and RX/TX support
> over SDMA mechanism.
> 
> Currently supported devices have DMA access range <= 32bit and require
> ZONE_DMA to be enabled, for such cases SDMA driver checks if the skb
> allocated in proper range supported by the Prestera device.
> 
> Also meanwhile there is no TX interrupt support in current firmware
> version so recycling work is scheduled on each xmit.
> 
> Port's mac address is generated from the switch base mac which may be
> provided via device-tree (static one or as nvme cell), or randomly
> generated.
> 
> Signed-off-by: Andrii Savka 
> Signed-off-by: Oleksandr Mazur 
> Signed-off-by: Serhiy Boiko 
> Signed-off-by: Serhiy Pshyk 
> Signed-off-by: Taras Chornyi 
> Signed-off-by: Volodymyr Mytnyk 
> Signed-off-by: Vadym Kochan 
> ---
>  drivers/net/ethernet/marvell/Kconfig  |   1 +
>  drivers/net/ethernet/marvell/Makefile |   1 +
>  drivers/net/ethernet/marvell/prestera/Kconfig |  13 +
>  .../net/ethernet/marvell/prestera/Makefile|   4 +
>  .../net/ethernet/marvell/prestera/prestera.h  | 172 
>  .../ethernet/marvell/prestera/prestera_dsa.c  | 134 +++
>  .../ethernet/marvell/prestera/prestera_dsa.h  |  37 +
>  .../ethernet/marvell/prestera/prestera_hw.c   | 610 +
>  .../ethernet/marvell/prestera/prestera_hw.h   |  71 ++
>  .../ethernet/marvell/prestera/prestera_main.c | 506 +++
>  .../ethernet/marvell/prestera/prestera_rxtx.c | 860 ++
>  .../ethernet/marvell/prestera/prestera_rxtx.h |  21 +
>  12 files changed, 2430 insertions(+)
>  create mode 100644 drivers/net/ethernet/marvell/prestera/Kconfig
>  create mode 100644 drivers/net/ethernet/marvell/prestera/Makefile
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera.h
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_dsa.c
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_dsa.h
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_hw.c
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_hw.h
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_main.c
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_rxtx.c
>  create mode 100644 drivers/net/ethernet/marvell/prestera/prestera_rxtx.h
> 
> diff --git a/drivers/net/ethernet/marvell/Kconfig 
> b/drivers/net/ethernet/marvell/Kconfig
> index 3d5caea096fb..74313d9e1fc0 100644
> --- a/drivers/net/ethernet/marvell/Kconfig
> +++ b/drivers/net/ethernet/marvell/Kconfig
> @@ -171,5 +171,6 @@ config SKY2_DEBUG
>  
>  
>  source "drivers/net/ethernet/marvell/octeontx2/Kconfig"
> +source "drivers/net/ethernet/marvell/prestera/Kconfig"
>  
>  endif # NET_VENDOR_MARVELL
> diff --git a/drivers/net/ethernet/marvell/Makefile 
> b/drivers/net/ethernet/marvell/Makefile
> index 89dea7284d5b..9f88fe822555 100644
> --- a/drivers/net/ethernet/marvell/Makefile
> +++ b/drivers/net/ethernet/marvell/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_PXA168_ETH) += pxa168_eth.o
>  obj-$(CONFIG_SKGE) += skge.o
>  obj-$(CONFIG_SKY2) += sky2.o
>  obj-y+= octeontx2/
> +obj-y+= prestera/
> diff --git a/drivers/net/ethernet/marvell/prestera/Kconfig 
> b/drivers/net/ethernet/marvell/prestera/Kconfig
> new file mode 100644
> index ..76b68613ea7a
> --- /dev/null
> +++ b/drivers/net/ethernet/marvell/prestera/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +# Marvell Prestera drivers configuration
> +#
> +
> +config PRESTERA
> + tristate "Marvell Prestera Switch ASICs support"
> + depends on NET_SWITCHDEV && VLAN_8021Q
> + help
> +   This driver supports Marvell Prestera Switch ASICs family.
> +
> +   To compile this driver as a module, choose M here: the
> +   module will be called prestera.
> diff --git a/drivers/net/ethernet/marvell/prestera/Makefile 
> b/drivers/net/ethernet/marvell/prestera/Makefile
> new file mode 100644
> index ..610d75032b78
> --- /dev/null
> +++ b/drivers/net/ethernet/marvell/prestera/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_PRESTERA)   += prestera.o
> +prestera-objs:= prestera_main.o prestera_hw.o prestera_dsa.o 
> \
> +

Re: [net-next 0/6] net: marvell: prestera: Add Switchdev driver for Prestera family ASIC device 98DX326x (AC3x)

2020-05-30 Thread Ido Schimmel

On Thu, May 28, 2020 at 06:12:39PM +0300, Vadym Kochan wrote:
> Marvell Prestera 98DX326x integrates up to 24 ports of 1GbE with 8
> ports of 10GbE uplinks or 2 ports of 40Gbps stacking for a largely
> wireless SMB deployment.
> 
> Prestera Switchdev is a firmware based driver that operates via PCI bus.  The
> current implementation supports only boards designed for the Marvell Switchdev
> solution and requires special firmware.
> 
> This driver implementation includes only L1, basic L2 support, and RX/TX.
> 
> The core Prestera switching logic is implemented in prestera_main.c, there is
> an intermediate hw layer between core logic and firmware. It is
> implemented in prestera_hw.c, the purpose of it is to encapsulate hw
> related logic, in future there is a plan to support more devices with
> different HW related configurations.
> 
> The following Switchdev features are supported:
> 
> - VLAN-aware bridge offloading
> - VLAN-unaware bridge offloading
> - FDB offloading (learning, ageing)
> - Switchport configuration
> 
> The firmware image will be uploaded soon to the linux-firmware repository.
> 
> PATCH:
> 1) Fixed W=1 warnings

Hi,

I just applied the patches for review and checkpatch had a lot of
complaints. Some are even ERRORs. For example:

WARNING: do not add new typedefs
#1064: FILE: drivers/net/ethernet/marvell/prestera/prestera_hw.h:32:
+typedef void (*prestera_event_cb_t)

WARNING: line over 80 characters
#2007: FILE: drivers/net/ethernet/marvell/prestera/prestera_rxtx.c:321:
+   __skb_trim(buf->skb, 
PRESTERA_SDMA_RX_DESC_PKT_LEN(desc));

WARNING: line over 80 characters
#2007: FILE: drivers/net/ethernet/marvell/prestera/prestera_rxtx.c:321:
+   __skb_trim(buf->skb, 
PRESTERA_SDMA_RX_DESC_PKT_LEN(desc));

ERROR: Macros with complex values should be enclosed in parentheses
#196: FILE: drivers/net/ethernet/marvell/prestera/prestera_pci.c:161:
+#define PRESTERA_FW_REG_ADDR(fw, reg)  PRESTERA_FW_REG_BASE(fw) + (reg)

WARNING: prefer 'help' over '---help---' for new help texts
#52: FILE: drivers/net/ethernet/marvell/prestera/Kconfig:15:
+config PRESTERA_PCI

...

Also, smatch complaints about:

drivers/net/ethernet/marvell/prestera//prestera_ethtool.c:713
prestera_ethtool_get_strings() error: memcpy() '*prestera_cnt_name' too
small (32 vs 960)

And coccicheck about:

drivers/net/ethernet/marvell/prestera/prestera_hw.c:681:2-3: Unneeded
semicolon

> 
> 2) Renamed PCI driver name to be more generic "Prestera DX" because
>there will be more devices supported.
> 
> 3) Changed firmware image dir path: marvell/ -> mrvl/prestera/
>to be aligned with location in linux-firmware.git (if such
>will be accepted).
> 
> RFC v3:
> 1) Fix prestera prefix in prestera_rxtx.c
> 
> 2) Protect concurrent access from multiple ports on multiple CPU system
>on tx path by spinlock in prestera_rxtx.c
> 
> 3) Try to get base mac address from device-tree, otherwise use a random 
> generated one.
> 
> 4) Move ethtool interface support into separate prestera_ethtool.c file.
> 
> 5) Add basic devlink support and get rid of physical port naming ops.
> 
> 6) Add STP support in Switchdev driver.
> 
> 7) Removed MODULE_AUTHOR
> 
> 8) Renamed prestera.c -> prestera_main.c, and kernel module to
>prestera.ko
> 
> RFC v2:
> 1) Use "pestera_" prefix in struct's and functions instead of mvsw_pr_
> 
> 2) Original series split into additional patches for Switchdev ethtool 
> support.
> 
> 3) Use major and minor firmware version numbers in the firmware image 
> filename.
> 
> 4) Removed not needed prints.
> 
> 5) Use iopoll API for waiting on register's value in prestera_pci.c
> 
> 6) Use standart approach for describing PCI ID matching section instead 
> of using
>custom wrappers in prestera_pci.c
> 
> 7) Add RX/TX support in prestera_rxtx.c.
> 
> 8) Rewritten prestera_switchdev.c with following changes:
>- handle netdev events from prestera.c
> 
>- use struct prestera_bridge for bridge objects, and get rid of
>  struct prestera_bridge_device which may confuse.
> 
>- use refcount_t
> 
> 9) Get rid of macro usage for sending fw requests in prestera_hw.c
> 
> 10) Add base_mac setting as module parameter. base_mac is required for
> generation default port's mac.
> 
> Vadym Kochan (6):
>   net: marvell: prestera: Add driver for Prestera family ASIC devices
>   net: marvell: prestera: Add PCI interface support
>   net: marvell: prestera: Add basic devlink support
>   net: marvell: prestera: Add ethtool interface support
>   net: marvell: prestera: Add Switchdev driver implementation
>   dt-bindings: marvell,prestera: Add description for device-tree
> bindings
> 
>  .../bindings/net/marvell,prestera.txt |   34 +
>  drivers/net/ethernet/marvell/Kconfig  |1 +
>  drivers/net/ethernet/marvell/Makefile

Re: [PATCH][V2][net-next] mlxsw: spectrum_router: remove redundant initialization of pointer br_dev

2020-05-27 Thread Ido Schimmel

On Wed, May 27, 2020 at 09:15:55AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The pointer br_dev is being initialized with a value that is never read
> and is being updated with a new value later on. The initialization
> is redundant and can be removed.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 

Reviewed-by: Ido Schimmel 

Thanks

Re: [PATCH] mlxsw: spectrum_router: remove redundant initialization of pointer br_dev

2020-05-26 Thread Ido Schimmel

On Tue, May 26, 2020 at 11:56:49PM +0100, Colin King wrote:
> From: Colin Ian King 
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> index 71aee4914619..8f485f9a07a7 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
> @@ -7572,11 +7572,12 @@ static struct mlxsw_sp_fid *
>  mlxsw_sp_rif_vlan_fid_get(struct mlxsw_sp_rif *rif,
> struct netlink_ext_ack *extack)
>  {
> - struct net_device *br_dev = rif->dev;
> + struct net_device *br_dev;
>   u16 vid;
>   int err;
>  
>   if (is_vlan_dev(rif->dev)) {
> +

Colin, please drop the extra blank line and mention in subject prefix
that the patch is for net-next. Thanks!

>   vid = vlan_dev_vlan_id(rif->dev);
>   br_dev = vlan_dev_real_dev(rif->dev);
>   if (WARN_ON(!netif_is_bridge_master(br_dev)))
> -- 
> 2.25.1
>

Re: [PATCH] vlan: fix the bug that cannot create vlan4095

2020-05-18 Thread Ido Schimmel

On Mon, May 18, 2020 at 01:27:55PM +0800, Huang Qijun wrote:
> According to the 8021q standard, the VLAN id range is 1 to 4095.

No, on IEEE8021VlanIndex the standard says:

"A value used to index per-VLAN tables: values of 0 and 4095 are not
permitted. If the value is between 1 and 4094 inclusive, it represents
an IEEE 802.1Q VLAN-ID with global scope within a given bridged domain
(see VlanId textual convention). If the value is greater than 4095, then
it represents a VLAN with scope local to the particular agent, i.e., one
without a global VLAN-ID assigned to it. Such VLANs are outside the
scope of IEEE 802.1Q, but it is convenient to be able to manage them"

>From Wikipedia as well:

"A 12-bit field specifying the VLAN to which the frame belongs. The
hexadecimal values of 0x000 and 0xFFF are reserved. All other values may
be used as VLAN identifiers, allowing up to 4,094 VLANs. [...] The VID
value 0xFFF is reserved for implementation use; it must not be
configured or transmitted. 0xFFF can be used to indicate a wildcard
match in management operations or filtering database entries."

https://en.wikipedia.org/wiki/IEEE_802.1Q

> But in the register_vlan_device function, the range is 1 to 4094,
> because ">= VLAN_VID_MASK" is used to determine whether the id
> is illegal. This will prevent the creation of the vlan4095 interface:
> $ vconfig add sit0 4095
> vconfig: ioctl error for add: Numerical result out of range
> 
> To fix this error, this patch uses ">= VLAN_N_VID" instead to
> determine if the id is illegal.
> 
> Signed-off-by: Huang Qijun 
> ---
>  net/8021q/vlan.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index d4bcfd8f95bf..5de7861ddf64 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -219,7 +219,7 @@ static int register_vlan_device(struct net_device 
> *real_dev, u16 vlan_id)
>   char name[IFNAMSIZ];
>   int err;
>  
> - if (vlan_id >= VLAN_VID_MASK)
> + if (vlan_id >= VLAN_N_VID)
>   return -ERANGE;
>  
>   err = vlan_check_real_dev(real_dev, htons(ETH_P_8021Q), vlan_id,
> -- 
> 2.17.1
>

Re: [PATCH 2/3] net/scm: cleanup scm_detach_fds

2020-05-13 Thread Ido Schimmel

On Wed, May 13, 2020 at 12:17:51PM +0200, Christoph Hellwig wrote:
> On Wed, May 13, 2020 at 12:10:37PM +0200, Christoph Hellwig wrote:
> > Ok.  I'll see what went wrong for real and will hopefully have a
> > different patch for you in a bit.
> 
> Can you try this patch instead of the previous one?

Works. Thanks a lot!

> 
> diff --git a/net/core/scm.c b/net/core/scm.c
> index a75cd637a71ff..875df1c2989db 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -307,7 +307,7 @@ static int __scm_install_fd(struct file *file, int __user 
> *ufd, int o_flags)
>   sock_update_classid(&sock->sk->sk_cgrp_data);
>   }
>   fd_install(new_fd, get_file(file));
> - return error;
> + return 0;
>  }
>  
>  static int scm_max_fds(struct msghdr *msg)

Re: [PATCH 2/3] net/scm: cleanup scm_detach_fds

2020-05-13 Thread Ido Schimmel

On Wed, May 13, 2020 at 11:49:08AM +0200, Christoph Hellwig wrote:
> On Wed, May 13, 2020 at 12:29:18PM +0300, Ido Schimmel wrote:
> > On Mon, May 11, 2020 at 01:59:12PM +0200, Christoph Hellwig wrote:
> > > Factor out two helpes to keep the code tidy.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > 
> > Christoph,
> > 
> > After installing net-next (fb9f2e92864f) on a Fedora 32 machine I cannot
> > ssh to it. Bisected it to this commit [1].
> > 
> > When trying to connect I see these error messages in journal:
> > 
> > sshd[1029]: error: mm_receive_fd: no message header
> > sshd[1029]: fatal: mm_pty_allocate: receive fds failed
> > sshd[1029]: fatal: mm_request_receive_expect: buffer error: incomplete 
> > message
> > sshd[1018]: fatal: mm_request_receive: read: Connection reset by peer
> > 
> > Please let me know if more info is required. I can easily test a patch
> > if you need me to try something.
> 
> To start we can try reverting just this commit, which requires a
> little manual work.  Patch below:

Thanks for the quick reply. With the below patch ssh is working again.

> 
> ---
> From fe4f53219b42aeded3c1464dbe2bbc9365f6a853 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Wed, 13 May 2020 11:48:33 +0200
> Subject: Revert "net/scm: cleanup scm_detach_fds"
> 
> This reverts commit 2618d530dd8b7ac0fdcb83f4c95b88f7b0d37ce6.
> ---
>  net/core/scm.c | 94 +++---
>  1 file changed, 43 insertions(+), 51 deletions(-)
> 
> diff --git a/net/core/scm.c b/net/core/scm.c
> index a75cd637a71ff..2d9aa5682bed2 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -280,53 +280,18 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, 
> struct scm_timestamping_inter
>  }
>  EXPORT_SYMBOL(put_cmsg_scm_timestamping);
>  
> -static int __scm_install_fd(struct file *file, int __user *ufd, int o_flags)
> -{
> - struct socket *sock;
> - int new_fd;
> - int error;
> -
> - error = security_file_receive(file);
> - if (error)
> - return error;
> -
> - new_fd = get_unused_fd_flags(o_flags);
> - if (new_fd < 0)
> - return new_fd;
> -
> - error = put_user(new_fd, ufd);
> - if (error) {
> - put_unused_fd(new_fd);
> - return error;
> - }
> -
> - /* Bump the usage count and install the file. */
> - sock = sock_from_file(file, &error);
> - if (sock) {
> - sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> - sock_update_classid(&sock->sk->sk_cgrp_data);
> - }
> - fd_install(new_fd, get_file(file));
> - return error;
> -}
> -
> -static int scm_max_fds(struct msghdr *msg)
> -{
> - if (msg->msg_controllen <= sizeof(struct cmsghdr))
> - return 0;
> - return (msg->msg_controllen - sizeof(struct cmsghdr)) / sizeof(int);
> -}
> -
>  void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
>  {
>   struct cmsghdr __user *cm
>   = (__force struct cmsghdr __user*)msg->msg_control;
> - int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
> - int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count);
> - int __user *cmsg_data = CMSG_USER_DATA(cm);
> +
> + int fdmax = 0;
> + int fdnum = scm->fp->count;
> + struct file **fp = scm->fp->fp;
> + int __user *cmfptr;
>   int err = 0, i;
>  
> - if (msg->msg_flags & MSG_CMSG_COMPAT) {
> + if (MSG_CMSG_COMPAT & msg->msg_flags) {
>   scm_detach_fds_compat(msg, scm);
>   return;
>   }
> @@ -335,35 +300,62 @@ void scm_detach_fds(struct msghdr *msg, struct 
> scm_cookie *scm)
>   if (WARN_ON_ONCE(!msg->msg_control_is_user))
>   return;
>  
> - for (i = 0; i < fdmax; i++) {
> - err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
> + if (msg->msg_controllen > sizeof(struct cmsghdr))
> + fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
> +  / sizeof(int));
> +
> + if (fdnum < fdmax)
> + fdmax = fdnum;
> +
> + for (i=0, cmfptr =(int __user *)CMSG_USER_DATA(cm); i +  i++, cmfptr++)
> + {
> + struct socket *sock;
> + int new_fd;
> + err = security_file_receive(fp[i]);
>   if (err)
>   break;
> + err = get_unused_fd_flags(MSG_CMSG_CLOEXEC &am

Re: [PATCH 2/3] net/scm: cleanup scm_detach_fds

2020-05-13 Thread Ido Schimmel

On Mon, May 11, 2020 at 01:59:12PM +0200, Christoph Hellwig wrote:
> Factor out two helpes to keep the code tidy.
> 
> Signed-off-by: Christoph Hellwig 

Christoph,

After installing net-next (fb9f2e92864f) on a Fedora 32 machine I cannot
ssh to it. Bisected it to this commit [1].

When trying to connect I see these error messages in journal:

sshd[1029]: error: mm_receive_fd: no message header
sshd[1029]: fatal: mm_pty_allocate: receive fds failed
sshd[1029]: fatal: mm_request_receive_expect: buffer error: incomplete message
sshd[1018]: fatal: mm_request_receive: read: Connection reset by peer

Please let me know if more info is required. I can easily test a patch
if you need me to try something.

Thanks

[1]
git bisect start
# bad: [fb9f2e92864f51d25e790947cca2ac4426a12f9c] net: dsa: tag_sja1105: 
appease sparse checks for ethertype accessors
git bisect bad fb9f2e92864f51d25e790947cca2ac4426a12f9c
# good: [3242956bd610af40e884b530b6c6f76f5bf85f3b] Merge branch 
'net-dsa-Constify-two-tagger-ops'
git bisect good 3242956bd610af40e884b530b6c6f76f5bf85f3b
# bad: [1b0cde4091877cd7fe4b29f67645cc391b86c9ca] sfc: siena_check_caps() can 
be static
git bisect bad 1b0cde4091877cd7fe4b29f67645cc391b86c9ca
# bad: [cba155d591aa28689332bc568632d2f868690be1] ionic: add support for more 
xcvr types
git bisect bad cba155d591aa28689332bc568632d2f868690be1
# bad: [97cf0ef9305bba857f04b914fd59e83813f1eae2] Merge branch 
'improve-msg_control-kernel-vs-user-pointer-handling'
git bisect bad 97cf0ef9305bba857f04b914fd59e83813f1eae2
# bad: [2618d530dd8b7ac0fdcb83f4c95b88f7b0d37ce6] net/scm: cleanup 
scm_detach_fds
git bisect bad 2618d530dd8b7ac0fdcb83f4c95b88f7b0d37ce6
# good: [0462b6bdb6445b887b8896f28be92e0d94c92e7b] net: add a CMSG_USER_DATA 
macro
git bisect good 0462b6bdb6445b887b8896f28be92e0d94c92e7b
# first bad commit: [2618d530dd8b7ac0fdcb83f4c95b88f7b0d37ce6] net/scm: cleanup 
scm_detach_fds

> ---
>  net/core/scm.c | 94 +++---
>  1 file changed, 51 insertions(+), 43 deletions(-)
> 
> diff --git a/net/core/scm.c b/net/core/scm.c
> index abfdc85a64c1b..168b006a52ff9 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -277,78 +277,86 @@ void put_cmsg_scm_timestamping(struct msghdr *msg, 
> struct scm_timestamping_inter
>  }
>  EXPORT_SYMBOL(put_cmsg_scm_timestamping);
>  
> +static int __scm_install_fd(struct file *file, int __user *ufd, int o_flags)
> +{
> + struct socket *sock;
> + int new_fd;
> + int error;
> +
> + error = security_file_receive(file);
> + if (error)
> + return error;
> +
> + new_fd = get_unused_fd_flags(o_flags);
> + if (new_fd < 0)
> + return new_fd;
> +
> + error = put_user(new_fd, ufd);
> + if (error) {
> + put_unused_fd(new_fd);
> + return error;
> + }
> +
> + /* Bump the usage count and install the file. */
> + sock = sock_from_file(file, &error);
> + if (sock) {
> + sock_update_netprioidx(&sock->sk->sk_cgrp_data);
> + sock_update_classid(&sock->sk->sk_cgrp_data);
> + }
> + fd_install(new_fd, get_file(file));
> + return error;
> +}
> +
> +static int scm_max_fds(struct msghdr *msg)
> +{
> + if (msg->msg_controllen <= sizeof(struct cmsghdr))
> + return 0;
> + return (msg->msg_controllen - sizeof(struct cmsghdr)) / sizeof(int);
> +}
> +
>  void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
>  {
>   struct cmsghdr __user *cm
>   = (__force struct cmsghdr __user*)msg->msg_control;
> -
> - int fdmax = 0;
> - int fdnum = scm->fp->count;
> - struct file **fp = scm->fp->fp;
> - int __user *cmfptr;
> + int o_flags = (msg->msg_flags & MSG_CMSG_CLOEXEC) ? O_CLOEXEC : 0;
> + int fdmax = min_t(int, scm_max_fds(msg), scm->fp->count);
> + int __user *cmsg_data = CMSG_USER_DATA(cm);
>   int err = 0, i;
>  
> - if (MSG_CMSG_COMPAT & msg->msg_flags) {
> + if (msg->msg_flags & MSG_CMSG_COMPAT) {
>   scm_detach_fds_compat(msg, scm);
>   return;
>   }
>  
> - if (msg->msg_controllen > sizeof(struct cmsghdr))
> - fdmax = ((msg->msg_controllen - sizeof(struct cmsghdr))
> -  / sizeof(int));
> -
> - if (fdnum < fdmax)
> - fdmax = fdnum;
> -
> - for (i=0, cmfptr =(int __user *)CMSG_USER_DATA(cm); i -  i++, cmfptr++)
> - {
> - struct socket *sock;
> - int new_fd;
> - err = security_file_receive(fp[i]);
> + for (i = 0; i < fdmax; i++) {
> + err = __scm_install_fd(scm->fp->fp[i], cmsg_data + i, o_flags);
>   if (err)
>   break;
> - err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & msg->msg_flags
> -   ? O_CLOEXEC : 0);
> - if (err < 0)
> - break;
> - new_fd = err;
> - err = put_user(n

Re: [ipv6] d5382fef70: kernel_selftests.net.fib_tests.sh.fail

2019-09-11 Thread Ido Schimmel

On Wed, Sep 11, 2019 at 05:51:21PM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):
> 
> commit: d5382fef70ce273608d6fc652c24f075de3737ef ("ipv6: Stop sending 
> in-kernel notifications for each nexthop")
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> 
> in testcase: kernel_selftests
> with following parameters:
> 
>   group: kselftests-02
> 
> test-description: The kernel contains a set of "self tests" under the 
> tools/testing/selftests/ directory. These are intended to be small unit tests 
> to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
> 
> 
> on test machine: 8 threads Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz with 16G 
> memory
> 
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 
> # selftests: net: fib_tests.sh

...

> # Tests passed: 137
> # Tests failed:  12
> not ok 13 selftests: net: fib_tests.sh

Thanks for the report, but I can't reproduce this. Used your attached
config and booted a kernel with the HEAD you mentioned:

$ uname -r
5.2.0-rc5-01201-gd5382fef70ce

$ ./fib_tests.sh
...
Tests passed: 150
Tests failed:   0

Also tested: net, net-next, master

All look fine. Plus, I'm pretty sure I ran this test before initially
submitting the patchset.

Can you reliably reproduce these failures on current kernels or is this
some random failure you noticed while testing the mentioned commit?

Thanks

Re: [PATCH] lib/Kconfig: fix OBJAGG in lib/ menu structure

2019-09-09 Thread Ido Schimmel

On Mon, Sep 09, 2019 at 02:54:21PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Keep the "Library routines" menu intact by moving OBJAGG into it.
> Otherwise OBJAGG is displayed/presented as an orphan in the
> various config menus.
> 
> Fixes: 0a020d416d0a ("lib: introduce initial implementation of object 
> aggregation manager")
> Signed-off-by: Randy Dunlap 
> Cc: Jiri Pirko 
> Cc: Ido Schimmel 
> Cc: David S. Miller 

Tested-by: Ido Schimmel 

Thanks!

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-09-08 Thread Ido Schimmel

On Tue, Sep 03, 2019 at 10:14:12AM +0200, Allan W. Nielsen wrote:
> The 09/03/2019 09:13, Ido Schimmel wrote:
> > On Mon, Sep 02, 2019 at 07:42:31PM +0200, Allan W. Nielsen wrote:
> > With these patches applied I assume I will see the following traffic
> > when running tcpdump on one of the netdevs exposed by the ocelot driver:
> > 
> > - Ingress: All
> > - Egress: Only locally generated traffic and traffic forwarded by the
> >   kernel from interfaces not belonging to the ocelot driver
> > 
> > The above means I will not see any offloaded traffic transmitted by the
> > port. Is that correct?
> Correct - but maybe we should change this.
> 
> In Ocelot and in LAN (the part we are working on now), we can come pretty
> close. We can get the offloaded TX traffic to the CPU, but it will not be
> re-written (it will look like the ingress frame, which is not always the same 
> as
> the egress frame, vlan tags an others may be re-written).

Yes, this is the same with mlxsw. You can trap the egress frames, but
they will reach the CPU unmodified via the ingress port.

> In some of our chips we can actually do this (not Ocelot, and not the LAN
> part we are working on now) after the frame as been re-written.

Cool.

> > I see that the driver is setting 'offload_fwd_mark' for any traffic trapped
> > from bridged ports, which means the bridge will drop it before it traverses
> > the packet taps on egress.
> Correct.
> 
> > Large parts of the discussion revolve around the fact that switch ports
> > are not any different than other ports. Dave wrote "Please stop
> > portraying switches as special in this regard" and Andrew wrote "[The
> > user] just wants tcpdump to work like on their desktop."
> And we are trying to get as close to this as practical possible, knowing that 
> it
> may not be exactly the same.
> 
> > But if anything, this discussion proves that switch ports are special in
> > this regard and that tcpdump will not work like on the desktop.
> I think it can come really close. Some drivers may be able to fix the TX issue
> you point out, others may not.
> 
> > Beside the fact that I don't agree (but gave up) with the new
> > interpretation of promisc mode, I wonder if we're not asking for trouble
> > with this patchset. Users will see all offloaded traffic on ingress, but
> > none of it on egress. This is in contrast to the sever/desktop, where
> > Linux is much more dominant in comparison to switches (let alone hw
> > accelerated ones) and where all the traffic is visible through tcpdump.
> > I can already see myself having to explain this over and over again to
> > confused users.
> > 
> > Now, I understand that showing egress traffic is inherently difficult.
> > It means one of two things:
> > 
> > 1. We allow packets to be forwarded by both the software and the
> > hardware
> > 2. We trap all ingressing traffic from all the ports
> If the HW cannot copy the egress traffic to the CPU (which our HW cannot), 
> then
> you need to do both. All ingress traffic needs to go to the CPU, you need to
> make all the forwarding decisions in the CPU, to figure out what traffic 
> happens
> to go to the port you want to monitor.
> 
> I really doubt this will work in real life. Too much traffic, and HW may make
> different forwarding decision that the SW (tc rules in HW but not in SW), 
> which
> means that it will not be good for debugging anyway.

I agree.

> 
> > Both options can have devastating effects on the network and therefore
> > should not be triggered by a supposedly innocent invocation of tcpdump.
> Agree.
> 
> > I again wonder if it would not be wiser to solve this by introducing two
> > new flags to tcpdump for ingress/egress (similar to -Q in/out) capturing
> > of offloaded traffic. The capturing of egress offloaded traffic can be
> > documented with the appropriate warnings.
> Not sure I agree, but I will try to spend some more time considering it.
> 
> In the mean while, what TC action was it that Jiri suggestion we should use? 
> The
> trap action is no good, and it prevents the forwarding in silicon, and I'm not
> aware of a "COPY-TO-CPU" action.

I agree. We would either need a new or just extend the existing one with
a new attribute.

> > Anyway, I don't want to hold you up, I merely want to make sure that the
> > above (assuming it's correct) is considered before the patches are
> > applied.
> Sounds good, and thanks for all the time spend on reviewing and asking the
> critical questions.

Thanks for bringing up these issues. I will be happy to review future
patches.

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-09-02 Thread Ido Schimmel

On Mon, Sep 02, 2019 at 07:42:31PM +0200, Allan W. Nielsen wrote:
> I have been reading through this thread several times and I still do not get 
> it.

Allan,

I kept thinking about this and I want to make sure that I correctly
understand the end result.

With these patches applied I assume I will see the following traffic
when running tcpdump on one of the netdevs exposed by the ocelot driver:

- Ingress: All
- Egress: Only locally generated traffic and traffic forwarded by the
  kernel from interfaces not belonging to the ocelot driver

The above means I will not see any offloaded traffic transmitted by the
port. Is that correct? I see that the driver is setting
'offload_fwd_mark' for any traffic trapped from bridged ports, which
means the bridge will drop it before it traverses the packet taps on
egress.

Large parts of the discussion revolve around the fact that switch ports
are not any different than other ports. Dave wrote "Please stop
portraying switches as special in this regard" and Andrew wrote "[The
user] just wants tcpdump to work like on their desktop."

But if anything, this discussion proves that switch ports are special in
this regard and that tcpdump will not work like on the desktop.

Beside the fact that I don't agree (but gave up) with the new
interpretation of promisc mode, I wonder if we're not asking for trouble
with this patchset. Users will see all offloaded traffic on ingress, but
none of it on egress. This is in contrast to the sever/desktop, where
Linux is much more dominant in comparison to switches (let alone hw
accelerated ones) and where all the traffic is visible through tcpdump.
I can already see myself having to explain this over and over again to
confused users.

Now, I understand that showing egress traffic is inherently difficult.
It means one of two things:

1. We allow packets to be forwarded by both the software and the
hardware
2. We trap all ingressing traffic from all the ports

Both options can have devastating effects on the network and therefore
should not be triggered by a supposedly innocent invocation of tcpdump.

I again wonder if it would not be wiser to solve this by introducing two
new flags to tcpdump for ingress/egress (similar to -Q in/out) capturing
of offloaded traffic. The capturing of egress offloaded traffic can be
documented with the appropriate warnings.

Anyway, I don't want to hold you up, I merely want to make sure that the
above (assuming it's correct) is considered before the patches are
applied.

Thanks

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-08-31 Thread Ido Schimmel

On Sat, Aug 31, 2019 at 09:35:56PM +0200, Andrew Lunn wrote:
> > Also, what happens when I'm running these application without putting
> > the interface in promisc mode? On an offloaded interface I would not be
> > able to even capture packets addressed to my interface's MAC address.
> 
> Sorry for rejoining the discussion late. I've been travelling and i'm
> now 3/4 of the way to Lisbon.

Hi Andrew,

Have fun!

> That statement i don't get. 

What about the other statements?

> If the frame has the MAC address of the interface, it has to be
> delivered to the CPU. 

So every packet that needs to be routed should be delivered to the CPU?
Definitely not.

> And so pcap will see it when running on the interface. I can pretty
> much guarantee every DSA driver does that.

I assume because you currently only consider L2 forwarding.

> But to address the bigger picture. My understanding is that we want to
> model offloading as a mechanism to accelerate what Linux can already
> do. The user should not have to care about these accelerators. The
> interface should work like a normal Linux interface. I can put an IP
> address on it and ping a peer. I can run a dhcp client and get an IP
> address from a dhcp server. I can add the interface to a bridge, and
> packets will get bridged. I as a user should not need to care if this
> is done in software, or accelerated by offloading it. I can add a
> route, and if the accelerate knows about L3, it can accelerate that as
> well. If not, the kernel will route it.

Yep, and this is how it's all working today.

> So if i run wireshark on an interface, i expect the interface will be
> put into promisc mode and i see all packets ingressing the interface.
> What the accelerator needs to do to achieve this, i as a user don't
> care.
> 
> I can follow the argument that i won't necessarily see every
> packet. But that is always true. For many embedded systems, the CPU is
> too slow to receive at line rate, even when we are talking about 1G
> links. Packets do get dropped. And i hope tcpdump users understand
> that.
> 
> For me, having tcpdump use tc trap is just wrong. It breaks the model
> that the user should not care about the accelerator. If anything, i
> think the driver needs to translate cBPF which pcap passes to the
> kernel to whatever internal format the accelerator can process. That
> is just another example of using hardware acceleration.

Look, this again boils down to what promisc mode means with regards to
hardware offload. You want it to mean punt all traffic to the CPU? Fine.
Does not seem like anyone will be switching sides anyway, so lets move
forward. But the current approach is not good. Each driver needs to have
this special case logic and the semantics of promisc mode change not
only with regards to the value of the promisc counter, but also with
regards to the interface's uppers. This is highly fragile and confusing.

The approach taken in v2 makes much more sense. Add a new flag to
accelerators and have the networking stack check it before putting the
interface in promisc mode. Then the only thing drivers need to do is to
instruct the accelerator to trap all traffic to the CPU.

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-08-30 Thread Ido Schimmel

On Thu, Aug 29, 2019 at 03:12:01PM -0700, David Miller wrote:
> From: Ido Schimmel 
> Date: Thu, 29 Aug 2019 22:36:13 +0300
> 
> > I fully agree that we should make it easy for users to capture offloaded
> > traffic, which is why I suggested patching libpcap. Add a flag to
> > capable netdevs that tells libpcap that in order to capture all the
> > traffic from this interface it needs to add a tc filter with a trap
> > action. That way zero familiarity with tc is required from users.
> 
> Why not just make setting promisc mode on the device do this rather than
> require all of this tc indirection nonsense?
> 
> That's the whole point of this conversation I thought?

As I understand it, the goal of this series is to be able to capture
offloaded traffic.

Currently, the only indication that drivers get when someone is running
tcpdump/tshark/whatever over an interface is that it's is put in promisc
mode. So this patches use it as an indication to trap all the traffic to
the CPU and special case the bridge in order to prevent the driver from
trapping packets when its ports are bridged. The same will have to be
done for OVS and in any other (current or future) cases where promisc
mode is needed to disable Rx filtering.

If there was another indication that these applications are running over
an interface, would we be bothering with this new interpretation of
promisc mode and special casing? I don't think so.

We can instead teach libpcap that in order to capture offloaded traffic
it should use this "tc indirection nonsense". Or turn on a new knob.
This avoids the need to change each driver with this new special case
logic.

Also, what happens when I'm running these application without putting
the interface in promisc mode? On an offloaded interface I would not be
able to even capture packets addressed to my interface's MAC address.
What happens when the interface is already in promisc mode (because of
the bridge) and I'm again running tcpdump with '-p'? I will not be able
to capture offloaded traffic despite the fact that the interface is
already configured in promisc mode.

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-08-29 Thread Ido Schimmel

On Thu, Aug 29, 2019 at 08:29:57PM +0200, Andrew Lunn wrote:
> > Hi Andrew,
> > 
> > What happens when you run tcpdump on a routed interface without putting
> > it in promiscuous mode ('-p')? If it is a pure software switch, then you
> > see all unicast packets addressed to your interface's MAC address. What
> > happens when the same is done on a hardware switch? With the proposed
> > solution you will not get the same result.
> > 
> > On a software switch, when you run tcpdump without '-p', do you incur
> > major packet loss? No. Will this happen when you punt several Tbps to
> > your CPU on the hardware switch? Yes.
> 
> Hi Ido
> 
> Please think about the general case, not your hardware. A DSA switch
> generally has 1G ports. And the connection to the host is generally
> 1G, maybe 2.5G. So if i put one interface into promisc mode, i will
> probably receive the majority of the traffic on that port, so long as
> there is not too much traffic from other ports towards the CPU.
> 
> I also don't expect any major packet loss in the switch. It is still
> hardware switching, but also sending a copy to the CPU. That copy will
> have the offload_fwd_mark bit set, so the bridge will discard the
> frame. The switch egress queue towards the CPU might overflow, but
> that means tcpdump does not get to see all the frames, and some
> traffic which is actually heading to the CPU is lost. But that can
> happen anyway.

The potential packet loss was only one example why using promiscuous
mode as an indication to punt all traffic to the CPU is wrong. I also
mentioned that you will not capture any traffic (besides
control/exception) when '-p' is specified.

> We should also think about the different classes of users. Somebody
> using a TOR switch with a NOS is very different to a user of a SOHO
> switch in their WiFi access point. The first probably knows tc very
> well, the second has probably never heard of it, and just wants
> tcpdump to work like on their desktop.

I fully agree that we should make it easy for users to capture offloaded
traffic, which is why I suggested patching libpcap. Add a flag to
capable netdevs that tells libpcap that in order to capture all the
traffic from this interface it needs to add a tc filter with a trap
action. That way zero familiarity with tc is required from users.

I really believe that instead of interpreting IFF_PROMISC in exotic ways
and pushing all this logic into the kernel, we should instead teach user
space utilities to capture offloaded traffic.

Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.

2019-08-29 Thread Ido Schimmel

On Thu, Aug 29, 2019 at 04:37:32PM +0200, Andrew Lunn wrote:
> > Wait, I believe there has been some misundestanding. Promisc mode is NOT
> > about getting packets to the cpu. It's about setting hw filters in a way
> > that no rx packet is dropped.
> > 
> > If you want to get packets from the hw forwarding dataplane to cpu, you
> > should not use promisc mode for that. That would be incorrect.
> 
> Hi Jiri
> 
> I'm not sure a wireshark/tcpdump/pcap user would agree with you. They
> want to see packets on an interface, so they use these tools. The fact
> that the interface is a switch interface should not matter. The
> switchdev model is that we try to hide away the interface happens to
> be on a switch, you can just use it as normal. So why should promisc
> mode not work as normal?

Hi Andrew,

What happens when you run tcpdump on a routed interface without putting
it in promiscuous mode ('-p')? If it is a pure software switch, then you
see all unicast packets addressed to your interface's MAC address. What
happens when the same is done on a hardware switch? With the proposed
solution you will not get the same result.

On a software switch, when you run tcpdump without '-p', do you incur
major packet loss? No. Will this happen when you punt several Tbps to
your CPU on the hardware switch? Yes.

Extending the definition of promiscuous mode to mean punt all traffic to
the CPU is wrong, IMO. You will not be able to capture all the packets
anyway. If you have both elephant and mice flows, then it is very likely
you will not be able to see any packets from the mice flows. The way
this kind of monitoring is usually done is by either sampling packets
(see tc-sample) or mirroring it to capable server. Both options are
available in Linux today.

> > If you want to get packets from the hw forwarding dataplane to cpu, you
> > should use tc trap action. It is there exactly for this purpose.
> 
> Do you really think a wireshark/tcpdump/pcap user should need to use
> tc trap for the special case the interface is a switch port? Doesn't that
> break the switchdev model?

I do not object to the overall goal, but I believe to implementation is
wrong. Instead, it would be much better to extend tshark/tcpdump and
with another flag that will instruct libpcap to install a rule that will
trap all traffic to the CPU. You can do that on either ingress or egress
using matchall and trap action.

If you want to do it without specifying a special flag (I think it's
very dangerous due to the potential packet loss), you can add a flag to
the interface that will indicate to libpcap that installing a tc filter
with trap action is required.

> tc trap is more about fine grained selection of packets.

Depends on the filter you associate with the action. If it's matchall,
then it's not fine grained at all :)

> Also, it seems like trapped packets are not forwarded, which is not
> what you would expect from wireshark/tcpdump/pcap.

How do you mean? Not forwarded by the HW? Right. But the trapped packets
are forwarded by the kernel. We can also add another action that means
both trap and forward. In mlxsw terminology it's called mirror.

Re: [net] devlink: Add method for time-stamp on reporter's dump

2019-08-22 Thread Ido Schimmel

On Thu, Aug 22, 2019 at 04:06:35PM +0200, Andrew Lunn wrote:
> On Thu, Aug 22, 2019 at 11:17:51AM +0300, Aya Levin wrote:
> > When setting the dump's time-stamp, use ktime_get_real in addition to
> > jiffies. This simplifies the user space implementation and bypasses
> > some inconsistent behavior with translating jiffies to current time.
> 
> Hi Aya
> 
> Is this year 2038 safe? I don't know enough about this to answer the
> question myself.

Hi Andrew,

Good point. 'struct timespec' is not considered year 2038 safe and
unfortunately I recently made the mistake of using it to communicate
timestamps to user space over netlink. :/ The code is still in net-next,
so I will fix it while I can.

Arnd, would it be acceptable to use 'struct __kernel_timespec' instead?

Thanks

Re: linux-next: Tree for Aug 19 (drivers/net/netdevsim/dev.o)

2019-08-20 Thread Ido Schimmel

On Mon, Aug 19, 2019 at 09:16:13PM -0700, Randy Dunlap wrote:
> On 8/19/19 2:18 AM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20190816:
> > 
> 
> on x86_64:
> # CONFIG_INET is not set
> 
> ld: drivers/net/netdevsim/dev.o: in function `nsim_dev_trap_report_work':
> dev.c:(.text+0x52f): undefined reference to `ip_send_check'
> 
> 
> Full randconfig file is attached.

Randy,

YueHaibing sent a v2 which is available here [1]. Successfully tested
it with your config.

[1] https://patchwork.ozlabs.org/patch/1150183/

Re: [PATCH v2 net-next] netdevsim: Fix build error without CONFIG_INET

2019-08-20 Thread Ido Schimmel

On Tue, Aug 20, 2019 at 10:14:46PM +0800, YueHaibing wrote:
> If CONFIG_INET is not set, building fails:
> 
> drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
> dev.c:(.text+0x67b): undefined reference to `ip_send_check'
> 
> Use ip_fast_csum instead of ip_send_check to avoid
> dependencies on CONFIG_INET.
> 
> Reported-by: Hulk Robot 
> Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
> Signed-off-by: YueHaibing 

Reviewed-by: Ido Schimmel 
Tested-by: Ido Schimmel 

Thanks for fixing this in my stead!

Re: linux-next: Tree for Aug 19 (drivers/net/netdevsim/dev.o)

2019-08-20 Thread Ido Schimmel

On Mon, Aug 19, 2019 at 09:16:13PM -0700, Randy Dunlap wrote:
> On 8/19/19 2:18 AM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20190816:
> > 
> 
> on x86_64:
> # CONFIG_INET is not set
> 
> ld: drivers/net/netdevsim/dev.o: in function `nsim_dev_trap_report_work':
> dev.c:(.text+0x52f): undefined reference to `ip_send_check'

Thanks, Randy.

There is a fix here [1], but some changes were requested. I will send a
fix later today and Cc you.

[1] https://patchwork.ozlabs.org/patch/1149229/

Re: [PATCH net-next] netdevsim: Fix build error without CONFIG_INET

2019-08-20 Thread Ido Schimmel

On Mon, Aug 19, 2019 at 02:59:00PM -0700, Jakub Kicinski wrote:
> On Mon, 19 Aug 2019 20:08:25 +0800, YueHaibing wrote:
> > If CONFIG_INET is not set, building fails:
> > 
> > drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
> > dev.c:(.text+0x67b): undefined reference to `ip_send_check'
> > 
> > Add CONFIG_INET Kconfig dependency to fix this.
> > 
> > Reported-by: Hulk Robot 
> > Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
> > Signed-off-by: YueHaibing 
> 
> Hmm.. I'd rather the test module did not have hard dependencies on
> marginally important config options. We have done a pretty good job
> so far limiting the requirements though separating the code out at
> compilation object level. The more tests depend on netdevsim and the
> more bots we have running tests against randconfig - the more important
> this is.
> 
> This missing reference here is for calculating a checksum over a
> constant header.. could we perhaps just hard code the checksum?

Sure. I was AFK today, will send a patch later today when I get home.

Thanks

Re: [PATCH net-next] netdevsim: Fix build error without CONFIG_INET

2019-08-19 Thread Ido Schimmel

On Mon, Aug 19, 2019 at 08:08:25PM +0800, YueHaibing wrote:
> If CONFIG_INET is not set, building fails:
> 
> drivers/net/netdevsim/dev.o: In function `nsim_dev_trap_report_work':
> dev.c:(.text+0x67b): undefined reference to `ip_send_check'
> 
> Add CONFIG_INET Kconfig dependency to fix this.
> 
> Reported-by: Hulk Robot 
> Fixes: da58f90f11f5 ("netdevsim: Add devlink-trap support")
> Signed-off-by: YueHaibing 

Reviewed-by: Ido Schimmel 

Thanks for the patch.

> ---
>  drivers/net/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 48e209e..7bb786e 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -505,7 +505,7 @@ source "drivers/net/hyperv/Kconfig"
>  
>  config NETDEVSIM
>   tristate "Simulated networking device"
> - depends on DEBUG_FS
> + depends on INET && DEBUG_FS
>   select NET_DEVLINK
>   help
> This driver is a developer testing tool and software model that can
> -- 
> 2.7.4
> 
>

Re: [PATCH net-next] selftests: mlxsw: Fix typo in qos_mc_aware.sh

2019-07-24 Thread Ido Schimmel

On Thu, Jul 25, 2019 at 12:29:51AM +0900, Masanari Iida wrote:
> This patch fix some spelling typo in qos_mc_aware.sh
> 
> Signed-off-by: Masanari Iida 
> Acked-by: Randy Dunlap 

Reviewed-by: Ido Schimmel

Re: [PATCH] selftests: mlxsw: Fix typo in qos_mc_aware.sh

2019-07-24 Thread Ido Schimmel

On Wed, Jul 24, 2019 at 11:15:54PM +0900, Masanari Iida wrote:
> This patch fixes some spelling typo in qos_mc_aware.sh
> 
> Signed-off-by: Masanari Iida 

Thanks for the patch, but it should go through the networking tree.

Please resubmit to net...@vger.kernel.org and make sure subject prefix
is "PATCH net-next". Please read this FAQ before you submit:

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

Thanks!

Re: general protection fault in call_fib6_multipath_entry_notifiers

2019-06-19 Thread Ido Schimmel

On Wed, Jun 19, 2019 at 08:47:07AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:39f58860 net/mlx5: add missing void argument to function m..
> git tree:   net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=115eb99ea0
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4937094fc643655f
> dashboard link: https://syzkaller.appspot.com/bug?extid=382566d339d52cd1a204
> compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=120c9e11a0
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=120c3d21a0
> 
> The bug was bisected to:
> 
> commit ebee3cad835f7fe7250213225cf6d62c7cf3b2ca
> Author: Ido Schimmel 
> Date:   Tue Jun 18 15:12:48 2019 +
> 
> ipv6: Add IPv6 multipath notifications for add / replace
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1529970aa0
> final crash:https://syzkaller.appspot.com/x/report.txt?x=1729970aa0
> console output: https://syzkaller.appspot.com/x/log.txt?x=1329970aa0
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+382566d339d52cd1a...@syzkaller.appspotmail.com
> Fixes: ebee3cad835f ("ipv6: Add IPv6 multipath notifications for add /
> replace")
> 
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9190 Comm: syz-executor149 Not tainted 5.2.0-rc5+ #38
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:call_fib6_multipath_entry_notifiers+0xd1/0x1a0

Did not consider the case where RTA_MULTIPATH does not encode valid
nexthop information in which case 'rt_notif' can be NULL. Will send a
fix after I run some tests.

> net/ipv6/ip6_fib.c:396
> Code: 8b b5 30 ff ff ff 48 c7 85 68 ff ff ff 00 00 00 00 48 c7 85 70 ff ff
> ff 00 00 00 00 89 45 88 4c 89 e0 48 c1 e8 03 4c 89 65 80 <42> 80 3c 28 00 0f
> 85 9a 00 00 00 48 b8 00 00 00 00 00 fc ff df 4d
> RSP: 0018:88809788f2c0 EFLAGS: 00010246
> RAX:  RBX: 111012f11e59 RCX: 
> RDX:  RSI:  RDI: 
> RBP: 88809788f390 R08: 88809788f8c0 R09: 000c
> R10: 88809788f5d8 R11: 88809788f527 R12: 
> R13: dc00 R14: 88809788f8c0 R15: 89541d80
> FS:  5632c880() GS:8880ae80() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 2080 CR3: 9ba7c000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  ip6_route_multipath_add+0xc55/0x1490 net/ipv6/route.c:5094
>  inet6_rtm_newroute+0xed/0x180 net/ipv6/route.c:5208
>  rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5219
>  netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
>  rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5237
>  netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
>  netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328
>  netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917
>  sock_sendmsg_nosec net/socket.c:646 [inline]
>  sock_sendmsg+0xd7/0x130 net/socket.c:665
>  ___sys_sendmsg+0x803/0x920 net/socket.c:2286
>  __sys_sendmsg+0x105/0x1d0 net/socket.c:2324
>  __do_sys_sendmsg net/socket.c:2333 [inline]
>  __se_sys_sendmsg net/socket.c:2331 [inline]
>  __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2331
>  do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x4401f9
> Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7ffc09fd0028 EFLAGS: 0246 ORIG_RAX: 002e
> RAX: ffda RBX: 004002c8 RCX: 004401f9
> RDX:  RSI: 2080 RDI: 0003
> RBP: 006ca018 R08:  R09: 004002c8
> R10:  R11: 0246 R12: 00401a80
> R13: 00401b10 R14:  R15: 
> Modules linked in:
> ---[ end trace 77949df4cfac115c ]---
> RIP: 0010:call_fib6_multipath_entry_notifiers+0xd1/0x1a0
> net/ipv6/ip6_fib.c:396
> Code: 8b b5 30 ff ff ff 48 c7 85 68 ff ff ff 00 00 00 00 48 c7 85 70 ff ff
> ff 00 00 00 00 89 45 88 4c 89 e0 48 c1 e8 03 4c 89 65 80 <42>

Re: [PATCH] mlxsw: spectrum_ptp: fix 32-bit build

2019-06-19 Thread Ido Schimmel

On Wed, Jun 19, 2019 at 03:31:20PM +0200, Arnd Bergmann wrote:
> On 32-bit architectures, we cannot easily device 64-bit numbers:
> 
> ERROR: "__aeabi_uldivmod" 
> [drivers/net/ethernet/mellanox/mlxsw/mlxsw_spectrum.ko] undefined!
> 
> Use do_div() to annotate the fact that we know this is an
> expensive operation.
> 
> Fixes: 992aa864dca0 ("mlxsw: spectrum_ptp: Add implementation for physical 
> hardware clock operations")
> Signed-off-by: Arnd Bergmann 

Arnd, thanks for the patch. We already patched this issue yesterday:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=cd4bb2a3344cb53d9234cca232edfb2dce0f0a35

Re: [PATCH 2/2] PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface

2019-05-09 Thread Ido Schimmel

On Thu, May 09, 2019 at 07:18:28PM +0200, Rafael J. Wysocki wrote:
> So does the patch below fix it for you?

Yes. Thanks for the fix. Feel free to add my tag:

Tested-by: Ido Schimmel

Re: [PATCH 2/2] PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface

2019-05-09 Thread Ido Schimmel

On Thu, Mar 21, 2019 at 11:20:17PM +0100, Rafael J. Wysocki wrote:
> +static struct attribute *intel_epb_attrs[] = {
> + &dev_attr_energy_perf_bias.attr,
> + NULL
> +};
> +
> +static const struct attribute_group intel_epb_attr_group = {
> + .name = power_group_name,
> + .attrs =  intel_epb_attrs
> +};
> +
>  static int intel_epb_online(unsigned int cpu)
>  {
> + struct device *cpu_dev = get_cpu_device(cpu);
> +
>   intel_epb_restore();
> + if (!cpuhp_tasks_frozen)
> + sysfs_merge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
>   return 0;
>  }
>  
>  static int intel_epb_offline(unsigned int cpu)
>  {
> - return intel_epb_save();
> + struct device *cpu_dev = get_cpu_device(cpu);
> +
> + if (!cpuhp_tasks_frozen)
> + sysfs_unmerge_group(&cpu_dev->kobj, &intel_epb_attr_group);
> +
> + intel_epb_save();
> + return 0;
>  }

Hi,

I just booted net-next and got the following NULL pointer dereference
[1] during boot. I believe it is caused by this patch.

CONFIG_PM is disabled in my config which means 'power_group_name' is
defined as NULL. When I enable CONFIG_PM the issue is not reproduced.

Thanks

[1]
[1.230241] BUG: kernel NULL pointer dereference, address: 
[1.231043] #PF: supervisor read access in kernel mode
[1.231043] #PF: error_code(0x) - not-present page
[1.231043] PGD 0 P4D 0
[1.231043] Oops:  [#1] SMP
[1.231043] CPU: 0 PID: 12 Comm: cpuhp/0 Not tainted 
5.1.0-custom-07273-g80f232121b69 #1392
[1.231043] Hardware name: Mellanox Technologies Ltd. 
MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[1.231043] RIP: 0010:strlen+0x0/0x20
[1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 
14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 
74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[1.231043] RSP: :b587c0cd3dc8 EFLAGS: 00010246
[1.231043] RAX:  RBX:  RCX: 0100
[1.231043] RDX:  RSI:  RDI: 
[1.231043] RBP:  R08: 8e6137a160c8 R09: 
[1.231043] R10:  R11: 8e613652ec80 R12: 
[1.231043] R13:  R14: 8e6137a160c8 R15: b4690120
[1.231043] FS:  () GS:8e6137a0() 
knlGS:
[1.231043] CS:  0010 DS:  ES:  CR0: 80050033
[1.231043] CR2:  CR3: 000200409000 CR4: 001006f0
[1.231043] Call Trace:
[1.231043]  kernfs_name_hash+0xd/0x80
[1.231043]  kernfs_find_ns+0x30/0xc0
[1.231043]  kernfs_find_and_get_ns+0x27/0x50
[1.231043]  sysfs_merge_group+0x2e/0x100
[1.231043]  ? __switch_to_asm+0x40/0x70
[1.231043]  intel_epb_online+0x2a/0x30
[1.231043]  cpuhp_invoke_callback+0x8f/0x550
[1.231043]  ? sort_range+0x20/0x20
[1.231043]  cpuhp_thread_fun+0x9b/0x100
[1.231043]  smpboot_thread_fn+0xc0/0x160
[1.231043]  kthread+0x10d/0x130
[1.231043]  ? __kthread_create_on_node+0x180/0x180
[1.231043]  ret_from_fork+0x35/0x40
[1.231043] CR2: 
[1.231043] ---[ end trace c8ea60276791261c ]---
[1.231043] RIP: 0010:strlen+0x0/0x20
[1.231043] Code: b5 20 75 eb c6 42 01 00 0f b6 10 f6 82 40 bf 4d b5 20 74 
14 48 c7 c1 40 bf 4d b5 48 83 c0 01 0f b6 10 f6 04 11 20 75 f3 c3 90 <80> 3f 00 
74 10 48 89 f8
48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
[1.231043] RSP: :b587c0cd3dc8 EFLAGS: 00010246
[1.231043] RAX:  RBX:  RCX: 0100
[1.231043] RDX:  RSI:  RDI: 
[1.231043] RBP:  R08: 8e6137a160c8 R09: 
[1.231043] R10:  R11: 8e613652ec80 R12: 
[1.231043] R13:  R14: 8e6137a160c8 R15: b4690120
[1.231043] FS:  () GS:8e6137a0() 
knlGS:
[1.231043] CS:  0010 DS:  ES:  CR0: 80050033
[1.231043] CR2:  CR3: 000200409000 CR4: 001006f0

Re: [PATCH] bitops.h: sanitize rotate primitives

2019-04-11 Thread Ido Schimmel

On Wed, Apr 10, 2019 at 11:19:06PM +0200, Rasmus Villemoes wrote:
> The ror32 implementation (word >> shift) | (word << (32 - shift) has
> undefined behaviour if shift is outside the [1, 31] range. Similarly
> for the 64 bit variants. Most callers pass a compile-time
> constant (naturally in that range), but there's an UBSAN report that
> these may actually be called with a shift count of 0.
> 
> Instead of special-casing that, we can make them DTRT for all values
> of shift while also avoiding UB. For some reason, this was already
> partly done for rol32 (which was well-defined for [0, 31]). gcc 8
> recognizes these patterns as rotates, so for example
> 
> __u32 rol32(__u32 word, unsigned int shift)
> {
>   return (word << (shift & 31)) | (word >> ((-shift) & 31));
> }
> 
> compiles to
> 
> 0020 :
>   20:   89 f8   mov%edi,%eax
>   22:   89 f1   mov%esi,%ecx
>   24:   d3 c0   rol%cl,%eax
>   26:   c3  retq
> 
> Older compilers unfortunately do not do as well, but this only affects
> the small minority of users that don't pass constants.
> 
> Due to integer promotions, ro[lr]8 were already well-defined for
> shifts in [0, 8], and ro[lr]16 were mostly well-defined for shifts in
> [0, 16] (only mostly - u16 gets promoted to _signed_ int, so if bit 15
> is set, word << 16 is undefined). For consistency, update those as
> well.
> 
> Reported-by: Ido Schimmel 
> Cc: Vadim Pasternak 
> Signed-off-by: Rasmus Villemoes 

Tested-by: Ido Schimmel 

Thanks!

Re: [PATCH net-next v3 0/8] net: Remove switchdev_ops

2019-02-27 Thread Ido Schimmel

On Wed, Feb 27, 2019 at 11:44:24AM -0800, Florian Fainelli wrote:
> Hi all,
> 
> This patch series completes the removal of the switchdev_ops by
> converting switchdev_port_attr_set() to use either the blocking
> (process) or non-blocking (atomic) notifier since we typically need to
> deal with both depending on where in the bridge code we get called from.
> 
> This was tested with the forwarding selftests and DSA hardware.

I ran some basic tests and nothing exploded :) Thanks, Florian!

Re: [PATCH net-next v3 8/8] net: Remove switchdev_ops

2019-02-27 Thread Ido Schimmel

On Wed, Feb 27, 2019 at 11:44:32AM -0800, Florian Fainelli wrote:
> Now that we have converted all possible callers to using a switchdev
> notifier for attributes we do not have a need for implementing
> switchdev_ops anymore, and this can be removed from all drivers the
> net_device structure.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Ido Schimmel

Re: [PATCH net-next v3 7/8] net: switchdev: Replace port attr set SDO with a notification

2019-02-27 Thread Ido Schimmel

On Wed, Feb 27, 2019 at 11:44:31AM -0800, Florian Fainelli wrote:
> Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field
> from all clients, which were migrated to use switchdev notification in
> the previous patches.
> 
> Add a new function switchdev_port_attr_notify() that sends the switchdev
> notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process)
> notifier chain.
> 
> We have one odd case within net/bridge/br_switchdev.c with the
> SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that
> requires executing from atomic context, we deal with that one
> specifically.
> 
> Drop __switchdev_port_attr_set() and update switchdev_port_attr_set()
> likewise.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Ido Schimmel 

One small nit that you can address in a follow-up:

> @@ -67,12 +67,18 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
>   .id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
>   .u.brport_flags = mask,
>   };
> + struct switchdev_notifier_port_attr_info info = {
> + .attr = &attr,
> + };
>   int err;
>  
>   if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
>   return 0;
>  
> - err = switchdev_port_attr_set(p->dev, &attr);
> + /* We run from atomic context here */
> + err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
> +&info.info, NULL);
> + err = notifier_to_errno(err);
>   if (err == -EOPNOTSUPP)
>   return 0;

This check can be removed. The code below checks `err` and fails the
operation in case of error.

Re: [PATCH net-next v2 8/8] net: Remove switchdev_ops

2019-02-27 Thread Ido Schimmel

On Tue, Feb 26, 2019 at 05:14:27PM -0800, Florian Fainelli wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> index b00f6f74f91a..995426ea9a43 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
> @@ -3660,7 +3660,6 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
> *mlxsw_sp, u8 local_port,
>   }
>   mlxsw_sp_port->default_vlan = mlxsw_sp_port_vlan;
>  
> - mlxsw_sp_port_switchdev_init(mlxsw_sp_port);
>   mlxsw_sp->ports[local_port] = mlxsw_sp_port;
>   err = register_netdev(dev);
>   if (err) {
> @@ -3677,7 +3676,6 @@ static int mlxsw_sp_port_create(struct mlxsw_sp 
> *mlxsw_sp, u8 local_port,
>  
>  err_register_netdev:
>   mlxsw_sp->ports[local_port] = NULL;
> - mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
>   mlxsw_sp_port_vlan_destroy(mlxsw_sp_port_vlan);
>  err_port_vlan_create:
>  err_port_pvid_set:
> @@ -3720,7 +3718,6 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp 
> *mlxsw_sp, u8 local_port)
>   mlxsw_core_port_clear(mlxsw_sp->core, local_port, mlxsw_sp);
>   unregister_netdev(mlxsw_sp_port->dev); /* This calls ndo_stop */
>   mlxsw_sp->ports[local_port] = NULL;
> - mlxsw_sp_port_switchdev_fini(mlxsw_sp_port);
>   mlxsw_sp_port_vlan_flush(mlxsw_sp_port, true);
>   mlxsw_sp_port_nve_fini(mlxsw_sp_port);
>   mlxsw_sp_tc_qdisc_fini(mlxsw_sp_port);
> @@ -4441,12 +4438,6 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
>   goto err_span_init;
>   }
>  
> - err = mlxsw_sp_switchdev_init(mlxsw_sp);

I missed that and got a trace as soon as I tried to enslave a port. You
should only remove mlxsw_sp_port_switchdev_init() and not
mlxsw_sp_switchdev_init()

> - if (err) {
> - dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize 
> switchdev\n");
> - goto err_switchdev_init;
> - }
> -
>   err = mlxsw_sp_counter_pool_init(mlxsw_sp);
>   if (err) {
>   dev_err(mlxsw_sp->bus_info->dev, "Failed to init counter 
> pool\n");
> @@ -4517,8 +4508,6 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
>  err_afa_init:
>   mlxsw_sp_counter_pool_fini(mlxsw_sp);
>  err_counter_pool_init:
> - mlxsw_sp_switchdev_fini(mlxsw_sp);
> -err_switchdev_init:
>   mlxsw_sp_span_fini(mlxsw_sp);
>  err_span_init:
>   mlxsw_sp_lag_fini(mlxsw_sp);
> @@ -4585,7 +4574,6 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
>   mlxsw_sp_nve_fini(mlxsw_sp);
>   mlxsw_sp_afa_fini(mlxsw_sp);
>   mlxsw_sp_counter_pool_fini(mlxsw_sp);
> - mlxsw_sp_switchdev_fini(mlxsw_sp);
>   mlxsw_sp_span_fini(mlxsw_sp);
>   mlxsw_sp_lag_fini(mlxsw_sp);
>   mlxsw_sp_buffers_fini(mlxsw_sp);
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> index a61c1130d9e3..da6278b0caa4 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
> @@ -407,8 +407,6 @@ extern const struct mlxsw_sp_sb_vals mlxsw_sp2_sb_vals;
>  /* spectrum_switchdev.c */
>  int mlxsw_sp_switchdev_init(struct mlxsw_sp *mlxsw_sp);
>  void mlxsw_sp_switchdev_fini(struct mlxsw_sp *mlxsw_sp);
> -void mlxsw_sp_port_switchdev_init(struct mlxsw_sp_port *mlxsw_sp_port);
> -void mlxsw_sp_port_switchdev_fini(struct mlxsw_sp_port *mlxsw_sp_port);
>  int mlxsw_sp_rif_fdb_op(struct mlxsw_sp *mlxsw_sp, const char *mac, u16 fid,
>   bool adding);
>  void
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c 
> b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> index c1aedfea3a31..f6ce386c3036 100644
> --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
> @@ -1938,10 +1938,6 @@ static struct mlxsw_sp_port 
> *mlxsw_sp_lag_rep_port(struct mlxsw_sp *mlxsw_sp,
>   return NULL;
>  }
>  
> -static const struct switchdev_ops mlxsw_sp_port_switchdev_ops = {
> - .switchdev_port_attr_set= mlxsw_sp_port_attr_set,
> -};
> -
>  static int
>  mlxsw_sp_bridge_8021q_port_join(struct mlxsw_sp_bridge_device *bridge_device,
>   struct mlxsw_sp_bridge_port *bridge_port,
> @@ -3545,11 +3541,3 @@ void mlxsw_sp_switchdev_fini(struct mlxsw_sp *mlxsw_sp)
>   kfree(mlxsw_sp->bridge);
>  }
>  
> -void mlxsw_sp_port_switchdev_init(struct mlxsw_sp_port *mlxsw_sp_port)
> -{
> - mlxsw_sp_port->dev->switchdev_ops = &mlxsw_sp_port_switchdev_ops;
> -}
> -
> -void mlxsw_sp_port_switchdev_fini(struct mlxsw_sp_port *mlxsw_sp_port)
> -{
> -}

Re: [PATCH net-next 7/8] net: switchdev: Replace port attr set SDO with a notification

2019-02-27 Thread Ido Schimmel

On Mon, Feb 25, 2019 at 11:47:12AM -0800, Florian Fainelli wrote:
> On 2/25/19 1:49 AM, Ido Schimmel wrote:
> > On Sun, Feb 24, 2019 at 08:47:27AM -0800, Florian Fainelli wrote:
> >> Le 2/23/19 à 2:32 AM, Ido Schimmel a écrit :
> >>> On Fri, Feb 22, 2019 at 03:59:25PM -0800, Florian Fainelli wrote:
> >>>> -if (attr->flags & SWITCHDEV_F_NO_RECURSE)
> >>>> +if (attr & SWITCHDEV_F_DEFER)
> >>>> +rc = call_switchdev_blocking_notifiers(nt, dev,
> >>>> +   &attr_info.info, 
> >>>> NULL);
> >>>> +else
> >>>> +rc = call_switchdev_notifiers(nt, dev, &attr_info.info, 
> >>>> NULL);
> >>>
> >>> I don't believe this is needed. You're calling this function from
> >>> switchdev_port_attr_set_now() which is always called from process
> >>> context. switchdev_port_attr_set() takes care of that. Similar to
> >>> switchdev_port_obj_add().
> >>
> >> Except for net/bridge/br_switchdev.c when we check the bridge port's
> >> flags support with PRE_BRIDGE_FLAGS. In that case we are executing from
> >> the caller (atomic) context and we can't defer otherwise that trumps the
> >> whole idea of being able to do a quick check and return that to the
> >> caller that we cannot support specific flags. How would you recommend
> >> approaching that?
> > 
> > In this case you can invoke call_switchdev_notifiers() directly from
> > br_switchdev_set_port_flag(). Eventually switchdev_port_attr_set() will
> > be gone and bridge code will invoke the notifiers directly.
> 
> That can be done, but it still requires the target driver (mlxsw,
> ocelot, dsa, etc.) to support attribute notification from blocking and
> non-blocking context. Are you fine with that?

Yes. Sorry for the latency. I was away yesterday. Reviewed your v2 and
only found one problem. Will run some tests now.

Thanks!

Re: [PATCH net-next v2 8/8] net: Remove switchdev_ops

2019-02-27 Thread Ido Schimmel

On Tue, Feb 26, 2019 at 05:14:27PM -0800, Florian Fainelli wrote:
> Now that we have converted all possible callers to using a switchdev
> notifier for attributes we do not have a need for implementing
> switchdev_ops anymore, and this can be removed from all drivers the
> net_device structure.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Ido Schimmel

Re: [PATCH net-next v2 4/8] mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_ATTR_SET

2019-02-27 Thread Ido Schimmel

On Tue, Feb 26, 2019 at 05:14:23PM -0800, Florian Fainelli wrote:
> Following patches will change the way we communicate setting a port's
> attribute and use a notifier to perform those tasks.
> 
> Prepare mlxsw to support receiving notifier events targeting
> SWITCHDEV_PORT_ATTR_SET and utilize the switchdev_handle_port_attr_set()
> to handle stacking of devices.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Ido Schimmel

Re: [PATCH net-next v2 1/8] switchdev: Add SWITCHDEV_PORT_ATTR_SET

2019-02-27 Thread Ido Schimmel

On Tue, Feb 26, 2019 at 05:14:20PM -0800, Florian Fainelli wrote:
> In preparation for allowing switchdev enabled drivers to veto specific
> attribute settings from within the context of the caller, introduce a
> new switchdev notifier type for port attributes.
> 
> Suggested-by: Ido Schimmel 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Ido Schimmel

Re: [PATCH net-next v2 7/8] net: switchdev: Replace port attr set SDO with a notification

2019-02-27 Thread Ido Schimmel

On Tue, Feb 26, 2019 at 05:14:26PM -0800, Florian Fainelli wrote:
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index af57c4a2b78a..b7988d49d708 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -67,12 +67,17 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
>   .id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
>   .u.brport_flags = mask,
>   };
> + struct switchdev_notifier_port_attr_info info = {
> + .attr = &attr,
> + };
>   int err;
>  
>   if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
>   return 0;
>  
> - err = switchdev_port_attr_set(p->dev, &attr);
> + /* We run from atomic context here */
> + err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
> +&info.info, NULL);
>   if (err == -EOPNOTSUPP)

Florian, this needs to use notifier_to_errno() and check for any error.
With the ops, `-EOPNOTSUPP` was returned for devices that did not
implement `switchdev_ops`. Now they will simply not listen / reply to
the notification.

>   return 0;

Re: [PATCH net-next] mlxsw: spectrum: acl: Use struct_size() in kzalloc()

2019-02-25 Thread Ido Schimmel

On Mon, Feb 25, 2019 at 01:01:32PM -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
> 
> struct foo {
> int stuff;
> struct boo entry[];
> };
> 
> size = sizeof(struct foo) + count * sizeof(struct boo);
> instance = kzalloc(size, GFP_KERNEL)
> 
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
> 
> instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)
> 
> Notice that, in this case, variable alloc_size is not necessary, hence
> it is removed.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Ido Schimmel 

Thanks

1 2 >

1 - 100 of 166 matches

Mail list logo