Hi Jeff, upstream commit 50b2412b7e78 net/mlx5: Avoid possible free of command entry while timeout comp handler was picked to Ubuntu-5.4.0-56.62 kernel (hash bcd6e98bef76cc8a49a1b736b0fefffbffb75c30) (v5.4.71 upstream stable release, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902110 )
now a new issue arise reloading mlx5 modules causes an error message in kernel buffer "cmd_work_handler:887:(pid 292): failed to allocate command entry" reproduction: # modprobe -r mlx5_ib mlx5_core # modprobe mlx5_core mlx5_ib # dmesg [ 142.638490] mlx5_core 0000:08:00.1: E-Switch: cleanup [ 143.734339] mlx5_core 0000:08:00.0: E-Switch: cleanup [ 164.171511] mlx5_core: unknown parameter 'mlx5_ib' ignored [ 164.173501] mlx5_core 0000:08:00.0: firmware version: 16.28.1002 [ 164.173576] mlx5_core 0000:08:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) [ 164.457342] mlx5_core 0000:08:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 164.457365] mlx5_core 0000:08:00.0: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384) [ 164.484659] port_module: 5 callbacks suppressed [ 164.484665] mlx5_core 0000:08:00.0: Port module event: module 0, Cable plugged [ 164.485112] mlx5_core 0000:08:00.0: mlx5_pcie_event:294:(pid 8): PCIe slot advertised sufficient power (75W). [ 164.494771] mlx5_core 0000:08:00.1: firmware version: 16.28.1002 [ 164.494844] mlx5_core 0000:08:00.1: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) [ 164.779534] mlx5_core 0000:08:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 164.779552] mlx5_core 0000:08:00.1: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384) [ 164.808886] mlx5_core 0000:08:00.1: Port module event: module 1, Cable plugged [ 164.809228] mlx5_core 0000:08:00.1: mlx5_pcie_event:294:(pid 292): PCIe slot advertised sufficient power (75W). [ 164.840667] mlx5_core 0000:08:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 165.081342] mlx5_core 0000:08:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 165.282793] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0 [ 165.438226] mlx5_core 0000:08:00.0: cmd_work_handler:887:(pid 292): failed to allocate command entry [ 165.442506] infiniband rocep8s0f0: reg_mr_callback:104:(pid 292): async reg mr failed. status -11 # the following fixes this issue 410bd754cd73 net/mlx5: Add retry mechanism to the command entry index allocation (upstream 5.9) 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout (upstream 5.9) d43b7007dbd1 net/mlx5: Fix a race when moving command interface to events mode (upstream 5.7-rc7) 3ed879965cc4 net/mlx5: net/mlx5: Use async EQ setup cleanup helpers for multiple EQs (upstream 5.6-rc1) those are on master-next branch off focal tree also synced from linux stable. (v5.4.79 upstream stable release https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907151 ) # git log --oneline Ubuntu-5.4.0-59.65..master-next .... 400ec5bb2816 net/mlx5: Add retry mechanism to the command entry index allocation 2bd608898edd net/mlx5: Fix a race when moving command interface to events mode bec07c488db0 net/mlx5: poll cmd EQ in case of command timeout 0c9bfdf598e1 net/mlx5: Use async EQ setup cleanup helpers for multiple EQs ..... I compiled master-next, booted the system with it and the issue is resolved. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1905574 Title: Ubuntu 20.10 four needed fixes to 'Add driver for Mellanox Connect-IB adapters' Status in linux package in Ubuntu: Invalid Status in linux source package in Focal: Won't Fix Bug description: [Impact] commit d43b7007dbd1 net/mlx5: Fix a race when moving command interface to events mode from upstream v5.7-rc1 (and in groovy) fixes e126ba97dba9 mlx5: Add driver for Mellanox Connect-IB adapters this fix should come with four more patches from v5.9. 410bd754cd73 net/mlx5: Add retry mechanism to the command entry index allocation 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout 50b2412b7e78 net/mlx5: Avoid possible free of command entry while timeout comp handler 432161ea26d6 net/mlx5: Fix a race when moving command interface to polling mode all four patches are applied cleanly on groovy tree and we ask to pull them into groovy. please also see this discussion https://www.spinics.net/lists/stable/msg428620.html Thank's To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1905574/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp