Public bug reported:

Summary:
Machine hangs when loading OFED 2310 mlx5 driver at BlueField

How to reproduce:
# load the OFED driver

Reason:
BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 
[mlx5_core]

dmesg from minicom:
[  726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds.
[  726.576895]       Tainted: G           OE     5.15.0-1029-bluefield 
#31-Ubuntu
[  726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  726.591913] task:systemd-udevd   state:D stack:    0 pid:  297 ppid:   280 
flags:0x0000000d
[  726.600248] Call trace:
[  726.602680]  __switch_to+0xf8/0x150
[  726.606159]  __schedule+0x2b8/0x790
[  726.609634]  schedule+0x64/0x140
[  726.612850]  schedule_preempt_disabled+0x18/0x24
[  726.617453]  __mutex_lock.constprop.0+0x1a0/0x680
[  726.622141]  __mutex_lock_slowpath+0x40/0x90
[  726.626396]  mutex_lock+0x64/0x70
[  726.629695]  devlink_resource_register+0x50/0x1a0
[  726.634386]  mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
[  726.639882]  mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core]
[  726.645791]  probe_one+0x300/0x5f0 [mlx5_core]
[  726.650307]  local_pci_probe+0x48/0xb4
[  726.654043]  pci_device_probe+0x18c/0x200
[  726.658039]  really_probe+0xd0/0x490
[  726.661600]  __driver_probe_device+0x148/0x190
[  726.666029]  driver_probe_device+0x48/0x180
[  726.670198]  __driver_attach+0x104/0x240
[  726.674106]  bus_for_each_dev+0x78/0xdc
[  726.677927]  driver_attach+0x2c/0x40
[  726.681486]  bus_add_driver+0x154/0x270
[  726.685307]  driver_register+0x80/0x13c
[  726.689129]  __pci_register_driver+0x4c/0x60
[  726.693386]  __init_backport+0xf0/0x1000 [mlx5_core]
[  726.698425]  do_one_initcall+0x4c/0x250
[  726.702248]  do_init_module+0x50/0x260
[  726.705983]  load_module+0x9fc/0xbe0
[  726.709543]  __do_sys_finit_module+0xa8/0x114
[  726.713885]  __arm64_sys_finit_module+0x28/0x3c
[  726.718401]  invoke_syscall+0x78/0x100
[  726.722137]  el0_svc_common.constprop.0+0x54/0x184
[  726.726913]  do_el0_svc+0x30/0xac
[  726.730215]  el0_svc+0x48/0x160
[  726.733341]  el0t_64_sync_handler+0xa4/0x130
[  726.737597]  el0t_64_sync+0x1a4/0x1a8
[  847.401924] INFO: task systemd-udevd:297 blocked for more than 724 seconds.
[  847.408891]       Tainted: G           OE     5.15.0-1029-bluefield 
#31-Ubuntu

How to fix:
This is related to
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869
and we need to backport/cherry-pick more patches from the series

Patches are below
Backport: f655dacb59ac net: devlink: remove unused locked functions
Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API 
during init/fini
Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of 
devlink_region_create/destroy() functions
SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during 
init/fini
Backport: 70a2ff89369d net: devlink: add unlocked variants of devlink_dpipe*() 
functions
Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of devlink_sb*() 
functions
Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of 
devlink_resource*() functions
Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of 
devling_trap*() functions
Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported 
by lock

Thanks!

** Affects: linux-bluefield (Ubuntu)
     Importance: Undecided
         Status: New

** Summary changed:

- Devlink backport: Fix mlx5 driver hangs
+ Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2042455

Title:
  Devlink backport: Fix mlx5 driver hangs due to mlx5_sf_hw_table_init

Status in linux-bluefield package in Ubuntu:
  New

Bug description:
  Summary:
  Machine hangs when loading OFED 2310 mlx5 driver at BlueField

  How to reproduce:
  # load the OFED driver

  Reason:
  BF got stuck and observed call trace "mlx5_sf_hw_table_init+0xf4/0x2d0 
[mlx5_core]

  dmesg from minicom:
  [  726.569928] INFO: task systemd-udevd:297 blocked for more than 604 seconds.
  [  726.576895]       Tainted: G           OE     5.15.0-1029-bluefield 
#31-Ubuntu
  [  726.584101] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [  726.591913] task:systemd-udevd   state:D stack:    0 pid:  297 ppid:   280 
flags:0x0000000d
  [  726.600248] Call trace:
  [  726.602680]  __switch_to+0xf8/0x150
  [  726.606159]  __schedule+0x2b8/0x790
  [  726.609634]  schedule+0x64/0x140
  [  726.612850]  schedule_preempt_disabled+0x18/0x24
  [  726.617453]  __mutex_lock.constprop.0+0x1a0/0x680
  [  726.622141]  __mutex_lock_slowpath+0x40/0x90
  [  726.626396]  mutex_lock+0x64/0x70
  [  726.629695]  devlink_resource_register+0x50/0x1a0
  [  726.634386]  mlx5_sf_hw_table_init+0xf4/0x2d0 [mlx5_core]
  [  726.639882]  mlx5_init_one_devl_locked+0x1c8/0x784 [mlx5_core]
  [  726.645791]  probe_one+0x300/0x5f0 [mlx5_core]
  [  726.650307]  local_pci_probe+0x48/0xb4
  [  726.654043]  pci_device_probe+0x18c/0x200
  [  726.658039]  really_probe+0xd0/0x490
  [  726.661600]  __driver_probe_device+0x148/0x190
  [  726.666029]  driver_probe_device+0x48/0x180
  [  726.670198]  __driver_attach+0x104/0x240
  [  726.674106]  bus_for_each_dev+0x78/0xdc
  [  726.677927]  driver_attach+0x2c/0x40
  [  726.681486]  bus_add_driver+0x154/0x270
  [  726.685307]  driver_register+0x80/0x13c
  [  726.689129]  __pci_register_driver+0x4c/0x60
  [  726.693386]  __init_backport+0xf0/0x1000 [mlx5_core]
  [  726.698425]  do_one_initcall+0x4c/0x250
  [  726.702248]  do_init_module+0x50/0x260
  [  726.705983]  load_module+0x9fc/0xbe0
  [  726.709543]  __do_sys_finit_module+0xa8/0x114
  [  726.713885]  __arm64_sys_finit_module+0x28/0x3c
  [  726.718401]  invoke_syscall+0x78/0x100
  [  726.722137]  el0_svc_common.constprop.0+0x54/0x184
  [  726.726913]  do_el0_svc+0x30/0xac
  [  726.730215]  el0_svc+0x48/0x160
  [  726.733341]  el0t_64_sync_handler+0xa4/0x130
  [  726.737597]  el0t_64_sync+0x1a4/0x1a8
  [  847.401924] INFO: task systemd-udevd:297 blocked for more than 724 seconds.
  [  847.408891]       Tainted: G           OE     5.15.0-1029-bluefield 
#31-Ubuntu

  How to fix:
  This is related to
  https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2039869
  and we need to backport/cherry-pick more patches from the series

  Patches are below
  Backport: f655dacb59ac net: devlink: remove unused locked functions
  Backport: 012ec02ae441 netdevsim: convert driver to use unlocked devlink API 
during init/fini
  Cherry-pick: eb0e9fa2c635 net: devlink: add unlocked variants of 
devlink_region_create/destroy() functions
  SKIP: 72a4c8c94efa mlxsw: convert driver to use unlocked devlink API during 
init/fini
  Backport: 70a2ff89369d net: devlink: add unlocked variants of 
devlink_dpipe*() functions
  Cherry-pick: 755cfa69c4ec net: devlink: add unlocked variants of 
devlink_sb*() functions
  Cherry-pick: c223d6a4bf6d net: devlink: add unlocked variants of 
devlink_resource*() functions
  Cherry-pick: 852e85a704c2 net: devlink: add unlocked variants of 
devling_trap*() functions
  Cherry-pick: e26fde2f5bef net: devlink: avoid false DEADLOCK warning reported 
by lock

  Thanks!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2042455/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to