Public bug reported:

SRU Justification:

[Impact]

We occasionally see a race condition (once every 350 reboots) where napi is 
still
running (mlxbf_gige_poll) while a shutdown has been initiated through "reboot".
Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and as
a result causes a kernel panic.

[Fix]

The fix is to explicitly disable napi and dequeue it during shutdown.
mlxbf_gige_remove already calls:
unregister_netdev->unregister_netdevice->unregister_netdev_queue->
rollback_registered->rollback_registered_many->dev_close_many->
__dev_close_many->ndo_stop->mlxbf_gige_stop which stops napi
    
So use mlxbf_gige_remove in place of the existing shutdown logic.

[Test Case]

* Issue at least 1000 reboots from linux and make sure there is no panic
caused by the mlxbf-gige driver.

[Regression Potential]

* since this issue is hard to reproduce, it hasn't been tested
thoroughly yet. so it needs several reboot loops to validate it.

** Affects: linux-bluefield (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-bluefield in Ubuntu.
https://bugs.launchpad.net/bugs/2022370

Title:
  mlxbf-gige: Fix kernel panic at shutdown

Status in linux-bluefield package in Ubuntu:
  New

Bug description:
  SRU Justification:

  [Impact]

  We occasionally see a race condition (once every 350 reboots) where napi is 
still
  running (mlxbf_gige_poll) while a shutdown has been initiated through 
"reboot".
  Since mlxbf_gige_poll is still running, it tries to access a NULL pointer and 
as
  a result causes a kernel panic.

  [Fix]

  The fix is to explicitly disable napi and dequeue it during shutdown.
  mlxbf_gige_remove already calls:
  unregister_netdev->unregister_netdevice->unregister_netdev_queue->
  rollback_registered->rollback_registered_many->dev_close_many->
  __dev_close_many->ndo_stop->mlxbf_gige_stop which stops napi
      
  So use mlxbf_gige_remove in place of the existing shutdown logic.

  [Test Case]

  * Issue at least 1000 reboots from linux and make sure there is no
  panic caused by the mlxbf-gige driver.

  [Regression Potential]

  * since this issue is hard to reproduce, it hasn't been tested
  thoroughly yet. so it needs several reboot loops to validate it.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/2022370/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to