** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1764982 Title: [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy <m...@mellanox.com> Date: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin <isra...@mellanox.com> Signed-off-by: Max Gurtovoy <m...@mellanox.com> Reviewed-by: Sagi Grimberg <s...@grimberg.me> Signed-off-by: Keith Busch <keith.bu...@intel.com> Signed-off-by: Jens Axboe <ax...@kernel.dk> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(&nvmet_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy <m...@mellanox.com> Date: Wed Feb 28 13:12:39 2018 +0200 nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the work queue. Reviewed-by: Israel Rukshin <isra...@mellanox.com> Signed-off-by: Max Gurtovoy <m...@mellanox.com> Reviewed-by: Sagi Grimberg <s...@grimberg.me> Signed-off-by: Keith Busch <keith.bu...@intel.com> Signed-off-by: Jens Axboe <ax...@kernel.dk> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index f5f460b..250b277 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2024,6 +2024,20 @@ static struct nvmf_transport_ops nvme_rdma_transport = { static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvme_rdma_ctrl *ctrl; + struct nvme_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->dev == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return; /* Delete all controllers using this device */ mutex_lock(&nvme_rdma_ctrl_mutex); To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1764982/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp