Re: Manual driver binding and unbinding broken for SCSI
On Wed, Feb 22, 2017 at 1:14 AM, Jan Karawrote: > On Sun 19-02-17 18:19:58, Omar Sandoval wrote: >> On Fri, Feb 17, 2017 at 04:43:56PM -0800, James Bottomley wrote: >> > This seems to be related to a 0day test we got on the block tree, >> > details here: >> > >> > http://marc.info/?t=14862406881 >> > >> > I root caused the above to something not being released when it should >> > be, so it looks like you have the same problem. It seems to be a >> > recent commit in the block tree, so could you bisect it since you have >> > a nice reproducer? >> >> These appear to actually be two separate issues. >> >> The unbind followed by bind crash only happens with scsi-mq. It reproes >> since at least 4.0. >> >> The unbind followed by a new device coming up crash happens both with >> and without scsi-mq. The earliest version I was able to check for that >> was 4.6, which did reproduce. >> >> I'll see if I can get some more info on these two issues separately. > > Actually, the second issue is only a warning right? And if I understand the > issue correctly, it should be fixed by either Dan's patches in linux-block > or my patch 4 in the series which matches your test results. So that is > dealt with. I have no idea about the first issue though. Looks the 1st one is one old issue in blk-mq, and I have sent one patchset to address it: http://marc.info/?l=linux-kernel=148775847517071=2 Omar, feel free to give a test. thanks, Ming Lei
Re: Manual driver binding and unbinding broken for SCSI
On Sun 19-02-17 18:19:58, Omar Sandoval wrote: > On Fri, Feb 17, 2017 at 04:43:56PM -0800, James Bottomley wrote: > > This seems to be related to a 0day test we got on the block tree, > > details here: > > > > http://marc.info/?t=14862406881 > > > > I root caused the above to something not being released when it should > > be, so it looks like you have the same problem. It seems to be a > > recent commit in the block tree, so could you bisect it since you have > > a nice reproducer? > > These appear to actually be two separate issues. > > The unbind followed by bind crash only happens with scsi-mq. It reproes > since at least 4.0. > > The unbind followed by a new device coming up crash happens both with > and without scsi-mq. The earliest version I was able to check for that > was 4.6, which did reproduce. > > I'll see if I can get some more info on these two issues separately. Actually, the second issue is only a warning right? And if I understand the issue correctly, it should be fixed by either Dan's patches in linux-block or my patch 4 in the series which matches your test results. So that is dealt with. I have no idea about the first issue though. Honza -- Jan KaraSUSE Labs, CR
Re: Manual driver binding and unbinding broken for SCSI
On Fri, Feb 17, 2017 at 04:43:56PM -0800, James Bottomley wrote: > This seems to be related to a 0day test we got on the block tree, > details here: > > http://marc.info/?t=14862406881 > > I root caused the above to something not being released when it should > be, so it looks like you have the same problem. It seems to be a > recent commit in the block tree, so could you bisect it since you have > a nice reproducer? These appear to actually be two separate issues. The unbind followed by bind crash only happens with scsi-mq. It reproes since at least 4.0. The unbind followed by a new device coming up crash happens both with and without scsi-mq. The earliest version I was able to check for that was 4.6, which did reproduce. I'll see if I can get some more info on these two issues separately.
Re: Manual driver binding and unbinding broken for SCSI
On Fri, 2017-02-17 at 16:30 -0800, Omar Sandoval wrote: > Hi, everyone, > > As per $SUBJECT, I can cause a crash on v4.10-rc8, Jens' block/for > -next, > and Jan's bdi branch [1] by doing this: > > # lsscsi > [0:0:0:0]diskQEMU QEMU HARDDISK2.5+ /dev/sda > # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/unbind > # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/bind > > The resulting trace looks like this: > > [ 19.347924] kobject (8800791ea0b8): tried to init an > initialized object, something is seriously wrong. > [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0 > -rc7-00210-g53f39eeaa263 #34 > [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.1-20161122_114906-anatol 04/01/2014 > [ 19.350920] Workqueue: events_unbound async_run_entry_fn > [ 19.350920] Call Trace: > [ 19.350920] dump_stack+0x63/0x83 > [ 19.350920] kobject_init+0x77/0x90 > [ 19.350920] blk_mq_register_dev+0x40/0x130 > [ 19.350920] blk_register_queue+0xb6/0x190 > [ 19.350920] device_add_disk+0x1ec/0x4b0 > [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] > [ 19.350920] async_run_entry_fn+0x48/0x150 > [ 19.350920] process_one_work+0x1d0/0x480 > [ 19.350920] worker_thread+0x48/0x4e0 > [ 19.350920] kthread+0x101/0x140 > [ 19.350920] ? process_one_work+0x480/0x480 > [ 19.350920] ? kthread_create_on_node+0x60/0x60 > [ 19.350920] ret_from_fork+0x2c/0x40 > > Additionally, on v4.10-rc8, but not on block/for-next or Jan's > branch, > doing this: > > # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/unbind > # modprobe scsi_debug > > Causes this trace: > > [ 18.876096] [ cut here ] > [ 18.877057] WARNING: CPU: 1 PID: 90 at fs/sysfs/dir.c:31 > sysfs_warn_dup+0x62/0x80 > [ 18.878270] sysfs: cannot create duplicate filename > '/devices/virtual/bdi/8:0' > [ 18.879435] Modules linked in: scsi_debug btrfs xor raid6_pq > sd_mod virtio_scsi scsi_mod nvme nvme_core virtio_net > [ 18.881118] CPU: 1 PID: 90 Comm: kworker/u8:2 Not tainted 4.10.0 > -rc8 #34 > [ 18.882114] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.1-20161122_114906-anatol 04/01/2014 > [ 18.883872] Workqueue: events_unbound async_run_entry_fn > [ 18.884408] Call Trace: > [ 18.884408] dump_stack+0x63/0x83 > [ 18.884408] __warn+0xcb/0xf0 > [ 18.884408] warn_slowpath_fmt+0x5f/0x80 > [ 18.884408] ? kernfs_path_from_node+0x4f/0x60 > [ 18.884408] sysfs_warn_dup+0x62/0x80 > [ 18.884408] sysfs_create_dir_ns+0x77/0x90 > [ 18.884408] kobject_add_internal+0xbe/0x350 > [ 18.884408] kobject_add+0x75/0xd0 > [ 18.884408] device_add+0x121/0x680 > [ 18.884408] device_create_groups_vargs+0xe0/0xf0 > [ 18.884408] device_create_vargs+0x1c/0x20 > [ 18.884408] bdi_register+0x90/0x1b0 > [ 18.884408] ? sd_revalidate_disk+0x34a/0x1d00 [sd_mod] > [ 18.884408] bdi_register_owner+0x36/0x60 > [ 18.884408] device_add_disk+0x165/0x4a0 > [ 18.884408] ? update_autosuspend+0x51/0x60 > [ 18.884408] ? __pm_runtime_use_autosuspend+0x5c/0x70 > [ 18.884408] sd_probe_async+0x10d/0x1c0 [sd_mod] > [ 18.884408] async_run_entry_fn+0x4a/0x170 > [ 18.884408] process_one_work+0x165/0x430 > [ 18.884408] worker_thread+0x4e/0x490 > [ 18.884408] kthread+0x101/0x140 > [ 18.884408] ? process_one_work+0x430/0x430 > [ 18.884408] ? kthread_create_on_node+0x60/0x60 > [ 18.884408] ret_from_fork+0x2c/0x40 > [ 18.913090] ---[ end trace f43b051485c2a749 ]--- > > On all three kernels, it looks like the bdi sysfs entry hangs around > after the block device has already been removed: This seems to be related to a 0day test we got on the block tree, details here: http://marc.info/?t=14862406881 I root caused the above to something not being released when it should be, so it looks like you have the same problem. It seems to be a recent commit in the block tree, so could you bisect it since you have a nice reproducer? Thanks, James
Manual driver binding and unbinding broken for SCSI
Hi, everyone, As per $SUBJECT, I can cause a crash on v4.10-rc8, Jens' block/for-next, and Jan's bdi branch [1] by doing this: # lsscsi [0:0:0:0]diskQEMU QEMU HARDDISK2.5+ /dev/sda # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/unbind # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/bind The resulting trace looks like this: [ 19.347924] kobject (8800791ea0b8): tried to init an initialized object, something is seriously wrong. [ 19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34 [ 19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 19.350920] Workqueue: events_unbound async_run_entry_fn [ 19.350920] Call Trace: [ 19.350920] dump_stack+0x63/0x83 [ 19.350920] kobject_init+0x77/0x90 [ 19.350920] blk_mq_register_dev+0x40/0x130 [ 19.350920] blk_register_queue+0xb6/0x190 [ 19.350920] device_add_disk+0x1ec/0x4b0 [ 19.350920] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 19.350920] async_run_entry_fn+0x48/0x150 [ 19.350920] process_one_work+0x1d0/0x480 [ 19.350920] worker_thread+0x48/0x4e0 [ 19.350920] kthread+0x101/0x140 [ 19.350920] ? process_one_work+0x480/0x480 [ 19.350920] ? kthread_create_on_node+0x60/0x60 [ 19.350920] ret_from_fork+0x2c/0x40 Additionally, on v4.10-rc8, but not on block/for-next or Jan's branch, doing this: # echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/unbind # modprobe scsi_debug Causes this trace: [ 18.876096] [ cut here ] [ 18.877057] WARNING: CPU: 1 PID: 90 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 [ 18.878270] sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:0' [ 18.879435] Modules linked in: scsi_debug btrfs xor raid6_pq sd_mod virtio_scsi scsi_mod nvme nvme_core virtio_net [ 18.881118] CPU: 1 PID: 90 Comm: kworker/u8:2 Not tainted 4.10.0-rc8 #34 [ 18.882114] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014 [ 18.883872] Workqueue: events_unbound async_run_entry_fn [ 18.884408] Call Trace: [ 18.884408] dump_stack+0x63/0x83 [ 18.884408] __warn+0xcb/0xf0 [ 18.884408] warn_slowpath_fmt+0x5f/0x80 [ 18.884408] ? kernfs_path_from_node+0x4f/0x60 [ 18.884408] sysfs_warn_dup+0x62/0x80 [ 18.884408] sysfs_create_dir_ns+0x77/0x90 [ 18.884408] kobject_add_internal+0xbe/0x350 [ 18.884408] kobject_add+0x75/0xd0 [ 18.884408] device_add+0x121/0x680 [ 18.884408] device_create_groups_vargs+0xe0/0xf0 [ 18.884408] device_create_vargs+0x1c/0x20 [ 18.884408] bdi_register+0x90/0x1b0 [ 18.884408] ? sd_revalidate_disk+0x34a/0x1d00 [sd_mod] [ 18.884408] bdi_register_owner+0x36/0x60 [ 18.884408] device_add_disk+0x165/0x4a0 [ 18.884408] ? update_autosuspend+0x51/0x60 [ 18.884408] ? __pm_runtime_use_autosuspend+0x5c/0x70 [ 18.884408] sd_probe_async+0x10d/0x1c0 [sd_mod] [ 18.884408] async_run_entry_fn+0x4a/0x170 [ 18.884408] process_one_work+0x165/0x430 [ 18.884408] worker_thread+0x4e/0x490 [ 18.884408] kthread+0x101/0x140 [ 18.884408] ? process_one_work+0x430/0x430 [ 18.884408] ? kthread_create_on_node+0x60/0x60 [ 18.884408] ret_from_fork+0x2c/0x40 [ 18.913090] ---[ end trace f43b051485c2a749 ]--- On all three kernels, it looks like the bdi sysfs entry hangs around after the block device has already been removed: ┌[root@silver ~] └# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda8:00 16G 0 disk ┌[root@silver ~] └# ls -al /sys/devices/virtual/bdi total 0 drwxr-xr-x 6 root root 0 Feb 17 16:19 . drwxr-xr-x 13 root root 0 Feb 17 16:19 .. drwxr-xr-x 3 root root 0 Feb 17 16:19 254:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 259:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 8:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 9p-1 ┌[root@silver ~] └# echo 0:0:0:0 > /sys/bus/scsi/drivers/sd/unbind ┌[root@silver ~] └# ls -al /sys/devices/virtual/bdi total 0 drwxr-xr-x 6 root root 0 Feb 17 16:19 . drwxr-xr-x 13 root root 0 Feb 17 16:19 .. drwxr-xr-x 3 root root 0 Feb 17 16:19 254:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 259:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 8:0 drwxr-xr-x 3 root root 0 Feb 17 16:19 9p-1 ┌[root@silver ~] └# lsblk /dev/sda lsblk: /dev/sda: not a block device Any ideas here? 1: https://git.kernel.org/cgit/linux/kernel/git/jack/linux-fs.git/tree/?h=bdi