Hi,

I have a reproducible scenario wherein removing a USB device while
reading /sys/kernel/debug/usb/devices causes a deadlock. This should
not be specific to any USB device. Any USB device removal that causes
a call to debugfs_remove() has inverted lock ordering with respect to
the read() of debug/usb/devices.

e.g.
read thread: srcu_read_lock(&debugfs_srcu);
-- usb unplug --
  remove thread: mutex_lock(&usb_bus_idr_lock);
  remove thread: synchronize_srcu(&debugfs_srcu); <- blocked
read thread: mutex_lock(&usb_bus_idr_lock); <- blocked
read thread: srcu_read_unlock(&debugfs_srcu, ...);

This seems to be another flavor of what Johannes Berg reported:
deadlock in synchronize_srcu() in debugfs?
https://lkml.org/lkml/2017/3/23/415

I applied this patch set from Nicolai Stange and can no longer
reproduce the hang.
[RFC PATCH v2 0/9] debugfs: per-file removal protection
https://lkml.org/lkml/2017/5/3/292

As patch 2/9 in the series indicates, commit 49d200deaa68 ("debugfs:
prevent access to removed files' private data") is where this was
first introduced, and it is reproducible on v4.14-rc4.

How should we move forward with the resolution of this debugfs change?
It seems to me that the USB locking is reasonable but the debugfs
global srcu is overly restrictive. This could lead to unexpected lock
inversion any time a driver shares a mutex between its debugfs read
and removal paths.

Backtrace below. Thanks!

-Tyler Hall

This is easier to reproduce by adding a sleep before the
usb_bus_idr_lock, but I've seen it on an unmodified kernel.

diff --git a/drivers/usb/core/devices.c b/drivers/usb/core/devices.c
index 55dea2e7828f..534650cd0950 100644
--- a/drivers/usb/core/devices.c
+++ b/drivers/usb/core/devices.c
@@ -614,6 +614,7 @@ static ssize_t usb_device_read(struct file *file,
char __user *buf,
     if (!access_ok(VERIFY_WRITE, buf, nbytes))
         return -EFAULT;

+    msleep(1000);
     mutex_lock(&usb_bus_idr_lock);
     /* print devices for all busses */
     idr_for_each_entry(&usb_bus_idr, bus, id) {


[   24.240542] sysrq: SysRq : Show Blocked State
[   24.240765]   task                        PC stack   pid father
[   24.240975] kworker/0:2     D13840   881      2 0x80000000
[   24.241525] Workqueue: usb_hub_wq hub_event
[   24.241682] Call Trace:
[   24.242273]  __schedule+0x317/0x6d0
[   24.242442]  schedule+0x31/0x80
[   24.242514]  schedule_timeout+0x1d0/0x320
[   24.242603]  ? __queue_work+0x135/0x400
[   24.242689]  wait_for_completion+0x92/0xf0
[   24.242765]  ? wait_for_completion+0x92/0xf0
[   24.242841]  ? wake_up_q+0x70/0x70
[   24.242907]  __synchronize_srcu.part.14+0x71/0x90
[   24.242985]  ? trace_event_raw_event_rcu_torture_read+0xe0/0xe0
[   24.243169]  synchronize_srcu_expedited+0x22/0x30
[   24.243265]  ? synchronize_srcu_expedited+0x22/0x30
[   24.243347]  synchronize_srcu+0x9a/0xc0
[   24.243418]  debugfs_remove+0x6d/0xa0
[   24.243490]  bdi_unregister+0x8b/0x170
[   24.243558]  del_gendisk+0x139/0x220
[   24.243624]  sd_remove+0x5c/0xc0
[   24.243685]  device_release_driver_internal+0x150/0x210
[   24.243769]  device_release_driver+0xd/0x10
[   24.243841]  bus_remove_device+0xdb/0x120
[   24.243915]  device_del+0x1c3/0x2e0
[   24.243977]  __scsi_remove_device+0xff/0x130
[   24.244122]  scsi_forget_host+0x5b/0x60
[   24.244203]  scsi_remove_host+0x74/0x140
[   24.244281]  usb_stor_disconnect+0x54/0xc0
[   24.244357]  usb_unbind_interface+0x6d/0x260
[   24.244437]  device_release_driver_internal+0x150/0x210
[   24.244520]  device_release_driver+0xd/0x10
[   24.244591]  bus_remove_device+0xdb/0x120
[   24.244659]  device_del+0x1c3/0x2e0
[   24.244722]  usb_disable_device+0x97/0x1f0
[   24.244792]  usb_disconnect+0x88/0x230
[   24.244853]  hub_event+0x5b9/0x11e0
[   24.244915]  ? add_timer+0x10e/0x230
[   24.244984]  process_one_work+0x146/0x3e0
[   24.245124]  worker_thread+0x43/0x3e0
[   24.245204]  kthread+0x104/0x140
[   24.245266]  ? create_worker+0x190/0x190
[   24.245333]  ? kthread_create_on_node+0x40/0x40
[   24.245406]  ret_from_fork+0x22/0x30

[   24.245542] cat             D13712  1029   1018 0x00000000
[   24.245652] Call Trace:
[   24.245705]  __schedule+0x317/0x6d0
[   24.245770]  schedule+0x31/0x80
[   24.245830]  schedule_preempt_disabled+0x9/0x10
[   24.245903]  __mutex_lock.isra.2+0x225/0x470
[   24.245975]  __mutex_lock_slowpath+0xe/0x10
[   24.246110]  ? __mutex_lock_slowpath+0xe/0x10
[   24.246199]  mutex_lock+0x2a/0x30
[   24.246261]  usb_device_read+0xb6/0x140
[   24.246325]  full_proxy_read+0x4f/0x90
[   24.246394]  __vfs_read+0x23/0x120
[   24.246456]  ? security_file_permission+0x96/0xb0
[   24.246533]  ? rw_verify_area+0x49/0xb0
[   24.246593]  vfs_read+0x8e/0x130
[   24.246646]  SyS_read+0x41/0xa0
[   24.246698]  entry_SYSCALL_64_fastpath+0x13/0x94

Reply via email to