On Wed, Jan 30, 2019 at 02:45:00PM +0800, Anand Jain wrote: > Circular locking dependency check reports warning[1], that's because > the btrfs_scrub_dev() calls the stack #0 below with, the > fs_info::scrub_lock held. The test case leading to this warning.. > > mkfs.btrfs -fq /dev/sdb && mount /dev/sdb /btrfs > btrfs scrub start -B /btrfs > > In fact we have fs_info::scrub_workers_refcnt to tack if the init and > destroy of the scrub workers are needed. So once we have incremented > and decremented the fs_info::scrub_workers_refcnt value in the thread, > its ok to drop the scrub_lock, and then actually do the > btrfs_destroy_workqueue() part. So this patch drops the scrub_lock > before calling btrfs_destroy_workqueue(). > > [1] > [ 76.146826] ====================================================== > [ 76.147086] WARNING: possible circular locking dependency detected > [ 76.147316] 4.20.0-rc3+ #41 Not tainted > [ 76.147489] ------------------------------------------------------ > [ 76.147722] btrfs/4065 is trying to acquire lock: > [ 76.147984] 0000000038593bc0 ((wq_completion)"%s-%s""btrfs", > name){+.+.}, at: flush_workqueue+0x70/0x4d0 > [ 76.148337] > but task is already holding lock: > [ 76.148594] 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at: > btrfs_scrub_dev+0x316/0x5d0 [btrfs] > [ 76.148909] > which lock already depends on the new lock. > > [ 76.149191] > the existing dependency chain (in reverse order) is: > [ 76.149446] > -> #3 (&fs_info->scrub_lock){+.+.}: > [ 76.149707] btrfs_scrub_dev+0x11f/0x5d0 [btrfs] > [ 76.149924] btrfs_ioctl+0x1ac3/0x2d80 [btrfs] > [ 76.150216] do_vfs_ioctl+0xa9/0x6d0 > [ 76.150468] ksys_ioctl+0x60/0x90 > [ 76.150716] __x64_sys_ioctl+0x16/0x20 > [ 76.150911] do_syscall_64+0x50/0x180 > [ 76.151182] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 76.151469] > -> #2 (&fs_devs->device_list_mutex){+.+.}: > [ 76.151851] reada_start_machine_worker+0xca/0x3f0 [btrfs] > [ 76.152195] normal_work_helper+0xf0/0x4c0 [btrfs] > [ 76.152489] process_one_work+0x1f4/0x520 > [ 76.152751] worker_thread+0x46/0x3d0 > [ 76.153715] kthread+0xf8/0x130 > [ 76.153912] ret_from_fork+0x3a/0x50 > [ 76.154178] > -> #1 ((work_completion)(&work->normal_work)){+.+.}: > [ 76.154575] worker_thread+0x46/0x3d0 > [ 76.154828] kthread+0xf8/0x130 > [ 76.155108] ret_from_fork+0x3a/0x50 > [ 76.155357] > -> #0 ((wq_completion)"%s-%s""btrfs", name){+.+.}: > [ 76.155751] flush_workqueue+0x9a/0x4d0 > [ 76.155911] drain_workqueue+0xca/0x1a0 > [ 76.156182] destroy_workqueue+0x17/0x230 > [ 76.156455] btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs] > [ 76.156756] scrub_workers_put+0x2e/0x60 [btrfs] > [ 76.156931] btrfs_scrub_dev+0x329/0x5d0 [btrfs] > [ 76.157219] btrfs_ioctl+0x1ac3/0x2d80 [btrfs] > [ 76.157491] do_vfs_ioctl+0xa9/0x6d0 > [ 76.157742] ksys_ioctl+0x60/0x90 > [ 76.157910] __x64_sys_ioctl+0x16/0x20 > [ 76.158177] do_syscall_64+0x50/0x180 > [ 76.158429] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 76.158716] > other info that might help us debug this: > > [ 76.158908] Chain exists of: > (wq_completion)"%s-%s""btrfs", name --> &fs_devs->device_list_mutex > --> &fs_info->scrub_lock > > [ 76.159629] Possible unsafe locking scenario: > > [ 76.160607] CPU0 CPU1 > [ 76.160934] ---- ---- > [ 76.161210] lock(&fs_info->scrub_lock); > [ 76.161458] > lock(&fs_devs->device_list_mutex); > [ 76.161805] > lock(&fs_info->scrub_lock); > [ 76.161909] lock((wq_completion)"%s-%s""btrfs", name); > [ 76.162201] > *** DEADLOCK ***
Please next time make sure the log messages are copied without whitespace damage. In many cases it's easy to glue the lines together, but specially the above locking scenario needs to be exact as the indentaion matters. I'll replace the whole report with the one in my logs as it does not have the device_list_mutex in the locking chain. This would be confusing as the problem with device_list_mutex was fixed already. > > [ 76.162627] 2 locks held by btrfs/4065: > [ 76.162897] #0: 00000000bef2775b (sb_writers#12){.+.+}, at: > mnt_want_write_file+0x24/0x50 > [ 76.163335] #1: 0000000062392ab7 (&fs_info->scrub_lock){+.+.}, at: > btrfs_scrub_dev+0x316/0x5d0 [btrfs] > [ 76.163796] > stack backtrace: > [ 76.163911] CPU: 1 PID: 4065 Comm: btrfs Not tainted 4.20.0-rc3+ #41 > [ 76.164228] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS > VirtualBox 12/01/2006 > [ 76.164646] Call Trace: > [ 76.164872] dump_stack+0x5e/0x8b > [ 76.165128] print_circular_bug.isra.37+0x1f1/0x1fe > [ 76.165398] __lock_acquire+0x14aa/0x1620 > [ 76.165652] lock_acquire+0xb0/0x190 > [ 76.165910] ? flush_workqueue+0x70/0x4d0 > [ 76.166175] flush_workqueue+0x9a/0x4d0 > [ 76.166420] ? flush_workqueue+0x70/0x4d0 > [ 76.166671] ? drain_workqueue+0x52/0x1a0 > [ 76.166911] drain_workqueue+0xca/0x1a0 > [ 76.167167] destroy_workqueue+0x17/0x230 > [ 76.167428] btrfs_destroy_workqueue+0x5d/0x1c0 [btrfs] > [ 76.167720] scrub_workers_put+0x2e/0x60 [btrfs] > [ 76.168233] btrfs_scrub_dev+0x329/0x5d0 [btrfs] > [ 76.168504] ? __sb_start_write+0x121/0x1b0 > [ 76.168759] ? mnt_want_write_file+0x24/0x50 > [ 76.169654] btrfs_ioctl+0x1ac3/0x2d80 [btrfs] > [ 76.169934] ? find_held_lock+0x2d/0x90 > [ 76.170204] ? find_held_lock+0x2d/0x90 > [ 76.170450] do_vfs_ioctl+0xa9/0x6d0 > [ 76.170690] ? __fget+0x101/0x1f0 > [ 76.170910] ? __fget+0x5/0x1f0 > [ 76.171157] ksys_ioctl+0x60/0x90 > [ 76.171391] __x64_sys_ioctl+0x16/0x20 > [ 76.171634] do_syscall_64+0x50/0x180 > [ 76.171892] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 76.172186] RIP: 0033:0x7f61d422e567 > [ 76.172425] Code: 44 00 00 48 8b 05 29 09 2d 00 64 c7 00 26 00 00 00 > 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f > 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 08 2d 00 f7 d8 64 89 01 48 This like can be ommitted from the log unless we really want to decode the instructions.