Re: [syzbot] possible deadlock in io_sq_thread_finish
syzbot suspects this issue was fixed by commit: commit f4e61f0c9add3b00bd5f2df3c814d688849b8707 Author: Wanpeng Li Date: Mon Mar 15 06:55:28 2021 + x86/kvm: Fix broken irq restoration in kvm_wait bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1022d7aad0 start commit: 144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'.. git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda syz repro: https://syzkaller.appspot.com/x/repro.syz?x=167574dad0 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c8f566d0 If the result looks correct, please mark the issue as fixed by replying with: #syz fix: x86/kvm: Fix broken irq restoration in kvm_wait For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Re: [syzbot] possible deadlock in io_sq_thread_finish
Hello, syzbot has tested the proposed patch and the reproducer did not trigger any issue: Reported-and-tested-by: syzbot+ac39856cb1b332dbb...@syzkaller.appspotmail.com Tested on: commit: 7d41e854 io_uring: remove indirect ctx into sqo injection git tree: git://git.kernel.dk/linux-block io_uring-5.12 kernel config: https://syzkaller.appspot.com/x/.config?x=b3c6cab008c50864 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda compiler: Note: testing is done by a robot and is best-effort only.
Re: [syzbot] possible deadlock in io_sq_thread_finish
#syz test: git://git.kernel.dk/linux-block io_uring-5.12 -- Jens Axboe
Re: [syzbot] possible deadlock in io_sq_thread_finish
On 3/10/21 6:40 AM, Pavel Begunkov wrote: > On 10/03/2021 04:10, Hillf Danton wrote:> >> Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree >> by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to >> the arrival of urgent signal because it misleads io_sq_thread_stop(), >> though a followup cleanup should go there. > > That's actually reasonable, just like > 8bff1bf8abeda ("io_uring: fix io_sq_offload_create error handling") > > Are you going to send a patch? Agree - Hillf, do you mind if I just fold this one in? -- Jens Axboe
Re: [syzbot] possible deadlock in io_sq_thread_finish
On 10/03/2021 04:10, Hillf Danton wrote:> > Fix 05ff6c4a0e07 ("io_uring: SQPOLL parking fixes") in the current tree > by removing the extra set of IO_SQ_THREAD_SHOULD_STOP in response to > the arrival of urgent signal because it misleads io_sq_thread_stop(), > though a followup cleanup should go there. That's actually reasonable, just like 8bff1bf8abeda ("io_uring: fix io_sq_offload_create error handling") Are you going to send a patch? > > --- x/fs/io_uring.c > +++ y/fs/io_uring.c > @@ -6689,10 +6689,8 @@ static int io_sq_thread(void *data) > io_sqd_init_new(sqd); > timeout = jiffies + sqd->sq_thread_idle; > } > - if (fatal_signal_pending(current)) { > - set_bit(IO_SQ_THREAD_SHOULD_STOP, >state); > + if (fatal_signal_pending(current)) > break; > - } > sqt_spin = false; > cap_entries = !list_is_singular(>ctx_list); > list_for_each_entry(ctx, >ctx_list, sqd_list) { > -- Pavel Begunkov
Re: [syzbot] possible deadlock in io_sq_thread_finish
Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: KASAN: use-after-free Read in io_sq_thread == BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 Read of size 8 at addr 88801d418c78 by task iou-sqp-10269/10271 CPU: 1 PID: 10271 Comm: iou-sqp-10269 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 down_write+0x92/0x150 kernel/locking/rwsem.c:1406 io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 10269: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] kasan_kmalloc mm/kasan/common.c:506 [inline] kasan_kmalloc mm/kasan/common.c:465 [inline] __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:684 [inline] io_get_sq_data fs/io_uring.c:7153 [inline] io_sq_offload_create fs/io_uring.c:7827 [inline] io_uring_create fs/io_uring.c:9443 [inline] io_uring_setup+0x154b/0x2940 fs/io_uring.c:9523 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 9: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 kasan_slab_free mm/kasan/common.c:360 [inline] kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367 kasan_slab_free include/linux/kasan.h:199 [inline] slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kfree+0xe5/0x7f0 mm/slub.c:4213 io_put_sq_data fs/io_uring.c:7095 [inline] io_sq_thread_finish+0x48e/0x5b0 fs/io_uring.c:7113 io_ring_ctx_free fs/io_uring.c:8355 [inline] io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at 88801d418c00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 120 bytes inside of 512-byte region [88801d418c00, 88801d418e00) The buggy address belongs to the page: page:311e6f59 refcount:1 mapcount:0 mapping: index:0x0 pfn:0x1d418 head:311e6f59 order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfff0010200(slab|head) raw: 00fff0010200 dead0100 dead0122 88800fc41c80 raw: 00100010 0001 page dumped because: kasan: bad access detected Memory state around the buggy address: 88801d418b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 88801d418b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >88801d418c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ 88801d418c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 88801d418d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb == Tested on: commit: dc5c40fb io_uring: always wait for sqd exited when stoppin.. git tree: git://git.kernel.dk/linux-block io_uring-5.12 console output: https://syzkaller.appspot.com/x/log.txt?x=111d175cd0 kernel config: https://syzkaller.appspot.com/x/.config?x=b3c6cab008c50864 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda compiler:
Re: [syzbot] possible deadlock in io_sq_thread_finish
Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: KASAN: use-after-free Read in io_sq_thread == BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 Read of size 8 at addr 888023e47c78 by task iou-sqp-10156/10158 CPU: 0 PID: 10158 Comm: iou-sqp-10156 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 down_write+0x92/0x150 kernel/locking/rwsem.c:1406 io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 10156: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] kasan_kmalloc mm/kasan/common.c:506 [inline] kasan_kmalloc mm/kasan/common.c:465 [inline] __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:684 [inline] io_get_sq_data fs/io_uring.c:7153 [inline] io_sq_offload_create fs/io_uring.c:7827 [inline] io_uring_create fs/io_uring.c:9443 [inline] io_uring_setup+0x154b/0x2940 fs/io_uring.c:9523 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 3392: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 kasan_slab_free mm/kasan/common.c:360 [inline] kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367 kasan_slab_free include/linux/kasan.h:199 [inline] slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kfree+0xe5/0x7f0 mm/slub.c:4213 io_put_sq_data fs/io_uring.c:7095 [inline] io_sq_thread_finish+0x48e/0x5b0 fs/io_uring.c:7113 io_ring_ctx_free fs/io_uring.c:8355 [inline] io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at 888023e47c00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 120 bytes inside of 512-byte region [888023e47c00, 888023e47e00) The buggy address belongs to the page: page:200f7571 refcount:1 mapcount:0 mapping: index:0x888023e47400 pfn:0x23e44 head:200f7571 order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfff0010200(slab|head) raw: 00fff0010200 ea5f6908 ea527508 88800fc41c80 raw: 888023e47400 001f 0001 page dumped because: kasan: bad access detected Memory state around the buggy address: 888023e47b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 888023e47b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >888023e47c00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ 888023e47c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 888023e47d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb == Tested on: commit: dc5c40fb io_uring: always wait for sqd exited when stoppin.. git tree: git://git.kernel.dk/linux-block io_uring-5.12 console output: https://syzkaller.appspot.com/x/log.txt?x=16cd022cd0 kernel config: https://syzkaller.appspot.com/x/.config?x=b3c6cab008c50864 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda compiler:
Re: [syzbot] possible deadlock in io_sq_thread_finish
#syz test: git://git.kernel.dk/linux-block io_uring-5.12 -- Jens Axboe
Re: [syzbot] possible deadlock in io_sq_thread_finish
#syz test: git://git.kernel.dk/linux-block io_uring-5.12 -- Jens Axboe
Re: [syzbot] possible deadlock in io_sq_thread_finish
Hello, syzbot has tested the proposed patch but the reproducer is still triggering an issue: KASAN: use-after-free Read in io_sq_thread == BUG: KASAN: use-after-free in __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 Read of size 8 at addr 888034cbfc78 by task iou-sqp-10518/10523 CPU: 0 PID: 10523 Comm: iou-sqp-10518 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __lock_acquire+0x3e6f/0x54c0 kernel/locking/lockdep.c:4770 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 down_write+0x92/0x150 kernel/locking/rwsem.c:1406 io_sq_thread+0x1220/0x1b10 fs/io_uring.c:6754 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Allocated by task 10518: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] kasan_kmalloc mm/kasan/common.c:506 [inline] kasan_kmalloc mm/kasan/common.c:465 [inline] __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515 kmalloc include/linux/slab.h:554 [inline] kzalloc include/linux/slab.h:684 [inline] io_get_sq_data fs/io_uring.c:7156 [inline] io_sq_offload_create fs/io_uring.c:7830 [inline] io_uring_create fs/io_uring.c:9443 [inline] io_uring_setup+0x1552/0x2860 fs/io_uring.c:9523 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 396: kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357 kasan_slab_free mm/kasan/common.c:360 [inline] kasan_slab_free mm/kasan/common.c:325 [inline] __kasan_slab_free+0xf5/0x130 mm/kasan/common.c:367 kasan_slab_free include/linux/kasan.h:199 [inline] slab_free_hook mm/slub.c:1562 [inline] slab_free_freelist_hook+0x92/0x210 mm/slub.c:1600 slab_free mm/slub.c:3161 [inline] kfree+0xe5/0x7f0 mm/slub.c:4213 io_put_sq_data fs/io_uring.c:7098 [inline] io_sq_thread_finish+0x4b0/0x5f0 fs/io_uring.c:7116 io_ring_ctx_free fs/io_uring.c:8355 [inline] io_ring_exit_work+0x333/0xcf0 fs/io_uring.c:8525 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 The buggy address belongs to the object at 888034cbfc00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 120 bytes inside of 512-byte region [888034cbfc00, 888034cbfe00) The buggy address belongs to the page: page:4a1f04c4 refcount:1 mapcount:0 mapping: index:0x0 pfn:0x34cbc head:4a1f04c4 order:2 compound_mapcount:0 compound_pincount:0 flags: 0xfff0010200(slab|head) raw: 00fff0010200 dead0100 dead0122 88800fc41c80 raw: 00100010 0001 page dumped because: kasan: bad access detected Memory state around the buggy address: 888034cbfb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 888034cbfb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >888034cbfc00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ 888034cbfc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 888034cbfd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb == Tested on: commit: 8bf06ba6 io_uring: remove unneeded variable 'ret' git tree: git://git.kernel.dk/linux-block io_uring-5.12 console output: https://syzkaller.appspot.com/x/log.txt?x=13fcd952d0 kernel config: https://syzkaller.appspot.com/x/.config?x=b3c6cab008c50864 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda compiler:
Re: [syzbot] possible deadlock in io_sq_thread_finish
On 3/9/21 7:04 AM, syzbot wrote: > syzbot has found a reproducer for the following issue on: > > HEAD commit:144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=129addbcd0 > kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2 > dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=167574dad0 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c8f566d0 #syz test: git://git.kernel.dk/linux-block io_uring-5.12 -- Jens Axboe
Re: [syzbot] possible deadlock in io_sq_thread_finish
syzbot has found a reproducer for the following issue on: HEAD commit:144c79ef Merge tag 'perf-tools-fixes-for-v5.12-2020-03-07'.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=129addbcd0 kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2 dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda syz repro: https://syzkaller.appspot.com/x/repro.syz?x=167574dad0 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c8f566d0 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+ac39856cb1b332dbb...@syzkaller.appspotmail.com WARNING: possible recursive locking detected 5.12.0-rc2-syzkaller #0 Not tainted kworker/u4:7/8696 is trying to acquire lock: 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_stop fs/io_uring.c:7099 [inline] 888015395870 (>lock){+.+.}-{3:3}, at: io_put_sq_data fs/io_uring.c:7115 [inline] 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 but task is already holding lock: 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline] 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(>lock); lock(>lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u4:7/8696: #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline] #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline] #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline] #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x871/0x1600 kernel/workqueue.c:2246 #1: c9000253fda8 ((work_completion)(>exit_work)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250 #2: 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park fs/io_uring.c:7088 [inline] #2: 888015395870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082 stack backtrace: CPU: 0 PID: 8696 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events_unbound io_ring_exit_work Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_deadlock_bug kernel/locking/lockdep.c:2829 [inline] check_deadlock kernel/locking/lockdep.c:2872 [inline] validate_chain kernel/locking/lockdep.c:3661 [inline] __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900 lock_acquire kernel/locking/lockdep.c:5510 [inline] lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 __mutex_lock_common kernel/locking/mutex.c:946 [inline] __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093 io_sq_thread_stop fs/io_uring.c:7099 [inline] io_put_sq_data fs/io_uring.c:7115 [inline] io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 io_ring_ctx_free fs/io_uring.c:8408 [inline] io_ring_exit_work+0x82/0x9a0 fs/io_uring.c:8539 process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 kthread+0x3b1/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Re: [syzbot] possible deadlock in io_sq_thread_finish
On 07/03/2021 12:39, Pavel Begunkov wrote: > On 07/03/2021 09:49, syzbot wrote: >> Hello, >> >> syzbot found the following issue on: >> >> HEAD commit:a38fd874 Linux 5.12-rc2 >> git tree: upstream >> console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad0 >> kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2 >> dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda >> >> Unfortunately, I don't have any reproducer for this issue yet. >> >> IMPORTANT: if you fix the issue, please add the following tag to the commit: >> Reported-by: syzbot+ac39856cb1b332dbb...@syzkaller.appspotmail.com > > Legit error, park() might take an sqd lock, and then we take it again. > I'll patch it up I was wrong, it looks fine, io_put_sq_data() and io_sq_thread_park() don't nest. I wonder if that's a false positive due to conditional locking as below if (sqd->thread == current) return; mutex_lock(>lock); > >> >> >> WARNING: possible recursive locking detected >> 5.12.0-rc2-syzkaller #0 Not tainted >> >> kworker/u4:7/7615 is trying to acquire lock: >> 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_stop >> fs/io_uring.c:7099 [inline] >> 888144a02870 (>lock){+.+.}-{3:3}, at: io_put_sq_data >> fs/io_uring.c:7115 [inline] >> 888144a02870 (>lock){+.+.}-{3:3}, at: >> io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 >> >> but task is already holding lock: >> 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park >> fs/io_uring.c:7088 [inline] >> 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 >> fs/io_uring.c:7082 >> >> other info that might help us debug this: >> Possible unsafe locking scenario: >> >>CPU0 >> >> lock(>lock); >> lock(>lock); >> >> *** DEADLOCK *** >> >> May be due to missing lock nesting notation >> >> 3 locks held by kworker/u4:7/7615: >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> atomic_long_set include/asm-generic/atomic-long.h:41 [inline] >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> set_work_data kernel/workqueue.c:616 [inline] >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline] >> #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: >> process_one_work+0x871/0x1600 kernel/workqueue.c:2246 >> #1: c900023a7da8 ((work_completion)(>exit_work)){+.+.}-{0:0}, at: >> process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250 >> #2: 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park >> fs/io_uring.c:7088 [inline] >> #2: 888144a02870 (>lock){+.+.}-{3:3}, at: >> io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082 >> >> stack backtrace: >> CPU: 1 PID: 7615 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS >> Google 01/01/2011 >> Workqueue: events_unbound io_ring_exit_work >> Call Trace: >> __dump_stack lib/dump_stack.c:79 [inline] >> dump_stack+0x141/0x1d7 lib/dump_stack.c:120 >> print_deadlock_bug kernel/locking/lockdep.c:2829 [inline] >> check_deadlock kernel/locking/lockdep.c:2872 [inline] >> validate_chain kernel/locking/lockdep.c:3661 [inline] >> __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900 >> lock_acquire kernel/locking/lockdep.c:5510 [inline] >> lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 >> __mutex_lock_common kernel/locking/mutex.c:946 [inline] >> __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093 >> io_sq_thread_stop fs/io_uring.c:7099 [inline] >> io_put_sq_data fs/io_uring.c:7115 [inline] >> io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 >> io_ring_ctx_free fs/io_uring.c:8408 [inline] >> io_ring_exit_work+0x82/0x9a0 fs/io_uring.c:8539 >> process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 >> worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 >> kthread+0x3b1/0x4a0 kernel/kthread.c:292 >> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 >> >> >> --- >> This report is generated by a bot. It may contain errors. >> See https://goo.gl/tpsmEJ for more information about syzbot. >> syzbot engineers can be reached at syzkal...@googlegroups.com. >> >> syzbot will keep track of this issue. See: >> https://goo.gl/tpsmEJ#status for how to communicate with syzbot. >> > -- Pavel Begunkov
Re: [syzbot] possible deadlock in io_sq_thread_finish
On 07/03/2021 09:49, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit:a38fd874 Linux 5.12-rc2 > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=143ee02ad0 > kernel config: https://syzkaller.appspot.com/x/.config?x=db9c6adb4986f2f2 > dashboard link: https://syzkaller.appspot.com/bug?extid=ac39856cb1b332dbbdda > > Unfortunately, I don't have any reproducer for this issue yet. > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+ac39856cb1b332dbb...@syzkaller.appspotmail.com Legit error, park() might take an sqd lock, and then we take it again. I'll patch it up > > > WARNING: possible recursive locking detected > 5.12.0-rc2-syzkaller #0 Not tainted > > kworker/u4:7/7615 is trying to acquire lock: > 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_stop > fs/io_uring.c:7099 [inline] > 888144a02870 (>lock){+.+.}-{3:3}, at: io_put_sq_data > fs/io_uring.c:7115 [inline] > 888144a02870 (>lock){+.+.}-{3:3}, at: > io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 > > but task is already holding lock: > 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park > fs/io_uring.c:7088 [inline] > 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park+0x63/0xc0 > fs/io_uring.c:7082 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(>lock); > lock(>lock); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 3 locks held by kworker/u4:7/7615: > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline] > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline] > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > atomic_long_set include/asm-generic/atomic-long.h:41 [inline] > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > set_work_data kernel/workqueue.c:616 [inline] > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline] > #0: 888010469138 ((wq_completion)events_unbound){+.+.}-{0:0}, at: > process_one_work+0x871/0x1600 kernel/workqueue.c:2246 > #1: c900023a7da8 ((work_completion)(>exit_work)){+.+.}-{0:0}, at: > process_one_work+0x8a5/0x1600 kernel/workqueue.c:2250 > #2: 888144a02870 (>lock){+.+.}-{3:3}, at: io_sq_thread_park > fs/io_uring.c:7088 [inline] > #2: 888144a02870 (>lock){+.+.}-{3:3}, at: > io_sq_thread_park+0x63/0xc0 fs/io_uring.c:7082 > > stack backtrace: > CPU: 1 PID: 7615 Comm: kworker/u4:7 Not tainted 5.12.0-rc2-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > Google 01/01/2011 > Workqueue: events_unbound io_ring_exit_work > Call Trace: > __dump_stack lib/dump_stack.c:79 [inline] > dump_stack+0x141/0x1d7 lib/dump_stack.c:120 > print_deadlock_bug kernel/locking/lockdep.c:2829 [inline] > check_deadlock kernel/locking/lockdep.c:2872 [inline] > validate_chain kernel/locking/lockdep.c:3661 [inline] > __lock_acquire.cold+0x14c/0x3b4 kernel/locking/lockdep.c:4900 > lock_acquire kernel/locking/lockdep.c:5510 [inline] > lock_acquire+0x1ab/0x740 kernel/locking/lockdep.c:5475 > __mutex_lock_common kernel/locking/mutex.c:946 [inline] > __mutex_lock+0x139/0x1120 kernel/locking/mutex.c:1093 > io_sq_thread_stop fs/io_uring.c:7099 [inline] > io_put_sq_data fs/io_uring.c:7115 [inline] > io_sq_thread_finish+0x408/0x650 fs/io_uring.c:7139 > io_ring_ctx_free fs/io_uring.c:8408 [inline] > io_ring_exit_work+0x82/0x9a0 fs/io_uring.c:8539 > process_one_work+0x98d/0x1600 kernel/workqueue.c:2275 > worker_thread+0x64c/0x1120 kernel/workqueue.c:2421 > kthread+0x3b1/0x4a0 kernel/kthread.c:292 > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkal...@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > -- Pavel Begunkov