Re: INFO: task hung in wb_shutdown (2)
On 5/1/18 4:14 PM, Tetsuo Handa wrote: >>From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Wed, 2 May 2018 07:07:55 +0900 > Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers. > > syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in > wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi: > Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call > wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down). > > Introduce a helper function clear_and_wake_up_bit() and use it, in order > to avoid similar errors in future. Queued up, thanks Tetsuo! -- Jens Axboe
Re: INFO: task hung in wb_shutdown (2)
On Wed 02-05-18 07:14:51, Tetsuo Handa wrote: > >From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Wed, 2 May 2018 07:07:55 +0900 > Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers. > > syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in > wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi: > Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call > wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down). > > Introduce a helper function clear_and_wake_up_bit() and use it, in order > to avoid similar errors in future. > > [1] > https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e > > Signed-off-by: Tetsuo Handa > Reported-by: syzbot > Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in > cgwb_bdi_destroy()") > Cc: Tejun Heo > Cc: Jan Kara > Cc: Jens Axboe > Suggested-by: Linus Torvalds Thanks for debugging this and for the fix Tetsuo! The patch looks good to me. You can add: Reviewed-by: Jan Kara Honza > --- > include/linux/wait_bit.h | 17 + > mm/backing-dev.c | 2 +- > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h > index 9318b21..2b0072f 100644 > --- a/include/linux/wait_bit.h > +++ b/include/linux/wait_bit.h > @@ -305,4 +305,21 @@ struct wait_bit_queue_entry { > __ret; \ > }) > > +/** > + * clear_and_wake_up_bit - clear a bit and wake up anyone waiting on that bit > + * > + * @bit: the bit of the word being waited on > + * @word: the word being waited on, a kernel virtual address > + * > + * You can use this helper if bitflags are manipulated atomically rather than > + * non-atomically under a lock. > + */ > +static inline void clear_and_wake_up_bit(int bit, void *word) > +{ > + clear_bit_unlock(bit, word); > + /* See wake_up_bit() for which memory barrier you need to use. */ > + smp_mb__after_atomic(); > + wake_up_bit(word, bit); > +} > + > #endif /* _LINUX_WAIT_BIT_H */ > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index 023190c..fa5e6d7 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -383,7 +383,7 @@ static void wb_shutdown(struct bdi_writeback *wb) >* the barrier provided by test_and_clear_bit() above. >*/ > smp_wmb(); > - clear_bit(WB_shutting_down, &wb->state); > + clear_and_wake_up_bit(WB_shutting_down, &wb->state); > } > > static void wb_exit(struct bdi_writeback *wb) > -- > 1.8.3.1 -- Jan Kara SUSE Labs, CR
Re: INFO: task hung in wb_shutdown (2)
>From 1b90d7f71d60e743c69cdff3ba41edd1f9f86f93 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Wed, 2 May 2018 07:07:55 +0900 Subject: [PATCH v2] bdi: wake up concurrent wb_shutdown() callers. syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in wb_shutdown() [1]. This seems to be because commit 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down). Introduce a helper function clear_and_wake_up_bit() and use it, in order to avoid similar errors in future. [1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e Signed-off-by: Tetsuo Handa Reported-by: syzbot Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") Cc: Tejun Heo Cc: Jan Kara Cc: Jens Axboe Suggested-by: Linus Torvalds --- include/linux/wait_bit.h | 17 + mm/backing-dev.c | 2 +- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h index 9318b21..2b0072f 100644 --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -305,4 +305,21 @@ struct wait_bit_queue_entry { __ret; \ }) +/** + * clear_and_wake_up_bit - clear a bit and wake up anyone waiting on that bit + * + * @bit: the bit of the word being waited on + * @word: the word being waited on, a kernel virtual address + * + * You can use this helper if bitflags are manipulated atomically rather than + * non-atomically under a lock. + */ +static inline void clear_and_wake_up_bit(int bit, void *word) +{ + clear_bit_unlock(bit, word); + /* See wake_up_bit() for which memory barrier you need to use. */ + smp_mb__after_atomic(); + wake_up_bit(word, bit); +} + #endif /* _LINUX_WAIT_BIT_H */ diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 023190c..fa5e6d7 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -383,7 +383,7 @@ static void wb_shutdown(struct bdi_writeback *wb) * the barrier provided by test_and_clear_bit() above. */ smp_wmb(); - clear_bit(WB_shutting_down, &wb->state); + clear_and_wake_up_bit(WB_shutting_down, &wb->state); } static void wb_exit(struct bdi_writeback *wb) -- 1.8.3.1
Re: INFO: task hung in wb_shutdown (2)
On 5/1/18 10:06 AM, Linus Torvalds wrote: > On Tue, May 1, 2018 at 3:27 AM Tetsuo Handa < > penguin-ker...@i-love.sakura.ne.jp> wrote: > >> Can you review this patch? syzbot has hit this bug for nearly 4000 times > but >> is still unable to find a reproducer. Therefore, the only way to test > would be >> to apply this patch upstream and test whether the problem is solved. > > Looks ok to me, except: > >>> smp_wmb(); >>> clear_bit(WB_shutting_down, &wb->state); >>> + smp_mb(); /* advised by wake_up_bit() */ >>> + wake_up_bit(&wb->state, WB_shutting_down); > > This whole sequence really should just be a pattern with a helper function. > > And honestly, the pattern probably *should* be > > clear_bit_unlock(bit, &mem); > smp_mb__after_atomic() > wake_up_bit(&mem, bit); > > which looks like it is a bit cleaner wrt memory ordering rules. Agree, that construct looks saner than introducing a "random" smp_mb(). As a pattern helper, should probably be introduced after the fact. -- Jens Axboe
Re: INFO: task hung in wb_shutdown (2)
On 5/1/18 4:27 AM, Tetsuo Handa wrote: > Tejun, Jan, Jens, > > Can you review this patch? syzbot has hit this bug for nearly 4000 times but > is still unable to find a reproducer. Therefore, the only way to test would be > to apply this patch upstream and test whether the problem is solved. I'll review it today. -- Jens Axboe
Re: INFO: task hung in wb_shutdown (2)
On Tue, May 1, 2018 at 3:27 AM Tetsuo Handa < penguin-ker...@i-love.sakura.ne.jp> wrote: > Can you review this patch? syzbot has hit this bug for nearly 4000 times but > is still unable to find a reproducer. Therefore, the only way to test would be > to apply this patch upstream and test whether the problem is solved. Looks ok to me, except: > > smp_wmb(); > > clear_bit(WB_shutting_down, &wb->state); > > + smp_mb(); /* advised by wake_up_bit() */ > > + wake_up_bit(&wb->state, WB_shutting_down); This whole sequence really should just be a pattern with a helper function. And honestly, the pattern probably *should* be clear_bit_unlock(bit, &mem); smp_mb__after_atomic() wake_up_bit(&mem, bit); which looks like it is a bit cleaner wrt memory ordering rules. Linus
Re: INFO: task hung in wb_shutdown (2)
Tejun, Jan, Jens, Can you review this patch? syzbot has hit this bug for nearly 4000 times but is still unable to find a reproducer. Therefore, the only way to test would be to apply this patch upstream and test whether the problem is solved. On 2018/04/24 21:19, Tetsuo Handa wrote: >>From 39ed6be8a2c12dfe54feaa5abbc2ec46103022bf Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Tue, 24 Apr 2018 11:59:08 +0900 > Subject: [PATCH] bdi: wake up concurrent wb_shutdown() callers. > > syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in > wb_shutdown() [1]. This might be because commit 5318ce7d46866e1d ("bdi: > Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call > wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down). > > [1] > https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e > > Signed-off-by: Tetsuo Handa > Reported-by: syzbot > Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in > cgwb_bdi_destroy()") > Cc: Tejun Heo > Cc: Jan Kara > Cc: Jens Axboe > --- > mm/backing-dev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/backing-dev.c b/mm/backing-dev.c > index 023190c..dadac99 100644 > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -384,6 +384,8 @@ static void wb_shutdown(struct bdi_writeback *wb) >*/ > smp_wmb(); > clear_bit(WB_shutting_down, &wb->state); > + smp_mb(); /* advised by wake_up_bit() */ > + wake_up_bit(&wb->state, WB_shutting_down); > } > > static void wb_exit(struct bdi_writeback *wb) >
Re: INFO: task hung in wb_shutdown (2)
>From 39ed6be8a2c12dfe54feaa5abbc2ec46103022bf Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Tue, 24 Apr 2018 11:59:08 +0900 Subject: [PATCH] bdi: wake up concurrent wb_shutdown() callers. syzbot is reporting hung tasks at wait_on_bit(WB_shutting_down) in wb_shutdown() [1]. This might be because commit 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") forgot to call wake_up_bit(WB_shutting_down) after clear_bit(WB_shutting_down). [1] https://syzkaller.appspot.com/bug?id=b297474817af98d5796bc544e1bb806fc3da0e5e Signed-off-by: Tetsuo Handa Reported-by: syzbot Fixes: 5318ce7d46866e1d ("bdi: Shutdown writeback on all cgwbs in cgwb_bdi_destroy()") Cc: Tejun Heo Cc: Jan Kara Cc: Jens Axboe --- mm/backing-dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 023190c..dadac99 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -384,6 +384,8 @@ static void wb_shutdown(struct bdi_writeback *wb) */ smp_wmb(); clear_bit(WB_shutting_down, &wb->state); + smp_mb(); /* advised by wake_up_bit() */ + wake_up_bit(&wb->state, WB_shutting_down); } static void wb_exit(struct bdi_writeback *wb) -- 1.8.3.1
INFO: task hung in wb_shutdown (2)
Hello, syzbot hit the following crash on upstream commit 3eb2ce825ea1ad89d20f7a3b5780df850e4be274 (Sun Mar 25 22:44:30 2018 +) Linux 4.16-rc7 syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=c0cf869505e03bdf1a24 So far this crash happened 179 times on upstream. Unfortunately, I don't have any reproducer for this crash yet. Raw console output: https://syzkaller.appspot.com/x/log.txt?id=4738516814659584 Kernel config: https://syzkaller.appspot.com/x/.config?id=-8440362230543204781 compiler: gcc (GCC) 7.1.1 20170620 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+c0cf869505e03bdf1...@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer. unregister_netdevice: waiting for lo to become free. Usage count = 1 unregister_netdevice: waiting for lo to become free. Usage count = 1 unregister_netdevice: waiting for lo to become free. Usage count = 1 unregister_netdevice: waiting for lo to become free. Usage count = 1 unregister_netdevice: waiting for lo to become free. Usage count = 1 INFO: task kworker/0:5:16458 blocked for more than 120 seconds. Not tainted 4.16.0-rc7+ #368 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/0:5 D20928 16458 2 0x8000 Workqueue: events cgwb_release_workfn Call Trace: context_switch kernel/sched/core.c:2862 [inline] __schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440 schedule+0xf5/0x430 kernel/sched/core.c:3499 bit_wait+0x18/0x90 kernel/sched/wait_bit.c:250 __wait_on_bit+0x88/0x130 kernel/sched/wait_bit.c:51 out_of_line_wait_on_bit+0x204/0x3a0 kernel/sched/wait_bit.c:64 wait_on_bit include/linux/wait_bit.h:84 [inline] wb_shutdown+0x335/0x430 mm/backing-dev.c:377 cgwb_release_workfn+0x8b/0x61d mm/backing-dev.c:520 process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Showing all locks held in the system: 3 locks held by kworker/u4:1/21: #0: ((wq_completion)"%s""netns"){+.+.}, at: [] work_static include/linux/workqueue.h:198 [inline] #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] set_work_data kernel/workqueue.c:619 [inline] #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] set_work_pool_and_clear_pending kernel/workqueue.c:646 [inline] #0: ((wq_completion)"%s""netns"){+.+.}, at: [ ] process_one_work+0xb12/0x1bb0 kernel/workqueue.c:2084 #1: (net_cleanup_work){+.+.}, at: [<6c4c2cfd>] process_one_work+0xb89/0x1bb0 kernel/workqueue.c:2088 #2: (net_mutex){+.+.}, at: [<58427774>] cleanup_net+0x242/0xcb0 net/core/net_namespace.c:484 2 locks held by khungtaskd/869: #0: (rcu_read_lock){}, at: [<9066e5de>] check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline] #0: (rcu_read_lock){}, at: [<9066e5de>] watchdog+0x1c5/0xd60 kernel/hung_task.c:249 #1: (tasklist_lock){.+.+}, at: [<2117cbd8>] debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470 2 locks held by getty/4407: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4408: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4409: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4410: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4411: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4412: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<841085d3>] n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131 2 locks held by getty/4413: #0: (&tty->ldisc_sem){}, at: [<15dafb41>] ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365 #1: (&ldata->atomic_read_lock){+.+.}, at: [<