[PATCH] bdi: Fix oops in wb_workfn()

2018-05-03 Thread Jan Kara
Syzbot has reported that it can hit a NULL pointer dereference in
wb_workfn() due to wb->bdi->dev being NULL. This indicates that
wb_workfn() was called for an already unregistered bdi which should not
happen as wb_shutdown() called from bdi_unregister() should make sure
all pending writeback works are completed before bdi is unregistered.
Except that wb_workfn() itself can requeue the work with:

mod_delayed_work(bdi_wq, &wb->dwork, 0);

and if this happens while wb_shutdown() is waiting in:

flush_delayed_work(&wb->dwork);

the dwork can get executed after wb_shutdown() has finished and
bdi_unregister() has cleared wb->bdi->dev.

Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
the necessary precautions against racing with bdi unregistration.

CC: Tetsuo Handa 
CC: Tejun Heo 
Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
Reported-by: syzbot 
Signed-off-by: Jan Kara 
---
 fs/fs-writeback.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 47d7c151fcba..471d863958bc 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
}
 
if (!list_empty(&wb->work_list))
-   mod_delayed_work(bdi_wq, &wb->dwork, 0);
+   wb_wakeup(wb);
else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
wb_wakeup_delayed(wb);
 
-- 
2.13.6



Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-03 Thread Dave Chinner
On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
> Syzbot has reported that it can hit a NULL pointer dereference in
> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> wb_workfn() was called for an already unregistered bdi which should not
> happen as wb_shutdown() called from bdi_unregister() should make sure
> all pending writeback works are completed before bdi is unregistered.
> Except that wb_workfn() itself can requeue the work with:
> 
>   mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 
> and if this happens while wb_shutdown() is waiting in:
> 
>   flush_delayed_work(&wb->dwork);
> 
> the dwork can get executed after wb_shutdown() has finished and
> bdi_unregister() has cleared wb->bdi->dev.
> 
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.
> 
> CC: Tetsuo Handa 
> CC: Tejun Heo 
> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> Reported-by: syzbot 
> Signed-off-by: Jan Kara 
> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 47d7c151fcba..471d863958bc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>   }
>  
>   if (!list_empty(&wb->work_list))
> - mod_delayed_work(bdi_wq, &wb->dwork, 0);
> + wb_wakeup(wb);
>   else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>   wb_wakeup_delayed(wb);

Yup, looks fine - I can't see any more of these open coded wakeup,
either, so we should be good here.

Reviewed-by: Dave Chinner 

As an aside, why is half the wb infrastructure in fs/fs-writeback.c
and the other half in mm/backing-dev.c? it seems pretty random as to
what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
identical, but are in completely different files...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-03 Thread Jens Axboe
On 5/3/18 3:55 PM, Dave Chinner wrote:
> On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
>> Syzbot has reported that it can hit a NULL pointer dereference in
>> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
>> wb_workfn() was called for an already unregistered bdi which should not
>> happen as wb_shutdown() called from bdi_unregister() should make sure
>> all pending writeback works are completed before bdi is unregistered.
>> Except that wb_workfn() itself can requeue the work with:
>>
>>  mod_delayed_work(bdi_wq, &wb->dwork, 0);
>>
>> and if this happens while wb_shutdown() is waiting in:
>>
>>  flush_delayed_work(&wb->dwork);
>>
>> the dwork can get executed after wb_shutdown() has finished and
>> bdi_unregister() has cleared wb->bdi->dev.
>>
>> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
>> the necessary precautions against racing with bdi unregistration.
>>
>> CC: Tetsuo Handa 
>> CC: Tejun Heo 
>> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
>> Reported-by: syzbot 
>> Signed-off-by: Jan Kara 
>> ---
>>  fs/fs-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 47d7c151fcba..471d863958bc 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>>  }
>>  
>>  if (!list_empty(&wb->work_list))
>> -mod_delayed_work(bdi_wq, &wb->dwork, 0);
>> +wb_wakeup(wb);
>>  else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>>  wb_wakeup_delayed(wb);
> 
> Yup, looks fine - I can't see any more of these open coded wakeup,
> either, so we should be good here.
> 
> Reviewed-by: Dave Chinner 
> 
> As an aside, why is half the wb infrastructure in fs/fs-writeback.c
> and the other half in mm/backing-dev.c? it seems pretty random as to
> what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
> identical, but are in completely different files...

That's always bothered me too, it's due for a cleanup and bringing it
all into one location.

-- 
Jens Axboe



Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-03 Thread Tetsuo Handa
Jan Kara wrote:
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.

Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave
list_empty(&wb->work_list) == false situation? Who takes over the role of making
list_empty(&wb->work_list) == true?

Just a confirmation, for Fabiano Rosas is facing a problem that "write call
hangs in kernel space after virtio hot-remove" and is thinking that we might
need to go the opposite direction
( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451...@linux.ibm.com ).


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-09 Thread Jan Kara
On Fri 04-05-18 07:35:34, Tetsuo Handa wrote:
> Jan Kara wrote:
> > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > the necessary precautions against racing with bdi unregistration.
> 
> Yes, this patch will solve NULL pointer dereference bug. But is it OK to
> leave list_empty(&wb->work_list) == false situation? Who takes over the
> role of making list_empty(&wb->work_list) == true?

That's a good question. The reason is the last running instance of
wb_workfn() cannot leave with the work_list non-empty. Once WB_registered
is cleared we cannot add new entries to work_list. Then we'll queue and
flush last wb_workfn() to clean up the list. The problem with NULL ptr
deref has been triggered not by this last running wb_workfn() but by one
running independently in parallel to wb_shutdown(). So something like:

CPU0CPU1CPU2
wb_workfn()
  do {
...
  } while (!list_empty(&wb->work_list));
wb_queue_work()
  if (test_bit(WB_registered, &wb->state)) {
list_add_tail(&work->list, &wb->work_list);
mod_delayed_work(bdi_wq, &wb->dwork, 0);
  }
wb_shutdown()
  if 
(!test_and_clear_bit(WB_registered, &wb->state)) {
  ...
  mod_delayed_work(bdi_wq, 
&wb->dwork, 0);
  
flush_delayed_work(&wb->dwork);
  if (!list_empty(&wb->work_list))
mod_delayed_work(bdi_wq, &wb->dwork, 0); -> queues buggy work

> Just a confirmation, for Fabiano Rosas is facing a problem that "write call
> hangs in kernel space after virtio hot-remove" and is thinking that we might
> need to go the opposite direction
> ( http://lkml.kernel.org/r/f0787b79-1e50-5f55-a400-44f715451...@linux.ibm.com 
> ).

Yes, I'm aware of that report and I think it should be solved
differently than what Fabiano suggests.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-09 Thread Jan Kara
On Fri 04-05-18 07:55:58, Dave Chinner wrote:
> On Thu, May 03, 2018 at 06:26:26PM +0200, Jan Kara wrote:
> > Syzbot has reported that it can hit a NULL pointer dereference in
> > wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> > wb_workfn() was called for an already unregistered bdi which should not
> > happen as wb_shutdown() called from bdi_unregister() should make sure
> > all pending writeback works are completed before bdi is unregistered.
> > Except that wb_workfn() itself can requeue the work with:
> > 
> > mod_delayed_work(bdi_wq, &wb->dwork, 0);
> > 
> > and if this happens while wb_shutdown() is waiting in:
> > 
> > flush_delayed_work(&wb->dwork);
> > 
> > the dwork can get executed after wb_shutdown() has finished and
> > bdi_unregister() has cleared wb->bdi->dev.
> > 
> > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > the necessary precautions against racing with bdi unregistration.
> > 
> > CC: Tetsuo Handa 
> > CC: Tejun Heo 
> > Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> > Reported-by: syzbot 
> > Signed-off-by: Jan Kara 
> > ---
> >  fs/fs-writeback.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 47d7c151fcba..471d863958bc 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
> > }
> >  
> > if (!list_empty(&wb->work_list))
> > -   mod_delayed_work(bdi_wq, &wb->dwork, 0);
> > +   wb_wakeup(wb);
> > else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
> > wb_wakeup_delayed(wb);
> 
> Yup, looks fine - I can't see any more of these open coded wakeup,
> either, so we should be good here.
> 
> Reviewed-by: Dave Chinner 

Thanks!

> As an aside, why is half the wb infrastructure in fs/fs-writeback.c
> and the other half in mm/backing-dev.c? it seems pretty random as to
> what is where e.g. wb_wakeup() and wb_wakeup_delayed() are almost
> identical, but are in completely different files...

Yeah, it deserves a cleanup.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-09 Thread Jan Kara
On Thu 03-05-18 18:26:26, Jan Kara wrote:
> Syzbot has reported that it can hit a NULL pointer dereference in
> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
> wb_workfn() was called for an already unregistered bdi which should not
> happen as wb_shutdown() called from bdi_unregister() should make sure
> all pending writeback works are completed before bdi is unregistered.
> Except that wb_workfn() itself can requeue the work with:
> 
>   mod_delayed_work(bdi_wq, &wb->dwork, 0);
> 
> and if this happens while wb_shutdown() is waiting in:
> 
>   flush_delayed_work(&wb->dwork);
> 
> the dwork can get executed after wb_shutdown() has finished and
> bdi_unregister() has cleared wb->bdi->dev.
> 
> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> the necessary precautions against racing with bdi unregistration.
> 
> CC: Tetsuo Handa 
> CC: Tejun Heo 
> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
> Reported-by: syzbot 
> Signed-off-by: Jan Kara 
> ---
>  fs/fs-writeback.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Jens, can you please pick up this patch? Probably for the next merge window
(I don't see a reason to rush this at this point in release cycle). Thanks!

Honza

> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 47d7c151fcba..471d863958bc 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -1961,7 +1961,7 @@ void wb_workfn(struct work_struct *work)
>   }
>  
>   if (!list_empty(&wb->work_list))
> - mod_delayed_work(bdi_wq, &wb->dwork, 0);
> + wb_wakeup(wb);
>   else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>   wb_wakeup_delayed(wb);
>  
> -- 
> 2.13.6
> 
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-09 Thread Jens Axboe
On 5/9/18 4:31 AM, Jan Kara wrote:
> On Thu 03-05-18 18:26:26, Jan Kara wrote:
>> Syzbot has reported that it can hit a NULL pointer dereference in
>> wb_workfn() due to wb->bdi->dev being NULL. This indicates that
>> wb_workfn() was called for an already unregistered bdi which should not
>> happen as wb_shutdown() called from bdi_unregister() should make sure
>> all pending writeback works are completed before bdi is unregistered.
>> Except that wb_workfn() itself can requeue the work with:
>>
>>  mod_delayed_work(bdi_wq, &wb->dwork, 0);
>>
>> and if this happens while wb_shutdown() is waiting in:
>>
>>  flush_delayed_work(&wb->dwork);
>>
>> the dwork can get executed after wb_shutdown() has finished and
>> bdi_unregister() has cleared wb->bdi->dev.
>>
>> Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
>> the necessary precautions against racing with bdi unregistration.
>>
>> CC: Tetsuo Handa 
>> CC: Tejun Heo 
>> Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977
>> Reported-by: syzbot 
>> Signed-off-by: Jan Kara 
>> ---
>>  fs/fs-writeback.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Jens, can you please pick up this patch? Probably for the next merge window
> (I don't see a reason to rush this at this point in release cycle). Thanks!

Looks like I never replied that back, but I did pick it up, and it did
in fact go out last week for this series. So we should be all good. I
didn't see a need to postpone it, it's obviously correct and fixes
a real issue.

-- 
Jens Axboe



Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-19 Thread Tetsuo Handa
Tetsuo Handa wrote:
> Jan Kara wrote:
> > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > the necessary precautions against racing with bdi unregistration.
> 
> Yes, this patch will solve NULL pointer dereference bug. But is it OK to leave
> list_empty(&wb->work_list) == false situation? Who takes over the role of 
> making
> list_empty(&wb->work_list) == true?

syzbot is again reporting the same NULL pointer dereference.

  general protection fault in wb_workfn (2)
  https://syzkaller.appspot.com/bug?id=e0818ccb7e46190b3f1038b0c794299208ed4206

Didn't we overlook something obvious in commit b8b784958eccbf8f ("bdi: Fix oops 
in wb_workfn()") ?

At first, I thought that that commit will solve NULL pointer dereference bug.
But what does

if (!list_empty(&wb->work_list))
-   mod_delayed_work(bdi_wq, &wb->dwork, 0);
+   wb_wakeup(wb);
else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
wb_wakeup_delayed(wb);

mean?

static void wb_wakeup(struct bdi_writeback *wb)
{
spin_lock_bh(&wb->work_lock);
if (test_bit(WB_registered, &wb->state))
mod_delayed_work(bdi_wq, &wb->dwork, 0);
spin_unlock_bh(&wb->work_lock);
}

It means nothing but "we don't call mod_delayed_work() if WB_registered bit was
already cleared".

But if WB_registered bit is not yet cleared when we hit wb_wakeup_delayed() 
path?

void wb_wakeup_delayed(struct bdi_writeback *wb)
{
unsigned long timeout;

timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
spin_lock_bh(&wb->work_lock);
if (test_bit(WB_registered, &wb->state))
queue_delayed_work(bdi_wq, &wb->dwork, timeout);
spin_unlock_bh(&wb->work_lock);
}

add_timer() is called because (presumably) timeout > 0. And after that timeout
expires, __queue_work() is called even if WB_registered bit is already cleared
before that timeout expires, isn't it?

void delayed_work_timer_fn(struct timer_list *t)
{
struct delayed_work *dwork = from_timer(dwork, t, timer);

/* should have been called from irqsafe timer with irq already off */
__queue_work(dwork->cpu, dwork->wq, &dwork->work);
}

Then, wb_workfn() is after all scheduled even if we check for WB_registered bit,
isn't it?

Then, don't we need to check that

mod_delayed_work(bdi_wq, &wb->dwork, 0);
flush_delayed_work(&wb->dwork);

is really waiting for completion? At least, shouldn't we try below debug output
(not only for debugging this report but also generally desirable)?

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7441bd9..ccec8cd 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -376,8 +376,10 @@ static void wb_shutdown(struct bdi_writeback *wb)
 * tells wb_workfn() that @wb is dying and its work_list needs to
 * be drained no matter what.
 */
-   mod_delayed_work(bdi_wq, &wb->dwork, 0);
-   flush_delayed_work(&wb->dwork);
+   if (!mod_delayed_work(bdi_wq, &wb->dwork, 0))
+   printk(KERN_WARNING "wb_shutdown: mod_delayed_work() failed\n");
+   if (!flush_delayed_work(&wb->dwork))
+   printk(KERN_WARNING "wb_shutdown: flush_delayed_work() 
failed\n");
WARN_ON(!list_empty(&wb->work_list));
/*
 * Make sure bit gets cleared after shutdown is finished. Matches with


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-21 Thread Jan Kara
On Sat 19-05-18 23:27:09, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Jan Kara wrote:
> > > Make wb_workfn() use wakeup_wb() for requeueing the work which takes all
> > > the necessary precautions against racing with bdi unregistration.
> > 
> > Yes, this patch will solve NULL pointer dereference bug. But is it OK to 
> > leave
> > list_empty(&wb->work_list) == false situation? Who takes over the role of 
> > making
> > list_empty(&wb->work_list) == true?
> 
> syzbot is again reporting the same NULL pointer dereference.
> 
>   general protection fault in wb_workfn (2)
>   
> https://syzkaller.appspot.com/bug?id=e0818ccb7e46190b3f1038b0c794299208ed4206

Gaah... So we are still missing something.

> Didn't we overlook something obvious in commit b8b784958eccbf8f ("bdi:
> Fix oops in wb_workfn()") ?
> 
> At first, I thought that that commit will solve NULL pointer dereference bug.
> But what does
> 
>   if (!list_empty(&wb->work_list))
> - mod_delayed_work(bdi_wq, &wb->dwork, 0);
> + wb_wakeup(wb);
>   else if (wb_has_dirty_io(wb) && dirty_writeback_interval)
>   wb_wakeup_delayed(wb);
> 
> mean?
> 
> static void wb_wakeup(struct bdi_writeback *wb)
> {
>   spin_lock_bh(&wb->work_lock);
>   if (test_bit(WB_registered, &wb->state))
>   mod_delayed_work(bdi_wq, &wb->dwork, 0);
>   spin_unlock_bh(&wb->work_lock);
> }
> 
> It means nothing but "we don't call mod_delayed_work() if WB_registered
> bit was already cleared".

Exactly.

> But if WB_registered bit is not yet cleared when we hit
> wb_wakeup_delayed() path?
> 
> void wb_wakeup_delayed(struct bdi_writeback *wb)
> {
>   unsigned long timeout;
> 
>   timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
>   spin_lock_bh(&wb->work_lock);
>   if (test_bit(WB_registered, &wb->state))
>   queue_delayed_work(bdi_wq, &wb->dwork, timeout);
>   spin_unlock_bh(&wb->work_lock);
> }
> 
> add_timer() is called because (presumably) timeout > 0. And after that
> timeout expires, __queue_work() is called even if WB_registered bit is
> already cleared before that timeout expires, isn't it?

Yes.

> void delayed_work_timer_fn(struct timer_list *t)
> {
>   struct delayed_work *dwork = from_timer(dwork, t, timer);
> 
>   /* should have been called from irqsafe timer with irq already off */
>   __queue_work(dwork->cpu, dwork->wq, &dwork->work);
> }
> 
> Then, wb_workfn() is after all scheduled even if we check for
> WB_registered bit, isn't it?

It can be queued after WB_registered bit is cleared but it cannot be queued
after mod_delayed_work(bdi_wq, &wb->dwork, 0) has finished. That function
deletes the pending timer (the timer cannot be armed again because
WB_registered is cleared) and queues what should be the last round of
wb_workfn().

> Then, don't we need to check that
> 
>   mod_delayed_work(bdi_wq, &wb->dwork, 0);
>   flush_delayed_work(&wb->dwork);
> 
> is really waiting for completion? At least, shouldn't we try below debug
> output (not only for debugging this report but also generally desirable)?
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7441bd9..ccec8cd 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -376,8 +376,10 @@ static void wb_shutdown(struct bdi_writeback *wb)
>* tells wb_workfn() that @wb is dying and its work_list needs to
>* be drained no matter what.
>*/
> - mod_delayed_work(bdi_wq, &wb->dwork, 0);
> - flush_delayed_work(&wb->dwork);
> + if (!mod_delayed_work(bdi_wq, &wb->dwork, 0))
> + printk(KERN_WARNING "wb_shutdown: mod_delayed_work() failed\n");

false return from mod_delayed_work() just means that there was no timer
armed. That is a valid situation if there are no dirty data.

> + if (!flush_delayed_work(&wb->dwork))
> + printk(KERN_WARNING "wb_shutdown: flush_delayed_work() 
> failed\n");

And this is valid as well (although unlikely) if the work managed to
complete on another CPU before flush_delayed_work() was called.

So I don't think your warnings will help us much. But yes, we need to debug
this somehow. For now I have no idea what could be still going wrong.

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH] bdi: Fix oops in wb_workfn()

2018-05-25 Thread Tetsuo Handa
Jan Kara wrote:
> > void delayed_work_timer_fn(struct timer_list *t)
> > {
> > struct delayed_work *dwork = from_timer(dwork, t, timer);
> > 
> > /* should have been called from irqsafe timer with irq already off */
> > __queue_work(dwork->cpu, dwork->wq, &dwork->work);
> > }
> > 
> > Then, wb_workfn() is after all scheduled even if we check for
> > WB_registered bit, isn't it?
> 
> It can be queued after WB_registered bit is cleared but it cannot be queued
> after mod_delayed_work(bdi_wq, &wb->dwork, 0) has finished. That function
> deletes the pending timer (the timer cannot be armed again because
> WB_registered is cleared) and queues what should be the last round of
> wb_workfn().

mod_delayed_work() deletes the pending timer but does not wait for already
invoked timer handler to complete because it is using del_timer() rather than
del_timer_sync(). Then, what happens if __queue_work() is almost concurrently
executed from two CPUs, one from mod_delayed_work(bdi_wq, &wb->dwork, 0) from
wb_shutdown() path (which is called without spin_lock_bh(&wb->work_lock)) and
the other from delayed_work_timer_fn() path (which is called without checking
WB_registered bit under spin_lock_bh(&wb->work_lock)) ?

wb_wakeup_delayed() {
  spin_lock_bh(&wb->work_lock);
  if (test_bit(WB_registered, &wb->state)) // succeeds
queue_delayed_work(bdi_wq, &wb->d_work, timeout) {
  queue_delayed_work_on(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work, timeout) {
 if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, 
work_data_bits(&wb->d_work.work))) { // succeeds
   __queue_delayed_work(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work, timeout) 
{
 add_timer(timer); // schedules for delayed_work_timer_fn()
   }
 }
  }
}
  spin_unlock_bh(&wb->work_lock);
}

delayed_work_timer_fn() {
  // del_timer() already returns false at this point because this timer
  // is already inside handler. But something took long here enough to
  // wait for __queue_work() from wb_shutdown() path to finish?
  __queue_work(WORK_CPU_UNBOUND, bdi_wq, &wb->d_work.work) {
insert_work(pwq, work, worklist, work_flags);
  }
}

wb_shutdown() {
  mod_delayed_work(bdi_wq, &wb->dwork, 0) {
mod_delayed_work_on(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork, 0) {
  ret = try_to_grab_pending(&wb->dwork.work, true, &flags) {
if (likely(del_timer(&wb->dwork.timer))) // fails because already in 
delayed_work_timer_fn()
  return 1;
if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, 
work_data_bits(&wb->dwork.work))) // fails because already set by 
queue_delayed_work()
  return 0;
// Returns 1 or -ENOENT after doing something?
  }
  if (ret >= 0)
__queue_delayed_work(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork, 0) {
  __queue_work(WORK_CPU_UNBOUND, bdi_wq, &wb->dwork.work) {
insert_work(pwq, work, worklist, work_flags);
  }
}
}
  }
}