On 3 Oct 2019, at 4:41, Gao Xiang wrote:
> Hi,
>
> On Thu, Oct 03, 2019 at 04:40:22PM +1000, Dave Chinner wrote:
>> [cc linux-fsdevel, linux-block, tejun ]
>>
>> On Wed, Oct 02, 2019 at 06:52:47PM -0700, Darrick J. Wong wrote:
>>> Hi everyone,
>>>
>>> Does anyone /else/ see this crash in generic/299 on a V4 filesystem
>>> (tho
>>> afaict V5 configs crash too) and a 5.4-rc1 kernel? It seems to pop
>>> up
>>> on generic/299 though only 80% of the time.
>>>
>
> Just a quick glance, I guess there could is a race between (complete
> guess):
>
>
> 160 static void finish_writeback_work(struct bdi_writeback *wb,
> 161 struct wb_writeback_work *work)
> 162 {
> 163 struct wb_completion *done = work->done;
> 164
> 165 if (work->auto_free)
> 166 kfree(work);
> 167 if (done && atomic_dec_and_test(&done->cnt))
>
> ^^^ here
>
> 168 wake_up_all(done->waitq);
> 169 }
>
> since new wake_up_all(done->waitq); is completely on-stack,
> if (done && atomic_dec_and_test(&done->cnt))
> - wake_up_all(&wb->bdi->wb_waitq);
> + wake_up_all(done->waitq);
> }
>
> which could cause use after free if on-stack wb_completion is gone...
> (however previous wb->bdi is solid since it is not on-stack)
>
> see generic on-stack completion which takes a wait_queue spin_lock
> between
> test and wake_up...
>
> If I am wrong, ignore me, hmm...
It's a good guess ;) Jens should have this queued up already:
https://lkml.org/lkml/2019/9/23/972
-chris