On 9/23/19 7:06 PM, Tejun Heo wrote:
> finish_writeback_work() reads @done->waitq after decrementing
> @done->cnt.  However, once @done->cnt reaches zero, @done may be freed
> (from stack) at any moment and @done->waitq can contain something
> unrelated by the time finish_writeback_work() tries to read it.  This
> led to the following crash.
> 
>    "BUG: kernel NULL pointer dereference, address: 0000000000000002"
>    #PF: supervisor write access in kernel mode
>    #PF: error_code(0x0002) - not-present page
>    PGD 0 P4D 0
>    Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
>    CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
>    ...
>    Workqueue: writeback wb_workfn (flush-btrfs-1)
>    RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
>    Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 
> 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 
> 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
>    RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
>    RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
>    RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
>    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
>    R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
>    R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
>    FS:  0000000000000000(0000) GS:ffff88bfdfa00000(0000) 
> knlGS:0000000000000000
>    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>    CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
>    Call Trace:
>     __wake_up_common_lock+0x63/0xc0
>     wb_workfn+0xd2/0x3e0
>     process_one_work+0x1f5/0x3f0
>     worker_thread+0x2d/0x3d0
>     kthread+0x111/0x130
>     ret_from_fork+0x1f/0x30
> 
> Fix it by reading and caching @done->waitq before decrementing
> @done->cnt.

That's some nice debugging work.

Reviewed-by: Jens Axboe <ax...@kernel.dk>


> Signed-off-by: Tejun Heo <t...@kernel.org>
> Debugged-by: Chris Mason <c...@fb.com>
> Fixes: 30638b0125e1 ("writeback: Generalize and expose wb_completion")

This seems wrong, though:

commit 5b9cce4c7eb0696558dfd4946074ae1fb9d8f05d
Author: Tejun Heo <t...@kernel.org>
Date:   Mon Aug 26 09:06:52 2019 -0700

    writeback: Generalize and expose wb_completion

-- 
Jens Axboe

Reply via email to