On 5/16/19 2:53 AM, Roman Penyaev wrote:
> This fixes couple of races which lead to infinite wait of park completion
> with the following backtraces:
>
> [20801.303319] Call Trace:
> [20801.303321] ? __schedule+0x284/0x650
> [20801.303323] schedule+0x33/0xc0
> [20801.303324] schedule_timeout+0x1bc/0x210
> [20801.303326] ? schedule+0x3d/0xc0
> [20801.303327] ? schedule_timeout+0x1bc/0x210
> [20801.303329] ? preempt_count_add+0x79/0xb0
> [20801.303330] wait_for_completion+0xa5/0x120
> [20801.303331] ? wake_up_q+0x70/0x70
> [20801.303333] kthread_park+0x48/0x80
> [20801.303335] io_finish_async+0x2c/0x70
> [20801.303336] io_ring_ctx_wait_and_kill+0x95/0x180
> [20801.303338] io_uring_release+0x1c/0x20
> [20801.303339] __fput+0xad/0x210
> [20801.303341] task_work_run+0x8f/0xb0
> [20801.303342] exit_to_usermode_loop+0xa0/0xb0
> [20801.303343] do_syscall_64+0xe0/0x100
> [20801.303349] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> [20801.303380] Call Trace:
> [20801.303383] ? __schedule+0x284/0x650
> [20801.303384] schedule+0x33/0xc0
> [20801.303386] io_sq_thread+0x38a/0x410
> [20801.303388] ? __switch_to_asm+0x40/0x70
> [20801.303390] ? wait_woken+0x80/0x80
> [20801.303392] ? _raw_spin_lock_irqsave+0x17/0x40
> [20801.303394] ? io_submit_sqes+0x120/0x120
> [20801.303395] kthread+0x112/0x130
> [20801.303396] ? kthread_create_on_node+0x60/0x60
> [20801.303398] ret_from_fork+0x35/0x40
>
> o kthread_park() waits for park completion, so io_sq_thread() loop
> should check kthread_should_park() along with khread_should_stop(),
> otherwise if kthread_park() is called before prepare_to_wait()
> the following schedule() never returns:
>
> CPU#0 CPU#1
>
> io_sq_thread_stop(): io_sq_thread():
>
> while(!kthread_should_stop() &&
> !ctx->sqo_stop) {
>
> ctx->sqo_stop = 1;
> kthread_park()
>
> prepare_to_wait();
> if (kthread_should_stop() {
> }
> schedule(); <<< nobody checks park flag,
> <<< so schedule and never
> return
>
> o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop
> it is quite possible, that kthread_should_park() check and the
> following kthread_parkme() is never called, because kthread_park()
> has not been yet called, but few moments later is is called and
> waits there for park completion, which never happens, because
> kthread has already exited:
>
> CPU#0 CPU#1
>
> io_sq_thread_stop(): io_sq_thread():
>
> ctx->sqo_stop = 1;
> while(!kthread_should_stop() &&
> !ctx->sqo_stop) {
> <<< observe sqo_stop and exit the loop
> }
>
> if (kthread_should_park())
> kthread_parkme(); <<< never called, since
> was
> <<< never parked
>
> kthread_park() <<< waits forever for park completion
>
> In the current patch we quit the loop by only kthread_should_park()
> check (kthread_park() is synchronous, so kthread_should_stop() is
> never observed), and we abandon ->sqo_stop flag, since it is racy.
> At the end of the io_sq_thread() we unconditionally call parmke(),
> since we've exited the loop by the park flag.
Thanks Roman, looks (and tests) good. Applied.
--
Jens Axboe