> 在 2019年7月19日,00:43,Jens Axboe <ax...@kernel.dk> 写道:
>
> On 7/18/19 9:41 AM, Jens Axboe wrote:
>> On 7/18/19 6:44 AM, Zhengyuan Liu wrote:
>>> There is a hang issue while using fio to do some basic test. The issue can
>>> been easily reproduced using bellow scripts:
>>>
>>> while true
>>> do
>>> fio --ioengine=io_uring -rw=write -bs=4k -numjobs=1 \
>>> -size=1G -iodepth=64 -name=uring
>>> --filename=/dev/zero
>>> done
>>>
>>> After serveral minutes, maybe more, fio would block at
>>> io_uring_enter->io_cqring_wait in order to waiting for previously committed
>>> sqes to be completed and cann't return to user anymore until we send a
>>> SIGTERM
>>> to fio. After got SIGTERM, fio turns to hang at io_ring_ctx_wait_and_kill
>>> with
>>> a backtrace like this:
>>>
>>> [54133.243816] Call Trace:
>>> [54133.243842] __schedule+0x3a0/0x790
>>> [54133.243868] schedule+0x38/0xa0
>>> [54133.243880] schedule_timeout+0x218/0x3b0
>>> [54133.243891] ? sched_clock+0x9/0x10
>>> [54133.243903] ? wait_for_completion+0xa3/0x130
>>> [54133.243916] ? _raw_spin_unlock_irq+0x2c/0x40
>>> [54133.243930] ? trace_hardirqs_on+0x3f/0xe0
>>> [54133.243951] wait_for_completion+0xab/0x130
>>> [54133.243962] ? wake_up_q+0x70/0x70
>>> [54133.243984] io_ring_ctx_wait_and_kill+0xa0/0x1d0
>>> [54133.243998] io_uring_release+0x20/0x30
>>> [54133.244008] __fput+0xcf/0x270
>>> [54133.244029] ____fput+0xe/0x10
>>> [54133.244040] task_work_run+0x7f/0xa0
>>> [54133.244056] do_exit+0x305/0xc40
>>> [54133.244067] ? get_signal+0x13b/0xbd0
>>> [54133.244088] do_group_exit+0x50/0xd0
>>> [54133.244103] get_signal+0x18d/0xbd0
>>> [54133.244112] ? _raw_spin_unlock_irqrestore+0x36/0x60
>>> [54133.244142] do_signal+0x34/0x720
>>> [54133.244171] ? exit_to_usermode_loop+0x7e/0x130
>>> [54133.244190] exit_to_usermode_loop+0xc0/0x130
>>> [54133.244209] do_syscall_64+0x16b/0x1d0
>>> [54133.244221] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>
>>> The reason is that we had added a req to ctx->pending_async at the very
>>> end, but
>>> it got no chance to be processed anymore. How could this be happened?
>>>
>>> fio#cpu0 wq#cpu1
>>>
>>> io_add_to_prev_work io_sq_wq_submit_work
>>>
>>> atomic_read() <<< 1
>>>
>>> atomic_dec_return() <<
>>> 1->0
>>> list_empty(); <<<
>>> true;
>>>
>>> list_add_tail()
>>> atomic_read() << 0 or 1?
>>>
>>> As was said in atomic_ops.rst, atomic_read does not guarantee that the
>>> runtime
>>> initialization by any other thread is visible yet, so we must take care of
>>> that
>>> with a proper implicit or explicit memory barrier;
>>
>> Thanks for looking at this and finding this issue, it does looks like a
>> problem.
>> But I'm not sure about the fix. Shouldn't we just need an
>> smp_mb__after_atomic()
>> on the atomic_dec_return() side of things? Like the below.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 5ec06e5ba0be..3c2a6f88a6b0 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1881,6 +1881,7 @@ static void io_sq_wq_submit_work(struct work_struct
>> *work)
>> */
>> if (async_list) {
>> ret = atomic_dec_return(&async_list->cnt);
>> + smp_mb__after_atomic();
>> while (!ret && !list_empty(&async_list->list)) {
>> spin_lock(&async_list->lock);
>> atomic_inc(&async_list->cnt);
>> @@ -1894,6 +1895,7 @@ static void io_sq_wq_submit_work(struct work_struct
>> *work)
>> goto restart;
>> }
>> ret = atomic_dec_return(&async_list->cnt);
>> + smp_mb__after_atomic();
>> }
>> }
>>
>>
>
> I don't think this is enough, I actually think your fix is the most
> appropriate. I will apply it, thank you!
>
Actually, although we can passed test use smp_mb(), but in the end we still do
not
understand where the race conditions are, could you explain it. If it is said
as
Zhengyuan, because of atomic_read, I think we should only need smp_rmb. but
failed.
smp_rmb can't help us pass the test. At the same time, we have tried smp_wmb,
failed too.
it seems that only smp_mb works correctly.
Is it because list_add_tail requires smp_wmb and atomic_read requires smp_rmb?
--
Jackie Liu