On 2015/2/9 18:12, Kevin Wolf wrote:
> Am 09.02.2015 um 10:36 hat Bin Wu geschrieben:
>> On 2015/2/9 16:12, Fam Zheng wrote:
>>> On Sat, 02/07 17:51, w00214312 wrote:
>>>> From: Bin Wu <wu.wu...@huawei.com>
>>>>
>>>> When a coroutine holds a lock, other coroutines who want to get
>>>> the lock must wait on a co_queue by adding themselves to the
>>>> CoQueue. However, if a waiting coroutine is woken up with the
>>>> lock still be holding by other coroutine, this waiting coroutine
>>>
>>> Could you explain who wakes up the waiting coroutine? Maybe the bug is that
>>> it shouldn't be awaken in the first place.
>>>
>>
>> During the mirror phase with nbd devices, if we send a cancel command or
>> physical network breaks down, the source qemu process will receive a readable
>> event and the main loop will invoke nbd_reply_ready to deal with it. This
>> function finds the connection is down and then goes into
>> nbd_teardown_connection. nbd_teardown_connection wakes up all working 
>> coroutines
>> by nbd_recv_coroutines_enter_all. These coroutines include the one which 
>> holds
>> the sending lock, the ones which wait for the lock, and the ones which wait 
>> for
>> receiving messages.
> 
> This is the bug. It's not allowed to reenter a coroutine if you don't
> know its state. NBD needs a fix, not the the generic coroutine
> infrastructure.
> 
> If we want to change anything in the lock implementation, it should be
> adding an assertion to catch such violations of the rule. (Untested, but
> I think the assertion should hold true.)
> 
> Kevin
> 
> diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c
> index e4860ae..25fc111 100644
> --- a/qemu-coroutine-lock.c
> +++ b/qemu-coroutine-lock.c
> @@ -123,9 +123,8 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex)
>  
>      trace_qemu_co_mutex_lock_entry(mutex, self);
>  
> -    while (mutex->locked) {
> -        qemu_co_queue_wait(&mutex->queue);
> -    }
> +    qemu_co_queue_wait(&mutex->queue);
> +    assert(!mutex->locked);
>  
>      mutex->locked = true;
> 
> .
> 

I see. paolo's patch in his first reply can fix the bug in NBD (not in
coroutine). I will complete that patch to do some tests and send another version
latter.

-- 
Bin Wu


Reply via email to