On 2015/2/9 18:12, Kevin Wolf wrote: > Am 09.02.2015 um 10:36 hat Bin Wu geschrieben: >> On 2015/2/9 16:12, Fam Zheng wrote: >>> On Sat, 02/07 17:51, w00214312 wrote: >>>> From: Bin Wu <wu.wu...@huawei.com> >>>> >>>> When a coroutine holds a lock, other coroutines who want to get >>>> the lock must wait on a co_queue by adding themselves to the >>>> CoQueue. However, if a waiting coroutine is woken up with the >>>> lock still be holding by other coroutine, this waiting coroutine >>> >>> Could you explain who wakes up the waiting coroutine? Maybe the bug is that >>> it shouldn't be awaken in the first place. >>> >> >> During the mirror phase with nbd devices, if we send a cancel command or >> physical network breaks down, the source qemu process will receive a readable >> event and the main loop will invoke nbd_reply_ready to deal with it. This >> function finds the connection is down and then goes into >> nbd_teardown_connection. nbd_teardown_connection wakes up all working >> coroutines >> by nbd_recv_coroutines_enter_all. These coroutines include the one which >> holds >> the sending lock, the ones which wait for the lock, and the ones which wait >> for >> receiving messages. > > This is the bug. It's not allowed to reenter a coroutine if you don't > know its state. NBD needs a fix, not the the generic coroutine > infrastructure. > > If we want to change anything in the lock implementation, it should be > adding an assertion to catch such violations of the rule. (Untested, but > I think the assertion should hold true.) > > Kevin > > diff --git a/qemu-coroutine-lock.c b/qemu-coroutine-lock.c > index e4860ae..25fc111 100644 > --- a/qemu-coroutine-lock.c > +++ b/qemu-coroutine-lock.c > @@ -123,9 +123,8 @@ void coroutine_fn qemu_co_mutex_lock(CoMutex *mutex) > > trace_qemu_co_mutex_lock_entry(mutex, self); > > - while (mutex->locked) { > - qemu_co_queue_wait(&mutex->queue); > - } > + qemu_co_queue_wait(&mutex->queue); > + assert(!mutex->locked); > > mutex->locked = true; > > . >
I see. paolo's patch in his first reply can fix the bug in NBD (not in coroutine). I will complete that patch to do some tests and send another version latter. -- Bin Wu