On 24/10/2016 03:44, Changlong Xie wrote:
> Ping. Any comments? It's really a problem for NBD.

Sorry, I haven't been sending pull requests.  I'll do it this week.

Paolo

> Thanks
>     -Xie
> 
> On 10/12/2016 06:18 PM, Changlong Xie wrote:
>> NBD is using the CoMutex in a way that wasn't anticipated. For
>> example, if there are
>> N(N=26, MAX_NBD_REQUESTS=16) nbd write requests, so we will invoke
>> nbd_client_co_pwritev
>> N times.
>> ----------------------------------------------------------------------------------------
>>
>> time request Actions
>> 1    1       in_flight=1, Coroutine=C1
>> 2    2       in_flight=2, Coroutine=C2
>> ...
>> 15   15      in_flight=15, Coroutine=C15
>> 16   16      in_flight=16, Coroutine=C16, free_sema->holder=C16,
>> mutex->locked=true
>> 17   17      in_flight=16, Coroutine=C17, queue C17 into free_sema->queue
>> 18   18      in_flight=16, Coroutine=C18, queue C18 into free_sema->queue
>> ...
>> 26   N       in_flight=16, Coroutine=C26, queue C26 into free_sema->queue
>> ----------------------------------------------------------------------------------------
>>
>>
>> Once nbd client recieves request No.16' reply, we will re-enter C16.
>> It's ok, because
>> it's equal to 'free_sema->holder'.
>> ----------------------------------------------------------------------------------------
>>
>> time request Actions
>> 27   16      in_flight=15, Coroutine=C16, free_sema->holder=C16,
>> mutex->locked=false
>> ----------------------------------------------------------------------------------------
>>
>>
>> Then nbd_coroutine_end invokes qemu_co_mutex_unlock what will pop
>> coroutines from
>> free_sema->queue's head and enter C17. More free_sema->holder is C17 now.
>> ----------------------------------------------------------------------------------------
>>
>> time request Actions
>> 28   17      in_flight=16, Coroutine=C17, free_sema->holder=C17,
>> mutex->locked=true
>> ----------------------------------------------------------------------------------------
>>
>>
>> In above scenario, we only recieves request No.16' reply. As time goes
>> by, nbd client will
>> almostly recieves replies from requests 1 to 15 rather than request 17
>> who owns C17. In this
>> case, we will encounter assert "mutex->holder == self" failed since
>> Kevin's commit 0e438cdc
>> "coroutine: Let CoMutex remember who holds it". For example, if nbd
>> client recieves request
>> No.15' reply, qemu will stop unexpectedly:
>> ----------------------------------------------------------------------------------------
>>
>> time request       Actions
>> 29   15(most case) in_flight=15, Coroutine=C15, free_sema->holder=C17,
>> mutex->locked=false
>> ----------------------------------------------------------------------------------------
>>
>>
>> Per Paolo's suggestion "The simplest fix is to change it to CoQueue,
>> which is like a condition
>> variable", this patch replaces CoMutex with CoQueue.
>>
>> Cc: Wen Congyang <we...@cn.fujitsu.com>
>> Reported-by: zhanghailiang <zhang.zhanghaili...@huawei.com>
>> Suggested-by: Paolo Bonzini <pbonz...@redhat.com>
>> Signed-off-by: Changlong Xie <xiecl.f...@cn.fujitsu.com>
>> ---
>>   block/nbd-client.c | 8 ++++----
>>   block/nbd-client.h | 2 +-
>>   2 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index 2cf3237..40b28ab 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -199,8 +199,8 @@ static void nbd_coroutine_start(NbdClientSession *s,
>>   {
>>       /* Poor man semaphore.  The free_sema is locked when no other
>> request
>>        * can be accepted, and unlocked after receiving one reply.  */
>> -    if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
>> -        qemu_co_mutex_lock(&s->free_sema);
>> +    if (s->in_flight == MAX_NBD_REQUESTS) {
>> +        qemu_co_queue_wait(&s->free_sema);
>>           assert(s->in_flight < MAX_NBD_REQUESTS);
>>       }
>>       s->in_flight++;
>> @@ -214,7 +214,7 @@ static void nbd_coroutine_end(NbdClientSession *s,
>>       int i = HANDLE_TO_INDEX(s, request->handle);
>>       s->recv_coroutine[i] = NULL;
>>       if (s->in_flight-- == MAX_NBD_REQUESTS) {
>> -        qemu_co_mutex_unlock(&s->free_sema);
>> +        qemu_co_queue_next(&s->free_sema);
>>       }
>>   }
>>
>> @@ -386,7 +386,7 @@ int nbd_client_init(BlockDriverState *bs,
>>       }
>>
>>       qemu_co_mutex_init(&client->send_mutex);
>> -    qemu_co_mutex_init(&client->free_sema);
>> +    qemu_co_queue_init(&client->free_sema);
>>       client->sioc = sioc;
>>       object_ref(OBJECT(client->sioc));
>>
>> diff --git a/block/nbd-client.h b/block/nbd-client.h
>> index 044aca4..307b8b1 100644
>> --- a/block/nbd-client.h
>> +++ b/block/nbd-client.h
>> @@ -24,7 +24,7 @@ typedef struct NbdClientSession {
>>       off_t size;
>>
>>       CoMutex send_mutex;
>> -    CoMutex free_sema;
>> +    CoQueue free_sema;
>>       Coroutine *send_coroutine;
>>       int in_flight;
>>
>>
> 
> 
> 
> 

Reply via email to