On 16/12/2014 12:07, Kevin Wolf wrote:
> Am 11.12.2014 um 14:52 hat Paolo Bonzini geschrieben:
>> Keep a queue of requests that were not submitted; pass them to
>> the kernel when a completion is reported, unless the queue is
>> plugged.
>>
>> The array of iocbs is rebuilt every time from scratch.  This
>> avoids keeping the iocbs array and list synchronized.
>>
>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> 
> Just found out that in qemu-img bench, this patch seems to cost about
> 5-8% for me.

What execution?  Queue depth=1?

For me it was noisy but I couldn't see a pessimization, and this patch
should only add a handful of pointer accesses.  Also, does perf point at
a culprit, and does patch 5 restore some of the performance?

Weird guess: TLB misses from accessing iocbs[0] on the stack (using a
different coroutine stack every time)?  Perf would report that as a
large cost of this line:

        iocbs[len++] = &aiocb->iocb;

> An optimisation for the unplugged case would probably be easy, but that
> would be cheating, as the devices that we're really interested in always
> plug the queue (perhaps I should extend qemu-img bench to do that
> optionally, too).

If you want to do that, you also have to move the "refilling" of the
queue to a bottom half.  If you refill from the completion routine, you
always have a single empty slot and plugging doesn't do anything.

Paolo

> Anything clever that we can do about this? Or will we just have to live
> with the fact that sending a single request is now slower than it used
> to be before bdrv_plug?


Reply via email to