Re: [Qemu-devel] [PATCH v2 1/5] linux-aio: queue requests that cannot be submitted

Kevin Wolf Tue, 16 Dec 2014 05:12:40 -0800

Am 16.12.2014 um 12:28 hat Paolo Bonzini geschrieben:
> 
> 
> On 16/12/2014 12:07, Kevin Wolf wrote:
> > Am 11.12.2014 um 14:52 hat Paolo Bonzini geschrieben:
> >> Keep a queue of requests that were not submitted; pass them to
> >> the kernel when a completion is reported, unless the queue is
> >> plugged.
> >>
> >> The array of iocbs is rebuilt every time from scratch.  This
> >> avoids keeping the iocbs array and list synchronized.
> >>
> >> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
> > 
> > Just found out that in qemu-img bench, this patch seems to cost about
> > 5-8% for me.
> 
> What execution?  Queue depth=1?


My usual one:

$ ./qemu-img bench -t none -c 10000000 -n /dev/loop0
Sending 10000000 requests, 4096 bytes each, 64 in parallel

> For me it was noisy but I couldn't see a pessimization, and this patch
> should only add a handful of pointer accesses.  Also, does perf point at
> a culprit, and does patch 5 restore some of the performance?
> 
> Weird guess: TLB misses from accessing iocbs[0] on the stack (using a
> different coroutine stack every time)?  Perf would report that as a
> large cost of this line:
> 
>         iocbs[len++] = &aiocb->iocb;

No, I can't seem to read much from the perf results. The cost seems to
be spread fairly evenly across ioq_submit(), with the exception of the
instruction after the call to io_submit(). Not sure why the next
instruction always takes so much time (independent of what it is), but
it has been this way before.

I was surprised to see a "rep stos" scoring at 10% in laio_submit(),
apparently io_prep_*() do a memset on the iocb. Not sure if that is
necessary, but again, it has always been this way.

Patch 5 doesn't restore the performance, which makes sense, as qemu-img
only sends single requests.

Kevin

Re: [Qemu-devel] [PATCH v2 1/5] linux-aio: queue requests that cannot be submitted

Reply via email to