Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2

Ming Lei Wed, 02 Jul 2014 08:46:55 -0700

On Wed, Jul 2, 2014 at 4:54 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote:
> On Tue, Jul 01, 2014 at 06:49:30PM +0200, Paolo Bonzini wrote:
>> Il 01/07/2014 16:49, Ming Lei ha scritto:
>> >Let me provide some data when running randread(bs 4k, libaio)
>> >from VM for 10sec:
>> >
>> >1), qemu.git/master
>> >- write(): 731K
>> >- rt_sigprocmask(): 417K
>> >- read(): 21K
>> >- ppoll(): 10K
>> >- io_submit(): 5K
>> >- io_getevents(): 4K
>> >
>> >2), qemu 2.0
>> >- write(): 9K
>> >- read(): 28K
>> >- ppoll(): 16K
>> >- io_submit(): 12K
>> >- io_getevents(): 10K
>> >
>> >>> The sigprocmask can probably be optimized away since the thread's
>> >>> signal mask remains unchanged most of the time.
>> >>>
>> >>> I'm not sure what is causing the write().
>> >I am investigating it...
>>
>> I would guess sigprocmask is getcontext (from qemu_coroutine_new) and write
>> is aio_notify (from qemu_bh_schedule).
>
> Aha!  We shouldn't be executing qemu_coroutine_new() very often since we
> try to keep a freelist of coroutines.
>
> I think a tweak to the freelist could make the rt_sigprocmask() calls go
> away since we should be reusing coroutines instead of allocating/freeing
> them all the time.
>
>> Both can be eliminated by introducing a fast path in bdrv_aio_{read,write}v,
>> that bypasses coroutines in the common case of no I/O throttling, no
>> copy-on-write, etc.
>
> I tried that in 2012 and couldn't measure an improvement above the noise
> threshold, although it was without dataplane.
>
> BTW, we cannot eliminate the BH because the block layer guarantees that
> callbacks are not invoked with reentrancy.  They are always invoked
> directly from the event loop through a BH.  This simplifies callers
> since they don't need to worry about callbacks happening while they are
> still in bdrv_aio_readv(), for example.
>
> Removing this guarantee (by making callers safe first) is orthogonal to
> coroutines.  But it's hard to do since it requires auditing a lot of
> code.
>
> Another idea is to skip aio_notify() when we're sure the event loop
> isn't blocked in g_poll().  Doing this is a thread-safe and lockless way
> might be tricky though.


The attachment debug patch skips aio_notify() if qemu_bh_schedule
is running from current aio context, but looks there is still 120K
writes triggered. (without the patch, 400K can be observed in
same test)

So is there still other writes not found in the path?


Thanks,
-- 
Ming Lei

diff --git a/async.c b/async.c
index 5b6fe6b..5aa9982 100644
--- a/async.c
+++ b/async.c
@@ -40,6 +40,18 @@ struct QEMUBH {
     bool deleted;
 };
 
+static __thread AioContext *my_ctx = NULL;
+
+AioContext *get_current_aio_context(void)
+{
+    return my_ctx;
+}
+
+void set_current_aio_context(AioContext *ctx)
+{
+    my_ctx = ctx;
+}
+
 QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
 {
     QEMUBH *bh;
@@ -131,7 +143,9 @@ void qemu_bh_schedule(QEMUBH *bh)
      */
     smp_mb();
     bh->scheduled = 1;
-    aio_notify(ctx);
+
+    if (get_current_aio_context() != ctx)
+        aio_notify(ctx);
 }
 
 
diff --git a/include/block/aio.h b/include/block/aio.h
index a92511b..29f29e2 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -307,4 +307,7 @@ static inline void aio_timer_init(AioContext *ctx,
     timer_init(ts, ctx->tlg.tl[type], scale, cb, opaque);
 }
 
+AioContext *get_current_aio_context(void);
+void set_current_aio_context(AioContext *ctx);
+
 #endif
diff --git a/iothread.c b/iothread.c
index 1fbf9f1..beb32ad 100644
--- a/iothread.c
+++ b/iothread.c
@@ -36,6 +36,8 @@ static void *iothread_run(void *opaque)
     qemu_cond_signal(&iothread->init_done_cond);
     qemu_mutex_unlock(&iothread->init_done_lock);
 
+    set_current_aio_context(iothread->ctx);
+
     while (!iothread->stopping) {
         aio_context_acquire(iothread->ctx);
         while (!iothread->stopping && aio_poll(iothread->ctx, true)) {

Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2

Reply via email to