On Thu, Jul 3, 2014 at 12:21 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 02/07/2014 17:45, Ming Lei ha scritto: >> The attachment debug patch skips aio_notify() if qemu_bh_schedule >> is running from current aio context, but looks there is still 120K >> writes triggered. (without the patch, 400K can be observed in >> same test) > > Nice. Another observation is that after aio_dispatch we'll always > re-evaluate everything (bottom halves, file descriptors and timeouts),
The idea is very good. If aio_notify() is called from the 1st aio_dispatch() in aio_poll(), ctc->notifier might need to be set, but it can be handled easily. > so we can skip the aio_notify if we're inside aio_dispatch. > > So what about this untested patch: > > diff --git a/aio-posix.c b/aio-posix.c > index f921d4f..a23d85d 100644 > --- a/aio-posix.c > +++ b/aio-posix.c #include "qemu/atomic.h" > @@ -124,6 +124,9 @@ static bool aio_dispatch(AioContext *ctx) > AioHandler *node; > bool progress = false; > > + /* No need to set the event notifier during aio_notify. */ > + ctx->running++; > + > /* > * We have to walk very carefully in case qemu_aio_set_fd_handler is > * called while we're walking. > @@ -169,6 +171,11 @@ static bool aio_dispatch(AioContext *ctx) > /* Run our timers */ > progress |= timerlistgroup_run_timers(&ctx->tlg); > > + smp_wmb(); > + ctx->iter_count++; > + smp_wmb(); > + ctx->running--; > + > return progress; > } > > diff --git a/async.c b/async.c > index 5b6fe6b..1f56afa 100644 > --- a/async.c > +++ b/async.c #include "qemu/atomic.h" > @@ -249,7 +249,19 @@ ThreadPool *aio_get_thread_pool(AioContext *ctx) > > void aio_notify(AioContext *ctx) > { > - event_notifier_set(&ctx->notifier); > + uint32_t iter_count; > + do { > + iter_count = ctx->iter_count; > + /* Read ctx->iter_count before ctx->running. */ > + smb_rmb(); s/smb/smp > + if (!ctx->running) { > + event_notifier_set(&ctx->notifier); > + return; > + } > + /* Read ctx->running before ctx->iter_count. */ > + smb_rmb(); s/smb/smp > + /* ctx might have gone to sleep. */ > + } while (iter_count != ctx->iter_count); > } Since both 'running' and 'iter_count' may be read lockless, something like ACCESS_ONCE() should be used to avoid compiler optimization. > > static void aio_timerlist_notify(void *opaque) > @@ -269,6 +279,7 @@ AioContext *aio_context_new(void) > ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext)); > ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD)); > ctx->thread_pool = NULL; > + ctx->iter_count = ctx->running = 0; > qemu_mutex_init(&ctx->bh_lock); > rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); > event_notifier_init(&ctx->notifier, false); > diff --git a/include/block/aio.h b/include/block/aio.h > index a92511b..9f51c4f 100644 > --- a/include/block/aio.h > +++ b/include/block/aio.h > @@ -51,6 +51,9 @@ struct AioContext { > /* Protects all fields from multi-threaded access */ > RFifoLock lock; > > + /* Used to avoid aio_notify while dispatching event handlers. > + * Writes protected by lock or BQL, reads are lockless. > + */ > + uint32_t iter_count, running; > + > /* The list of registered AIO handlers */ > QLIST_HEAD(, AioHandler) aio_handlers; > In my test, it does decrease write() very much, and I hope a formal version can be applied soon. Thanks, -- Ming Lei