On Thu, Jul 3, 2014 at 6:29 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 03/07/2014 06:54, Ming Lei ha scritto: > >> On Thu, Jul 3, 2014 at 12:21 AM, Paolo Bonzini <pbonz...@redhat.com> >> wrote: >>> >>> Il 02/07/2014 17:45, Ming Lei ha scritto: >>>> >>>> The attachment debug patch skips aio_notify() if qemu_bh_schedule >>>> is running from current aio context, but looks there is still 120K >>>> writes triggered. (without the patch, 400K can be observed in >>>> same test) >>> >>> >>> Nice. Another observation is that after aio_dispatch we'll always >>> re-evaluate everything (bottom halves, file descriptors and timeouts), >> >> >> The idea is very good. >> >> If aio_notify() is called from the 1st aio_dispatch() in aio_poll(), >> ctc->notifier might need to be set, but it can be handled easily. > > > Yes, you can just move the atomic_inc/atomic_dec in aio_poll.
If you mean move inc/dec of 'running' in aio_poll, that won't work. When aio_notify() sees 'running', it won't set notifier, and may trap to ppoll(). We can set a flag in aio_notify() if notifier is bypassed, then check the flag before ppoll() to decide if we should poll(). > > >>> so we can skip the aio_notify if we're inside aio_dispatch. >>> >>> So what about this untested patch: >>> >>> diff --git a/aio-posix.c b/aio-posix.c >>> index f921d4f..a23d85d 100644 >>> --- a/aio-posix.c >>> +++ b/aio-posix.c >> >> >> #include "qemu/atomic.h" >> >>> @@ -124,6 +124,9 @@ static bool aio_dispatch(AioContext *ctx) >>> AioHandler *node; >>> bool progress = false; >>> >>> + /* No need to set the event notifier during aio_notify. */ >>> + ctx->running++; >>> + >>> /* >>> * We have to walk very carefully in case qemu_aio_set_fd_handler is >>> * called while we're walking. >>> @@ -169,6 +171,11 @@ static bool aio_dispatch(AioContext *ctx) >>> /* Run our timers */ >>> progress |= timerlistgroup_run_timers(&ctx->tlg); >>> >>> + smp_wmb(); >>> + ctx->iter_count++; >>> + smp_wmb(); >>> + ctx->running--; >>> + >>> return progress; >>> } >>> >>> diff --git a/async.c b/async.c >>> index 5b6fe6b..1f56afa 100644 >>> --- a/async.c >>> +++ b/async.c >> >> >> #include "qemu/atomic.h" >> >>> @@ -249,7 +249,19 @@ ThreadPool *aio_get_thread_pool(AioContext *ctx) >>> >>> void aio_notify(AioContext *ctx) >>> { >>> - event_notifier_set(&ctx->notifier); >>> + uint32_t iter_count; >>> + do { >>> + iter_count = ctx->iter_count; >>> + /* Read ctx->iter_count before ctx->running. */ >>> + smb_rmb(); >> >> >> s/smb/smp >> >>> + if (!ctx->running) { >>> + event_notifier_set(&ctx->notifier); >>> + return; >>> + } >>> + /* Read ctx->running before ctx->iter_count. */ >>> + smb_rmb(); >> >> >> s/smb/smp >> >>> + /* ctx might have gone to sleep. */ >>> + } while (iter_count != ctx->iter_count); >>> } >> >> >> Since both 'running' and 'iter_count' may be read lockless, something >> like ACCESS_ONCE() should be used to avoid compiler optimization. > > > No, smp_rmb() is enough to avoid them. See also include/qemu/seqlock.h Yes, there is order between read/write iter_count and running. > The first access to ctx->iter_count _could_ be protected by ACCESS_ONCE(), > which in QEMU we call atomic_read()/atomic_set(), but it's not necessary. > See docs/atomics.txt for a description for QEMU's > atomic access functions. > > >> In my test, it does decrease write() very much, and I hope >> a formal version can be applied soon. > > > Can you take care of that (you can add my Signed-off-by), since you have the > best testing environment? v5 of the plug/unplug series will be good to go, OK, I will do it, and include it in v5, and put your name as author/signed-off. BTW, I do not have best testing environment, and all plug/unplug related tests are done on my laptop, :-) Thanks, -- Ming Lei