Re: [PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling
On Tue, Feb 18, 2020 at 06:27:08PM +, Stefan Hajnoczi wrote: > The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. > > This optimization increases IOPS from 73k to 162k with a Linux guest > that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 > devices. > > Signed-off-by: Stefan Hajnoczi > --- > util/aio-posix.c | 11 +++ > 1 file changed, 11 insertions(+) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan signature.asc Description: PGP signature
Re: [PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling
On 18/02/20 19:27, Stefan Hajnoczi wrote: > The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. > > This optimization increases IOPS from 73k to 162k with a Linux guest > that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 > devices. > > Signed-off-by: Stefan Hajnoczi > --- > util/aio-posix.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/util/aio-posix.c b/util/aio-posix.c > index a4977f538e..f67f5b34e9 100644 > --- a/util/aio-posix.c > +++ b/util/aio-posix.c > @@ -15,6 +15,7 @@ > > #include "qemu/osdep.h" > #include "block/block.h" > +#include "qemu/rcu.h" > #include "qemu/rcu_queue.h" > #include "qemu/sockets.h" > #include "qemu/cutils.h" > @@ -514,6 +515,16 @@ static bool run_poll_handlers_once(AioContext *ctx, > int64_t *timeout) > bool progress = false; > AioHandler *node; > > +/* > + * Optimization: ->io_poll() handlers often contain RCU read critical > + * sections and we therefore see many rcu_read_lock() -> > rcu_read_unlock() > + * -> rcu_read_lock() -> ... sequences with expensive memory > + * synchronization primitives. Make the entire polling loop an RCU > + * critical section because nested rcu_read_lock()/rcu_read_unlock() > calls > + * are cheap. > + */ > +RCU_READ_LOCK_GUARD(); > + > QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { > if (!node->deleted && node->io_poll && > aio_node_check(ctx, node->is_external) && > Reviewed-by: Paolo Bonzini
[PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling
The first rcu_read_lock/unlock() is expensive. Nested calls are cheap. This optimization increases IOPS from 73k to 162k with a Linux guest that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi --- util/aio-posix.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/util/aio-posix.c b/util/aio-posix.c index a4977f538e..f67f5b34e9 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -15,6 +15,7 @@ #include "qemu/osdep.h" #include "block/block.h" +#include "qemu/rcu.h" #include "qemu/rcu_queue.h" #include "qemu/sockets.h" #include "qemu/cutils.h" @@ -514,6 +515,16 @@ static bool run_poll_handlers_once(AioContext *ctx, int64_t *timeout) bool progress = false; AioHandler *node; +/* + * Optimization: ->io_poll() handlers often contain RCU read critical + * sections and we therefore see many rcu_read_lock() -> rcu_read_unlock() + * -> rcu_read_lock() -> ... sequences with expensive memory + * synchronization primitives. Make the entire polling loop an RCU + * critical section because nested rcu_read_lock()/rcu_read_unlock() calls + * are cheap. + */ +RCU_READ_LOCK_GUARD(); + QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { if (!node->deleted && node->io_poll && aio_node_check(ctx, node->is_external) && -- 2.24.1