Re: [PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling

2020-02-21 Thread Stefan Hajnoczi
On Tue, Feb 18, 2020 at 06:27:08PM +, Stefan Hajnoczi wrote:
> The first rcu_read_lock/unlock() is expensive.  Nested calls are cheap.
> 
> This optimization increases IOPS from 73k to 162k with a Linux guest
> that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32
> devices.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/aio-posix.c | 11 +++
>  1 file changed, 11 insertions(+)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling

2020-02-20 Thread Paolo Bonzini
On 18/02/20 19:27, Stefan Hajnoczi wrote:
> The first rcu_read_lock/unlock() is expensive.  Nested calls are cheap.
> 
> This optimization increases IOPS from 73k to 162k with a Linux guest
> that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32
> devices.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/aio-posix.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index a4977f538e..f67f5b34e9 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -15,6 +15,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "block/block.h"
> +#include "qemu/rcu.h"
>  #include "qemu/rcu_queue.h"
>  #include "qemu/sockets.h"
>  #include "qemu/cutils.h"
> @@ -514,6 +515,16 @@ static bool run_poll_handlers_once(AioContext *ctx, 
> int64_t *timeout)
>  bool progress = false;
>  AioHandler *node;
>  
> +/*
> + * Optimization: ->io_poll() handlers often contain RCU read critical
> + * sections and we therefore see many rcu_read_lock() -> 
> rcu_read_unlock()
> + * -> rcu_read_lock() -> ... sequences with expensive memory
> + * synchronization primitives.  Make the entire polling loop an RCU
> + * critical section because nested rcu_read_lock()/rcu_read_unlock() 
> calls
> + * are cheap.
> + */
> +RCU_READ_LOCK_GUARD();
> +
>  QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
>  if (!node->deleted && node->io_poll &&
>  aio_node_check(ctx, node->is_external) &&
> 

Reviewed-by: Paolo Bonzini 




[PATCH] aio-posix: avoid reacquiring rcu_read_lock() when polling

2020-02-18 Thread Stefan Hajnoczi
The first rcu_read_lock/unlock() is expensive.  Nested calls are cheap.

This optimization increases IOPS from 73k to 162k with a Linux guest
that has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32
devices.

Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index a4977f538e..f67f5b34e9 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -15,6 +15,7 @@
 
 #include "qemu/osdep.h"
 #include "block/block.h"
+#include "qemu/rcu.h"
 #include "qemu/rcu_queue.h"
 #include "qemu/sockets.h"
 #include "qemu/cutils.h"
@@ -514,6 +515,16 @@ static bool run_poll_handlers_once(AioContext *ctx, 
int64_t *timeout)
 bool progress = false;
 AioHandler *node;
 
+/*
+ * Optimization: ->io_poll() handlers often contain RCU read critical
+ * sections and we therefore see many rcu_read_lock() -> rcu_read_unlock()
+ * -> rcu_read_lock() -> ... sequences with expensive memory
+ * synchronization primitives.  Make the entire polling loop an RCU
+ * critical section because nested rcu_read_lock()/rcu_read_unlock() calls
+ * are cheap.
+ */
+RCU_READ_LOCK_GUARD();
+
 QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 if (!node->deleted && node->io_poll &&
 aio_node_check(ctx, node->is_external) &&
-- 
2.24.1