RE: [PATCH v5 3/5] virtio-crypto: wait ctrl queue instead of busy polling

2022-05-06 Thread Gonglei (Arei) via Virtualization



> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Thursday, May 5, 2022 5:24 PM
> To: Gonglei (Arei) ; m...@redhat.com
> Cc: jasow...@redhat.com; herb...@gondor.apana.org.au;
> linux-ker...@vger.kernel.org; virtualization@lists.linux-foundation.org;
> linux-cry...@vger.kernel.org; helei.si...@bytedance.com;
> pizhen...@bytedance.com; da...@davemloft.net
> Subject: [PATCH v5 3/5] virtio-crypto: wait ctrl queue instead of busy polling
> 
> Originally, after submitting request into virtio crypto control queue, the 
> guest
> side polls the result from the virt queue. This works like following:
> CPU0   CPU1   ... CPUx  CPUy
>  |  |  | |
>  \  \  / /
>   \spin_lock(&vcrypto->ctrl_lock)---/
>|
>  virtqueue add & kick
>|
>   busy poll virtqueue
>|
>   spin_unlock(&vcrypto->ctrl_lock)
>   ...
> 
> There are two problems:
> 1, The queue depth is always 1, the performance of a virtio crypto
>device gets limited. Multi user processes share a single control
>queue, and hit spin lock race from control queue. Test on Intel
>Platinum 8260, a single worker gets ~35K/s create/close session
>operations, and 8 workers get ~40K/s operations with 800% CPU
>utilization.
> 2, The control request is supposed to get handled immediately, but
>in the current implementation of QEMU(v6.2), the vCPU thread kicks
>another thread to do this work, the latency also gets unstable.
>Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
> usecs   : count distribution
>  0 -> 1  : 0||
>  2 -> 3  : 7||
>  4 -> 7  : 72   ||
>  8 -> 15 : 186485   ||
> 16 -> 31 : 687  ||
> 32 -> 63 : 5||
> 64 -> 127: 3||
>128 -> 255: 1||
>256 -> 511: 0||
>512 -> 1023   : 0||
>   1024 -> 2047   : 0||
>   2048 -> 4095   : 0||
>   4096 -> 8191   : 0||
>   8192 -> 16383  : 2||
> This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.
> 
> To improve the performance of control queue, a request on control queue waits
> completion instead of busy polling to reduce lock racing, and gets completed 
> by
> control queue callback.
> CPU0   CPU1   ... CPUx  CPUy
>  |  |  | |
>  \  \  / /
>   \spin_lock(&vcrypto->ctrl_lock)---/
>|
>  virtqueue add & kick
>|
>   -spin_unlock(&vcrypto->ctrl_lock)--
>  /  /  \ \
>  |  |  | |
> wait   wait   wait  wait
> 
> Test this patch, the guest side get ~200K/s operations with 300% CPU
> utilization.
> 
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: Gonglei 
> Signed-off-by: zhenwei pi 
> ---
>  .../virtio/virtio_crypto_akcipher_algs.c  | 29 ++-
>  drivers/crypto/virtio/virtio_crypto_common.h  |  4 ++
>  drivers/crypto/virtio/virtio_crypto_core.c| 52 ++-
>  .../virtio/virtio_crypto_skcipher_algs.c  | 34 ++--
>  4 files changed, 64 insertions(+), 55 deletions(-)
> 

Reviewed-by: Gonglei 

Regards,
-Gonglei 

> diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> index 698ea57e2649..382ccec9ab12 100644
> --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> @@ -103,7 +103,6 @@ static int
> virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher
>   struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
>   struct virtio_crypto *vcrypto

[PATCH v5 3/5] virtio-crypto: wait ctrl queue instead of busy polling

2022-05-05 Thread zhenwei pi
Originally, after submitting request into virtio crypto control
queue, the guest side polls the result from the virt queue. This
works like following:
CPU0   CPU1   ... CPUx  CPUy
 |  |  | |
 \  \  / /
  \spin_lock(&vcrypto->ctrl_lock)---/
   |
 virtqueue add & kick
   |
  busy poll virtqueue
   |
  spin_unlock(&vcrypto->ctrl_lock)
  ...

There are two problems:
1, The queue depth is always 1, the performance of a virtio crypto
   device gets limited. Multi user processes share a single control
   queue, and hit spin lock race from control queue. Test on Intel
   Platinum 8260, a single worker gets ~35K/s create/close session
   operations, and 8 workers get ~40K/s operations with 800% CPU
   utilization.
2, The control request is supposed to get handled immediately, but
   in the current implementation of QEMU(v6.2), the vCPU thread kicks
   another thread to do this work, the latency also gets unstable.
   Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
usecs   : count distribution
 0 -> 1  : 0||
 2 -> 3  : 7||
 4 -> 7  : 72   ||
 8 -> 15 : 186485   ||
16 -> 31 : 687  ||
32 -> 63 : 5||
64 -> 127: 3||
   128 -> 255: 1||
   256 -> 511: 0||
   512 -> 1023   : 0||
  1024 -> 2047   : 0||
  2048 -> 4095   : 0||
  4096 -> 8191   : 0||
  8192 -> 16383  : 2||
This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.

To improve the performance of control queue, a request on control queue
waits completion instead of busy polling to reduce lock racing, and gets
completed by control queue callback.
CPU0   CPU1   ... CPUx  CPUy
 |  |  | |
 \  \  / /
  \spin_lock(&vcrypto->ctrl_lock)---/
   |
 virtqueue add & kick
   |
  -spin_unlock(&vcrypto->ctrl_lock)--
 /  /  \ \
 |  |  | |
wait   wait   wait  wait

Test this patch, the guest side get ~200K/s operations with 300% CPU
utilization.

Cc: Michael S. Tsirkin 
Cc: Jason Wang 
Cc: Gonglei 
Signed-off-by: zhenwei pi 
---
 .../virtio/virtio_crypto_akcipher_algs.c  | 29 ++-
 drivers/crypto/virtio/virtio_crypto_common.h  |  4 ++
 drivers/crypto/virtio/virtio_crypto_core.c| 52 ++-
 .../virtio/virtio_crypto_skcipher_algs.c  | 34 ++--
 4 files changed, 64 insertions(+), 55 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c 
b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
index 698ea57e2649..382ccec9ab12 100644
--- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
@@ -103,7 +103,6 @@ static int virtio_crypto_alg_akcipher_init_session(struct 
virtio_crypto_akcipher
struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
struct virtio_crypto *vcrypto = ctx->vcrypto;
uint8_t *pkey;
-   unsigned int inlen;
int err;
unsigned int num_out = 0, num_in = 0;
struct virtio_crypto_op_ctrl_req *ctrl;
@@ -135,18 +134,9 @@ static int virtio_crypto_alg_akcipher_init_session(struct 
virtio_crypto_akcipher
sg_init_one(&inhdr_sg, input, sizeof(*input));
sgs[num_out + num_in++] = &inhdr_sg;
 
-   spin_lock(&vcrypto->ctrl_lock);
-   err = virtqueue_add_sgs(vcrypto->ctrl_vq, sgs, num_out, num_in, 
vcrypto, GFP_ATOMIC);
-   if (err < 0) {
-   spin_unlock(&vcrypto->ctrl_lock);
+   err = virtio_crypto_ctrl_vq_request(vcrypto, sgs, num_out, num_in, 
vc_ctrl_req);
+   if (err < 0)
goto out;
-   }
-
-   virtqueue_kick(vcrypto->ctrl_vq);
-   while (!virtqueue_get_buf(vcrypto->ctrl_vq, &inlen) &&
-  !virtqueue_is_broken(vcrypto->ctrl_vq))
-   cpu_relax();
-   spin_unlock(&vcrypto->ctrl_lock);
 
if (le32_to_cpu(input->status) != VIRTIO_CRYPTO_OK) {