Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-11 Thread Ilya Maximets
On 8/11/23 15:58, Stefan Hajnoczi wrote:
> 
> 
> On Fri, Aug 11, 2023, 08:50 Ilya Maximets  > wrote:
> 
> On 8/10/23 17:50, Stefan Hajnoczi wrote:
> > On Tue, Aug 08, 2023 at 12:28:47AM +0200, Ilya Maximets wrote:
> >> Lots of virtio functions that are on a hot path in data transmission
> >> are initializing indirect descriptor cache at the point of stack
> >> allocation.  It's a 112 byte structure that is getting zeroed out on
> >> each call adding unnecessary overhead.  It's going to be correctly
> >> initialized later via special init function.  The only reason to
> >> actually initialize right away is the ability to safely destruct it.
> >> However, we only need to destruct it when it was used, i.e. when a
> >> desc_cache points to it.
> >>
> >> Removing these unnecessary stack initializations improves throughput
> >> of virtio-net devices in terms of 64B packets per second by 6-14 %
> >> depending on the case.  Tested with a proposed af-xdp network backend
> >> and a dpdk testpmd application in the guest, but should be beneficial
> >> for other virtio devices as well.
> >>
> >> Signed-off-by: Ilya Maximets  >
> >> ---
> >>  hw/virtio/virtio.c | 42 +++---
> >>  1 file changed, 27 insertions(+), 15 deletions(-)
> >
> > Another option is to create an address_space_cache_init_invalid()
> > function that only assigns mrs.mr  = NULL instead of 
> touching all bytes
> > of the struct like = MEMORY_REGION_CACHE_INVALID. There would be less
> > code and the existing mrs.mr  check in 
> address_space_cache_destroy()
> > would serve the same function as the desc_cache == &indirect_desc_cache
> > check added by this patch.
> 
> It does look simpler this way, indeed.  Though I'm not sure about
> a function name.  We have address_space_cache_invalidate() that
> does a completely different thing and the invalidated cache can
> still be used, while the cache initialized with the newly proposed
> address_space_cache_init_invalid() can not be safely used.
> 
> I suppose, the problem is not new, since the macro was named similarly,
> but making it a function seems to make the issue worse.
> 
> Maybe address_space_cache_init_empty() will be a better name?
> E.g.:
> 
> **
>  * address_space_cache_init_empty: Initialize empty #MemoryRegionCache
>  *
>  * @cache: The #MemoryRegionCache to operate on.
>  *
>  * Initializes #MemoryRegionCache structure without memory region 
> attached.
>  * Cache initialized this way can only be safely destroyed, but not used.
>  */
> static inline void address_space_cache_init_empty(MemoryRegionCache 
> *cache)
> {
>     cache->mrs.mr = NULL;
> }
> 
> What do you think?
> 
> 
> init_empty() is good.

I'll use it then.  Will send a v2 shortly.

Thanks!

> 
> Stefan
> 




Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-11 Thread Stefan Hajnoczi
On Fri, Aug 11, 2023, 08:50 Ilya Maximets  wrote:

> On 8/10/23 17:50, Stefan Hajnoczi wrote:
> > On Tue, Aug 08, 2023 at 12:28:47AM +0200, Ilya Maximets wrote:
> >> Lots of virtio functions that are on a hot path in data transmission
> >> are initializing indirect descriptor cache at the point of stack
> >> allocation.  It's a 112 byte structure that is getting zeroed out on
> >> each call adding unnecessary overhead.  It's going to be correctly
> >> initialized later via special init function.  The only reason to
> >> actually initialize right away is the ability to safely destruct it.
> >> However, we only need to destruct it when it was used, i.e. when a
> >> desc_cache points to it.
> >>
> >> Removing these unnecessary stack initializations improves throughput
> >> of virtio-net devices in terms of 64B packets per second by 6-14 %
> >> depending on the case.  Tested with a proposed af-xdp network backend
> >> and a dpdk testpmd application in the guest, but should be beneficial
> >> for other virtio devices as well.
> >>
> >> Signed-off-by: Ilya Maximets 
> >> ---
> >>  hw/virtio/virtio.c | 42 +++---
> >>  1 file changed, 27 insertions(+), 15 deletions(-)
> >
> > Another option is to create an address_space_cache_init_invalid()
> > function that only assigns mrs.mr = NULL instead of touching all bytes
> > of the struct like = MEMORY_REGION_CACHE_INVALID. There would be less
> > code and the existing mrs.mr check in address_space_cache_destroy()
> > would serve the same function as the desc_cache == &indirect_desc_cache
> > check added by this patch.
>
> It does look simpler this way, indeed.  Though I'm not sure about
> a function name.  We have address_space_cache_invalidate() that
> does a completely different thing and the invalidated cache can
> still be used, while the cache initialized with the newly proposed
> address_space_cache_init_invalid() can not be safely used.
>
> I suppose, the problem is not new, since the macro was named similarly,
> but making it a function seems to make the issue worse.
>
> Maybe address_space_cache_init_empty() will be a better name?
> E.g.:
>
> **
>  * address_space_cache_init_empty: Initialize empty #MemoryRegionCache
>  *
>  * @cache: The #MemoryRegionCache to operate on.
>  *
>  * Initializes #MemoryRegionCache structure without memory region attached.
>  * Cache initialized this way can only be safely destroyed, but not used.
>  */
> static inline void address_space_cache_init_empty(MemoryRegionCache *cache)
> {
> cache->mrs.mr = NULL;
> }
>
> What do you think?
>

init_empty() is good.

Stefan

>


Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-11 Thread Ilya Maximets
On 8/10/23 17:50, Stefan Hajnoczi wrote:
> On Tue, Aug 08, 2023 at 12:28:47AM +0200, Ilya Maximets wrote:
>> Lots of virtio functions that are on a hot path in data transmission
>> are initializing indirect descriptor cache at the point of stack
>> allocation.  It's a 112 byte structure that is getting zeroed out on
>> each call adding unnecessary overhead.  It's going to be correctly
>> initialized later via special init function.  The only reason to
>> actually initialize right away is the ability to safely destruct it.
>> However, we only need to destruct it when it was used, i.e. when a
>> desc_cache points to it.
>>
>> Removing these unnecessary stack initializations improves throughput
>> of virtio-net devices in terms of 64B packets per second by 6-14 %
>> depending on the case.  Tested with a proposed af-xdp network backend
>> and a dpdk testpmd application in the guest, but should be beneficial
>> for other virtio devices as well.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  hw/virtio/virtio.c | 42 +++---
>>  1 file changed, 27 insertions(+), 15 deletions(-)
> 
> Another option is to create an address_space_cache_init_invalid()
> function that only assigns mrs.mr = NULL instead of touching all bytes
> of the struct like = MEMORY_REGION_CACHE_INVALID. There would be less
> code and the existing mrs.mr check in address_space_cache_destroy()
> would serve the same function as the desc_cache == &indirect_desc_cache
> check added by this patch.

It does look simpler this way, indeed.  Though I'm not sure about
a function name.  We have address_space_cache_invalidate() that
does a completely different thing and the invalidated cache can
still be used, while the cache initialized with the newly proposed
address_space_cache_init_invalid() can not be safely used.

I suppose, the problem is not new, since the macro was named similarly,
but making it a function seems to make the issue worse.

Maybe address_space_cache_init_empty() will be a better name?
E.g.:

**
 * address_space_cache_init_empty: Initialize empty #MemoryRegionCache
 *
 * @cache: The #MemoryRegionCache to operate on.
 *
 * Initializes #MemoryRegionCache structure without memory region attached.
 * Cache initialized this way can only be safely destroyed, but not used.
 */
static inline void address_space_cache_init_empty(MemoryRegionCache *cache)
{
cache->mrs.mr = NULL;
}

What do you think?

Best regards, Ilya Maximets.



Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-11 Thread Ilya Maximets
On 8/9/23 04:37, Jason Wang wrote:
> On Tue, Aug 8, 2023 at 6:28 AM Ilya Maximets  wrote:
>>
>> Lots of virtio functions that are on a hot path in data transmission
>> are initializing indirect descriptor cache at the point of stack
>> allocation.  It's a 112 byte structure that is getting zeroed out on
>> each call adding unnecessary overhead.  It's going to be correctly
>> initialized later via special init function.  The only reason to
>> actually initialize right away is the ability to safely destruct it.
>> However, we only need to destruct it when it was used, i.e. when a
>> desc_cache points to it.
>>
>> Removing these unnecessary stack initializations improves throughput
>> of virtio-net devices in terms of 64B packets per second by 6-14 %
>> depending on the case.  Tested with a proposed af-xdp network backend
>> and a dpdk testpmd application in the guest, but should be beneficial
>> for other virtio devices as well.
>>
>> Signed-off-by: Ilya Maximets 
> 
> Acked-by: Jason Wang 
> 
> Btw, we can probably remove MEMORY_REGION_CACHE_INVALID.

Good point.  I can include that in the patch.  Or just replace it
with a function, as Stefan suggested.

Best regards, Ilya Maximets.



Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-10 Thread Stefan Hajnoczi
On Tue, Aug 08, 2023 at 12:28:47AM +0200, Ilya Maximets wrote:
> Lots of virtio functions that are on a hot path in data transmission
> are initializing indirect descriptor cache at the point of stack
> allocation.  It's a 112 byte structure that is getting zeroed out on
> each call adding unnecessary overhead.  It's going to be correctly
> initialized later via special init function.  The only reason to
> actually initialize right away is the ability to safely destruct it.
> However, we only need to destruct it when it was used, i.e. when a
> desc_cache points to it.
> 
> Removing these unnecessary stack initializations improves throughput
> of virtio-net devices in terms of 64B packets per second by 6-14 %
> depending on the case.  Tested with a proposed af-xdp network backend
> and a dpdk testpmd application in the guest, but should be beneficial
> for other virtio devices as well.
> 
> Signed-off-by: Ilya Maximets 
> ---
>  hw/virtio/virtio.c | 42 +++---
>  1 file changed, 27 insertions(+), 15 deletions(-)

Another option is to create an address_space_cache_init_invalid()
function that only assigns mrs.mr = NULL instead of touching all bytes
of the struct like = MEMORY_REGION_CACHE_INVALID. There would be less
code and the existing mrs.mr check in address_space_cache_destroy()
would serve the same function as the desc_cache == &indirect_desc_cache
check added by this patch.

I'm fine with your approach too:

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-08 Thread Jason Wang
On Tue, Aug 8, 2023 at 6:28 AM Ilya Maximets  wrote:
>
> Lots of virtio functions that are on a hot path in data transmission
> are initializing indirect descriptor cache at the point of stack
> allocation.  It's a 112 byte structure that is getting zeroed out on
> each call adding unnecessary overhead.  It's going to be correctly
> initialized later via special init function.  The only reason to
> actually initialize right away is the ability to safely destruct it.
> However, we only need to destruct it when it was used, i.e. when a
> desc_cache points to it.
>
> Removing these unnecessary stack initializations improves throughput
> of virtio-net devices in terms of 64B packets per second by 6-14 %
> depending on the case.  Tested with a proposed af-xdp network backend
> and a dpdk testpmd application in the guest, but should be beneficial
> for other virtio devices as well.
>
> Signed-off-by: Ilya Maximets 

Acked-by: Jason Wang 

Btw, we can probably remove MEMORY_REGION_CACHE_INVALID.

Thanks

> ---
>  hw/virtio/virtio.c | 42 +++---
>  1 file changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 309038fd46..a65396e616 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1071,7 +1071,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int idx;
>  unsigned int total_bufs, in_total, out_total;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len = 0;
>  int rc;
>
> @@ -1079,7 +1080,6 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  total_bufs = in_total = out_total = 0;
>
>  while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
> -MemoryRegionCache *desc_cache = &caches->desc;
>  unsigned int num_bufs;
>  VRingDesc desc;
>  unsigned int i;
> @@ -1091,6 +1091,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  goto err;
>  }
>
> +desc_cache = &caches->desc;
> +
>  vring_split_desc_read(vdev, &desc, desc_cache, i);
>
>  if (desc.flags & VRING_DESC_F_INDIRECT) {
> @@ -1156,7 +1158,9 @@ static void virtqueue_split_get_avail_bytes(VirtQueue 
> *vq,
>  }
>
>  done:
> -address_space_cache_destroy(&indirect_desc_cache);
> +if (desc_cache == &indirect_desc_cache) {
> +address_space_cache_destroy(&indirect_desc_cache);
> +}
>  if (in_bytes) {
>  *in_bytes = in_total;
>  }
> @@ -1207,8 +1211,8 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
> *vq,
>  VirtIODevice *vdev = vq->vdev;
>  unsigned int idx;
>  unsigned int total_bufs, in_total, out_total;
> -MemoryRegionCache *desc_cache;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len = 0;
>  VRingPackedDesc desc;
>  bool wrap_counter;
> @@ -1297,7 +1301,9 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
> *vq,
>  vq->shadow_avail_idx = idx;
>  vq->shadow_avail_wrap_counter = wrap_counter;
>  done:
> -address_space_cache_destroy(&indirect_desc_cache);
> +if (desc_cache == &indirect_desc_cache) {
> +address_space_cache_destroy(&indirect_desc_cache);
> +}
>  if (in_bytes) {
>  *in_bytes = in_total;
>  }
> @@ -1487,8 +1493,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  {
>  unsigned int i, head, max;
>  VRingMemoryRegionCaches *caches;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> -MemoryRegionCache *desc_cache;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len;
>  VirtIODevice *vdev = vq->vdev;
>  VirtQueueElement *elem = NULL;
> @@ -1611,7 +1617,9 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>  done:
> -address_space_cache_destroy(&indirect_desc_cache);
> +if (desc_cache == &indirect_desc_cache) {
> +address_space_cache_destroy(&indirect_desc_cache);
> +}
>
>  return elem;
>
> @@ -1624,8 +1632,8 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
> sz)
>  {
>  unsigned int i, max;
>  VRingMemoryRegionCaches *caches;
> -MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
> -MemoryRegionCache *desc_cache;
> +MemoryRegionCache indirect_desc_cache;
> +MemoryRegionCache *desc_cache = NULL;
>  int64_t len;
>  VirtIODevice *vdev = vq->vdev;
>  VirtQueueElement *elem = NULL;
> @@ -1746,7 +1754,9 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
> sz)
>
>  trace_virtqueue_pop(vq,

[PATCH] virtio: don't zero out memory region cache for indirect descriptors

2023-08-07 Thread Ilya Maximets
Lots of virtio functions that are on a hot path in data transmission
are initializing indirect descriptor cache at the point of stack
allocation.  It's a 112 byte structure that is getting zeroed out on
each call adding unnecessary overhead.  It's going to be correctly
initialized later via special init function.  The only reason to
actually initialize right away is the ability to safely destruct it.
However, we only need to destruct it when it was used, i.e. when a
desc_cache points to it.

Removing these unnecessary stack initializations improves throughput
of virtio-net devices in terms of 64B packets per second by 6-14 %
depending on the case.  Tested with a proposed af-xdp network backend
and a dpdk testpmd application in the guest, but should be beneficial
for other virtio devices as well.

Signed-off-by: Ilya Maximets 
---
 hw/virtio/virtio.c | 42 +++---
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..a65396e616 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1071,7 +1071,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 VirtIODevice *vdev = vq->vdev;
 unsigned int idx;
 unsigned int total_bufs, in_total, out_total;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len = 0;
 int rc;
 
@@ -1079,7 +1080,6 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 total_bufs = in_total = out_total = 0;
 
 while ((rc = virtqueue_num_heads(vq, idx)) > 0) {
-MemoryRegionCache *desc_cache = &caches->desc;
 unsigned int num_bufs;
 VRingDesc desc;
 unsigned int i;
@@ -1091,6 +1091,8 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 goto err;
 }
 
+desc_cache = &caches->desc;
+
 vring_split_desc_read(vdev, &desc, desc_cache, i);
 
 if (desc.flags & VRING_DESC_F_INDIRECT) {
@@ -1156,7 +1158,9 @@ static void virtqueue_split_get_avail_bytes(VirtQueue *vq,
 }
 
 done:
-address_space_cache_destroy(&indirect_desc_cache);
+if (desc_cache == &indirect_desc_cache) {
+address_space_cache_destroy(&indirect_desc_cache);
+}
 if (in_bytes) {
 *in_bytes = in_total;
 }
@@ -1207,8 +1211,8 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
*vq,
 VirtIODevice *vdev = vq->vdev;
 unsigned int idx;
 unsigned int total_bufs, in_total, out_total;
-MemoryRegionCache *desc_cache;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len = 0;
 VRingPackedDesc desc;
 bool wrap_counter;
@@ -1297,7 +1301,9 @@ static void virtqueue_packed_get_avail_bytes(VirtQueue 
*vq,
 vq->shadow_avail_idx = idx;
 vq->shadow_avail_wrap_counter = wrap_counter;
 done:
-address_space_cache_destroy(&indirect_desc_cache);
+if (desc_cache == &indirect_desc_cache) {
+address_space_cache_destroy(&indirect_desc_cache);
+}
 if (in_bytes) {
 *in_bytes = in_total;
 }
@@ -1487,8 +1493,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
 unsigned int i, head, max;
 VRingMemoryRegionCaches *caches;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
-MemoryRegionCache *desc_cache;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len;
 VirtIODevice *vdev = vq->vdev;
 VirtQueueElement *elem = NULL;
@@ -1611,7 +1617,9 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-address_space_cache_destroy(&indirect_desc_cache);
+if (desc_cache == &indirect_desc_cache) {
+address_space_cache_destroy(&indirect_desc_cache);
+}
 
 return elem;
 
@@ -1624,8 +1632,8 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 {
 unsigned int i, max;
 VRingMemoryRegionCaches *caches;
-MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
-MemoryRegionCache *desc_cache;
+MemoryRegionCache indirect_desc_cache;
+MemoryRegionCache *desc_cache = NULL;
 int64_t len;
 VirtIODevice *vdev = vq->vdev;
 VirtQueueElement *elem = NULL;
@@ -1746,7 +1754,9 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
-address_space_cache_destroy(&indirect_desc_cache);
+if (desc_cache == &indirect_desc_cache) {
+address_space_cache_destroy(&indirect_desc_cache);
+}
 
 return elem;
 
@@ -3935,8 +3945,8 @@ VirtioQueueElement 
*qmp_x_query_virtio_queue_element(const char *path,
 } else {
 unsigned int head, i, max;
 V