Re: [PATCH 22/23] i2c: virtio: Remove #ifdef guards for PM related functions

2023-07-05 Thread Viresh Kumar
On 05-07-23, 22:45, Paul Cercueil wrote:
> Use the new PM macros for the suspend and resume functions to be
> automatically dropped by the compiler when CONFIG_PM or
> CONFIG_PM_SLEEP are disabled, without having to use #ifdef guards.
> 
> This has the advantage of always compiling these functions in,
> independently of any Kconfig option. Thanks to that, bugs and other
> regressions are subsequently easier to catch.
> 
> Signed-off-by: Paul Cercueil 
> 
> ---
> Cc: Conghui Chen 
> Cc: Viresh Kumar 
> Cc: virtualization@lists.linux-foundation.org
> ---
>  drivers/i2c/busses/i2c-virtio.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
> index 4b9536f50800..c60ae531ba57 100644
> --- a/drivers/i2c/busses/i2c-virtio.c
> +++ b/drivers/i2c/busses/i2c-virtio.c
> @@ -243,7 +243,6 @@ static struct virtio_device_id id_table[] = {
>  };
>  MODULE_DEVICE_TABLE(virtio, id_table);
>  
> -#ifdef CONFIG_PM_SLEEP
>  static int virtio_i2c_freeze(struct virtio_device *vdev)
>  {
>   virtio_i2c_del_vqs(vdev);
> @@ -254,7 +253,6 @@ static int virtio_i2c_restore(struct virtio_device *vdev)
>  {
>   return virtio_i2c_setup_vqs(vdev->priv);
>  }
> -#endif
>  
>  static const unsigned int features[] = {
>   VIRTIO_I2C_F_ZERO_LENGTH_REQUEST,
> @@ -269,10 +267,8 @@ static struct virtio_driver virtio_i2c_driver = {
>   .driver = {
>   .name   = "i2c_virtio",
>   },
> -#ifdef CONFIG_PM_SLEEP
> - .freeze = virtio_i2c_freeze,
> - .restore = virtio_i2c_restore,
> -#endif
> + .freeze = pm_sleep_ptr(virtio_i2c_freeze),
> + .restore= pm_sleep_ptr(virtio_i2c_restore),
>  };
>  module_virtio_driver(virtio_i2c_driver);
>  

Acked-by: Viresh Kumar 

-- 
viresh
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 3/3] vduse: Temporarily disable control queue features

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 6:04 PM Maxime Coquelin
 wrote:
>
> Virtio-net driver control queue implementation is not safe
> when used with VDUSE. If the VDUSE application does not
> reply to control queue messages, it currently ends up
> hanging the kernel thread sending this command.
>
> Some work is on-going to make the control queue
> implementation robust with VDUSE. Until it is completed,
> let's filter out control virtqueue and features that depend
> on it by keeping only features known to be supported.
>
> Signed-off-by: Maxime Coquelin 

Acked-by: Jason Wang 

Thanks

> ---
>  drivers/vdpa/vdpa_user/vduse_dev.c | 36 ++
>  1 file changed, 36 insertions(+)
>
> diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> b/drivers/vdpa/vdpa_user/vduse_dev.c
> index 1271c9796517..7345071db0a8 100644
> --- a/drivers/vdpa/vdpa_user/vduse_dev.c
> +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> @@ -46,6 +46,30 @@
>
>  #define IRQ_UNBOUND -1
>
> +#define VDUSE_NET_VALID_FEATURES_MASK   \
> +   (BIT_ULL(VIRTIO_NET_F_CSUM) |   \
> +BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \
> +BIT_ULL(VIRTIO_NET_F_MTU) |\
> +BIT_ULL(VIRTIO_NET_F_MAC) |\
> +BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \
> +BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \
> +BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |  \
> +BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |  \
> +BIT_ULL(VIRTIO_NET_F_HOST_ECN) |   \
> +BIT_ULL(VIRTIO_NET_F_HOST_UFO) |   \
> +BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |  \
> +BIT_ULL(VIRTIO_NET_F_STATUS) | \
> +BIT_ULL(VIRTIO_NET_F_HOST_USO) |   \
> +BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \
> +BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \
> +BIT_ULL(VIRTIO_F_EVENT_IDX) |  \
> +BIT_ULL(VIRTIO_F_VERSION_1) |  \
> +BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) | \
> +BIT_ULL(VIRTIO_F_RING_PACKED) |\
> +BIT_ULL(VIRTIO_F_IN_ORDER))
> +
>  struct vduse_virtqueue {
> u16 index;
> u16 num_max;
> @@ -1778,6 +1802,16 @@ static struct attribute *vduse_dev_attrs[] = {
>
>  ATTRIBUTE_GROUPS(vduse_dev);
>
> +static void vduse_dev_features_filter(struct vduse_dev_config *config)
> +{
> +   /*
> +* Temporarily filter out virtio-net's control virtqueue and features
> +* that depend on it while CVQ is being made more robust for VDUSE.
> +*/
> +   if (config->device_id == VIRTIO_ID_NET)
> +   config->features &= VDUSE_NET_VALID_FEATURES_MASK;
> +}
> +
>  static int vduse_create_dev(struct vduse_dev_config *config,
> void *config_buf, u64 api_version)
>  {
> @@ -1793,6 +1827,8 @@ static int vduse_create_dev(struct vduse_dev_config 
> *config,
> if (!dev)
> goto err;
>
> +   vduse_dev_features_filter(config);
> +
> dev->api_version = api_version;
> dev->device_features = config->features;
> dev->device_id = config->device_id;
> --
> 2.41.0
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 4:42 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 05, 2023 at 03:55:23PM +0800, Jason Wang wrote:
> > On Tue, Jul 4, 2023 at 6:38 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > > With the current code it is accepted as long as userland send it.
> > > > > >
> > > > > > Although userland should not set a feature flag that has not been
> > > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code 
> > > > > > will not
> > > > > > complain for it.
> > > > > >
> > > > > > Since there is no specific reason for any parent to reject that 
> > > > > > backend
> > > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > > frontend
> > > > > > level. Future patches may move this control to the parent driver.
> > > > > >
> > > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > > Signed-off-by: Eugenio Pérez 
> > > > >
> > > > > Please do send v3. And again, I don't want to send "after driver ok" 
> > > > > hack
> > > > > upstream at all, I merged it in next just to give it some testing.
> > > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > > >
> > > >
> > > > Current devices do not support that semantic.
> > >
> > > Which devices specifically access the ring after DRIVER_OK but before
> > > a kick?
> >
> > Vhost-net is one example at last. It polls a socket as well, so it
> > starts to access the ring immediately after DRIVER_OK.
> >
> > Thanks
>
>
> For sure but that is not vdpa.

Somehow via vp_vdpa that I'm usually testing vDPA patches.

The problem is that it's very hard to audit all vDPA devices now if it
is not defined in the spec before they are invented.

Thanks

>
> > >
> > > > My plan was to convert
> > > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > > was not explicit enough.
> > > >
> > > > The only solution I can see to that is to trap & emulate in the vdpa
> > > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > > the architecture:
> > > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > > * Store vq enable state separately, at
> > > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > > * Store the doorbell state separately, but do not configure it to the
> > > > device directly.
> > > >
> > > > But how to recover if the device cannot configure them at kick time,
> > > > for example?
> > > >
> > > > Maybe we can just fail if the parent driver does not support enabling
> > > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > > > ---
> > > > > > Sent with Fixes: tag pointing to 
> > > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > > >
> > > > > > [1] 
> > > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > > ---
> > > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > > --- a/drivers/vhost/vdpa.c
> > > > > > +++ b/drivers/vhost/vdpa.c
> > > > > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct 
> > > > > > file *filep,
> > > > > >  {
> > > > > >   struct vhost_vdpa *v = filep->private_data;
> > > > > >   struct vhost_dev *d = >vdev;
> > > > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > > > >   void __user *argp = (void __user *)arg;
> > > > > >   u64 __user *featurep = argp;
> > > > > > - u64 features;
> > > > > > + u64 features, parent_features = 0;
> > > > > >   long r = 0;
> > > > > >
> > > > > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > > > > >   if (copy_from_user(, featurep, 
> > > > > > sizeof(features)))
> > > > > >   return -EFAULT;
> > > > > > + if (ops->get_backend_features)
> > > > > > + parent_features = 
> > > > > > ops->get_backend_features(v->vdpa);
> > > > > >   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> > > > > >BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> > > > > >BIT_ULL(VHOST_BACKEND_F_RESUME) |
> > > > > > -  
> > > > > > BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> > > > > > +  parent_features))
> > > > > >   return -EOPNOTSUPP;
> > > > > >   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> > > > > >

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 4:41 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 05, 2023 at 03:49:58PM +0800, Jason Wang wrote:
> > On Tue, Jul 4, 2023 at 11:45 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 04, 2023 at 01:36:11PM +0200, Eugenio Perez Martin wrote:
> > > > On Tue, Jul 4, 2023 at 12:38 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > > > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > > > > With the current code it is accepted as long as userland send 
> > > > > > > > it.
> > > > > > > >
> > > > > > > > Although userland should not set a feature flag that has not 
> > > > > > > > been
> > > > > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code 
> > > > > > > > will not
> > > > > > > > complain for it.
> > > > > > > >
> > > > > > > > Since there is no specific reason for any parent to reject that 
> > > > > > > > backend
> > > > > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > > > > frontend
> > > > > > > > level. Future patches may move this control to the parent 
> > > > > > > > driver.
> > > > > > > >
> > > > > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > >
> > > > > > > Please do send v3. And again, I don't want to send "after driver 
> > > > > > > ok" hack
> > > > > > > upstream at all, I merged it in next just to give it some testing.
> > > > > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > > > > >
> > > > > >
> > > > > > Current devices do not support that semantic.
> > > > >
> > > > > Which devices specifically access the ring after DRIVER_OK but before
> > > > > a kick?
> > > > >
> > > >
> > > > Previous versions of the QEMU LM series did a spurious kick to start
> > > > traffic at the LM destination [1]. When it was proposed, that spurious
> > > > kick was removed from the series because to check for descriptors
> > > > after driver_ok, even without a kick, was considered work of the
> > > > parent driver.
> > > >
> > > > I'm ok to go back to this spurious kick, but I'm not sure if the hw
> > > > will read the ring before the kick actually. I can ask.
> > > >
> > > > Thanks!
> > > >
> > > > [1] 
> > > > https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg02775.html
> > >
> > > Let's find out. We need to check for ENABLE_AFTER_DRIVER_OK too, no?
> >
> > My understanding is [1] assuming ACCESS_AFTER_KICK. This seems
> > sub-optimal than assuming ENABLE_AFTER_DRIVER_OK.
> >
> > But this reminds me one thing, as the thread is going too long, I
> > wonder if we simply assume ENABLE_AFTER_DRIVER_OK if RING_RESET is
> > supported?
> >
> > Thanks
>
> I don't see what does one have to do with another ...
>
> I think with RING_RESET we had another solution, enable rings
> mapping them to a zero page, then reset and re-enable later.

As discussed before, this seems to have some problems:

1) The behaviour is not clarified in the document
2) zero is a valid IOVA

Thanks

>
> > >
> > >
> > >
> > > > > > My plan was to convert
> > > > > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > > > > was not explicit enough.
> > > > > >
> > > > > > The only solution I can see to that is to trap & emulate in the vdpa
> > > > > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > > > > the architecture:
> > > > > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > > > > * Store vq enable state separately, at
> > > > > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > > > > * Store the doorbell state separately, but do not configure it to 
> > > > > > the
> > > > > > device directly.
> > > > > >
> > > > > > But how to recover if the device cannot configure them at kick time,
> > > > > > for example?
> > > > > >
> > > > > > Maybe we can just fail if the parent driver does not support 
> > > > > > enabling
> > > > > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > > Sent with Fixes: tag pointing to 
> > > > > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > > > > >
> > > > > > > > [1] 
> > > > > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > > > > ---
> > > > > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > > > > --- a/drivers/vhost/vdpa.c
> > > > > > > > +++ 

Re: [PATCH v2 00/24] use vmalloc_array and vcalloc

2023-07-05 Thread Martin K. Petersen


Julia,

> The functions vmalloc_array and vcalloc were introduced in
>
> commit a8749a35c399 ("mm: vmalloc: introduce array allocation functions")
>
> but are not used much yet.  This series introduces uses of
> these functions, to protect against multiplication overflows.

Applied #7 and #24 to 6.5/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-05 Thread Liang Chen
On Wed, Jul 5, 2023 at 2:05 PM Jason Wang  wrote:
>
> On Wed, Jul 5, 2023 at 1:41 PM Liang Chen  wrote:
> >
> > On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  
> > wrote:
> > >
> > > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> > > >
> > > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > > > The implementation at the moment uses one page per packet 
> > > > > > > > > > > in both the
> > > > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > > > parameter to enable
> > > > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > > > >
> > > > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > > > performance gain
> > > > > > > > > > > in the normal path.
> > > > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > > > >
> > > > > > > > > > > In multi-core vm testing environments, The most 
> > > > > > > > > > > significant performance
> > > > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > > > >
> > > > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > > > fragmentation and
> > > > > > > > > > > DMA map/unmap support.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > > > >
> > > > > > > > > > Why off by default?
> > > > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > > > The less modes we have the better...
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Sure, now I believe it makes sense to enable it by default. 
> > > > > > > > > When the
> > > > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > > > coalescing. But such cases are rare.
> > > > > > > >
> > > > > > > > small packets are rare? These workloads are easy to create 
> > > > > > > > actually.
> > > > > > > > Pls try and include benchmark with small packet size.
> > > > > > > >
> > > > > > >
> > > > > > > Sure, Thanks!
> > > > > >
> > > > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > > > advice for the cases of small packets. I have done more performance
> > > > > > benchmark with small packets since then. Here is a list of iperf
> > > > > > output,
> > > > > >
> > > > > > With PP and PP fragmenting:
> > > > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0   
> > > > > >  144 KBytes
> > > > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > > > 223 KBytes
> > > > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > > > 324 KBytes
> > > > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > > > 1.08 MBytes
> > > > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > > > 744 KBytes
> > > > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0  
> > > > > >   963 KBytes
> > > > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0  
> > > > > >  1.25 MBytes
> > > > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0  
> > > > > >  1.70 MBytes
> > > > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > > > 4.26 MBytes
> > > > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > > > 3.20 MBytes
> > > >
> > > > Note that virtio-net driver is lacking things like BQL and others, so
> > > > it might suffer from buffer bloat for TCP performance. Would you mind
> > > > to measure with e.g using testpmd on the vhost to see the rx PPS?
> > > >
> > >
> > > No problem. Before we proceed to measure with testpmd, could you
> > > please take a look at the PPS measurements we obtained previously and
> > > see if they are sufficient? Though we will only utilize page pool for
> > > xdp on v2.
> > >
> > > netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
> > >
> > > with page pool:
> > > 1.
> > > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > > rxcmp/s   txcmp/s  

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Shannon Nelson via Virtualization

On 7/5/23 11:27 AM, Eugenio Perez Martin wrote:
>

On Wed, Jul 5, 2023 at 9:50 AM Jason Wang  wrote:


On Tue, Jul 4, 2023 at 11:45 PM Michael S. Tsirkin  wrote:


On Tue, Jul 04, 2023 at 01:36:11PM +0200, Eugenio Perez Martin wrote:

On Tue, Jul 4, 2023 at 12:38 PM Michael S. Tsirkin  wrote:


On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:

On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  wrote:


On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:

With the current code it is accepted as long as userland send it.

Although userland should not set a feature flag that has not been
offered to it with VHOST_GET_BACKEND_FEATURES, the current code will not
complain for it.

Since there is no specific reason for any parent to reject that backend
feature bit when it has been proposed, let's control it at vdpa frontend
level. Future patches may move this control to the parent driver.

Fixes: 967800d2d52e ("vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend 
feature")
Signed-off-by: Eugenio Pérez 


Please do send v3. And again, I don't want to send "after driver ok" hack
upstream at all, I merged it in next just to give it some testing.
We want RING_ACCESS_AFTER_KICK or some such.



Current devices do not support that semantic.


Which devices specifically access the ring after DRIVER_OK but before
a kick?


The PDS vdpa device can deal with a call to .set_vq_ready after 
DRIVER_OK is set.  And I'm told that our VQ activity should start 
without a kick.


Our vdpa device FW doesn't currently have support for 
VIRTIO_F_RING_RESET, but I believe it could be added without too much 
trouble.


sln






Previous versions of the QEMU LM series did a spurious kick to start
traffic at the LM destination [1]. When it was proposed, that spurious
kick was removed from the series because to check for descriptors
after driver_ok, even without a kick, was considered work of the
parent driver.

I'm ok to go back to this spurious kick, but I'm not sure if the hw
will read the ring before the kick actually. I can ask.

Thanks!

[1] https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg02775.html


Let's find out. We need to check for ENABLE_AFTER_DRIVER_OK too, no?


My understanding is [1] assuming ACCESS_AFTER_KICK. This seems
sub-optimal than assuming ENABLE_AFTER_DRIVER_OK.

But this reminds me one thing, as the thread is going too long, I
wonder if we simply assume ENABLE_AFTER_DRIVER_OK if RING_RESET is
supported?



The problem with that is that the device needs to support all
RING_RESET, like to be able to change vq address etc after DRIVER_OK.
Not all HW support it.

We just need the subset of having the dataplane freezed until all CVQ
commands have been consumed. I'm sure current vDPA code already
supports it in some devices, like MLX and PSD.

Thanks!


Thanks






My plan was to convert
it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
was not explicit enough.

The only solution I can see to that is to trap & emulate in the vdpa
(parent?) driver, as talked in virtio-comment. But that complicates
the architecture:
* Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
* Store vq enable state separately, at
vdpa->config->set_vq_ready(true), but not transmit that enable to hw
* Store the doorbell state separately, but do not configure it to the
device directly.

But how to recover if the device cannot configure them at kick time,
for example?

Maybe we can just fail if the parent driver does not support enabling
the vq after DRIVER_OK? That way no new feature flag is needed.

Thanks!




---
Sent with Fixes: tag pointing to git.kernel.org/pub/scm/linux/kernel/git/mst
commit. Please let me know if I should send a v3 of [1] instead.

[1] 
https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
---
  drivers/vhost/vdpa.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index e1abf29fed5b..a7e554352351 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
  {
   struct vhost_vdpa *v = filep->private_data;
   struct vhost_dev *d = >vdev;
+ const struct vdpa_config_ops *ops = v->vdpa->config;
   void __user *argp = (void __user *)arg;
   u64 __user *featurep = argp;
- u64 features;
+ u64 features, parent_features = 0;
   long r = 0;

   if (cmd == VHOST_SET_BACKEND_FEATURES) {
   if (copy_from_user(, featurep, sizeof(features)))
   return -EFAULT;
+ if (ops->get_backend_features)
+ parent_features = ops->get_backend_features(v->vdpa);
   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_RESUME) |
-  

[PATCH] vduse: Use proper spinlock for IRQ injection

2023-07-05 Thread Maxime Coquelin
The IRQ injection work used spin_lock_irq() to protect the
scheduling of the softirq, but spin_lock_bh() should be
used.

With spin_lock_irq(), we noticed delay of more than 6
seconds between the time a NAPI polling work is scheduled
and the time it is executed.

Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace")
Cc: xieyon...@bytedance.com

Suggested-by: Jason Wang 
Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index dc38ed21319d..df7869537ef1 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -935,10 +935,10 @@ static void vduse_dev_irq_inject(struct work_struct *work)
 {
struct vduse_dev *dev = container_of(work, struct vduse_dev, inject);
 
-   spin_lock_irq(>irq_lock);
+   spin_lock_bh(>irq_lock);
if (dev->config_cb.callback)
dev->config_cb.callback(dev->config_cb.private);
-   spin_unlock_irq(>irq_lock);
+   spin_unlock_bh(>irq_lock);
 }
 
 static void vduse_vq_irq_inject(struct work_struct *work)
@@ -946,10 +946,10 @@ static void vduse_vq_irq_inject(struct work_struct *work)
struct vduse_virtqueue *vq = container_of(work,
struct vduse_virtqueue, inject);
 
-   spin_lock_irq(>irq_lock);
+   spin_lock_bh(>irq_lock);
if (vq->ready && vq->cb.callback)
vq->cb.callback(vq->cb.private);
-   spin_unlock_irq(>irq_lock);
+   spin_unlock_bh(>irq_lock);
 }
 
 static bool vduse_vq_signal_irqfd(struct vduse_virtqueue *vq)
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/3] vduse: Temporarily disable control queue features

2023-07-05 Thread Maxime Coquelin
Virtio-net driver control queue implementation is not safe
when used with VDUSE. If the VDUSE application does not
reply to control queue messages, it currently ends up
hanging the kernel thread sending this command.

Some work is on-going to make the control queue
implementation robust with VDUSE. Until it is completed,
let's filter out control virtqueue and features that depend
on it by keeping only features known to be supported.

Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 36 ++
 1 file changed, 36 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index 1271c9796517..7345071db0a8 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -46,6 +46,30 @@
 
 #define IRQ_UNBOUND -1
 
+#define VDUSE_NET_VALID_FEATURES_MASK   \
+   (BIT_ULL(VIRTIO_NET_F_CSUM) |   \
+BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \
+BIT_ULL(VIRTIO_NET_F_MTU) |\
+BIT_ULL(VIRTIO_NET_F_MAC) |\
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) | \
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) | \
+BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |  \
+BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |  \
+BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |  \
+BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |  \
+BIT_ULL(VIRTIO_NET_F_HOST_ECN) |   \
+BIT_ULL(VIRTIO_NET_F_HOST_UFO) |   \
+BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |  \
+BIT_ULL(VIRTIO_NET_F_STATUS) | \
+BIT_ULL(VIRTIO_NET_F_HOST_USO) |   \
+BIT_ULL(VIRTIO_F_ANY_LAYOUT) | \
+BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) | \
+BIT_ULL(VIRTIO_F_EVENT_IDX) |  \
+BIT_ULL(VIRTIO_F_VERSION_1) |  \
+BIT_ULL(VIRTIO_F_IOMMU_PLATFORM) | \
+BIT_ULL(VIRTIO_F_RING_PACKED) |\
+BIT_ULL(VIRTIO_F_IN_ORDER))
+
 struct vduse_virtqueue {
u16 index;
u16 num_max;
@@ -1778,6 +1802,16 @@ static struct attribute *vduse_dev_attrs[] = {
 
 ATTRIBUTE_GROUPS(vduse_dev);
 
+static void vduse_dev_features_filter(struct vduse_dev_config *config)
+{
+   /*
+* Temporarily filter out virtio-net's control virtqueue and features
+* that depend on it while CVQ is being made more robust for VDUSE.
+*/
+   if (config->device_id == VIRTIO_ID_NET)
+   config->features &= VDUSE_NET_VALID_FEATURES_MASK;
+}
+
 static int vduse_create_dev(struct vduse_dev_config *config,
void *config_buf, u64 api_version)
 {
@@ -1793,6 +1827,8 @@ static int vduse_create_dev(struct vduse_dev_config 
*config,
if (!dev)
goto err;
 
+   vduse_dev_features_filter(config);
+
dev->api_version = api_version;
dev->device_features = config->features;
dev->device_id = config->device_id;
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/3] vduse: validate block features only with block devices

2023-07-05 Thread Maxime Coquelin
This patch is preliminary work to enable network device
type support to VDUSE.

As VIRTIO_BLK_F_CONFIG_WCE shares the same value as
VIRTIO_NET_F_HOST_TSO4, we need to restrict its check
to Virtio-blk device type.

Acked-by: Jason Wang 
Reviewed-by: Xie Yongji 
Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index dc38ed21319d..ff9fdd6783fe 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1662,13 +1662,14 @@ static bool device_is_allowed(u32 device_id)
return false;
 }
 
-static bool features_is_valid(u64 features)
+static bool features_is_valid(struct vduse_dev_config *config)
 {
-   if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
+   if (!(config->features & (1ULL << VIRTIO_F_ACCESS_PLATFORM)))
return false;
 
/* Now we only support read-only configuration space */
-   if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE))
+   if ((config->device_id == VIRTIO_ID_BLOCK) &&
+   (config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)))
return false;
 
return true;
@@ -1695,7 +1696,7 @@ static bool vduse_validate_config(struct vduse_dev_config 
*config)
if (!device_is_allowed(config->device_id))
return false;
 
-   if (!features_is_valid(config->features))
+   if (!features_is_valid(config))
return false;
 
return true;
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/3] vduse: add support for networking devices

2023-07-05 Thread Maxime Coquelin
This small series enables virtio-net device type in VDUSE.
With it, basic operation have been tested, both with
virtio-vdpa and vhost-vdpa using DPDK Vhost library series
adding VDUSE support using split rings layout (merged in
DPDK v23.07-rc1).

Control queue support (and so multiqueue) has also been
tested, but requires a Kernel series from Jason Wang
relaxing control queue polling [1] to function reliably,
so while Jason rework is done, a patch is added to disable
CVQ and features that depend on it (tested also with DPDK
v23.07-rc1).

[1]: 
https://lore.kernel.org/lkml/cacgkmetgrxn3ppwsdo4oosnssljfembez0wvjgrr3whu+qa...@mail.gmail.com/T/

v2 -> v3 changes:
=
- Use allow list instead of deny list (Michael)

v1 -> v2 changes:
=
- Add a patch to disable CVQ (Michael)

RFC -> v1 changes:
==
- Fail device init if it does not support VERSION_1 (Jason)

Maxime Coquelin (3):
  vduse: validate block features only with block devices
  vduse: enable Virtio-net device type
  vduse: Temporarily disable control queue features

 drivers/vdpa/vdpa_user/vduse_dev.c | 51 +++---
 1 file changed, 47 insertions(+), 4 deletions(-)

-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/3] vduse: enable Virtio-net device type

2023-07-05 Thread Maxime Coquelin
This patch adds Virtio-net device type to the supported
devices types. Initialization fails if the device does
not support VIRTIO_F_VERSION_1 feature, in order to
guarantee the configuration space is read-only.

Acked-by: Jason Wang 
Reviewed-by: Xie Yongji 
Signed-off-by: Maxime Coquelin 
---
 drivers/vdpa/vdpa_user/vduse_dev.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
index ff9fdd6783fe..1271c9796517 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -142,6 +142,7 @@ static struct workqueue_struct *vduse_irq_bound_wq;
 
 static u32 allowed_device_id[] = {
VIRTIO_ID_BLOCK,
+   VIRTIO_ID_NET,
 };
 
 static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa)
@@ -1672,6 +1673,10 @@ static bool features_is_valid(struct vduse_dev_config 
*config)
(config->features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)))
return false;
 
+   if ((config->device_id == VIRTIO_ID_NET) &&
+   !(config->features & (1ULL << VIRTIO_F_VERSION_1)))
+   return false;
+
return true;
 }
 
@@ -2027,6 +2032,7 @@ static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops 
= {
 
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID },
+   { VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID },
{ 0 },
 };
 
-- 
2.41.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 03:55:23PM +0800, Jason Wang wrote:
> On Tue, Jul 4, 2023 at 6:38 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > With the current code it is accepted as long as userland send it.
> > > > >
> > > > > Although userland should not set a feature flag that has not been
> > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code will 
> > > > > not
> > > > > complain for it.
> > > > >
> > > > > Since there is no specific reason for any parent to reject that 
> > > > > backend
> > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > frontend
> > > > > level. Future patches may move this control to the parent driver.
> > > > >
> > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > Signed-off-by: Eugenio Pérez 
> > > >
> > > > Please do send v3. And again, I don't want to send "after driver ok" 
> > > > hack
> > > > upstream at all, I merged it in next just to give it some testing.
> > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > >
> > >
> > > Current devices do not support that semantic.
> >
> > Which devices specifically access the ring after DRIVER_OK but before
> > a kick?
> 
> Vhost-net is one example at last. It polls a socket as well, so it
> starts to access the ring immediately after DRIVER_OK.
> 
> Thanks


For sure but that is not vdpa.

> >
> > > My plan was to convert
> > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > was not explicit enough.
> > >
> > > The only solution I can see to that is to trap & emulate in the vdpa
> > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > the architecture:
> > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > * Store vq enable state separately, at
> > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > * Store the doorbell state separately, but do not configure it to the
> > > device directly.
> > >
> > > But how to recover if the device cannot configure them at kick time,
> > > for example?
> > >
> > > Maybe we can just fail if the parent driver does not support enabling
> > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > >
> > > Thanks!
> > >
> > > >
> > > > > ---
> > > > > Sent with Fixes: tag pointing to 
> > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > >
> > > > > [1] 
> > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > ---
> > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > --- a/drivers/vhost/vdpa.c
> > > > > +++ b/drivers/vhost/vdpa.c
> > > > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct 
> > > > > file *filep,
> > > > >  {
> > > > >   struct vhost_vdpa *v = filep->private_data;
> > > > >   struct vhost_dev *d = >vdev;
> > > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > > >   void __user *argp = (void __user *)arg;
> > > > >   u64 __user *featurep = argp;
> > > > > - u64 features;
> > > > > + u64 features, parent_features = 0;
> > > > >   long r = 0;
> > > > >
> > > > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > > > >   if (copy_from_user(, featurep, 
> > > > > sizeof(features)))
> > > > >   return -EFAULT;
> > > > > + if (ops->get_backend_features)
> > > > > + parent_features = 
> > > > > ops->get_backend_features(v->vdpa);
> > > > >   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> > > > >BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> > > > >BIT_ULL(VHOST_BACKEND_F_RESUME) |
> > > > > -  
> > > > > BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> > > > > +  parent_features))
> > > > >   return -EOPNOTSUPP;
> > > > >   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> > > > >!vhost_vdpa_can_suspend(v))
> > > > > --
> > > > > 2.39.3
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 03:49:58PM +0800, Jason Wang wrote:
> On Tue, Jul 4, 2023 at 11:45 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Jul 04, 2023 at 01:36:11PM +0200, Eugenio Perez Martin wrote:
> > > On Tue, Jul 4, 2023 at 12:38 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > > > With the current code it is accepted as long as userland send it.
> > > > > > >
> > > > > > > Although userland should not set a feature flag that has not been
> > > > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code 
> > > > > > > will not
> > > > > > > complain for it.
> > > > > > >
> > > > > > > Since there is no specific reason for any parent to reject that 
> > > > > > > backend
> > > > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > > > frontend
> > > > > > > level. Future patches may move this control to the parent driver.
> > > > > > >
> > > > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > >
> > > > > > Please do send v3. And again, I don't want to send "after driver 
> > > > > > ok" hack
> > > > > > upstream at all, I merged it in next just to give it some testing.
> > > > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > > > >
> > > > >
> > > > > Current devices do not support that semantic.
> > > >
> > > > Which devices specifically access the ring after DRIVER_OK but before
> > > > a kick?
> > > >
> > >
> > > Previous versions of the QEMU LM series did a spurious kick to start
> > > traffic at the LM destination [1]. When it was proposed, that spurious
> > > kick was removed from the series because to check for descriptors
> > > after driver_ok, even without a kick, was considered work of the
> > > parent driver.
> > >
> > > I'm ok to go back to this spurious kick, but I'm not sure if the hw
> > > will read the ring before the kick actually. I can ask.
> > >
> > > Thanks!
> > >
> > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg02775.html
> >
> > Let's find out. We need to check for ENABLE_AFTER_DRIVER_OK too, no?
> 
> My understanding is [1] assuming ACCESS_AFTER_KICK. This seems
> sub-optimal than assuming ENABLE_AFTER_DRIVER_OK.
> 
> But this reminds me one thing, as the thread is going too long, I
> wonder if we simply assume ENABLE_AFTER_DRIVER_OK if RING_RESET is
> supported?
> 
> Thanks

I don't see what does one have to do with another ...

I think with RING_RESET we had another solution, enable rings
mapping them to a zero page, then reset and re-enable later.

> >
> >
> >
> > > > > My plan was to convert
> > > > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > > > was not explicit enough.
> > > > >
> > > > > The only solution I can see to that is to trap & emulate in the vdpa
> > > > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > > > the architecture:
> > > > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > > > * Store vq enable state separately, at
> > > > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > > > * Store the doorbell state separately, but do not configure it to the
> > > > > device directly.
> > > > >
> > > > > But how to recover if the device cannot configure them at kick time,
> > > > > for example?
> > > > >
> > > > > Maybe we can just fail if the parent driver does not support enabling
> > > > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > >
> > > > > > > ---
> > > > > > > Sent with Fixes: tag pointing to 
> > > > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > > > >
> > > > > > > [1] 
> > > > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > > > ---
> > > > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > > > --- a/drivers/vhost/vdpa.c
> > > > > > > +++ b/drivers/vhost/vdpa.c
> > > > > > > @@ -681,18 +681,21 @@ static long 
> > > > > > > vhost_vdpa_unlocked_ioctl(struct file *filep,
> > > > > > >  {
> > > > > > >   struct vhost_vdpa *v = filep->private_data;
> > > > > > >   struct vhost_dev *d = >vdev;
> > > > > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > > > > >   void __user *argp = (void __user *)arg;
> > > > > > >   u64 __user *featurep = argp;
> > > > > > > - u64 features;
> > > > > > > + u64 features, 

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Jason Wang
On Tue, Jul 4, 2023 at 6:38 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > With the current code it is accepted as long as userland send it.
> > > >
> > > > Although userland should not set a feature flag that has not been
> > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code will not
> > > > complain for it.
> > > >
> > > > Since there is no specific reason for any parent to reject that backend
> > > > feature bit when it has been proposed, let's control it at vdpa frontend
> > > > level. Future patches may move this control to the parent driver.
> > > >
> > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > Signed-off-by: Eugenio Pérez 
> > >
> > > Please do send v3. And again, I don't want to send "after driver ok" hack
> > > upstream at all, I merged it in next just to give it some testing.
> > > We want RING_ACCESS_AFTER_KICK or some such.
> > >
> >
> > Current devices do not support that semantic.
>
> Which devices specifically access the ring after DRIVER_OK but before
> a kick?

Vhost-net is one example at last. It polls a socket as well, so it
starts to access the ring immediately after DRIVER_OK.

Thanks

>
> > My plan was to convert
> > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > was not explicit enough.
> >
> > The only solution I can see to that is to trap & emulate in the vdpa
> > (parent?) driver, as talked in virtio-comment. But that complicates
> > the architecture:
> > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > * Store vq enable state separately, at
> > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > * Store the doorbell state separately, but do not configure it to the
> > device directly.
> >
> > But how to recover if the device cannot configure them at kick time,
> > for example?
> >
> > Maybe we can just fail if the parent driver does not support enabling
> > the vq after DRIVER_OK? That way no new feature flag is needed.
> >
> > Thanks!
> >
> > >
> > > > ---
> > > > Sent with Fixes: tag pointing to 
> > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > >
> > > > [1] 
> > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > ---
> > > >  drivers/vhost/vdpa.c | 7 +--
> > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > index e1abf29fed5b..a7e554352351 100644
> > > > --- a/drivers/vhost/vdpa.c
> > > > +++ b/drivers/vhost/vdpa.c
> > > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> > > > *filep,
> > > >  {
> > > >   struct vhost_vdpa *v = filep->private_data;
> > > >   struct vhost_dev *d = >vdev;
> > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > >   void __user *argp = (void __user *)arg;
> > > >   u64 __user *featurep = argp;
> > > > - u64 features;
> > > > + u64 features, parent_features = 0;
> > > >   long r = 0;
> > > >
> > > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > > >   if (copy_from_user(, featurep, sizeof(features)))
> > > >   return -EFAULT;
> > > > + if (ops->get_backend_features)
> > > > + parent_features = 
> > > > ops->get_backend_features(v->vdpa);
> > > >   if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
> > > >BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> > > >BIT_ULL(VHOST_BACKEND_F_RESUME) |
> > > > -  
> > > > BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> > > > +  parent_features))
> > > >   return -EOPNOTSUPP;
> > > >   if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> > > >!vhost_vdpa_can_suspend(v))
> > > > --
> > > > 2.39.3
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vdpa: reject F_ENABLE_AFTER_DRIVER_OK if backend does not support it

2023-07-05 Thread Jason Wang
On Tue, Jul 4, 2023 at 11:45 PM Michael S. Tsirkin  wrote:
>
> On Tue, Jul 04, 2023 at 01:36:11PM +0200, Eugenio Perez Martin wrote:
> > On Tue, Jul 4, 2023 at 12:38 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Jul 04, 2023 at 12:25:32PM +0200, Eugenio Perez Martin wrote:
> > > > On Mon, Jul 3, 2023 at 4:52 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 03, 2023 at 04:22:18PM +0200, Eugenio Pérez wrote:
> > > > > > With the current code it is accepted as long as userland send it.
> > > > > >
> > > > > > Although userland should not set a feature flag that has not been
> > > > > > offered to it with VHOST_GET_BACKEND_FEATURES, the current code 
> > > > > > will not
> > > > > > complain for it.
> > > > > >
> > > > > > Since there is no specific reason for any parent to reject that 
> > > > > > backend
> > > > > > feature bit when it has been proposed, let's control it at vdpa 
> > > > > > frontend
> > > > > > level. Future patches may move this control to the parent driver.
> > > > > >
> > > > > > Fixes: 967800d2d52e ("vdpa: accept 
> > > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature")
> > > > > > Signed-off-by: Eugenio Pérez 
> > > > >
> > > > > Please do send v3. And again, I don't want to send "after driver ok" 
> > > > > hack
> > > > > upstream at all, I merged it in next just to give it some testing.
> > > > > We want RING_ACCESS_AFTER_KICK or some such.
> > > > >
> > > >
> > > > Current devices do not support that semantic.
> > >
> > > Which devices specifically access the ring after DRIVER_OK but before
> > > a kick?
> > >
> >
> > Previous versions of the QEMU LM series did a spurious kick to start
> > traffic at the LM destination [1]. When it was proposed, that spurious
> > kick was removed from the series because to check for descriptors
> > after driver_ok, even without a kick, was considered work of the
> > parent driver.
> >
> > I'm ok to go back to this spurious kick, but I'm not sure if the hw
> > will read the ring before the kick actually. I can ask.
> >
> > Thanks!
> >
> > [1] https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg02775.html
>
> Let's find out. We need to check for ENABLE_AFTER_DRIVER_OK too, no?

My understanding is [1] assuming ACCESS_AFTER_KICK. This seems
sub-optimal than assuming ENABLE_AFTER_DRIVER_OK.

But this reminds me one thing, as the thread is going too long, I
wonder if we simply assume ENABLE_AFTER_DRIVER_OK if RING_RESET is
supported?

Thanks

>
>
>
> > > > My plan was to convert
> > > > it in vp_vdpa if needed, and reuse the current vdpa ops. Sorry if I
> > > > was not explicit enough.
> > > >
> > > > The only solution I can see to that is to trap & emulate in the vdpa
> > > > (parent?) driver, as talked in virtio-comment. But that complicates
> > > > the architecture:
> > > > * Offer VHOST_BACKEND_F_RING_ACCESS_AFTER_KICK
> > > > * Store vq enable state separately, at
> > > > vdpa->config->set_vq_ready(true), but not transmit that enable to hw
> > > > * Store the doorbell state separately, but do not configure it to the
> > > > device directly.
> > > >
> > > > But how to recover if the device cannot configure them at kick time,
> > > > for example?
> > > >
> > > > Maybe we can just fail if the parent driver does not support enabling
> > > > the vq after DRIVER_OK? That way no new feature flag is needed.
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > > > ---
> > > > > > Sent with Fixes: tag pointing to 
> > > > > > git.kernel.org/pub/scm/linux/kernel/git/mst
> > > > > > commit. Please let me know if I should send a v3 of [1] instead.
> > > > > >
> > > > > > [1] 
> > > > > > https://lore.kernel.org/lkml/20230609121244-mutt-send-email-...@kernel.org/T/
> > > > > > ---
> > > > > >  drivers/vhost/vdpa.c | 7 +--
> > > > > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > > > index e1abf29fed5b..a7e554352351 100644
> > > > > > --- a/drivers/vhost/vdpa.c
> > > > > > +++ b/drivers/vhost/vdpa.c
> > > > > > @@ -681,18 +681,21 @@ static long vhost_vdpa_unlocked_ioctl(struct 
> > > > > > file *filep,
> > > > > >  {
> > > > > >   struct vhost_vdpa *v = filep->private_data;
> > > > > >   struct vhost_dev *d = >vdev;
> > > > > > + const struct vdpa_config_ops *ops = v->vdpa->config;
> > > > > >   void __user *argp = (void __user *)arg;
> > > > > >   u64 __user *featurep = argp;
> > > > > > - u64 features;
> > > > > > + u64 features, parent_features = 0;
> > > > > >   long r = 0;
> > > > > >
> > > > > >   if (cmd == VHOST_SET_BACKEND_FEATURES) {
> > > > > >   if (copy_from_user(, featurep, 
> > > > > > sizeof(features)))
> > > > > >   return -EFAULT;
> > > > > > + if (ops->get_backend_features)
> > > > > > + parent_features = 
> > > > > > ops->get_backend_features(v->vdpa);
> > > > > >   if (features & 

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 2:16 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 05, 2023 at 01:47:44PM +0800, Jason Wang wrote:
> > On Wed, Jul 5, 2023 at 1:31 PM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jul 05, 2023 at 01:11:37PM +0800, Jason Wang wrote:
> > > > On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > > > > >
> > > > > >
> > > > > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > > > > Offer this backend feature as mlx5 is compatible with it. It 
> > > > > > > > allows it
> > > > > > > > to do live migration with CVQ, dynamically switching between 
> > > > > > > > passthrough
> > > > > > > > and shadow virtqueue.
> > > > > > > >
> > > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > > Same comment.
> > > > > > to which?
> > > > > >
> > > > > > -Siwei
> > > > >
> > > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to 
> > > > > commit to it
> > > > > as a kernel/userspace ABI: what if one wants to start rings in some
> > > > > other specific order?
> > > >
> > > > Just enable a queue by writing e.g 1 to queue_enable in a specific 
> > > > order?
> > >
> > >
> > > But then at driver ok time we don't know how many queues are there.
> >
> > There should be a device specific interface for this, for example,
> > num_queue_paris. So the device knows at most how many queues there
> > are. Or anything I miss?
>
> That's a device limitations. Does not tell the device how much is used.

I think I miss something, how kick differs from queue_enable in this way?

>
> > >
> > > > > As was discussed on list, a better promise is not to access ring
> > > > > until the 1st kick. vdpa can then do a kick when it wants
> > > > > the device to start accessing rings.
> > > >
> > > > Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
> > > > to allow queue_enable after DRIVER_OK, but it seems to have
> > > > distanvages:
> > > >
> > > > A busy polling software device may disable notifications and never
> > > > respond to register to any kick notifiers. ACCESS_AFTER_KICK will
> > > > introduce overhead to those implementations.
> > > >
> > > > Thanks
> > >
> > > It's just the 1st kick, then you can disable. No?
> >
> > Yes, but:
> >
> > 1) adding hooks for queue_enable
> > 2) adding new codes to register event notifier and toggle the notifier
> >
> > 1) seems much easier? Or for most devices, it already behaves like this.
> >
> > Thanks
>
> Well libvhostuser checks enabled queues at DRIVER_OK, does it not?

Probably, but I meant:

1) This behaviour has been supported by some device (e.g MLX)
2) This is the current behaviour of Qemu for vhost-net devices:

static void virtio_net_queue_enable(VirtIODevice *vdev, uint32_t queue_index)
{
VirtIONet *n = VIRTIO_NET(vdev);
NetClientState *nc;
int r;



if (get_vhost_net(nc->peer) &&
nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
r = vhost_net_virtqueue_restart(vdev, nc, queue_index);
if (r < 0) {
error_report("unable to restart vhost net virtqueue: %d, "
"when resetting the queue", queue_index);
}
}
}

Thanks

>
> > >
> > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > > > > >   1 file changed, 7 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > @@ -7,6 +7,7 @@
> > > > > > > >   #include 
> > > > > > > >   #include 
> > > > > > > >   #include 
> > > > > > > > +#include 
> > > > > > > >   #include 
> > > > > > > >   #include 
> > > > > > > >   #include 
> > > > > > > > @@ -2499,6 +2500,11 @@ static void 
> > > > > > > > unregister_link_notifier(struct mlx5_vdpa_net *ndev)
> > > > > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > > > > >   }
> > > > > > > > +static u64 mlx5_vdpa_get_backend_features(const struct 
> > > > > > > > vdpa_device *vdpa)
> > > > > > > > +{
> > > > > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device 
> > > > > > > > *vdev, u64 features)
> > > > > > > >   {
> > > > > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > > > > mlx5_vdpa_ops = {
> > > > > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > 

Re: [PATCH] mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK

2023-07-05 Thread Michael S. Tsirkin
On Wed, Jul 05, 2023 at 01:47:44PM +0800, Jason Wang wrote:
> On Wed, Jul 5, 2023 at 1:31 PM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jul 05, 2023 at 01:11:37PM +0800, Jason Wang wrote:
> > > On Tue, Jul 4, 2023 at 6:16 PM Michael S. Tsirkin  wrote:
> > > >
> > > > On Mon, Jul 03, 2023 at 05:26:02PM -0700, Si-Wei Liu wrote:
> > > > >
> > > > >
> > > > > On 7/3/2023 8:46 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jul 03, 2023 at 04:25:14PM +0200, Eugenio Pérez wrote:
> > > > > > > Offer this backend feature as mlx5 is compatible with it. It 
> > > > > > > allows it
> > > > > > > to do live migration with CVQ, dynamically switching between 
> > > > > > > passthrough
> > > > > > > and shadow virtqueue.
> > > > > > >
> > > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > Same comment.
> > > > > to which?
> > > > >
> > > > > -Siwei
> > > >
> > > > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is too narrow a use-case to 
> > > > commit to it
> > > > as a kernel/userspace ABI: what if one wants to start rings in some
> > > > other specific order?
> > >
> > > Just enable a queue by writing e.g 1 to queue_enable in a specific order?
> >
> >
> > But then at driver ok time we don't know how many queues are there.
> 
> There should be a device specific interface for this, for example,
> num_queue_paris. So the device knows at most how many queues there
> are. Or anything I miss?

That's a device limitations. Does not tell the device how much is used.

> >
> > > > As was discussed on list, a better promise is not to access ring
> > > > until the 1st kick. vdpa can then do a kick when it wants
> > > > the device to start accessing rings.
> > >
> > > Rethink about the ACCESS_AFTER_KICK, it sounds functional equivalent
> > > to allow queue_enable after DRIVER_OK, but it seems to have
> > > distanvages:
> > >
> > > A busy polling software device may disable notifications and never
> > > respond to register to any kick notifiers. ACCESS_AFTER_KICK will
> > > introduce overhead to those implementations.
> > >
> > > Thanks
> >
> > It's just the 1st kick, then you can disable. No?
> 
> Yes, but:
> 
> 1) adding hooks for queue_enable
> 2) adding new codes to register event notifier and toggle the notifier
> 
> 1) seems much easier? Or for most devices, it already behaves like this.
> 
> Thanks

Well libvhostuser checks enabled queues at DRIVER_OK, does it not?

> >
> > > >
> > > > > >
> > > > > > > ---
> > > > > > >   drivers/vdpa/mlx5/net/mlx5_vnet.c | 7 +++
> > > > > > >   1 file changed, 7 insertions(+)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 9138ef2fb2c8..5f309a16b9dc 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -7,6 +7,7 @@
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > > +#include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > >   #include 
> > > > > > > @@ -2499,6 +2500,11 @@ static void 
> > > > > > > unregister_link_notifier(struct mlx5_vdpa_net *ndev)
> > > > > > >   flush_workqueue(ndev->mvdev.wq);
> > > > > > >   }
> > > > > > > +static u64 mlx5_vdpa_get_backend_features(const struct 
> > > > > > > vdpa_device *vdpa)
> > > > > > > +{
> > > > > > > + return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
> > > > > > > +}
> > > > > > > +
> > > > > > >   static int mlx5_vdpa_set_driver_features(struct vdpa_device 
> > > > > > > *vdev, u64 features)
> > > > > > >   {
> > > > > > >   struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> > > > > > > @@ -3140,6 +3146,7 @@ static const struct vdpa_config_ops 
> > > > > > > mlx5_vdpa_ops = {
> > > > > > >   .get_vq_align = mlx5_vdpa_get_vq_align,
> > > > > > >   .get_vq_group = mlx5_vdpa_get_vq_group,
> > > > > > >   .get_device_features = mlx5_vdpa_get_device_features,
> > > > > > > + .get_backend_features = mlx5_vdpa_get_backend_features,
> > > > > > >   .set_driver_features = mlx5_vdpa_set_driver_features,
> > > > > > >   .get_driver_features = mlx5_vdpa_get_driver_features,
> > > > > > >   .set_config_cb = mlx5_vdpa_set_config_cb,
> > > > > > > --
> > > > > > > 2.39.3
> > > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 1:41 PM Liang Chen  wrote:
>
> On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  wrote:
> >
> > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> > >
> > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> > > >
> > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > > The implementation at the moment uses one page per packet 
> > > > > > > > > > in both the
> > > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > > parameter to enable
> > > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > > >
> > > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > > performance gain
> > > > > > > > > > in the normal path.
> > > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > > > performance
> > > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > > fragmentation and
> > > > > > > > > > DMA map/unmap support.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > > >
> > > > > > > > > Why off by default?
> > > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > > The less modes we have the better...
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > Sure, now I believe it makes sense to enable it by default. 
> > > > > > > > When the
> > > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > > coalescing. But such cases are rare.
> > > > > > >
> > > > > > > small packets are rare? These workloads are easy to create 
> > > > > > > actually.
> > > > > > > Pls try and include benchmark with small packet size.
> > > > > > >
> > > > > >
> > > > > > Sure, Thanks!
> > > > >
> > > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > > advice for the cases of small packets. I have done more performance
> > > > > benchmark with small packets since then. Here is a list of iperf
> > > > > output,
> > > > >
> > > > > With PP and PP fragmenting:
> > > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0
> > > > > 144 KBytes
> > > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > > 223 KBytes
> > > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > > 324 KBytes
> > > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > > 1.08 MBytes
> > > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > > 744 KBytes
> > > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0
> > > > > 963 KBytes
> > > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   
> > > > > 1.25 MBytes
> > > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   
> > > > > 1.70 MBytes
> > > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > > 4.26 MBytes
> > > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > > 3.20 MBytes
> > >
> > > Note that virtio-net driver is lacking things like BQL and others, so
> > > it might suffer from buffer bloat for TCP performance. Would you mind
> > > to measure with e.g using testpmd on the vhost to see the rx PPS?
> > >
> >
> > No problem. Before we proceed to measure with testpmd, could you
> > please take a look at the PPS measurements we obtained previously and
> > see if they are sufficient? Though we will only utilize page pool for
> > xdp on v2.
> >
> > netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
> >
> > with page pool:
> > 1.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 655092.27  0.35  27508.77  0.03
> > 0.00  0.00  0.00  0.00
> > 2.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 654749.87  0.63