RE: [PATCH V2 vfio 2/9] virtio-pci: Introduce admin virtqueue

2023-10-31 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Tuesday, October 31, 2023 1:29 PM
> 
> On Tue, Oct 31, 2023 at 03:11:57AM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Tuesday, October 31, 2023 5:02 AM
> > >
> > > On Mon, Oct 30, 2023 at 06:10:06PM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Monday, October 30, 2023 9:29 PM On Mon, Oct 30, 2023 at
> > > > > 03:51:40PM +, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: Michael S. Tsirkin 
> > > > > > > Sent: Monday, October 30, 2023 1:53 AM
> > > > > > >
> > > > > > > On Sun, Oct 29, 2023 at 05:59:45PM +0200, Yishai Hadas wrote:
> > > > > > > > From: Feng Liu 
> > > > > > > >
> > > > > > > > Introduce support for the admin virtqueue. By negotiating
> > > > > > > > VIRTIO_F_ADMIN_VQ feature, driver detects capability and
> > > > > > > > creates one administration virtqueue. Administration
> > > > > > > > virtqueue implementation in virtio pci generic layer,
> > > > > > > > enables multiple types of upper layer drivers such as vfio, 
> > > > > > > > net, blk
> to utilize it.
> > > > > > > >
> > > > > > > > Signed-off-by: Feng Liu 
> > > > > > > > Reviewed-by: Parav Pandit 
> > > > > > > > Reviewed-by: Jiri Pirko 
> > > > > > > > Signed-off-by: Yishai Hadas 
> > > > > > > > ---
> > > > > > > >  drivers/virtio/virtio.c| 37 ++--
> > > > > > > >  drivers/virtio/virtio_pci_common.c |  3 ++
> > > > > > > >  drivers/virtio/virtio_pci_common.h | 15 ++-
> > > > > > > >  drivers/virtio/virtio_pci_modern.c | 61
> > > +-
> > > > > > > >  drivers/virtio/virtio_pci_modern_dev.c | 18 
> > > > > > > >  include/linux/virtio_config.h  |  4 ++
> > > > > > > >  include/linux/virtio_pci_modern.h  |  5 +++
> > > > > > > >  7 files changed, 137 insertions(+), 6 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/virtio/virtio.c
> > > > > > > > b/drivers/virtio/virtio.c index
> > > > > > > > 3893dc29eb26..f4080692b351 100644
> > > > > > > > --- a/drivers/virtio/virtio.c
> > > > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > > > @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device
> *_d)
> > > > > > > > if (err)
> > > > > > > > goto err;
> > > > > > > >
> > > > > > > > +   if (dev->config->create_avq) {
> > > > > > > > +   err = dev->config->create_avq(dev);
> > > > > > > > +   if (err)
> > > > > > > > +   goto err;
> > > > > > > > +   }
> > > > > > > > +
> > > > > > > > err = drv->probe(dev);
> > > > > > > > if (err)
> > > > > > > > -   goto err;
> > > > > > > > +   goto err_probe;
> > > > > > > >
> > > > > > > > /* If probe didn't do it, mark device DRIVER_OK 
> > > > > > > > ourselves. */
> > > > > > > > if (!(dev->config->get_status(dev) &
> > > > > > > > VIRTIO_CONFIG_S_DRIVER_OK))
> > > > > > >
> > > > > > > Hmm I am not all that happy that we are just creating avq
> > > unconditionally.
> > > > > > > Can't we do it on demand to avoid wasting resources if no one uses
> it?
> > > > > > >
> > > > > > Virtio queues must be enabled before driver_ok as we discussed
> > > > > > in
> > > > > F_DYNAMIC bit exercise.
> > > > > > So creating AQ when first legacy command is invoked, would be too
> late.
> > > > >
> > > > > Well we didn't release the spec with AQ so I am pretty sure
> > > > > there are no devices using the feature. Do we want to already
> > > > > make an exception for AQ and allow creating AQs after DRIVER_OK
> > > > > even without
> > > F_DYNAMIC?
> > > > >
> > > > No. it would abuse the init time config registers for the dynamic
> > > > things like
> > > this.
> > > > For flow filters and others there is need for dynamic q creation
> > > > with multiple
> > > physical address anyway.
> > >
> > > That seems like a completely unrelated issue.
> > >
> > It isn't.
> > Driver requirements are:
> > 1. Driver needs to dynamically create vqs 2. Sometimes this VQ needs
> > to have multiple physical addresses 3. Driver needs to create them
> > after driver is fully running, past the bootstrap stage using tiny
> > config registers
> >
> > Device requirements are:
> > 1. Not to keep growing 64K VQs *(8+8+8) bytes of address registers +
> > enable bit 2. Ability to return appropriate error code when fail to
> > create queue 3. Above #2
> >
> > Users of this new infrastructure are eth tx,rx queues, flow filter queues, 
> > aq, blk
> rq per cpu.
> > AQs are just one of those.
> > When a generic infrastructure for this will be built in the spec as we 
> > started
> that, all above use cases will be handled.
> >
> > > > So creating virtqueues dynamically using a generic scheme is
> > > > desired with
> > > new feature bit.
> 
> Reducing config registers and returning errors should be handled by a new
> transport.
> VQ with multiple addresses - I can 

RE: [PATCH V2 vfio 2/9] virtio-pci: Introduce admin virtqueue

2023-10-30 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Tuesday, October 31, 2023 5:02 AM
> 
> On Mon, Oct 30, 2023 at 06:10:06PM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, October 30, 2023 9:29 PM On Mon, Oct 30, 2023 at
> > > 03:51:40PM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Monday, October 30, 2023 1:53 AM
> > > > >
> > > > > On Sun, Oct 29, 2023 at 05:59:45PM +0200, Yishai Hadas wrote:
> > > > > > From: Feng Liu 
> > > > > >
> > > > > > Introduce support for the admin virtqueue. By negotiating
> > > > > > VIRTIO_F_ADMIN_VQ feature, driver detects capability and
> > > > > > creates one administration virtqueue. Administration virtqueue
> > > > > > implementation in virtio pci generic layer, enables multiple
> > > > > > types of upper layer drivers such as vfio, net, blk to utilize it.
> > > > > >
> > > > > > Signed-off-by: Feng Liu 
> > > > > > Reviewed-by: Parav Pandit 
> > > > > > Reviewed-by: Jiri Pirko 
> > > > > > Signed-off-by: Yishai Hadas 
> > > > > > ---
> > > > > >  drivers/virtio/virtio.c| 37 ++--
> > > > > >  drivers/virtio/virtio_pci_common.c |  3 ++
> > > > > >  drivers/virtio/virtio_pci_common.h | 15 ++-
> > > > > >  drivers/virtio/virtio_pci_modern.c | 61
> +-
> > > > > >  drivers/virtio/virtio_pci_modern_dev.c | 18 
> > > > > >  include/linux/virtio_config.h  |  4 ++
> > > > > >  include/linux/virtio_pci_modern.h  |  5 +++
> > > > > >  7 files changed, 137 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > > > index
> > > > > > 3893dc29eb26..f4080692b351 100644
> > > > > > --- a/drivers/virtio/virtio.c
> > > > > > +++ b/drivers/virtio/virtio.c
> > > > > > @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
> > > > > > if (err)
> > > > > > goto err;
> > > > > >
> > > > > > +   if (dev->config->create_avq) {
> > > > > > +   err = dev->config->create_avq(dev);
> > > > > > +   if (err)
> > > > > > +   goto err;
> > > > > > +   }
> > > > > > +
> > > > > > err = drv->probe(dev);
> > > > > > if (err)
> > > > > > -   goto err;
> > > > > > +   goto err_probe;
> > > > > >
> > > > > > /* If probe didn't do it, mark device DRIVER_OK ourselves. */
> > > > > > if (!(dev->config->get_status(dev) &
> > > > > > VIRTIO_CONFIG_S_DRIVER_OK))
> > > > >
> > > > > Hmm I am not all that happy that we are just creating avq
> unconditionally.
> > > > > Can't we do it on demand to avoid wasting resources if no one uses it?
> > > > >
> > > > Virtio queues must be enabled before driver_ok as we discussed in
> > > F_DYNAMIC bit exercise.
> > > > So creating AQ when first legacy command is invoked, would be too late.
> > >
> > > Well we didn't release the spec with AQ so I am pretty sure there
> > > are no devices using the feature. Do we want to already make an
> > > exception for AQ and allow creating AQs after DRIVER_OK even without
> F_DYNAMIC?
> > >
> > No. it would abuse the init time config registers for the dynamic things 
> > like
> this.
> > For flow filters and others there is need for dynamic q creation with 
> > multiple
> physical address anyway.
> 
> That seems like a completely unrelated issue.
> 
It isn't.
Driver requirements are:
1. Driver needs to dynamically create vqs
2. Sometimes this VQ needs to have multiple physical addresses
3. Driver needs to create them after driver is fully running, past the 
bootstrap stage using tiny config registers

Device requirements are:
1. Not to keep growing 64K VQs *(8+8+8) bytes of address registers + enable bit
2. Ability to return appropriate error code when fail to create queue
3. Above #2

Users of this new infrastructure are eth tx,rx queues, flow filter queues, aq, 
blk rq per cpu.
AQs are just one of those.
When a generic infrastructure for this will be built in the spec as we started 
that, all above use cases will be handled.

> > So creating virtqueues dynamically using a generic scheme is desired with
> new feature bit.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V2 vfio 2/9] virtio-pci: Introduce admin virtqueue

2023-10-30 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Monday, October 30, 2023 9:29 PM
> On Mon, Oct 30, 2023 at 03:51:40PM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Monday, October 30, 2023 1:53 AM
> > >
> > > On Sun, Oct 29, 2023 at 05:59:45PM +0200, Yishai Hadas wrote:
> > > > From: Feng Liu 
> > > >
> > > > Introduce support for the admin virtqueue. By negotiating
> > > > VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates
> > > > one administration virtqueue. Administration virtqueue
> > > > implementation in virtio pci generic layer, enables multiple types
> > > > of upper layer drivers such as vfio, net, blk to utilize it.
> > > >
> > > > Signed-off-by: Feng Liu 
> > > > Reviewed-by: Parav Pandit 
> > > > Reviewed-by: Jiri Pirko 
> > > > Signed-off-by: Yishai Hadas 
> > > > ---
> > > >  drivers/virtio/virtio.c| 37 ++--
> > > >  drivers/virtio/virtio_pci_common.c |  3 ++
> > > >  drivers/virtio/virtio_pci_common.h | 15 ++-
> > > >  drivers/virtio/virtio_pci_modern.c | 61 +-
> > > >  drivers/virtio/virtio_pci_modern_dev.c | 18 
> > > >  include/linux/virtio_config.h  |  4 ++
> > > >  include/linux/virtio_pci_modern.h  |  5 +++
> > > >  7 files changed, 137 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index
> > > > 3893dc29eb26..f4080692b351 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
> > > > if (err)
> > > > goto err;
> > > >
> > > > +   if (dev->config->create_avq) {
> > > > +   err = dev->config->create_avq(dev);
> > > > +   if (err)
> > > > +   goto err;
> > > > +   }
> > > > +
> > > > err = drv->probe(dev);
> > > > if (err)
> > > > -   goto err;
> > > > +   goto err_probe;
> > > >
> > > > /* If probe didn't do it, mark device DRIVER_OK ourselves. */
> > > > if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> > >
> > > Hmm I am not all that happy that we are just creating avq unconditionally.
> > > Can't we do it on demand to avoid wasting resources if no one uses it?
> > >
> > Virtio queues must be enabled before driver_ok as we discussed in
> F_DYNAMIC bit exercise.
> > So creating AQ when first legacy command is invoked, would be too late.
> 
> Well we didn't release the spec with AQ so I am pretty sure there are no 
> devices
> using the feature. Do we want to already make an exception for AQ and allow
> creating AQs after DRIVER_OK even without F_DYNAMIC?
> 
No. it would abuse the init time config registers for the dynamic things like 
this.
For flow filters and others there is need for dynamic q creation with multiple 
physical address anyway.
So creating virtqueues dynamically using a generic scheme is desired with new 
feature bit.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V2 vfio 2/9] virtio-pci: Introduce admin virtqueue

2023-10-30 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Monday, October 30, 2023 1:53 AM
> 
> On Sun, Oct 29, 2023 at 05:59:45PM +0200, Yishai Hadas wrote:
> > From: Feng Liu 
> >
> > Introduce support for the admin virtqueue. By negotiating
> > VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
> > administration virtqueue. Administration virtqueue implementation in
> > virtio pci generic layer, enables multiple types of upper layer
> > drivers such as vfio, net, blk to utilize it.
> >
> > Signed-off-by: Feng Liu 
> > Reviewed-by: Parav Pandit 
> > Reviewed-by: Jiri Pirko 
> > Signed-off-by: Yishai Hadas 
> > ---
> >  drivers/virtio/virtio.c| 37 ++--
> >  drivers/virtio/virtio_pci_common.c |  3 ++
> >  drivers/virtio/virtio_pci_common.h | 15 ++-
> >  drivers/virtio/virtio_pci_modern.c | 61 +-
> >  drivers/virtio/virtio_pci_modern_dev.c | 18 
> >  include/linux/virtio_config.h  |  4 ++
> >  include/linux/virtio_pci_modern.h  |  5 +++
> >  7 files changed, 137 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index
> > 3893dc29eb26..f4080692b351 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
> > if (err)
> > goto err;
> >
> > +   if (dev->config->create_avq) {
> > +   err = dev->config->create_avq(dev);
> > +   if (err)
> > +   goto err;
> > +   }
> > +
> > err = drv->probe(dev);
> > if (err)
> > -   goto err;
> > +   goto err_probe;
> >
> > /* If probe didn't do it, mark device DRIVER_OK ourselves. */
> > if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
> 
> Hmm I am not all that happy that we are just creating avq unconditionally.
> Can't we do it on demand to avoid wasting resources if no one uses it?
>
Virtio queues must be enabled before driver_ok as we discussed in F_DYNAMIC bit 
exercise.
So creating AQ when first legacy command is invoked, would be too late.

> 
> > @@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
> > virtio_config_enable(dev);
> >
> > return 0;
> > +
> > +err_probe:
> > +   if (dev->config->destroy_avq)
> > +   dev->config->destroy_avq(dev);
> >  err:
> > virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> > return err;
> > @@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)
> >
> > drv->remove(dev);
> >
> > +   if (dev->config->destroy_avq)
> > +   dev->config->destroy_avq(dev);
> > +
> > /* Driver should have reset device. */
> > WARN_ON_ONCE(dev->config->get_status(dev));
> >
> > @@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
> >  int virtio_device_freeze(struct virtio_device *dev)  {
> > struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
> > +   int ret;
> >
> > virtio_config_disable(dev);
> >
> > dev->failed = dev->config->get_status(dev) &
> VIRTIO_CONFIG_S_FAILED;
> >
> > -   if (drv && drv->freeze)
> > -   return drv->freeze(dev);
> > +   if (drv && drv->freeze) {
> > +   ret = drv->freeze(dev);
> > +   if (ret)
> > +   return ret;
> > +   }
> > +
> > +   if (dev->config->destroy_avq)
> > +   dev->config->destroy_avq(dev);
> >
> > return 0;
> >  }
> > @@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
> > if (ret)
> > goto err;
> >
> > +   if (dev->config->create_avq) {
> > +   ret = dev->config->create_avq(dev);
> > +   if (ret)
> > +   goto err;
> > +   }
> > +
> > if (drv->restore) {
> > ret = drv->restore(dev);
> > if (ret)
> > -   goto err;
> > +   goto err_restore;
> > }
> >
> > /* If restore didn't do it, mark device DRIVER_OK ourselves. */ @@
> > -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)
> >
> > return 0;
> >
> > +err_restore:
> > +   if (dev->config->destroy_avq)
> > +   dev->config->destroy_avq(dev);
> >  err:
> > virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> > return ret;
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index c2524a7207cf..6b4766d5abe6 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -236,6 +236,9 @@ void vp_del_vqs(struct virtio_device *vdev)
> > int i;
> >
> > list_for_each_entry_safe(vq, n, >vqs, list) {
> > +   if (vp_dev->is_avq(vdev, vq->index))
> > +   continue;
> > +
> > if (vp_dev->per_vq_vectors) {
> > int v = vp_dev->vqs[vq->index]->msix_vector;
> >
> > diff --git a/drivers/virtio/virtio_pci_common.h
> > b/drivers/virtio/virtio_pci_common.h
> > index 4b773bd7c58c..e03af0966a4b 100644
> 

RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-26 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 26, 2023 9:16 PM

> On Thu, Oct 26, 2023 at 03:09:13PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Thursday, October 26, 2023 8:36 PM
> > >
> > > On Thu, Oct 26, 2023 at 01:28:18PM +, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Thursday, October 26, 2023 6:45 PM
> > > >
> > > > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > > > Do you mean windows driver expects specific subsystem vendor
> > > > > > id of
> > > 0x1af4?
> > > > >
> > > > > Look it up, it's open source.
> > > >
> > > > Those are not OS inbox drivers anyway.
> > > > :)
> > >
> > > Does not matter at all if guest has drivers installed.
> > > Either you worry about legacy guests or not.
> > >
> > So, Linux guests have inbox drivers, that we care about and they seems to be
> covered, right?
> >
> > >
> > > > The current vfio driver is following the virtio spec based on
> > > > legacy spec, 1.x
> > > spec following the transitional device sections.
> > > > There is no need to do something out of spec at this point.
> > >
> > > legacy spec wasn't maintained properly, drivers diverged sometimes
> > > significantly. what matters is installed base.
> >
> > So if you know the subsystem vendor id that Windows expects, please
> > share, so we can avoid playing puzzle game. :) It anyway can be reported by
> the device itself.
> 
> I don't know myself offhand. I just know it's not so simple. Looking at the 
> source
> for network drivers I see:
> 
> %kvmnet6.DeviceDesc%= kvmnet6.ndi,
> PCI\VEN_1AF4_1000_0001_INX_SUBSYS_VENDOR_ID_00,
> PCI\VEN_1AF4_1000
> 
Yeah, I was checking the cryptic notation at 
https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master

> 
> So the drivers will:
> A. bind with high priority to subsystem vendor ID used when drivers where 
> built.
>popular drivers built and distributed for free by Red Hat have 1AF4 B. bind
> with low priority to any subsystem device/vendor id as long as
>vendor is 1af4 and device is 1000
> 
> 
> My conclusions:
> - you probably need a way to tweak subsystem vendor id in software
> - default should probably be 1AF4 not whatever actual device uses
Ok.
It is not mandatory but, if you and Alex are ok to also tweak the subsystem 
vendor id, it seems fine to me. It does not hurt anything.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-26 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 26, 2023 8:36 PM
> 
> On Thu, Oct 26, 2023 at 01:28:18PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Thursday, October 26, 2023 6:45 PM
> >
> > > > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > > > Do you mean windows driver expects specific subsystem vendor id of
> 0x1af4?
> > >
> > > Look it up, it's open source.
> >
> > Those are not OS inbox drivers anyway.
> > :)
> 
> Does not matter at all if guest has drivers installed.
> Either you worry about legacy guests or not.
> 
So, Linux guests have inbox drivers, that we care about and they seems to be 
covered, right?

> 
> > The current vfio driver is following the virtio spec based on legacy spec, 
> > 1.x
> spec following the transitional device sections.
> > There is no need to do something out of spec at this point.
> 
> legacy spec wasn't maintained properly, drivers diverged sometimes
> significantly. what matters is installed base.

So if you know the subsystem vendor id that Windows expects, please share, so 
we can avoid playing puzzle game. :)
It anyway can be reported by the device itself.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-26 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 26, 2023 6:45 PM

> > Followed by an open coded driver check for 0x1000 to 0x103f range.
> > Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
> 
> Look it up, it's open source.

Those are not OS inbox drivers anyway.
:)
The current vfio driver is following the virtio spec based on legacy spec, 1.x 
spec following the transitional device sections.
There is no need to do something out of spec at this point.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-26 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Thursday, October 26, 2023 5:42 PM
> 
> On Thu, Oct 26, 2023 at 03:08:12PM +0300, Yishai Hadas wrote:
> > > > Makes sense ?
> > > So do I understand correctly that virtio dictates the subsystem
> > > device ID for all subsystem vendor IDs that implement a legacy
> > > virtio interface?  Ok, but this device didn't actually implement a
> > > legacy virtio interface.  The device itself is not tranistional,
> > > we're imposing an emulated transitional interface onto it.  So did
> > > the subsystem vendor agree to have their subsystem device ID managed
> > > by the virtio committee or might we create conflicts?  I imagine we
> > > know we don't have a conflict if we also virtualize the subsystem vendor 
> > > ID.
> > >
> > The non transitional net device in the virtio spec defined as the
> > below tuple.
> > T_A: VID=0x1AF4, DID=0x1040, Subsys_VID=FOO, Subsys_DID=0x40.
> >
> > And transitional net device in the virtio spec for a vendor FOO is
> > defined
> > as:
> > T_B: VID=0x1AF4,DID=0x1000,Subsys_VID=FOO, subsys_DID=0x1
> >
> > This driver is converting T_A to T_B, which both are defined by the
> > virtio spec.
> > Hence, it does not conflict for the subsystem vendor, it is fine.
> 
> You are talking about legacy guests, what 1.X spec says about them is much 
> less
> important than what guests actually do.
> Check the INF of the open source windows drivers and linux code, at least.

Linux legacy guest has,

static struct pci_device_id virtio_pci_id_table[] = {
{ 0x1af4, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 },
{ 0 },
};
Followed by an open coded driver check for 0x1000 to 0x103f range.
Do you mean windows driver expects specific subsystem vendor id of 0x1af4?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V1 vfio 9/9] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-18 Thread Parav Pandit via Virtualization


> From: Alex Williamson 
> Sent: Wednesday, October 18, 2023 6:22 PM

> Are we realistically extending this beyond virtio-net?  Maybe all the 
> descriptions
> should be limited to what is actually supported as proposed rather than
> aspirational goals.  Thanks,
Virtio blk would the second user of it.
The series didn't cover the test of virtio-blk mainly due to time limitation 
due to which Yishai only included the net device id.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 12, 2023 5:00 PM

> I am instead talking about devices that work with existing legacy linux 
> drivers
> with no traps.
> 
Yep, I understood.

> > I am not expecting OASIS to do anything extra for legacy registers.
> >
> > [1] The device MUST reset when 0 is written to device_status, and present a > > 0
> in device_status once that is done.
> > [2] After writing 0 to device_status, the driver MUST wait for a read
> > of device_status to return 0 before reinitializing the device.
> 
> We can add a note explaining that legacy drivers do not wait after doing 
> reset,
> that is not a problem.
> If someone wants to make a device that works with existing legacy linux 
> drivers,
> they can do that.
> Won't work with all drivers though, which is why oasis did not want to
> standardize this.

Ok. thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-12 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, October 12, 2023 4:23 PM
> 
> On Tue, Sep 26, 2023 at 03:45:36AM +, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Tuesday, September 26, 2023 12:06 AM
> >
> > > One can thinkably do that wait in hardware, though. Just defer
> > > completion until read is done.
> > >
> > Once OASIS does such new interface and if some hw vendor _actually_ wants
> to do such complex hw, may be vfio driver can adopt to it.
> 
> The reset behaviour I describe is already in the spec. What else do you want
> OASIS to standardize? Virtio currently is just a register map it does not yet
> include suggestions on how exactly do pci express transactions look. You feel 
> we
> should add that?

The reset behavior in the spec for modern as listed in [1] and [2] is just fine.

What I meant is in context of having MMIO based legacy registers to "defer 
completion until read is done".
I think you meant, "Just differ read completion, until reset is done".
This means the hw needs to finish the device reset for thousands of devices 
within the read completion timeout of the pci.
So when if OASIS does such standardization, someone can implement it.

What I recollect, is OASIS didn't not standardize such anti-scale approach and 
took the admin command approach which achieve better scale.
Hope I clarified.

I am not expecting OASIS to do anything extra for legacy registers.

[1] The device MUST reset when 0 is written to device_status, and present a 0 
in device_status once that is done.
[2] After writing 0 to device_status, the driver MUST wait for a read of 
device_status to return 0 before reinitializing
the device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-11 Thread Parav Pandit via Virtualization
Hi Christoph,

> From: Christoph Hellwig 
> Sent: Wednesday, October 11, 2023 12:29 PM
> 
> On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > > Btw, what is that intel thing everyone is talking about?  And why
> > > would the virtio core support vendor specific behavior like that?
> >
> > It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> > that implemented vdpa support and so Zhu Lingshan from intel is
> > working on vdpa and has also proposed virtio spec extensions for migration.
> > intel's driver is called ifcvf.  vdpa composes all this stuff that is
> > added to vfio in userspace, so it's a different approach.
> 
> Well, so let's call it virtio live migration instead of intel.
> 
> And please work all together in the virtio committee that you have one way of
> communication between controlling and controlled functions.
> If one extension does it one way and the other a different way that's just
> creating a giant mess.

We in virtio committee are working on VF device migration where:
VF = controlled function
PF = controlling function

The second proposal is what Michael mentioned from Intel that somehow combine 
controlled and controlling function as single entity on VF.

The main reasons I find it weird are:
1. it must always need to do mediation to do fake the device reset, and flr 
flows
2. dma cannot work as you explained for complex device state
3. it needs constant knowledge of each tiny things for each virtio device type

Such single entity appears a bit very weird to me but maybe it is just me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Parav Pandit via Virtualization


> From: Jason Gunthorpe 
> Sent: Tuesday, October 10, 2023 9:37 PM
> 
> On Tue, Oct 10, 2023 at 12:03:29PM -0400, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> > >
> > > > I suggest 3 but call it on the VF. commands will switch to PF
> > > > internally as needed. For example, intel might be interested in
> > > > exposing admin commands through a memory BAR of VF itself.

If in the future if one does admin command on the VF memory BAR, there is no 
need of cast either.
vfio-virtio-pci driver can do on the pci vf device directly.

(though per VF memory registers would be anti-scale design for real hw; to 
discuss in other forum).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Parav Pandit via Virtualization



> From: Yishai Hadas 
> Sent: Tuesday, October 10, 2023 9:14 PM
> 
> On 10/10/2023 18:14, Michael S. Tsirkin wrote:
> > On Tue, Oct 10, 2023 at 06:09:44PM +0300, Yishai Hadas wrote:
> >> On 10/10/2023 17:54, Michael S. Tsirkin wrote:
> >>> On Tue, Oct 10, 2023 at 11:08:49AM -0300, Jason Gunthorpe wrote:
>  On Tue, Oct 10, 2023 at 09:56:00AM -0400, Michael S. Tsirkin wrote:
> 
> >> However - the Intel GPU VFIO driver is such a bad experiance I
> >> don't want to encourage people to make VFIO drivers, or code that
> >> is only used by VFIO drivers, that are not under drivers/vfio review.
> > So if Alex feels it makes sense to add some virtio functionality
> > to vfio and is happy to maintain or let you maintain the UAPI then
> > why would I say no? But we never expected devices to have two
> > drivers like this does, so just exposing device pointer and saying
> > "use regular virtio APIs for the rest" does not cut it, the new
> > APIs have to make sense so virtio drivers can develop normally
> > without fear of stepping on the toes of this admin driver.
>  Please work with Yishai to get something that make sense to you. He
>  can post a v2 with the accumulated comments addressed so far and
>  then go over what the API between the drivers is.
> 
>  Thanks,
>  Jason
> >>> /me shrugs. I pretty much posted suggestions already. Should not be hard.
> >>> Anything unclear - post on list.
> >>>
> >> Yes, this is the plan.
> >>
> >> We are working to address the comments that we got so far in both
> >> VFIO & VIRTIO, retest and send the next version.
> >>
> >> Re the API between the modules, It looks like we have the below
> >> alternatives.
> >>
> >> 1) Proceed with current approach where we exposed a generic API to
> >> execute any admin command, however, make it much more solid inside
> VIRTIO.
> >> 2) Expose extra APIs from VIRTIO for commands that we can consider
> >> future client usage of them as of LIST_QUERY/LIST_USE, however still
> >> have the generic execute admin command for others.
> >> 3) Expose API per command from VIRTIO and fully drop the generic
> >> execute admin command.
> >>
> >> Few notes:
> >> Option #1 looks the most generic one, it drops the need to expose
> >> multiple symbols / APIs per command and for now we have a single
> >> client for them (i.e. VFIO).
> >> Options #2 & #3, may still require to expose the
> >> virtio_pci_vf_get_pf_dev() API to let VFIO get the VIRTIO PF (struct
> >> virtio_device *) from its PCI device, each command will get it as its first
> argument.
> >>
> >> Michael,
> >> What do you suggest here ?
> >>
> >> Thanks,
> >> Yishai
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in
> > exposing admin commands through a memory BAR of VF itself.
> >
> The driver who owns the VF is VFIO, it's not a VIRTIO one.
> 
> The ability to get the VIRTIO PF is from the PCI device (i.e. struct pci_dev).
> 
> In addition,
> virtio_pci_vf_get_pf_dev() was implemented for now in virtio-pci as it worked
> on pci_dev.
> Assuming that we'll put each command inside virtio as the generic layer, we
> won't be able to call/use this API internally to get the PF as of cyclic
> dependencies between the modules, link will fail.
> 
> So in option #3 we may still need to get outside into VFIO the VIRTIO PF and
> give it as pointer to VIRTIO upon each command.
>
I think,
For #3 the virtio level API signature should be,

virtio_admin_legacy_xyz_cmd(struct virtio_device *dev, u64 group_member_id, 
);

This maintains the right abstraction needed between vfio, generic virtio and 
virtio transport (pci) layer.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-02 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Friday, September 22, 2023 9:23 PM

> > +static int virtiovf_pci_probe(struct pci_dev *pdev,
> > + const struct pci_device_id *id) {
> > +   const struct vfio_device_ops *ops = _acc_vfio_pci_ops;
> > +   struct virtiovf_pci_core_device *virtvdev;
> > +   int ret;
> > +
> > +   if (pdev->is_virtfn && virtiovf_support_legacy_access(pdev) &&
> > +   !virtiovf_bar0_exists(pdev) && pdev->msix_cap)
> 
> I see this is the reason you set MSIX to true. But I think it's a 
> misunderstanding -
> that true means MSIX is enabled by guest, not that it exists.

Msix check here just looks a sanity check to make sure that guest can enable 
msix.
The msix enable check should be in the read()/write() calls to decide which AQ 
command to choose from, 
i.e. to access common config or device config as written in the virtio spec. 

Yishai please fix the read() write() calls to dynamically consider the offset 
of 24/20 based on the msix enabled state.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-10-01 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Tuesday, September 26, 2023 10:30 PM

> For example, a transitional device
> must not in theory be safely passed through to guest userspace, because guest
> then might try to use it through the legacy BAR without acknowledging
> ACCESS_PLATFORM.
> Do any guests check this and fail? Hard to say.
>
ACCESS_PLATFORM is not offered on the legacy interface because legacy interface 
spec 0.9.5 didn't have it.
Whether guest VM maps it to user space and using GIOVA is completely unknown to 
the device.
And all of this is just fine, because IOMMU through vfio takes care of 
necessary translation with/without mapping the transitional device to the guest 
user space.

Hence, it is not a compat problem.
Anyways, only those user will attach a virtio device to vfio-virtio device when 
user care to expose transitional device in guest.

I can see that in future, when user wants to do this optionally, a 
devlink/sysfs knob will be added, at that point, one needs to have a 
disable_transitional flag.
So it may be worth to optionally enable transitional support on user request as 
Michael suggested.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Parav Pandit via Virtualization



> From: Jason Wang 
> Sent: Tuesday, September 26, 2023 10:08 AM

> Right, so if we'd consider migration from virtio to vDPA, it needs to be 
> designed
> in a way that allows more involvement from hypervisor other than coupling it
> with a specific interface (like admin virtqueues).
It is not attached to the admin virtqueues.
One way to use it using admin commands at [1].
One can define without admin command by explaining the technical difficulties 
in admin command may/cannot work.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Parav Pandit via Virtualization



> From: Jason Wang 
> Sent: Tuesday, September 26, 2023 10:07 AM


> 
> If you can't find a way to make legacy drivers work, use modern.
>
Understood.
This vfio series make the legacy drivers work.
Thanks.
 
> That's it.
> 
> Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Parav Pandit via Virtualization



> From: Jason Wang 
> Sent: Tuesday, September 26, 2023 8:03 AM
> 
> It's the implementation details in legacy. The device needs to make sure 
> (reset)
> the driver can work (is done before get_status return).
It is part of the 0.9.5 and 1.x specification as I quoted those text above.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Tuesday, September 26, 2023 12:06 AM

> One can thinkably do that wait in hardware, though. Just defer completion 
> until
> read is done.
>
Once OASIS does such new interface and if some hw vendor _actually_ wants to do 
such complex hw, may be vfio driver can adopt to it.
When we worked with you, we discussed that there such hw does not have enough 
returns and hence technical committee choose to proceed with admin commands.
I will skip re-discussing all over it again here.

The current virto spec is delivering the best trade-offs of functionality, 
performance and light weight implementation with future forward path towards 
more features as Jason explained such as migration.
All with near zero driver, qemu and sw involvement for rapidly growing feature 
set...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-25 Thread Parav Pandit via Virtualization


> From: Jason Wang 
> Sent: Monday, September 25, 2023 8:00 AM
> 
> On Fri, Sep 22, 2023 at 8:25 PM Parav Pandit  wrote:
> >
> >
> > > From: Jason Gunthorpe 
> > > Sent: Friday, September 22, 2023 5:53 PM
> >
> >
> > > > And what's more, using MMIO BAR0 then it can work for legacy.
> > >
> > > Oh? How? Our team didn't think so.
> >
> > It does not. It was already discussed.
> > The device reset in legacy is not synchronous.
> 
> How do you know this?
>
Not sure the motivation of same discussion done in the OASIS with you and 
others in past.

Anyways, please find the answer below.

About reset,
The legacy device specification has not enforced below cited 1.0 driver 
requirement of 1.0.

"The driver SHOULD consider a driver-initiated reset complete when it reads 
device status as 0."
 
[1] https://ozlabs.org/~rusty/virtio-spec/virtio-0.9.5.pdf

> > The drivers do not wait for reset to complete; it was written for the sw
> backend.
> 
> Do you see there's a flush after reset in the legacy driver?
> 
Yes. it only flushes the write by reading it. The driver does not get _wait_ 
for the reset to complete within the device like above.

Please see the reset flow of 1.x device as below.
In fact the comment of the 1.x device also needs to be updated to indicate that 
driver need to wait for the device to finish the reset.
I will send separate patch for improving this comment of vp_reset() to match 
the spec.

static void vp_reset(struct virtio_device *vdev)
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
struct virtio_pci_modern_device *mdev = _dev->mdev;

/* 0 status means a reset. */
vp_modern_set_status(mdev, 0);
/* After writing 0 to device_status, the driver MUST wait for a read of
 * device_status to return 0 before reinitializing the device.
 * This will flush out the status write, and flush in device writes,
 * including MSI-X interrupts, if any.
 */
while (vp_modern_get_status(mdev))
msleep(1);
/* Flush pending VQ/configuration callbacks. */
vp_synchronize_vectors(vdev);
}


> static void vp_reset(struct virtio_device *vdev) {
> struct virtio_pci_device *vp_dev = to_vp_device(vdev);
> /* 0 status means a reset. */
> vp_legacy_set_status(_dev->ldev, 0);
> /* Flush out the status write, and flush in device writes,
>  * including MSi-X interrupts, if any. */
> vp_legacy_get_status(_dev->ldev);
> /* Flush pending VQ/configuration callbacks. */
> vp_synchronize_vectors(vdev);
> }
> 
> Thanks
> 
> 
> 
> > Hence MMIO BAR0 is not the best option in real implementations.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-22 Thread Parav Pandit via Virtualization


> From: Jason Gunthorpe 
> Sent: Friday, September 22, 2023 6:07 PM
> 
> On Thu, Sep 21, 2023 at 01:58:32PM -0600, Alex Williamson wrote:
> 
> > If the heart of this driver is simply pretending to have an I/O BAR
> > where I/O accesses into that BAR are translated to accesses in the
> > MMIO BAR, why can't this be done in the VMM, ie. QEMU?
> 
> That isn't exactly what it does, the IO bar access is translated into an admin
> queue command on the PF and excuted by the PCI function.
> 
> So it would be difficult to do that in qemu without also somehow wiring up
> qemu to access the PF's kernel driver's admin queue.
> 
> It would have been nice if it was a trivial 1:1 translation to the MMIO bar, 
> but it
> seems that didn't entirely work with existing VMs. So OASIS standardized this
> approach.
> 
> The bigger picture is there is also a live migration standard & driver in the
> works that will re-use all this admin queue infrastructure anyhow, so the best
> course is to keep this in the kernel.

Additionally in the future the AQ of the PF will also be used to provision the 
VFs (virtio OASIS calls them member devices), such framework also resides in 
the kernel.
Such PFs are in use by the kernel driver.

+1 for keeping this framework in the kernel.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-22 Thread Parav Pandit via Virtualization


> From: Jason Gunthorpe 
> Sent: Friday, September 22, 2023 5:53 PM


> > And what's more, using MMIO BAR0 then it can work for legacy.
> 
> Oh? How? Our team didn't think so.

It does not. It was already discussed.
The device reset in legacy is not synchronous.
The drivers do not wait for reset to complete; it was written for the sw 
backend.
Hence MMIO BAR0 is not the best option in real implementations.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-21 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Thursday, September 21, 2023 10:31 PM

> Another question I'm interested in is whether there's actually a performance
> benefit to using this as compared to just software vhost. I note there's a VM 
> exit
> on each IO access, so ... perhaps?
> Would be nice to see some numbers.

Packet rate and bandwidth are close are only 10% lower than modern device due 
to the batching of driver notification.
Bw tested with iperf with one and multiple queues. 
Packet rate tested with testpmd.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization

2023-05-02 Thread Parav Pandit via Virtualization


> From: Feng Liu 
> Sent: Tuesday, May 2, 2023 8:35 PM

> Issue: 3383038
Remove this internal garbage.

> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info")
> Signed-off-by: Feng Liu 
> Reviewed-by: William Tu 
> Reviewed-by: Parav Pandit 
> Reviewed-by: Simon Horman 
> Acked-by: Michael S. Tsirkin 

> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f
Remove this internal garbage.
Please run a local script not to forward the above garbage in upstream patches.

You are missing the changelog from v0->v1->v2->v3.
Please add and resend with above things removed and with changelog.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/1] posted interrtups support for mlx5

2023-04-03 Thread Parav Pandit via Virtualization




On 4/3/2023 12:20 PM, Eli Cohen wrote:

Hi,

the single patch in this series adds support for posted interrupts in
mlx5.

It depends on the patch series already accpted can be seen here:
https://patchwork.kernel.org/project/netdevbpf/patch/20230324231341.29808-1-sa...@kernel.org/

git pull  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git
tags/mlx5-updates-2023-03-20

Eli Cohen (1):
   vdpa/mlx5: Support interrupt bypassing

  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
  2 files changed, 144 insertions(+), 9 deletions(-)


In subject

s/interrtups/interrupts/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/1] vdpa/mlx5: Support interrupt bypassing

2023-04-03 Thread Parav Pandit via Virtualization




On 4/3/2023 12:20 PM, Eli Cohen wrote:

Add support for generation of interrupts from the device directly to the
VM to the VCPU thus avoiding the overhead on the host CPU.

When supported, the driver will attempt to allocate vectors for each
data virtqueue. If a vector for a virtqueue cannot be provided it will
use the QP mode where notifications go through the driver.

In addition, we add a shutdown callback to make sure allocated
interrupts are released in case of shutdown to allow clean shutdown.

Signed-off-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
  drivers/vdpa/mlx5/net/mlx5_vnet.c | 139 --
  drivers/vdpa/mlx5/net/mlx5_vnet.h |  14 +++
  2 files changed, 144 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 520646ae7fa0..215a46cf8a98 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -83,6 +83,7 @@ struct mlx5_vq_restore_info {
u64 driver_addr;
u16 avail_index;
u16 used_index;
+   struct msi_map map;
bool ready;
bool restore;
  };
@@ -118,6 +119,7 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+   struct msi_map map;
  
  	/* keep last in the struct */

struct mlx5_vq_restore_info ri;
@@ -792,6 +794,13 @@ static bool counters_supported(const struct mlx5_vdpa_dev 
*mvdev)
   BIT_ULL(MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS);
  }
  
+static bool msix_mode_supported(struct mlx5_vdpa_dev *mvdev)

+{
Better to have const struct. It makes it clear that it is read only API 
and there is no accidental modification of hca cap or other fields.



+   return (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, event_mode) &
+   (1 << MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE) &&
+   pci_msix_can_alloc_dyn(mvdev->mdev->pdev));
+}
+
  static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  {
int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
@@ -829,9 +838,15 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, 
struct mlx5_vdpa_virtque
if (vq_is_tx(mvq->index))
MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, 
ndev->res.tisn);
  
-	MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);

+   if (mvq->map.virq) {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_MSIX_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->map.index);
+   } else {
+   MLX5_SET(virtio_q, vq_ctx, event_mode, 
MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, 
mvq->fwqp.mqp.qpn);
+   }
+
MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
-   MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
 !!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_F_VERSION_1)));
@@ -1174,6 +1189,32 @@ static void counter_set_dealloc(struct mlx5_vdpa_net 
*ndev, struct mlx5_vdpa_vir
mlx5_vdpa_warn(>mvdev, "dealloc counter set 0x%x\n", 
mvq->counter_set_id);
  }
  
+static void alloc_vector(struct mlx5_vdpa_net *ndev,

+struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = >irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++) {
+   if (!irqp->entries[i].usecount) {

The usage appears to be boolean.
So better to rename it from usecount to -> used.
used = true/false;

In future, if you plan to reuse the same vector, may be at that point 
usecount makes more sense.



+   irqp->entries[i].usecount++;
+   mvq->map = irqp->entries[i].map;
+   return;
+   }
+   }
+}
+
+static void dealloc_vector(struct mlx5_vdpa_net *ndev,
+  struct mlx5_vdpa_virtqueue *mvq)
+{
+   struct mlx5_vdpa_irq_pool *irqp = >irqp;
+   int i;
+
+   for (i = 0; i < irqp->num_ent; i++)
+   if (mvq->map.virq == irqp->entries[i].map.virq)
+   irqp->entries[i].usecount--;
You should add return here too like alloc to not go over rest of the 
entries once a specific vq is taken care.



+}
+
  static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue 
*mvq)
  {
u16 idx = mvq->index;
@@ -1203,27 +1244,31 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct 
mlx5_vdpa_virtqueue *mvq)
  
  	err = counter_set_alloc(ndev, mvq);

if (err)
-   goto err_counter;
+   goto err_connect;
  
+	alloc_vector(ndev, mvq);

err = create_virtqueue(ndev, mvq);
if (err)
-   goto err_connect;
+   goto err_vq;
  
  	if (mvq->ready) {

  

RE: [PATCH v2 1/3] virtio_pci_modern: Allow non power of 2 sizes for virtqueues

2023-03-10 Thread Parav Pandit via Virtualization



> From: Feng Liu 
> Sent: Friday, March 10, 2023 12:34 AM

> 
> - if (!is_power_of_2(num)) {
> - dev_warn(_dev->pci_dev->dev, "bad queue size %u",
> num);
> - return ERR_PTR(-EINVAL);
> - }
> -

The check is still valid for split q.
Maybe the right place for such a check is not the pci transport driver.
But layer below where split vs packed q knowledge resides and hence, power of 2 
check can be omitted for packed vq.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [patch net-next v2] net: virtio_net: implement exact header length guest feature

2023-02-21 Thread Parav Pandit via Virtualization



> From: Jiri Pirko 
> Sent: Tuesday, February 21, 2023 9:48 AM
> 
> From: Jiri Pirko 
> 
> Virtio spec introduced a feature VIRTIO_NET_F_GUEST_HDRLEN which when
> set implicates that the driver provides the exact size of the header.
> 
> Quoting the original virtio spec:
> "hdr_len is a hint to the device as to how much of the header needs to  be 
> kept
> to copy into each packet"
> 
> "a hint" might not be clear for the reader what does it mean, if it is "maybe 
> like
> that" of "exactly like that". This feature just makes it crystal clear and 
> let the
> device count on the hdr_len being filled up by the exact length of header.
> 
> Also note the spec already has following note about hdr_len:
> "Due to various bugs in implementations, this field is not useful  as a 
> guarantee
> of the transport header size."
> 
> Without this feature the device needs to parse the header in core data path
> handling. Accurate information helps the device to eliminate such header
> parsing and directly use the hardware accelerators for GSO operation.
> 
> virtio_net_hdr_from_skb() fills up hdr_len to skb_headlen(skb).
> The driver already complies to fill the correct value. Introduce the feature 
> and
> advertise it.
> 
> Note that virtio spec also includes following note for device
> implementation:
> "Caution should be taken by the implementation so as to prevent  a malicious
> driver from attacking the device by setting  an incorrect hdr_len."
> 
> There is a plan to support this feature in our emulated device.
> A device of SolidRun offers this feature bit. They claim this feature will 
> save the
> device a few cycles for every GSO packet.
> 
> Signed-off-by: Jiri Pirko 

This also saves a few cycles of L2 to L4 header parsing on every GSO stream for 
virtio PCI PF and VF devices that we have.

> ---
> v1->v2:
> - extended patch description
> ---
>  drivers/net/virtio_net.c| 6 --
>  include/uapi/linux/virtio_net.h | 1 +
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> fb5e68ed3ec2..e85b03988733 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -62,7 +62,8 @@ static const unsigned long guest_offloads[] = {
>   VIRTIO_NET_F_GUEST_UFO,
>   VIRTIO_NET_F_GUEST_CSUM,
>   VIRTIO_NET_F_GUEST_USO4,
> - VIRTIO_NET_F_GUEST_USO6
> + VIRTIO_NET_F_GUEST_USO6,
> + VIRTIO_NET_F_GUEST_HDRLEN
>  };
> 
>  #define GUEST_OFFLOAD_GRO_HW_MASK ((1ULL <<
> VIRTIO_NET_F_GUEST_TSO4) | \ @@ -4213,7 +4214,8 @@ static struct
> virtio_device_id id_table[] = {
>   VIRTIO_NET_F_CTRL_MAC_ADDR, \
>   VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_STANDBY, \
> - VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT,
> VIRTIO_NET_F_NOTF_COAL
> + VIRTIO_NET_F_RSS, VIRTIO_NET_F_HASH_REPORT,
> VIRTIO_NET_F_NOTF_COAL, \
> + VIRTIO_NET_F_GUEST_HDRLEN
> 
>  static unsigned int features[] = {
>   VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index b4062bed186a..12c1c9699935 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -61,6 +61,7 @@
>  #define VIRTIO_NET_F_GUEST_USO6  55  /* Guest can handle USOv6 in.
> */
>  #define VIRTIO_NET_F_HOST_USO56  /* Host can handle USO in. */
>  #define VIRTIO_NET_F_HASH_REPORT  57 /* Supports hash report */
> +#define VIRTIO_NET_F_GUEST_HDRLEN  59/* Guest provides the exact
> hdr_len value. */
>  #define VIRTIO_NET_F_RSS   60/* Supports RSS RX steering */
>  #define VIRTIO_NET_F_RSC_EXT   61/* extended coalescing info */
>  #define VIRTIO_NET_F_STANDBY   62/* Act as standby for another
> device
> --
> 2.39.0

Reviewed-by: Parav Pandit 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] virtio-net: Maintain reverse cleanup order

2023-02-03 Thread Parav Pandit via Virtualization
To easily audit the code, better to keep the device stop()
sequence to be mirror of the device open() sequence.

Acked-by: Michael S. Tsirkin 
Reviewed-by: Jiri Pirko 
Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b7d0b54c3bb0..1f8168e0f64d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2279,9 +2279,9 @@ static int virtnet_close(struct net_device *dev)
cancel_delayed_work_sync(>refill);
 
for (i = 0; i < vi->max_queue_pairs; i++) {
+   virtnet_napi_tx_disable(>sq[i].napi);
napi_disable(>rq[i].napi);
xdp_rxq_info_unreg(>rq[i].xdp_rxq);
-   virtnet_napi_tx_disable(>sq[i].napi);
}
 
return 0;
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 3/7] vdpa: validate provisioned device features against specified attribute

2023-02-02 Thread Parav Pandit via Virtualization


> From: Si-Wei Liu 
> Sent: Thursday, February 2, 2023 4:59 PM
> 
> On 2/1/2023 9:05 PM, Parav Pandit wrote:
> >
> >> From: Si-Wei Liu 
> >> Sent: Tuesday, January 31, 2023 6:22 PM
> >>
> >> With device feature provisioning, there's a chance for
> >> misconfiguration that the vdpa feature attribute supplied in 'vdpa
> >> dev add' command doesn't get selected on the device_features to be
> >> provisioned. For instance, when a @mac attribute is specified, the
> >> corresponding feature bit _F_MAC in device_features should be set for
> >> consistency. If there's conflict on provisioned features against the 
> >> attribute,
> it should be treated as an error to fail the ambiguous command.
> >> Noted the opposite is not necessarily true, for e.g. it's okay to
> >> have _F_MAC set in device_features without providing a corresponding
> >> @mac attribute, in which case the vdpa vendor driver could load
> >> certain default value for attribute that is not explicitly specified.
> >>
> >> Generalize this check in vdpa core so that there's no duplicate code
> >> in each vendor driver.
> >>
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   drivers/vdpa/vdpa.c | 18 ++
> >>   1 file changed, 18 insertions(+)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> 21c8aa3..1eba978
> >> 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -601,8 +601,26 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct
> >> sk_buff *skb, struct genl_info *i
> >>config.mask |=
> >> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
> >>}
> >>if (nl_attrs[VDPA_ATTR_DEV_FEATURES]) {
> >> +  u64 missing = 0x0ULL;
> >> +
> >>config.device_features =
> >>nla_get_u64(nl_attrs[VDPA_ATTR_DEV_FEATURES]);
> >> +  if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR] &&
> >> +  !(config.device_features & BIT_ULL(VIRTIO_NET_F_MAC)))
> >> +  missing |= BIT_ULL(VIRTIO_NET_F_MAC);
> >> +  if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU] &&
> >> +  !(config.device_features & BIT_ULL(VIRTIO_NET_F_MTU)))
> >> +  missing |= BIT_ULL(VIRTIO_NET_F_MTU);
> >> +  if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP] &&
> >> +  config.net.max_vq_pairs > 1 &&
> >> +  !(config.device_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> >> +  missing |= BIT_ULL(VIRTIO_NET_F_MQ);
> >> +  if (missing) {
> >> +  NL_SET_ERR_MSG_FMT_MOD(info->extack,
> >> + "Missing features 0x%llx for
> >> provided attributes",
> >> + missing);
> >> +  return -EINVAL;
> >> +  }
> >>config.mask |= BIT_ULL(VDPA_ATTR_DEV_FEATURES);
> >>}
> >>
> >> --
> >> 1.8.3.1
> > Vdpa this layer can likely derive the feature bits for the supplied config 
> > fields
> so that user doesn't need to keep track of both.
> > Only those feature bits which are unrelated to any config, is what user 
> > should
> be setting.
> It's not I can't do this, but Jason wanted to have clear semantics around
> migration compatibility for the driver, and for that users have to explicitly
> provide device_features that we may define new driver behavior (rather that
> inheritance which is implicit and not uniformly define across drivers) for
> compatibility using the new uAPI.
Make sense to explicitly tell, just requires more careful plumbing on the user 
space side.
Eventually it will get orchestrated by non user, so it should be fine to 
explicitly define it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net] virtio-net: Keep stop() to follow mirror sequence of open()

2023-02-02 Thread Parav Pandit via Virtualization
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.

Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.

Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info")
Acked-by: Michael S. Tsirkin 
Reviewed-by: Jiri Pirko 
Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7e1a98430190..b7d0b54c3bb0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2279,8 +2279,8 @@ static int virtnet_close(struct net_device *dev)
cancel_delayed_work_sync(>refill);
 
for (i = 0; i < vi->max_queue_pairs; i++) {
-   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
napi_disable(>rq[i].napi);
+   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
virtnet_napi_tx_disable(>sq[i].napi);
}
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] virtio-net: Maintain reverse cleanup order

2023-02-02 Thread Parav Pandit via Virtualization


> From: Jiri Pirko 
> Sent: Thursday, February 2, 2023 10:47 AM
> 
> Thu, Feb 02, 2023 at 04:10:56PM CET, pa...@nvidia.com wrote:
> >
> >> From: Jiri Pirko 
> >> Sent: Thursday, February 2, 2023 7:26 AM
> >>
> >> Thu, Feb 02, 2023 at 06:00:38AM CET, pa...@nvidia.com wrote:
> >> >To easily audit the code, better to keep the device stop() sequence
> >> >to be mirror of the device open() sequence.
> >> >
> >> >Signed-off-by: Parav Pandit 
> >>
> >> Reviewed-by: Jiri Pirko 
> >>
> >> If this is not fixing bug (which I believe is the case), you should
> >> target it to net- next ([patch net-next] ..).
> >>
> >Yes. Right. First one was fix for net-rc, second was for net-next. And 2nd
> depends on the first to avoid merge conflicts.
> >So, I was unsure how to handle it.
> >Can you please suggest?
> 
> 1) Send the fix to -net
> 2) Wait until -net is merged into -net-next
> 3) Send the second patch to -net-next

Got it. Thanks.

Dave, Jakub,
Please drop this series.
I am sending one by one to net and net-next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] virtio-net: Maintain reverse cleanup order

2023-02-02 Thread Parav Pandit via Virtualization


> From: Jiri Pirko 
> Sent: Thursday, February 2, 2023 7:26 AM
> 
> Thu, Feb 02, 2023 at 06:00:38AM CET, pa...@nvidia.com wrote:
> >To easily audit the code, better to keep the device stop() sequence to
> >be mirror of the device open() sequence.
> >
> >Signed-off-by: Parav Pandit 
> 
> Reviewed-by: Jiri Pirko 
> 
> If this is not fixing bug (which I believe is the case), you should target it 
> to net-
> next ([patch net-next] ..).
> 
Yes. Right. First one was fix for net-rc, second was for net-next. And 2nd 
depends on the first to avoid merge conflicts.
So, I was unsure how to handle it.
Can you please suggest?


> 
> >---
> > drivers/net/virtio_net.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> >b7d0b54c3bb0..1f8168e0f64d 100644
> >--- a/drivers/net/virtio_net.c
> >+++ b/drivers/net/virtio_net.c
> >@@ -2279,9 +2279,9 @@ static int virtnet_close(struct net_device *dev)
> > cancel_delayed_work_sync(>refill);
> >
> > for (i = 0; i < vi->max_queue_pairs; i++) {
> >+virtnet_napi_tx_disable(>sq[i].napi);
> > napi_disable(>rq[i].napi);
> > xdp_rxq_info_unreg(>rq[i].xdp_rxq);
> >-virtnet_napi_tx_disable(>sq[i].napi);
> > }
> >
> > return 0;
> >--
> >2.26.2
> >
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 3/7] vdpa: validate provisioned device features against specified attribute

2023-02-01 Thread Parav Pandit via Virtualization



> From: Si-Wei Liu 
> Sent: Tuesday, January 31, 2023 6:22 PM
> 
> With device feature provisioning, there's a chance for misconfiguration that 
> the
> vdpa feature attribute supplied in 'vdpa dev add' command doesn't get
> selected on the device_features to be provisioned. For instance, when a @mac
> attribute is specified, the corresponding feature bit _F_MAC in 
> device_features
> should be set for consistency. If there's conflict on provisioned features 
> against
> the attribute, it should be treated as an error to fail the ambiguous command.
> Noted the opposite is not necessarily true, for e.g. it's okay to have _F_MAC 
> set
> in device_features without providing a corresponding @mac attribute, in which
> case the vdpa vendor driver could load certain default value for attribute 
> that is
> not explicitly specified.
> 
> Generalize this check in vdpa core so that there's no duplicate code in each
> vendor driver.
> 
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/vdpa.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 21c8aa3..1eba978
> 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -601,8 +601,26 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct
> sk_buff *skb, struct genl_info *i
>   config.mask |=
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP);
>   }
>   if (nl_attrs[VDPA_ATTR_DEV_FEATURES]) {
> + u64 missing = 0x0ULL;
> +
>   config.device_features =
>   nla_get_u64(nl_attrs[VDPA_ATTR_DEV_FEATURES]);
> + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR] &&
> + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MAC)))
> + missing |= BIT_ULL(VIRTIO_NET_F_MAC);
> + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU] &&
> + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MTU)))
> + missing |= BIT_ULL(VIRTIO_NET_F_MTU);
> + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP] &&
> + config.net.max_vq_pairs > 1 &&
> + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MQ)))
> + missing |= BIT_ULL(VIRTIO_NET_F_MQ);
> + if (missing) {
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"Missing features 0x%llx for
> provided attributes",
> +missing);
> + return -EINVAL;
> + }
>   config.mask |= BIT_ULL(VDPA_ATTR_DEV_FEATURES);
>   }
> 
> --
> 1.8.3.1
Vdpa this layer can likely derive the feature bits for the supplied config 
fields so that user doesn't need to keep track of both.
Only those feature bits which are unrelated to any config, is what user should 
be setting.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 0/2] virtio-net: close() to follow mirror of open()

2023-02-01 Thread Parav Pandit via Virtualization
Hi,

This two small patches improves ndo_close() callback to follow
the mirror sequence of ndo_open() callback. This improves the code auditing
and also ensure that xdp rxq info is not unregistered while NAPI on
RXQ is ongoing.

Please review.

Patch summary:
patch-1 ensures that xdp rq info is unregistered after rq napi is disabled
patch-2 keeps the mirror sequence for close() be mirror of open()

Parav Pandit (2):
  virtio-net: Keep stop() to follow mirror sequence of open()
  virtio-net: Maintain reverse cleanup order

 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/2] virtio-net: Keep stop() to follow mirror sequence of open()

2023-02-01 Thread Parav Pandit via Virtualization
Cited commit in fixes tag frees rxq xdp info while RQ NAPI is
still enabled and packet processing may be ongoing.

Follow the mirror sequence of open() in the stop() callback.
This ensures that when rxq info is unregistered, no rx
packet processing is ongoing.

Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info")
Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7e1a98430190..b7d0b54c3bb0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2279,8 +2279,8 @@ static int virtnet_close(struct net_device *dev)
cancel_delayed_work_sync(>refill);
 
for (i = 0; i < vi->max_queue_pairs; i++) {
-   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
napi_disable(>rq[i].napi);
+   xdp_rxq_info_unreg(>rq[i].xdp_rxq);
virtnet_napi_tx_disable(>sq[i].napi);
}
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 2/7] vdpa: conditionally read STATUS in config space

2023-02-01 Thread Parav Pandit via Virtualization



> From: Si-Wei Liu 
> Sent: Tuesday, January 31, 2023 6:22 PM
> 
> The spec says:
> status only exists if VIRTIO_NET_F_STATUS is set
> 
> Similar to MAC and MTU, vdpa_dev_net_config_fill() should read STATUS
> conditionally depending on the feature bits.
> 
> Signed-off-by: Si-Wei Liu 
> ---
>  drivers/vdpa/vdpa.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> 3a82ca78..21c8aa3 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -843,18 +843,25 @@ static int vdpa_dev_net_mac_config_fill(struct
> sk_buff *msg, u64 features,
>   sizeof(config->mac), config->mac);
>  }
> 
> +static int vdpa_dev_net_status_config_fill(struct sk_buff *msg, u64 features,
> +const struct virtio_net_config
> *config) {
> + u16 val_u16;
> +
> + if ((features & BIT_ULL(VIRTIO_NET_F_STATUS)) == 0)
> + return 0;
> +
> + val_u16 = __virtio16_to_cpu(true, config->status);
> + return nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16); }
> +
>  static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff
> *msg)  {
>   struct virtio_net_config config = {};
>   u64 features_device;
> - u16 val_u16;
> 
>   vdev->config->get_config(vdev, 0, , sizeof(config));
> 
> - val_u16 = __virtio16_to_cpu(true, config.status);
> - if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
> - return -EMSGSIZE;
> -
>   features_device = vdev->config->get_device_features(vdev);
> 
>   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES,
> features_device, @@ -867,6 +874,9 @@ static int
> vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms
>   if (vdpa_dev_net_mac_config_fill(msg, features_device, ))
>   return -EMSGSIZE;
> 
> + if (vdpa_dev_net_status_config_fill(msg, features_device, ))
> + return -EMSGSIZE;
> +
>   return vdpa_dev_net_mq_config_fill(msg, features_device, );  }
> 
> --
> 1.8.3.1

Reviewed-by: Parav Pandit 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 1/7] vdpa: fix improper error message when adding vdpa dev

2023-02-01 Thread Parav Pandit via Virtualization


> From: Si-Wei Liu 
> Sent: Tuesday, January 31, 2023 6:22 PM
> 
> In below example, before the fix, mtu attribute is supported by the parent
> mgmtdev, but the error message showing "All provided are not supported" is
> just misleading.
> 
> $ vdpa mgmtdev show
> vdpasim_net:
>   supported_classes net
>   max_supported_vqs 3
>   dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1
> ACCESS_PLATFORM
> 
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: All provided attributes are not supported.
> kernel answers: Operation not supported
> 
> After fix, the relevant error message will be like:
> 
> $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2
> Error: vdpa: Some provided attributes are not supported: 0x1000.
> kernel answers: Operation not supported
> 
> Signed-off-by: Si-Wei Liu 
Please add fixes tag so that older kernel gets this fix.
With that change,
Reviewed-by: Parav Pandit 

> ---
>  drivers/vdpa/vdpa.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 8ef7aa1..3a82ca78
> 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -622,9 +622,11 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct
> sk_buff *skb, struct genl_info *i
>   err = PTR_ERR(mdev);
>   goto err;
>   }
> +
>   if ((config.mask & mdev->config_attr_mask) != config.mask) {
> - NL_SET_ERR_MSG_MOD(info->extack,
> -"All provided attributes are not supported");
> + NL_SET_ERR_MSG_FMT_MOD(info->extack,
> +"Some provided attributes are not
> supported: 0x%llx",
> +config.mask & ~mdev->config_attr_mask);
>   err = -EOPNOTSUPP;
>   goto err;
>   }
> --
> 1.8.3.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next] virtio-net: Reduce debug name field size to 16 bytes

2023-01-22 Thread Parav Pandit via Virtualization
virtio queue index can be maximum of 65535. 16 bytes are enough to store
the vq name with the existing string prefix.

With this change, send queue struct saves 24 bytes and receive
queue saves whole cache line worth 64 bytes per structure
due to saving in alignment bytes.

Pahole results before:

pahole -s drivers/net/virtio_net.o | \
grep -e "send_queue" -e "receive_queue"
send_queue  11120
receive_queue   12801

Pahole results after:
pahole -s drivers/net/virtio_net.o | \
grep -e "send_queue" -e "receive_queue"
send_queue  10880
receive_queue   12161

Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a170b0075dcf..3855b8524300 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -134,7 +134,7 @@ struct send_queue {
struct scatterlist sg[MAX_SKB_FRAGS + 2];
 
/* Name of the send queue: output.$index */
-   char name[40];
+   char name[16];
 
struct virtnet_sq_stats stats;
 
@@ -171,7 +171,7 @@ struct receive_queue {
unsigned int min_buf_len;
 
/* Name of this receive queue: input.$index */
-   char name[40];
+   char name[16];
 
struct xdp_rxq_info xdp_rxq;
 };
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH net-next v2] virtio_net: Reuse buffer free function

2023-01-16 Thread Parav Pandit via Virtualization



> From: Michael S. Tsirkin 
> Sent: Monday, January 16, 2023 5:13 PM
> 
> On Mon, Jan 16, 2023 at 10:27:08PM +0200, Parav Pandit wrote:
> > virtnet_rq_free_unused_buf() helper function to free the buffer
> > already exists. Avoid code duplication by reusing existing function.
> >
> > Reviewed-by: Alexander Duyck 
> > Reviewed-by: Xuan Zhuo 
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/net/virtio_net.c | 8 +---
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> > 7723b2a49d8e..31d037df514f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1251,13 +1251,7 @@ static void receive_buf(struct virtnet_info *vi,
> struct receive_queue *rq,
> > if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> > pr_debug("%s: short packet %i\n", dev->name, len);
> > dev->stats.rx_length_errors++;
> > -   if (vi->mergeable_rx_bufs) {
> > -   put_page(virt_to_head_page(buf));
> > -   } else if (vi->big_packets) {
> > -   give_pages(rq, buf);
> > -   } else {
> > -   put_page(virt_to_head_page(buf));
> > -   }
> > +   virtnet_rq_free_unused_buf(rq->vq, buf);
> > return;
> > }
> 
> Sure.
> 
> Acked-by: Michael S. Tsirkin 
> 
> while we are at it how about a patch moving virtnet_rq_free_unused_buf and
> virtnet_sq_free_unused_buf so we don't need forward declarations?
> 
Yes. its in my list. Will do after this patch.
I am also going to add the comment for the history about not doing ZLEN around 
this area.

> E.g. a good place for virtnet_sq_free_unused_buf is likely after ptr_to_xdp
> group of functions.
> 
> For virtnet_rq_free_unused_buf - after give_pages/get_a_page.
> 
> >
> > --
> > 2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next v2] virtio_net: Reuse buffer free function

2023-01-16 Thread Parav Pandit via Virtualization
virtnet_rq_free_unused_buf() helper function to free the buffer
already exists. Avoid code duplication by reusing existing function.

Reviewed-by: Alexander Duyck 
Reviewed-by: Xuan Zhuo 
Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7723b2a49d8e..31d037df514f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1251,13 +1251,7 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
-   if (vi->mergeable_rx_bufs) {
-   put_page(virt_to_head_page(buf));
-   } else if (vi->big_packets) {
-   give_pages(rq, buf);
-   } else {
-   put_page(virt_to_head_page(buf));
-   }
+   virtnet_rq_free_unused_buf(rq->vq, buf);
return;
}
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH net-next 0/2] Small packet processing handling changes

2023-01-14 Thread Parav Pandit via Virtualization
Hi Jakub, Dave,

> From: Parav Pandit 
> Sent: Friday, January 13, 2023 5:36 PM
> 
> Hi,
> 
> These two changes improve the small packet handling.
> 
> Patch summary:
> patch-1 fixes the length check by considering Ethernet 60B frame size
> patch-2 avoids code duplication by reuses existing buffer free helper
> 
> Please review.
> 
> Parav Pandit (2):
>   virtio_net: Fix short frame length check
>   virtio_net: Reuse buffer free function
> 
Please drop this series.
I will drop first patch as it was wrong.
Will send 2nd patch as v2 which was reviewed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH net-next 1/2] virtio_net: Fix short frame length check

2023-01-14 Thread Parav Pandit via Virtualization

> From: Alexander Duyck 
> Sent: Friday, January 13, 2023 7:24 PM
> 
> On Fri, Jan 13, 2023 at 3:37 PM Parav Pandit  wrote:
> >
> >
> > > From: Alexander H Duyck 
> > > Sent: Friday, January 13, 2023 6:24 PM
> > >
> > > On Sat, 2023-01-14 at 00:36 +0200, Parav Pandit wrote:
> > > > A smallest Ethernet frame defined by IEEE 802.3 is 60 bytes
> > > > without any preemble and CRC.
> > > >
> > > > Current code only checks for minimal 14 bytes of Ethernet header length.
> > > > Correct it to consider the minimum Ethernet frame length.
> > > >
> > > > Fixes: 296f96fcfc16 ("Net driver using virtio")
> > > > Signed-off-by: Parav Pandit 
> > > > ---
> > > >  drivers/net/virtio_net.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index
> > > > 7723b2a49d8e..d45e140b6852 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -1248,7 +1248,7 @@ static void receive_buf(struct virtnet_info
> > > > *vi,
> > > struct receive_queue *rq,
> > > > struct sk_buff *skb;
> > > > struct virtio_net_hdr_mrg_rxbuf *hdr;
> > > >
> > > > -   if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> > > > +   if (unlikely(len < vi->hdr_len + ETH_ZLEN)) {
> > > > pr_debug("%s: short packet %i\n", dev->name, len);
> > > > dev->stats.rx_length_errors++;
> > > > if (vi->mergeable_rx_bufs) {
> > >
> > > I'm not sure I agree with this change as packets are only 60B if
> > > they have gone across the wire as they are usually padded out on the
> > > transmit side. There may be cases where software routed packets may not
> be 60B.
> > >
> > Do you mean Linux kernel software? Any link to it would be helpful.
> 
> The problem is there are several software paths involved and that is why I am
> wanting to be cautious. As I recall this would impact Qemu itself, DPDK, the
> Linux Kernel and several others if I am not mistaken. That is why I am 
> tending to
> err on the side of caution as this is a pretty significant change.
> 
> > > As such rather than changing out ETH_HLEN for ETH_ZLEN I wonder if
> > > we should look at maybe making this a "<=" comparison instead since
> > > that is the only case I can think of where the packet would end up
> > > being entirely empty after eth_type_trans is called and we would be 
> > > passing
> an skb with length 0.
> >
> > I likely didn’t understand your comment.
> > This driver check is before creating the skb for the received packet.
> > So, purpose is to not even process the packet header or prepare the skb if 
> > it
> not an Ethernet frame.
> >
> > It is interesting to know when we get < 60B frame.
> 
> If I recall, a UDPv4 frame can easily do it since Ethernet is 14B, IP header 
> is 20,
> and UDP is only 8 so that only comes to 42B if I recall correctly. Similarly 
> I think
> a TCPv4 Frame can be as small as 54B if you disable all the option headers.

Yes for sure < 60B Ethernet payload is very common which is usually padded by 
the nic tx.
I am familiar with it. :)

I missed the part that when virtio is sw emulated, the tx short frame(not 
padded by stack) never left the sw stack.
(never sent to the hw nic).
Hence, it was never padded, and it was looped back.
This will reach as short frame to virtio driver.

So yes, this patch breaks it. I will drop this patch.
Thanks Alexander for the catch.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH net-next 1/2] virtio_net: Fix short frame length check

2023-01-13 Thread Parav Pandit via Virtualization

> From: Alexander H Duyck 
> Sent: Friday, January 13, 2023 6:24 PM
> 
> On Sat, 2023-01-14 at 00:36 +0200, Parav Pandit wrote:
> > A smallest Ethernet frame defined by IEEE 802.3 is 60 bytes without
> > any preemble and CRC.
> >
> > Current code only checks for minimal 14 bytes of Ethernet header length.
> > Correct it to consider the minimum Ethernet frame length.
> >
> > Fixes: 296f96fcfc16 ("Net driver using virtio")
> > Signed-off-by: Parav Pandit 
> > ---
> >  drivers/net/virtio_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> > 7723b2a49d8e..d45e140b6852 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1248,7 +1248,7 @@ static void receive_buf(struct virtnet_info *vi,
> struct receive_queue *rq,
> > struct sk_buff *skb;
> > struct virtio_net_hdr_mrg_rxbuf *hdr;
> >
> > -   if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> > +   if (unlikely(len < vi->hdr_len + ETH_ZLEN)) {
> > pr_debug("%s: short packet %i\n", dev->name, len);
> > dev->stats.rx_length_errors++;
> > if (vi->mergeable_rx_bufs) {
> 
> I'm not sure I agree with this change as packets are only 60B if they have 
> gone
> across the wire as they are usually padded out on the transmit side. There may
> be cases where software routed packets may not be 60B.
> 
Do you mean Linux kernel software? Any link to it would be helpful.

> As such rather than changing out ETH_HLEN for ETH_ZLEN I wonder if we
> should look at maybe making this a "<=" comparison instead since that is the
> only case I can think of where the packet would end up being entirely empty
> after eth_type_trans is called and we would be passing an skb with length 0.

I likely didn’t understand your comment.
This driver check is before creating the skb for the received packet.
So, purpose is to not even process the packet header or prepare the skb if it 
not an Ethernet frame.

It is interesting to know when we get < 60B frame.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH net-next 2/2] virtio_net: Reuse buffer free function

2023-01-13 Thread Parav Pandit via Virtualization
virtnet_rq_free_unused_buf() helper function to free the buffer
already exists. Avoid code duplication by reusing existing function.

Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d45e140b6852..c1faaf11fbcd 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1251,13 +1251,7 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
if (unlikely(len < vi->hdr_len + ETH_ZLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
-   if (vi->mergeable_rx_bufs) {
-   put_page(virt_to_head_page(buf));
-   } else if (vi->big_packets) {
-   give_pages(rq, buf);
-   } else {
-   put_page(virt_to_head_page(buf));
-   }
+   virtnet_rq_free_unused_buf(rq->vq, buf);
return;
}
 
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 0/2] Small packet processing handling changes

2023-01-13 Thread Parav Pandit via Virtualization
Hi,

These two changes improve the small packet handling.

Patch summary:
patch-1 fixes the length check by considering Ethernet 60B frame size
patch-2 avoids code duplication by reuses existing buffer free helper

Please review.

Parav Pandit (2):
  virtio_net: Fix short frame length check
  virtio_net: Reuse buffer free function

 drivers/net/virtio_net.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH net-next 1/2] virtio_net: Fix short frame length check

2023-01-13 Thread Parav Pandit via Virtualization
A smallest Ethernet frame defined by IEEE 802.3 is 60 bytes without any
preemble and CRC.

Current code only checks for minimal 14 bytes of Ethernet header length.
Correct it to consider the minimum Ethernet frame length.

Fixes: 296f96fcfc16 ("Net driver using virtio")
Signed-off-by: Parav Pandit 
---
 drivers/net/virtio_net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 7723b2a49d8e..d45e140b6852 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1248,7 +1248,7 @@ static void receive_buf(struct virtnet_info *vi, struct 
receive_queue *rq,
struct sk_buff *skb;
struct virtio_net_hdr_mrg_rxbuf *hdr;
 
-   if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
+   if (unlikely(len < vi->hdr_len + ETH_ZLEN)) {
pr_debug("%s: short packet %i\n", dev->name, len);
dev->stats.rx_length_errors++;
if (vi->mergeable_rx_bufs) {
-- 
2.26.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 3/4] vdpa: show dev config as-is in "vdpa dev show" output

2022-10-20 Thread Parav Pandit via Virtualization
> From: Si-Wei Liu 
> Sent: Thursday, October 20, 2022 2:12 PM
> 
> 
> On 10/19/2022 10:25 PM, Jason Wang wrote:
> > On Wed, Oct 19, 2022 at 8:56 AM Si-Wei Liu 
> wrote:
> >> Live migration of vdpa would typically require re-instate vdpa device
> >> with an idential set of configs on the destination node, same way as
> >> how source node created the device in the first place. In order to
> >> save orchestration software from memorizing and keeping track of vdpa
> >> config, it will be helpful if the vdpa tool provides the aids for
> >> exporting the initial configs from which vdpa device was created
> >> as-is. The "vdpa dev show" command seems to be the right vehicle for
> >> that. It is unlike the "vdpa dev config show" command output that
> >> usually goes with the live value in the device config space, which is
> >> not quite reliable subject to the dynamics of feature negotiation and
> >> possible change in device config space.
> >>
> >> Examples:
> >>
> >> 1) Create vDPA by default without any config attribute
> >>
> >> $ vdpa dev add mgmtdev pci/:41:04.2 name vdpa0 $ vdpa dev show
> >> vdpa0
> >> vdpa0: type network mgmtdev pci/:41:04.2 vendor_id  max_vqs
> 9
> >> max_vq_size 256 $ vdpa dev -jp show vdpa0 {
> >>  "dev": {
> >>  "vdpa0": {
> >>  "type": "network",
> >>  "mgmtdev": "pci/:41:04.2",
> >>  "vendor_id": ,
> >>  "max_vqs": 9,
> >>  "max_vq_size": 256,
> >>  }
> >>  }
> >> }
> >>
> >> 2) Create vDPA with config attribute(s) specified
> >>
> >> $ vdpa dev add mgmtdev pci/:41:04.2 name vdpa0 \
> >>  mac e4:11:c6:d3:45:f0 max_vq_pairs 4 $ vdpa dev show
> >> vdpa0: type network mgmtdev pci/:41:04.2 vendor_id  max_vqs
> 9 max_vq_size 256
> >>virtio_config: mac e4:11:c6:d3:45:f0 max_vq_pairs 4 $ vdpa dev -jp
> >> show {
> >>  "dev": {
> >>  "vdpa0": {
> >>  "type": "network",
> >>  "mgmtdev": "pci/:41:04.2",
> >>  "vendor_id": ,
> >>  "max_vqs": 9,
> >>  "max_vq_size": 256,
> >>  "virtio_config": {
Since most config is related to virtio.
May be better to do
s/virtio_config/static_config or
s/virto_config/dev_config

This clearly indicates that this was the device static configuration.

> >>  "mac": "e4:11:c6:d3:45:f0",
> >>  "max_vq_pairs": 4
> >>  }
> >>  }
> >>  }
> >> }
> >>
> >> Signed-off-by: Si-Wei Liu 
> >> ---
> >>   drivers/vdpa/vdpa.c | 39 +++
> >>   1 file changed, 39 insertions(+)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> 566c1c6..91eca6d 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -677,6 +677,41 @@ static int vdpa_nl_cmd_dev_del_set_doit(struct
> sk_buff *skb, struct genl_info *i
> >>   }
> >>
> >>   static int
> >> +vdpa_dev_cfgattrs_fill(struct vdpa_device *vdev, struct sk_buff
> >> +*msg, u32 device_id) {
> >> +   struct vdpa_dev_set_config *cfg = >vdev_cfg;
> >> +   int err = -EMSGSIZE;
> >> +
> >> +   if (!cfg->mask)
> >> +   return 0;
> >> +
> >> +   switch (device_id) {
> >> +   case VIRTIO_ID_NET:
> >> +   if ((cfg->mask &
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR)) != 0 &&
> >> +   nla_put(msg, VDPA_ATTR_DEV_NET_CFG_MACADDR,
> >> +   sizeof(cfg->net.mac), cfg->net.mac))
> >> +   return err;
> >> +   if ((cfg->mask & BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU)) !=
> 0 &&
> >> +   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, cfg-
> >net.mtu))
> >> +   return err;
> >> +   if ((cfg->mask &
> BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) != 0 &&
> >> +   nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >> +   cfg->net.max_vq_pairs))
> >> +   return err;
> >> +   break;
> > This makes me think if we can reuse the virtio_net_config structure
> > other than duplicate it slowly with a dedicated nested structure
> > inside vdpa_dev_set_config then we can reuse the
> > vdpa_dev_net_config_fill().
> Adding Parav.
> 
> I think for now the struct vdpa_dev_set_config has just a few fields, so it's
> not very obvious. But from what I understand, the vdpa_dev_set_config
> struct is designed to be built around vdpa configurables, without getting it
> limited by what's exposed by the virtio device config structure, such as
> virtio_net_config. 
Sure. Vdpa_dev_set_config can expand for fields outside of virtio_net_config 
structure space, but it should be close to virtio spec definition like you 
described below or close to Linux kernel objects.

At present it can handle another 62 more fields, which I think is good enough 
for midterm.

> For instance, there could be possibility for vdpa user to
> specify 

RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-10-05 Thread Parav Pandit via Virtualization
> From: Jakub Kicinski 
> Sent: Thursday, September 22, 2022 6:05 PM
> 
> On Thu, 22 Sep 2022 06:14:59 -0400 Michael S. Tsirkin wrote:
> > It's nitpicking to be frank. v6 arrived while I was traveling and I
> > didn't notice it.  I see Jason acked that so I guess I will just apply
> > as is.
> 
> Oh, you wanna take it? The split on who takes virtio_net is a touch unclear
> but if it makes more sense here to go via virtio feel free to slap my ack on 
> the
> v6.

Michael,

Did you get a chance to take this?
I don't see it at 
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/log/?h=linux-next
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-22 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, September 22, 2022 6:15 AM
> 
> It's nitpicking to be frank. v6 arrived while I was traveling and I didn't 
> notice it.
> I see Jason acked that so I guess I will just apply as is. Do you ack v6 too?
> 
Yes. I reviewed it. Gavin added reviewed-by.
Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-22 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, September 22, 2022 5:27 AM

> > >
> > > And I'd like commit log to include results of perf testing
> > > - with indirect feature on
> > Which device do you suggest using for this test?
> 
> AFAIK most devices support INDIRECT, e.g. don't nvidia cards do this?
The device of interest didn't have INDIRECT support.
But the question is no more relevant as Gavin supplied the numbers with 
INDIRECT in commit log of v6.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-22 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Thursday, September 22, 2022 5:35 AM
> 
> On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for big
> > packets even when GUEST_* offloads are not present on the device.
> > However, if guest GSO is not supported, it would be sufficient to
> > allocate segments to cover just up the MTU size and no further.
> > Allocating the maximum amount of segments results in a large waste of
> > buffer space in the queue, which limits the number of packets that can
> > be buffered and can result in reduced performance.
> >
> > Therefore, if guest GSO is not supported, use the MTU to calculate the
> > optimal amount of segments required.
> >
> > When guest offload is enabled at runtime, RQ already has packets of
> > bytes less than 64K. So when packet of 64KB arrives, all the packets
> > of such size will be dropped. and RQ is now not usable.
> >
> > So this means that during set_guest_offloads() phase, RQs have to be
> > destroyed and recreated, which requires almost driver reload.
> >
> > If VIRTIO_NET_F_CTRL_GUEST_OFFLOADS has been negotiated, then it
> > should always treat them as GSO enabled.
> >
> > Accordingly, for now the assumption is that if guest GSO has been
> > negotiated then it has been enabled, even if it's actually been
> > disabled at runtime through VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> >
> > Below is the iperf TCP test results over a Mellanox NIC, using vDPA
> > for
> > 1 VQ, queue size 1024, before and after the change, with the iperf
> > server running over the virtio-net interface.
> >
> > MTU(Bytes)/Bandwidth (Gbit/s)
> >  Before   After
> >   150022.5 22.4
> >   900012.8 25.9
> >
> > Signed-off-by: Gavin Li 
> > Reviewed-by: Gavi Teitz 
> > Reviewed-by: Parav Pandit 
> > Reviewed-by: Xuan Zhuo 
> > Reviewed-by: Si-Wei Liu 
> 
> OK I think the logic is correct, it's just a bit harder to read than 
> necessary.
> Small improvement suggestions:
> 
> 
> > ---
> > changelog:
> > v4->v5
> > - Addressed comments from Michael S. Tsirkin
> > - Improve commit message
> > v3->v4
> > - Addressed comments from Si-Wei
> > - Rename big_packets_sg_num with big_packets_num_skbfrags
> > v2->v3
> > - Addressed comments from Si-Wei
> > - Simplify the condition check to enable the optimization
> > v1->v2
> > - Addressed comments from Jason, Michael, Si-Wei.
> > - Remove the flag of guest GSO support, set sg_num for big packets and
> >   use it directly
> > - Recalculate sg_num for big packets in virtnet_set_guest_offloads
> > - Replace the round up algorithm with DIV_ROUND_UP
> > ---
> >  drivers/net/virtio_net.c | 37 -
> >  1 file changed, 24 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index
> > f831a0290998..dbffd5f56fb8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -225,6 +225,9 @@ struct virtnet_info {
> > /* I like... big packets and I cannot lie! */
> > bool big_packets;
> >
> > +   /* number of sg entries allocated for big packets */
> > +   unsigned int big_packets_num_skbfrags;
> > +
> > /* Host will merge rx buffers for big packets (shake it! shake it!) */
> > bool mergeable_rx_bufs;
> >
> 
> big_packets_num_skbfrags -> big_packet_num_skbfrags
> 
> > @@ -1331,10 +1334,10 @@ static int add_recvbuf_big(struct virtnet_info
> *vi, struct receive_queue *rq,
> > char *p;
> > int i, err, offset;
> >
> > -   sg_init_table(rq->sg, MAX_SKB_FRAGS + 2);
> > +   sg_init_table(rq->sg, vi->big_packets_num_skbfrags + 2);
> >
> > -   /* page in rq->sg[MAX_SKB_FRAGS + 1] is list tail */
> > -   for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
> > +   /* page in rq->sg[vi->big_packets_num_skbfrags + 1] is list tail */
> > +   for (i = vi->big_packets_num_skbfrags + 1; i > 1; --i) {
> > first = get_a_page(rq, gfp);
> > if (!first) {
> > if (list)
> > @@ -1365,7 +1368,7 @@ static int add_recvbuf_big(struct virtnet_info
> > *vi, struct receive_queue *rq,
> >
> > /* chain first in list head */
> > first->private = (unsigned long)list;
> > -   err = virtqueue_add_inbuf(rq->vq, rq->sg, MAX_SKB_FRAGS + 2,
> > +   err = virtqueue_add_inbuf(rq->vq, rq->sg,
> > +vi->big_packets_num_skbfrags + 2,
> >   first, gfp);
> > if (err < 0)
> > give_pages(rq, first);
> > @@ -3690,13 +3693,27 @@ static bool virtnet_check_guest_gso(const
> struct virtnet_info *vi)
> > virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO);  }
> >
> > +static void virtnet_set_big_packets_fields(struct virtnet_info *vi,
> > +const int mtu) {
> > +   bool guest_gso = virtnet_check_guest_gso(vi);
> > +
> > +   /* If device can receive ANY guest GSO packets, regardless of mtu,
> > +* allocate packets of maximum size, otherwise limit it to only
> > +* mtu size 

RE: [PATCH 0/3] vdpa: device feature provisioning

2022-09-20 Thread Parav Pandit via Virtualization
Hi Jason,

> From: Jason Wang 
> Sent: Thursday, September 15, 2022 4:51 AM
> To: m...@redhat.com; jasow...@redhat.com
> Cc: Eli Cohen ; si-wei@oracle.com; Parav Pandit
> ; wuzongy...@linux.alibaba.com;
> virtualization@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> epere...@redhat.com; lingshan@intel.com; gda...@xilinx.com;
> l...@redhat.com; xieyon...@bytedance.com
> Subject: [PATCH 0/3] vdpa: device feature provisioning
> 
> Hi All:
> 
> Virtio features are neogiated between the device and the drivers. This allows
> the mediation layer like vDPA to hide some features from the driver to
> faciliate the cross vendor live migration:
> 
> vDPA on the source supports feature set X vDPA on the destination supports
> feature set Y
> 
> Management can simply provision the vDPA instance with features X on
> both source and destination to let the vDPA can be migrate-able between
> the two vDPA devies with different features support.
> 
> This series tries to allow the device features to be provisioned via netlink 
> to
> achieve this.
Very useful series.
Can you please add vdpa example command with and without -jp option in each of 
the commit logs?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization

> From: Si-Wei Liu 
> Sent: Wednesday, September 7, 2022 5:40 PM
> 
> 
> On 9/7/2022 12:51 PM, Parav Pandit wrote:
> >> And I'd like commit log to include results of perf testing
> >> - with indirect feature on
> > Which device do you suggest using for this test?
> >
> You may use software vhost-net backend with and without fix to compare.
> Since this driver fix effectively lowers down the buffer size for the
> indirect=on case as well, 
Do you have sample example for this?

> it's a natural request to make sure no perf
> degradation is seen on devices with indirect descriptor enabled. I don't
> expect degradation though and think this patch should improve efficiency
> with less memory foot print.
> 
Any specific reason to discount test for the packed vq here because the change 
applies to packed vq too?

It is counter intuitive to see degradation with smaller size buffers.
But sure, code reviews can miss things that can bring regression for which it 
should be tested.

I am not against the test itself. It is good thing to do more test coverage.
What is puzzling me is, I fail to see test results not available in previous 
commits and cover letters, which is making this patch special for test coverage.
Or a new trend begins with this specific patch onwards?

For example, I don’t see a test results in commit [1], [2], [3] to indicate 
that no degradation is seen which heavily touches the lock in core data path.

So want to know test reporting guidelines in the commit logs for this and other 
patches.

[1] commit 5a2f966d0f
[2] commit a7766ef18b33 
[3] commit 22bc63c58e876
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 3:36 PM
> 
> (c) replace mtu = 0 with sensibly not calling the function when mtu is
> unknown.
Even when mtu is zero, virtnet_set_big_packets() must be called to act on the 
gso bits.
Currently handling by virtnet_set_big_packets() seems more simpler taking care 
of mtu and gso both.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 3:38 PM
> 
> and if possible a larger ring. 1k?

What do you expect to see here for which test report should be added to commit 
log?
What is special about 1k vs 512, 128 and 2k? is 1K default for some 
configuration?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 3:36 PM
> 
> On Wed, Sep 07, 2022 at 07:27:16PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, September 7, 2022 3:24 PM
> > >
> > > On Wed, Sep 07, 2022 at 07:18:06PM +, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Wednesday, September 7, 2022 3:12 PM
> > > >
> > > > > > Because of shallow queue of 16 entries deep.
> > > > >
> > > > > but why is the queue just 16 entries?
> > > > I explained the calculation in [1] about 16 entries.
> > > >
> > > > [1]
> > > >
> > >
> https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> > > 419@
> > > > PH0PR12MB5481.namprd12.prod.outlook.com/
> > > >
> > > > > does the device not support indirect?
> > > > >
> > > > Yes, indirect feature bit is disabled on the device.
> > >
> > > OK that explains it.
> >
> > So can we proceed with v6 to contain
> > (a) updated commit message and
> > (b) function name change you suggested to drop _fields suffix?
> 
> (c) replace mtu = 0 with sensibly not calling the function when mtu is
> unknown.

> 
> 
> And I'd like commit log to include results of perf testing
> - with indirect feature on
Which device do you suggest using for this test?

> - with mtu feature off
Why is this needed when it is not touching the area of mtu being not offered?

> just to make sure nothing breaks.
Not sure why you demand this.
Can you please share the link to other patches that ensured that nothing 
breaks, for example I didn't see a similar "test ask" in v14 series [1]?
What is so special about current patch of interest vs [1] that requires this 
special testing details in commit log, and it is not required in [1] or past 
patches?
Do you have link to the tests done with synchronization tests by commit [2]?

This will help to define test matrix for developers and internal regression and 
similar report in all subsequent patches like [1].

[1] 
https://lore.kernel.org/bpf/20220801063902.129329-41-xuanz...@linux.alibaba.com/
[2] 6213f07cb54
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 3:24 PM
> 
> On Wed, Sep 07, 2022 at 07:18:06PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, September 7, 2022 3:12 PM
> >
> > > > Because of shallow queue of 16 entries deep.
> > >
> > > but why is the queue just 16 entries?
> > I explained the calculation in [1] about 16 entries.
> >
> > [1]
> >
> https://lore.kernel.org/netdev/PH0PR12MB54812EC7F4711C1EA4CAA119DC
> 419@
> > PH0PR12MB5481.namprd12.prod.outlook.com/
> >
> > > does the device not support indirect?
> > >
> > Yes, indirect feature bit is disabled on the device.
> 
> OK that explains it.

So can we proceed with v6 to contain 
(a) updated commit message and
(b) function name change you suggested to drop _fields suffix?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 3:12 PM

> > Because of shallow queue of 16 entries deep.
> 
> but why is the queue just 16 entries?
I explained the calculation in [1] about 16 entries.

[1] 
https://lore.kernel.org/netdev/ph0pr12mb54812ec7f4711c1ea4caa119dc...@ph0pr12mb5481.namprd12.prod.outlook.com/

> does the device not support indirect?
>
Yes, indirect feature bit is disabled on the device.
 
> because with indirect you get 256 entries, with 16 s/g each.
> 
Sure. I explained below that indirect comes with 7x memory cost that is not 
desired.
(Ignored the table memory allocation cost and extra latency).

Hence don't want to enable indirect in this scenario.
This optimization also works with indirect with smaller indirect table.

> 
> > With driver turn around time to repost buffers, device is idle without any
> RQ buffers.
> > With this improvement, device has 85 buffers instead of 16 to receive
> packets.
> >
> > Enabling indirect in device can help at cost of 7x higher memory per VQ in
> the guest VM.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 2:16 PM
> 
> On Wed, Sep 07, 2022 at 04:12:47PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, September 7, 2022 10:40 AM
> > >
> > > On Wed, Sep 07, 2022 at 02:33:02PM +, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Wednesday, September 7, 2022 10:30 AM
> > > >
> > > > [..]
> > > > > > > actually how does this waste space? Is this because your
> > > > > > > device does not have INDIRECT?
> > > > > > VQ is 256 entries deep.
> > > > > > Driver posted total of 256 descriptors.
> > > > > > Each descriptor points to a page of 4K.
> > > > > > These descriptors are chained as 4K * 16.
> > > > >
> > > > > So without indirect then? with indirect each descriptor can
> > > > > point to
> > > > > 16 entries.
> > > > >
> > > > With indirect, can it post 256 * 16 size buffers even though vq
> > > > depth is 256
> > > entries?
> > > > I recall that total number of descriptors with direct/indirect
> > > > descriptors is
> > > limited to vq depth.
> > >
> > >
> > > > Was there some recent clarification occurred in the spec to clarify 
> > > > this?
> > >
> > >
> > > This would make INDIRECT completely pointless.  So I don't think we
> > > ever had such a limitation.
> > > The only thing that comes to mind is this:
> > >
> > >   A driver MUST NOT create a descriptor chain longer than the Queue
> > > Size of
> > >   the device.
> > >
> > > but this limits individual chain length not the total length of all 
> > > chains.
> > >
> > Right.
> > I double checked in virtqueue_add_split() which doesn't count table
> entries towards desc count of VQ for indirect case.
> >
> > With indirect descriptors without this patch the situation is even worse
> with memory usage.
> > Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and
> used) buffer is only 2.3 Mbytes.
> 
> Yes. So just so we understand the reason for the performance improvement
> is this because of memory usage? Or is this because device does not have
> INDIRECT?

Because of shallow queue of 16 entries deep.
With driver turn around time to repost buffers, device is idle without any RQ 
buffers.
With this improvement, device has 85 buffers instead of 16 to receive packets.

Enabling indirect in device can help at cost of 7x higher memory per VQ in the 
guest VM.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 10:40 AM
> 
> On Wed, Sep 07, 2022 at 02:33:02PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, September 7, 2022 10:30 AM
> >
> > [..]
> > > > > actually how does this waste space? Is this because your device
> > > > > does not have INDIRECT?
> > > > VQ is 256 entries deep.
> > > > Driver posted total of 256 descriptors.
> > > > Each descriptor points to a page of 4K.
> > > > These descriptors are chained as 4K * 16.
> > >
> > > So without indirect then? with indirect each descriptor can point to
> > > 16 entries.
> > >
> > With indirect, can it post 256 * 16 size buffers even though vq depth is 256
> entries?
> > I recall that total number of descriptors with direct/indirect descriptors 
> > is
> limited to vq depth.
> 
> 
> > Was there some recent clarification occurred in the spec to clarify this?
> 
> 
> This would make INDIRECT completely pointless.  So I don't think we ever
> had such a limitation.
> The only thing that comes to mind is this:
> 
>   A driver MUST NOT create a descriptor chain longer than the Queue
> Size of
>   the device.
> 
> but this limits individual chain length not the total length of all chains.
> 
Right.
I double checked in virtqueue_add_split() which doesn't count table entries 
towards desc count of VQ for indirect case.

With indirect descriptors without this patch the situation is even worse with 
memory usage.
Driver will allocate 64K * 256 = 16MB buffer per VQ, while needed (and used) 
buffer is only 2.3 Mbytes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 10:30 AM

[..]
> > > actually how does this waste space? Is this because your device does
> > > not have INDIRECT?
> > VQ is 256 entries deep.
> > Driver posted total of 256 descriptors.
> > Each descriptor points to a page of 4K.
> > These descriptors are chained as 4K * 16.
> 
> So without indirect then? with indirect each descriptor can point to 16
> entries.
> 
With indirect, can it post 256 * 16 size buffers even though vq depth is 256 
entries?
I recall that total number of descriptors with direct/indirect descriptors is 
limited to vq depth.
Was there some recent clarification occurred in the spec to clarify this?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 2/2] virtio-net: use mtu size as buffer length for big packets

2022-09-07 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, September 7, 2022 5:27 AM
> 
> On Wed, Sep 07, 2022 at 04:08:54PM +0800, Gavin Li wrote:
> >
> > On 9/7/2022 1:31 PM, Michael S. Tsirkin wrote:
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Thu, Sep 01, 2022 at 05:10:38AM +0300, Gavin Li wrote:
> > > > Currently add_recvbuf_big() allocates MAX_SKB_FRAGS segments for
> > > > big packets even when GUEST_* offloads are not present on the
> device.
> > > > However, if guest GSO is not supported, it would be sufficient to
> > > > allocate segments to cover just up the MTU size and no further.
> > > > Allocating the maximum amount of segments results in a large waste
> > > > of buffer space in the queue, which limits the number of packets
> > > > that can be buffered and can result in reduced performance.
> 
> actually how does this waste space? Is this because your device does not
> have INDIRECT?
VQ is 256 entries deep.
Driver posted total of 256 descriptors.
Each descriptor points to a page of 4K.
These descriptors are chained as 4K * 16.
So total packets that can be serviced are 256/16 = 16.
So effective queue depth = 16.

So, when GSO is off, for 9K mtu, packet buffer needed = 3 pages. (12k).
So, 13 descriptors (= 13 x 4K =52K) per packet buffer is wasted.

After this improvement, these 13 descriptors are available, increasing the 
effective queue depth = 256/3 = 85.

[..]
> > > >
> > > > MTU(Bytes)/Bandwidth (Gbit/s)
> > > >   Before   After
> > > >150022.5 22.4
> > > >900012.8 25.9
> 
> 
> is this buffer space?
Above performance numbers are showing improvement in bandwidth. In Gbps/sec.

> just the overhead of allocating/freeing the buffers?
> of using INDIRECT?
The effective queue depth is so small, device cannot receive all the packets at 
given bw-delay product.

> > >
> > > Which configurations were tested?
> > I tested it with DPDK vDPA + qemu vhost. Do you mean the feature set
> > of the VM?
> 
The configuration of interest is mtu, not the backend.
Which is different mtu as shown in above perf numbers.

> > > Did you test devices without VIRTIO_NET_F_MTU ?
> > No.  It will need code changes.
No. It doesn't need any code changes. This is misleading/vague.

This patch doesn't have any relation to a device which doesn't offer 
VIRTIO_NET_F_MTU.
Just the code restructuring is touching this area, that may require some 
existing tests.
I assume virtio tree will have some automation tests for such a device?

> > > >
> > > > @@ -3853,12 +3866,10 @@ static int virtnet_probe(struct
> > > > virtio_device *vdev)
> > > >
> > > >dev->mtu = mtu;
> > > >dev->max_mtu = mtu;
> > > > -
> > > > - /* TODO: size buffers correctly in this case. */
> > > > - if (dev->mtu > ETH_DATA_LEN)
> > > > - vi->big_packets = true;
> > > >}
> > > >
> > > > + virtnet_set_big_packets_fields(vi, mtu);
> > > > +
> > > If VIRTIO_NET_F_MTU is off, then mtu is uninitialized.
> > > You should move it to within if () above to fix.
> > mtu was initialized to 0 at the beginning of probe if VIRTIO_NET_F_MTU
> > is off.
> >
> > In this case,  big_packets_num_skbfrags will be set according to guest gso.
> >
> > If guest gso is supported, it will be set to MAX_SKB_FRAGS else
> > zero do you
> >
> > think this is a bug to be fixed?
> 
> 
> yes I think with no mtu this should behave as it did historically.
> 
Michael is right.
It should behave as today. There is no new bug introduced by this patch.
dev->mtu and dev->max_mtu is set only when VIRTIO_NET_F_MTU is offered 
with/without this patch.

Please have mtu related fix/change in different patch.

> > >
> > > >if (vi->any_header_sg)
> > > >dev->needed_headroom = vi->hdr_len;
> > > >
> > > > --
> > > > 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 1/2] virtio-net: introduce and use helper function for guest gso support checks

2022-08-31 Thread Parav Pandit via Virtualization


> From: Gavin Li 
> Sent: Wednesday, August 31, 2022 5:03 AM
> 
> Probe routine is already several hundred lines.
> Use helper function for guest gso support check.
> 
> Signed-off-by: Gavin Li 
> Reviewed-by: Gavi Teitz 
> Reviewed-by: Parav Pandit 

You missed the review by and ack by entries that were added in the email reply 
for v3 for this, and 2nd patch.
Please add them.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH RESEND v2 2/2] virtio-net: use mtu size as buffer length for big packets

2022-08-26 Thread Parav Pandit via Virtualization



> From: Si-Wei Liu 
> Sent: Friday, August 26, 2022 2:05 PM
> 

> > +   /* If we can receive ANY GSO packets, we must allocate large ones.
> */
> > +   if (mtu > ETH_DATA_LEN || guest_gso) {
> > +   vi->big_packets = true;
> > +   /* if the guest offload is offered by the device, user can
> modify
> > +* offload capability. In such posted buffers may short fall of
> size.
> > +* Hence allocate for max size.
> > +*/
> > +   if (virtio_has_feature(vi->vdev,
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
> > +   vi->big_packets_sg_num = MAX_SKB_FRAGS;
> >> MAX_SKB_FRAGS is needed when any of the guest_gso capability is
> offered. This is per spec regardless if
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is negotiated or not. Quoting spec:
> >
> >
> >> If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
> >> If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
> VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the
> receive queue(s) with buffers of at least 65562 bytes.
> >
> > Spec recommendation is good here, but Linux driver knows that such
> offload settings cannot change if the above feature is not offered.
> > So I think we should add the comment and reference to the code to have
> this optimization.
> 
> I don't get what you mean by optimization here. Say if
> VIRTIO_NET_F_GUEST_TSO4 is negotiated while
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is not offered, why you consider it
> an optimization to post only MTU sized (with roundup to page) buffers?
> Wouldn't it be an issue if the device is passing up aggregated GSO packets of
> up to 64KB? Noted, GUEST_GSO is implied on when the corresponding
> feature bit is negotiated, regardless the presence of
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS bit.
> 
You are right.
NETIF_F_GRO_HW setting of the netdev is already guarded by 
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS bit check.
So, its fine. I missed that code previously.

I agree the condition check should be simpler like below.

if (guest_gso || mtu > ETH_DATA_LEN) {
vi->big_packets = true;
vi->big_packets_sg_num = guest_gso ? MAX_SKB_FRAGS : DIV_ROUND_UP(mtu, 
PAGE_SIZE);
}
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH RESEND v2 2/2] virtio-net: use mtu size as buffer length for big packets

2022-08-26 Thread Parav Pandit via Virtualization


From: Si-Wei Liu  
Sent: Friday, August 26, 2022 4:52 AM

> Sorry for the delay. Didn't notice as this thread was not addressed to my 
> work email. Please copy to my work email if it needs my immediate attention.

Can you please setup your mail client to post plain text mail as required by 
mailing list.
Conversation without it is close to impossible to track.
+   /* If we can receive ANY GSO packets, we must allocate large ones. */
+   if (mtu > ETH_DATA_LEN || guest_gso) {
+   vi->big_packets = true;
+   /* if the guest offload is offered by the device, user can 
modify
+* offload capability. In such posted buffers may short fall of 
size.
+* Hence allocate for max size.
+*/
+   if (virtio_has_feature(vi->vdev, 
VIRTIO_NET_F_CTRL_GUEST_OFFLOADS))
+   vi->big_packets_sg_num = MAX_SKB_FRAGS;
> MAX_SKB_FRAGS is needed when any of the guest_gso capability is offered. This 
> is per spec regardless if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is negotiated or 
> not. Quoting spec:


> If VIRTIO_NET_F_MRG_RXBUF is not negotiated: 
> If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or VIRTIO_NET_F_GUEST_UFO 
> are negotiated, the driver SHOULD populate the receive queue(s) with buffers 
> of at least 65562 bytes.

Spec recommendation is good here, but Linux driver knows that such offload 
settings cannot change if the above feature is not offered.
So I think we should add the comment and reference to the code to have this 
optimization.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v2 0/2] Improve virtio performance for 9k mtu

2022-08-22 Thread Parav Pandit via Virtualization


> From: Gavin Li 
> Sent: Friday, August 19, 2022 10:49 AM
> 
> This small series contains two patches that improves virtio netdevice
> performance for 9K mtu when GRO/ guest TSO is disabled.
> 
> Gavin Li (2):
>   virtio-net: introduce and use helper function for guest gso support
> checks
>   virtio-net: use mtu size as buffer length for big packets
> 
>  drivers/net/virtio_net.c | 52 +++-
>  1 file changed, 36 insertions(+), 16 deletions(-)

This driver change is in drivers/net.
Patches must include, net...@vger.kernel.org to this series.
I do not know if they are merged through MST tree or netdev tree.

You might want to wait for a day or two, and if you don't receive comments, you 
should resend V2 by including net...@vger.kernel.org.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev

2022-08-18 Thread Parav Pandit via Virtualization

> From: Jason Wang 
> Sent: Thursday, August 18, 2022 12:19 AM

>  On 8/16/2022 10:32 AM, Parav Pandit wrote:
> >> From: Zhu Lingshan 
> >> Sent: Monday, August 15, 2022 5:27 AM
> >>
> >> Some fields of virtio-net device config space are conditional on
> >> the feature bits, the spec says:
> >>
> >> "The mac address field always exists (though is only valid if
> >> VIRTIO_NET_F_MAC is set)"
> >>
> >> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ or
> >> VIRTIO_NET_F_RSS is set"
> >>
> >> "mtu only exists if VIRTIO_NET_F_MTU is set"
> >>
> >> so we should read MTU, MAC and MQ in the device config space
> only
> >> when these feature bits are offered.
> > Yes.
> >
> >> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are not
> set,
>  the
> >> virtio device should have one queue pair as default value, so
> >> when userspace querying queue pair numbers, it should return
> mq=1
> >> than zero.
> > No.
> > No need to treat mac and max_qps differently.
> > It is meaningless to differentiate when field exist/not-exists vs
> > value
>  valid/not valid.
>  as we discussed before, MQ has a default value 1, to be a
>  functional virtio- net device, while MAC has no default value, if
>  no VIRTIO_NET_F_MAC set, the driver should generate a random
> MAC.
> >> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read MTU
> >> from the device config sapce.
> >> RFC894  >> Ethernet
> >> Networks> says:"The minimum length of the data field of a packet
> >> sent
> >> Networks> over
> >> an Ethernet is 1500 octets, thus the maximum length of an IP
> >> datagram sent over an Ethernet is 1500 octets.  Implementations
> >> are encouraged to support full-length packets"
> > This line in the RFC 894 of 1984 is wrong.
> > Errata already exists for it at [1].
> >
> > [1]
> > https://www.rfc-editor.org/errata_search.php?rfc=894_status=0
>  OK, so I think we should return nothing if _F_MTU not set, like
>  handling the MAC
> >> virtio spec says:"The virtio network device is a virtual ethernet
> >> card", so the default MTU value should be 1500 for virtio-net.
> >>
> > Practically I have seen 1500 and highe mtu.
> > And this derivation is not good of what should be the default mtu
> > as above
>  errata exists.
> > And I see the code below why you need to work so hard to define a
> > default
>  value so that _MQ and _MTU can report default values.
> > There is really no need for this complexity and such a long commit
>  message.
> > Can we please expose feature bits as-is and report config space
> > field which
>  are valid?
> > User space will be querying both.
>  I think MAC and MTU don't have default values, so return nothing if
>  the feature bits not set, for MQ, it is still max_vq_paris == 1 by
>  default.
> >>> I have stressed enough to highlight the fact that we don’t want to
> >>> start digging default/no default, valid/no-valid part of the spec.
> >>> I prefer kernel to reporting fields that _exists_ in the config
> >>> space and are valid.
> >>> I will let MST to handle the maintenance nightmare that this kind of
> >>> patch brings in without any visible gain to user space/orchestration
> >>> apps.
> >>>
> >>> A logic that can be easily build in user space, should be written in
> >>> user space.
> >>> I conclude my thoughts here for this discussion.
> >>>
> >>> I will let MST to decide how he prefers to proceed.
> >>>
> >> +    if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
> >> +    val_u16 = 1500;
> >> +    else
> >> +    val_u16 = __virtio16_to_cpu(true, config->mtu);
> >> +
> > Need to work hard to find default values and that too turned out
> > had
>  errata.
> > There are more fields that doesn’t have default values.
> >
> > There is no point in kernel doing this guess work, that user space
> > can figure
>  out of what is valid/invalid.
>  It's not guest work, when guest finds no feature bits set, it can
>  decide what to do.
> >>> Above code of doing 1500 was probably an honest attempt to find a
> >>> legitimate default value, and we saw that it doesn’t work.
> >>> This is second example after _MQ that we both agree should not
> >>> return default.
> >>>
> >>> And there are more fields coming in this area.
> >>> Hence, I prefer to not avoid returning such defaults for MAC, MTU,
> >>> MQ and rest all fields which doesn’t _exists_.
> >>>
> >>> I will let MST to decide how he prefers to proceed for every field
> >>> to come next.
> >>> Thanks.
> >>>
> >>
> >> If MTU does not return a value without _F_MTU, and MAC does not
> >> return a value without _F_MAC then IMO yes, number of queues should
> >> not return a value without _F_MQ.
> > sure I can do this, but may I ask 

RE: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev

2022-08-16 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, August 16, 2022 12:19 AM
> 
> 
> On 8/16/2022 10:32 AM, Parav Pandit wrote:
> >> From: Zhu Lingshan 
> >> Sent: Monday, August 15, 2022 5:27 AM
> >>
> >> Some fields of virtio-net device config space are conditional on the
> >> feature bits, the spec says:
> >>
> >> "The mac address field always exists
> >> (though is only valid if VIRTIO_NET_F_MAC is set)"
> >>
> >> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ or
> >> VIRTIO_NET_F_RSS is set"
> >>
> >> "mtu only exists if VIRTIO_NET_F_MTU is set"
> >>
> >> so we should read MTU, MAC and MQ in the device config space only
> >> when these feature bits are offered.
> > Yes.
> >
> >> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are not set,
> the
> >> virtio device should have one queue pair as default value, so when
> >> userspace querying queue pair numbers, it should return mq=1 than zero.
> > No.
> > No need to treat mac and max_qps differently.
> > It is meaningless to differentiate when field exist/not-exists vs value
> valid/not valid.
> as we discussed before, MQ has a default value 1, to be a functional virtio-
> net device, while MAC has no default value, if no VIRTIO_NET_F_MAC set,
> the driver should generate a random MAC.
> >
> >> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read MTU from
> >> the device config sapce.
> >> RFC894  >> Networks> says:"The minimum length of the data field of a packet sent
> >> Networks> over
> >> an Ethernet is 1500 octets, thus the maximum length of an IP datagram
> >> sent over an Ethernet is 1500 octets.  Implementations are encouraged
> >> to support full-length packets"
> > This line in the RFC 894 of 1984 is wrong.
> > Errata already exists for it at [1].
> >
> > [1] https://www.rfc-editor.org/errata_search.php?rfc=894_status=0
> OK, so I think we should return nothing if _F_MTU not set, like handling the
> MAC
> >
> >> virtio spec says:"The virtio network device is a virtual ethernet
> >> card", so the default MTU value should be 1500 for virtio-net.
> >>
> > Practically I have seen 1500 and highe mtu.
> > And this derivation is not good of what should be the default mtu as above
> errata exists.
> >
> > And I see the code below why you need to work so hard to define a default
> value so that _MQ and _MTU can report default values.
> >
> > There is really no need for this complexity and such a long commit
> message.
> >
> > Can we please expose feature bits as-is and report config space field which
> are valid?
> >
> > User space will be querying both.
> I think MAC and MTU don't have default values, so return nothing if the
> feature bits not set, 

> for MQ, it is still max_vq_paris == 1 by default.

I have stressed enough to highlight the fact that we don’t want to start 
digging default/no default, valid/no-valid part of the spec.
I prefer kernel to reporting fields that _exists_ in the config space and are 
valid.
I will let MST to handle the maintenance nightmare that this kind of patch 
brings in without any visible gain to user space/orchestration apps.

A logic that can be easily build in user space, should be written in user space.
I conclude my thoughts here for this discussion.

I will let MST to decide how he prefers to proceed.

>
> >> +  if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
> >> +  val_u16 = 1500;
> >> +  else
> >> +  val_u16 = __virtio16_to_cpu(true, config->mtu);
> >> +
> > Need to work hard to find default values and that too turned out had
> errata.
> > There are more fields that doesn’t have default values.
> >
> > There is no point in kernel doing this guess work, that user space can 
> > figure
> out of what is valid/invalid.
> It's not guest work, when guest finds no feature bits set, it can decide what
> to do. 

Above code of doing 1500 was probably an honest attempt to find a legitimate 
default value, and we saw that it doesn’t work.
This is second example after _MQ that we both agree should not return default.

And there are more fields coming in this area.
Hence, I prefer to not avoid returning such defaults for MAC, MTU, MQ and rest 
all fields which doesn’t _exists_.

I will let MST to decide how he prefers to proceed for every field to come next.
Thanks.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev

2022-08-15 Thread Parav Pandit via Virtualization

> From: Zhu Lingshan 
> Sent: Monday, August 15, 2022 5:27 AM
> 
> Some fields of virtio-net device config space are conditional on the feature
> bits, the spec says:
> 
> "The mac address field always exists
> (though is only valid if VIRTIO_NET_F_MAC is set)"
> 
> "max_virtqueue_pairs only exists if VIRTIO_NET_F_MQ or
> VIRTIO_NET_F_RSS is set"
> 
> "mtu only exists if VIRTIO_NET_F_MTU is set"
> 
> so we should read MTU, MAC and MQ in the device config space only when
> these feature bits are offered.
Yes.

> 
> For MQ, if both VIRTIO_NET_F_MQ and VIRTIO_NET_F_RSS are not set, the
> virtio device should have one queue pair as default value, so when userspace
> querying queue pair numbers, it should return mq=1 than zero.
No.
No need to treat mac and max_qps differently.
It is meaningless to differentiate when field exist/not-exists vs value 
valid/not valid.

> 
> For MTU, if VIRTIO_NET_F_MTU is not set, we should not read MTU from
> the device config sapce.
> RFC894  Networks> says:"The minimum length of the data field of a packet sent over
> an Ethernet is 1500 octets, thus the maximum length of an IP datagram sent
> over an Ethernet is 1500 octets.  Implementations are encouraged to support
> full-length packets"
This line in the RFC 894 of 1984 is wrong.
Errata already exists for it at [1].

[1] https://www.rfc-editor.org/errata_search.php?rfc=894_status=0

> 
> virtio spec says:"The virtio network device is a virtual ethernet card", so 
> the
> default MTU value should be 1500 for virtio-net.
> 
Practically I have seen 1500 and highe mtu.
And this derivation is not good of what should be the default mtu as above 
errata exists.

And I see the code below why you need to work so hard to define a default value 
so that _MQ and _MTU can report default values.

There is really no need for this complexity and such a long commit message.

Can we please expose feature bits as-is and report config space field which are 
valid?

User space will be querying both.

> For MAC, the spec says:"If the VIRTIO_NET_F_MAC feature bit is set, the
> configuration space mac entry indicates the “physical” address of the
> network card, otherwise the driver would typically generate a random local
> MAC address." So there is no default MAC address if VIRTIO_NET_F_MAC
> not set.
> 
> This commits introduces functions vdpa_dev_net_mtu_config_fill() and
> vdpa_dev_net_mac_config_fill() to fill MTU and MAC.
> It also fixes vdpa_dev_net_mq_config_fill() to report correct MQ when
> _F_MQ is not present.
> 
Multiple changes in single patch are not good idea.
Its ok to split to smaller patches.

> + if ((features & BIT_ULL(VIRTIO_NET_F_MTU)) == 0)
> + val_u16 = 1500;
> + else
> + val_u16 = __virtio16_to_cpu(true, config->mtu);
> +
Need to work hard to find default values and that too turned out had errata.
There are more fields that doesn’t have default values.

There is no point in kernel doing this guess work, that user space can figure 
out of what is valid/invalid.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH 1/2] vDPA: allow userspace to query features of a vDPA device

2022-08-15 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Monday, August 15, 2022 9:49 PM
> 
> On 8/16/2022 2:15 AM, Si-Wei Liu wrote:
> >
> >
> > On 8/15/2022 2:26 AM, Zhu Lingshan wrote:
> >> This commit adds a new vDPA netlink attribution
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> features
> >> of vDPA devices through this new attr.
> >>
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >>   drivers/vdpa/vdpa.c   | 17 +
> >>   include/uapi/linux/vdpa.h |  3 +++
> >>   2 files changed, 16 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> c06c02704461..efb55a06e961 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -491,6 +491,8 @@ static int vdpa_mgmtdev_fill(const struct
> >> vdpa_mgmt_dev *mdev, struct sk_buff *m
> >>   err = -EMSGSIZE;
> >>   goto msg_err;
> >>   }
> >> +
> >> +    /* report features of a vDPA management device through
> >> VDPA_ATTR_DEV_SUPPORTED_FEATURES */
> >>   if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_SUPPORTED_FEATURES,
> >>     mdev->supported_features, VDPA_ATTR_PAD)) {
> >>   err = -EMSGSIZE;
> >> @@ -815,7 +817,7 @@ static int vdpa_dev_net_mq_config_fill(struct
> >> vdpa_device *vdev,
> >>   static int vdpa_dev_net_config_fill(struct vdpa_device *vdev,
> >> struct sk_buff *msg)
> >>   {
> >>   struct virtio_net_config config = {};
> >> -    u64 features;
> >> +    u64 features_device, features_driver;
> >>   u16 val_u16;
> >>     vdpa_get_config_unlocked(vdev, 0, , sizeof(config));
> >> @@ -832,12 +834,19 @@ static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *ms
> >>   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
> >>   return -EMSGSIZE;
> >>   -    features = vdev->config->get_driver_features(vdev);
> >> -    if (nla_put_u64_64bit(msg,
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES,
> >> features,
> >> +    features_driver = vdev->config->get_driver_features(vdev);
> >> +    if (nla_put_u64_64bit(msg,
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES,
> >> features_driver,
> >> +  VDPA_ATTR_PAD))
> >> +    return -EMSGSIZE;
> >> +
> >> +    features_device = vdev->config->get_device_features(vdev);
> >> +
> >> +    /* report features of a vDPA device through
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES */
> >> +    if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES, features_device,
> >>     VDPA_ATTR_PAD))
> >>   return -EMSGSIZE;
> >>   -    return vdpa_dev_net_mq_config_fill(vdev, msg, features,
> >> );
> >> +    return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
> >> );
> >>   }
> >>     static int
> >> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> >> index 25c55cab3d7c..d171b92ef522 100644
> >> --- a/include/uapi/linux/vdpa.h
> >> +++ b/include/uapi/linux/vdpa.h
> >> @@ -46,7 +46,10 @@ enum vdpa_attr {
> >>     VDPA_ATTR_DEV_NEGOTIATED_FEATURES,    /* u64 */
> >>   VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,    /* u32 */
> >> +    /* features of a vDPA management device */
Say comment as.
  /* Features of the vDPA management device matching the virtio feature 
bits 0 to 63 when queried on the mgmt. device.
*  When returned on the vdpa device, it indicates virtio feature bits 0 
to 63 of the vdpa device 
*/

> >>   VDPA_ATTR_DEV_SUPPORTED_FEATURES,    /* u64 */
> >> +    /* features of a vDPA device, e.g., /dev/vhost-vdpa0 */
> >> +    VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,    /* u64 */
> > Append to the end, please. Otherwise it breaks userspace ABI.
> OK, will fix it in V2

I have read Se-Wei comment in the v4.
However I disagree, we don’t need to continue the past mistake done with the 
naming.

Please add a comment that VDPA_ATTR_DEV_SUPPORTED_FEATURES like above about the 
exception.
We established that there is no race condition either.
So no need to add new UAPI for some past history.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-10 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, August 10, 2022 12:59 PM
> 
> On Wed, Aug 10, 2022 at 04:22:41PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, August 10, 2022 12:05 PM
> > >
> > > On Wed, Aug 10, 2022 at 04:00:08PM +, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin 
> > > > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > > > >
> > > > > > > Should we make this depend on the vq reset ability maybe?
> > > > > >
> > > > > > The advantage of this is to keep TX working. Or we can use
> > > > > > device reset as a fallback if there's no vq reset.
> > > > > >
> > > > > > Thanks
> > > > >
> > > > > Device reset is really annoying in that it loses all the state:
> > > > > rx filters etc etc.
> > > >
> > > > The elegant solution is let driver tell the new mtu to the device.
> > > > One way to do so is by using existing ctrl vq.
> > >
> > > That will need a new feature bit.
> > >
> > Yes. ctrl vq can tell what all configuration does it allow. :) Or you
> > prefer feature bit?
> 
> We did feature bits for this in the past.
> 
Ok. Will try to draft the update for future.

Gavin should repost the patch to address comments unrelated to this future bit 
anyway.
Right?

> > > > If merged buffer is done, and new mtu is > minimum posting size,
> > > > no need
> > > to undergo vq reset.
> > > > If merged buffer is not done, and buffer posted are smaller than
> > > > new mtu,
> > > undergo vq reset optionally.
> > >
> > > This can be done with or without sending mtu to device.
> > Yes, telling mtu to device helps device to optimize and adhere to the spec
> line " The device MUST NOT pass received packets that exceed mtu" in
> section 5.1.4.1.
> 
> Again, that line refers to \field{mtu} which is the max mtu supported,
> irrespective to anything driver does.
> 
> --
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-10 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, August 10, 2022 12:05 PM
> 
> On Wed, Aug 10, 2022 at 04:00:08PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, August 10, 2022 5:03 AM
> > > > >
> > > > > Should we make this depend on the vq reset ability maybe?
> > > >
> > > > The advantage of this is to keep TX working. Or we can use device
> > > > reset as a fallback if there's no vq reset.
> > > >
> > > > Thanks
> > >
> > > Device reset is really annoying in that it loses all the state:
> > > rx filters etc etc.
> >
> > The elegant solution is let driver tell the new mtu to the device.
> > One way to do so is by using existing ctrl vq.
> 
> That will need a new feature bit.
> 
Yes. ctrl vq can tell what all configuration does it allow. :)
Or you prefer feature bit?

> > If merged buffer is done, and new mtu is > minimum posting size, no need
> to undergo vq reset.
> > If merged buffer is not done, and buffer posted are smaller than new mtu,
> undergo vq reset optionally.
> 
> This can be done with or without sending mtu to device.
Yes, telling mtu to device helps device to optimize and adhere to the spec line 
" The device MUST NOT pass received packets that exceed mtu" in section 5.1.4.1.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-10 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, August 10, 2022 5:03 AM
> > >
> > > Should we make this depend on the vq reset ability maybe?
> >
> > The advantage of this is to keep TX working. Or we can use device
> > reset as a fallback if there's no vq reset.
> >
> > Thanks
> 
> Device reset is really annoying in that it loses all the state:
> rx filters etc etc.

The elegant solution is let driver tell the new mtu to the device.
One way to do so is by using existing ctrl vq.
If merged buffer is done, and new mtu is > minimum posting size, no need to 
undergo vq reset.
If merged buffer is not done, and buffer posted are smaller than new mtu, 
undergo vq reset optionally.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-09 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Tuesday, August 9, 2022 6:26 PM
> To: Parav Pandit 
> Cc: Si-Wei Liu ; Jason Wang
> ; Gavin Li ; Hemminger,
> Stephen ; davem
> ; virtualization  foundation.org>; Virtio-Dev ;
> jesse.brandeb...@intel.com; alexander.h.du...@intel.com;
> kubak...@wp.pl; sridhar.samudr...@intel.com; losewe...@gmail.com; Gavi
> Teitz 
> Subject: Re: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length 
> for
> big packets
> 
> On Tue, Aug 09, 2022 at 09:49:03PM +, Parav Pandit wrote:
> > > From: Michael S. Tsirkin 
> > > Sent: Tuesday, August 9, 2022 5:38 PM
> >
> > [..]
> > > > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > > > which case the receive buffer will be reduced to fit the 1500B
> > > > > payload size when mtu is lowered down to 1500 from 9000.
> > > > How? Driver reduced the mXu to 1500, say it is improved to post
> > > > buffers of
> > > 1500 bytes.
> > > >
> > > > Device doesn't know about it because mtu in config space is RO field.
> > > > Device keep dropping 9K packets because buffers posted are 1500
> bytes.
> > > > This is because device follows the spec " The device MUST NOT pass
> > > received packets that exceed mtu".
> > >
> > >
> > > The "mtu" here is the device config field, which is
> > >
> > > /* Default maximum transmit unit advice */
> > >
> >
> > It is the field from struct virtio_net_config.mtu. right?
> > This is RO field for driver.
> >
> > > there is no guarantee device will not get a bigger packet.
> > Right. That is what I also hinted.
> > Hence, allocating buffers worth upto mtu is safer.
> 
> yes
> 
> > When user overrides it, driver can be further optimized to honor such new
> value on rx buffer posting.
> 
> no, not without a feature bit promising device won't get wedged.
> 
I mean to say as_it_stands today, driver can decide to post smaller buffers 
with larger mtu.
Why device should be affected with it?
( I am not proposing such weird configuration but asking for sake of 
correctness).

> > > And there is no guarantee such a packet will be dropped as opposed
> > > to wedging the device if userspace insists on adding smaller buffers.
> > >
> > If user space insists on small buffers, so be it.
> 
> If previously things worked, the "so be it" is a regression and blaming users
> won't help us.
> 
I am not suggesting above.
This was Si-Wei's suggestion that somehow driver wants to post smaller buffers 
than the mtu because user knows what peer is doing.
So may be driver can be extended to give more weight on user config.

> > It only works when user exactly know what user is doing in the whole
> network.
> 
> If you want to claim this you need a new feature bit.
> 
Why is a new bit needed to tell device?
User is doing something its own config mismatching the buffers and mtu.
A solid use case hasn't emerged for this yet.

If user wants to modify the mtu, we should just make virtio_net_config.mtu as 
RW field using new feature bit.
Is that what you mean?
If so, yes, it makes things very neat where driver and device are aligned to 
each other, the way they are today.
Only limitation is that its one-way. = device tells to driver.

> > When user prefers to override the device RO field, device is in the dark and
> things work on best effort basis.
> 
> Dropping packets is best effort. Getting stuck forever isn't, that's a 
> quality of
> implementation issue.
>
Not sure, why things get stuck for ever. Maybe you have example to explain.
I am mostly missing something.
 
> > This must be a reasonably advance user who has good knowledge of its
> network topology etc.
> >
> > For such case, may be yes, driver should be further optimized.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-09 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Tuesday, August 9, 2022 5:38 PM

[..]
> > > I think virtio-net driver doesn't differentiate MTU and MRU, in
> > > which case the receive buffer will be reduced to fit the 1500B
> > > payload size when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> 
> 
> The "mtu" here is the device config field, which is
> 
> /* Default maximum transmit unit advice */
> 

It is the field from struct virtio_net_config.mtu. right?
This is RO field for driver.

> there is no guarantee device will not get a bigger packet.
Right. That is what I also hinted.
Hence, allocating buffers worth upto mtu is safer.
When user overrides it, driver can be further optimized to honor such new value 
on rx buffer posting.

> And there is no guarantee such a packet will be dropped as opposed to
> wedging the device if userspace insists on adding smaller buffers.
>
If user space insists on small buffers, so be it. It only works when user 
exactly know what user is doing in the whole network.
When user prefers to override the device RO field, device is in the dark and 
things work on best effort basis.
This must be a reasonably advance user who has good knowledge of its network 
topology etc.

For such case, may be yes, driver should be further optimized.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-09 Thread Parav Pandit via Virtualization
> From: Si-Wei Liu 
> Sent: Tuesday, August 9, 2022 4:33 PM
> 
> On 8/9/2022 12:18 PM, Parav Pandit wrote:
> >> From: Si-Wei Liu 
> >> Sent: Tuesday, August 9, 2022 3:09 PM
>  From: Si-Wei Liu 
>  Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
>  single patch nor this patch, but the context for the eventual goal
>  is to allow XDP on a MTU=9000 link when guest users intentionally
>  lower down MTU to 1500.
> >>> Which application benefit by having asymmetry by lowering mtu to
> >>> 1500
> >> to send packets but want to receive 9K packets?
> > Below details doesn’t answer the question of asymmetry. :)
> >
> >> I think virtio-net driver doesn't differentiate MTU and MRU, in which
> >> case the receive buffer will be reduced to fit the 1500B payload size
> >> when mtu is lowered down to 1500 from 9000.
> > How? Driver reduced the mXu to 1500, say it is improved to post buffers of
> 1500 bytes.
> For big_packet path, yes, we need improvement; for mergeable, it's
> adaptable to any incoming packet size so 1500 is what it is today.
> >
> > Device doesn't know about it because mtu in config space is RO field.
> > Device keep dropping 9K packets because buffers posted are 1500 bytes.
> > This is because device follows the spec " The device MUST NOT pass
> received packets that exceed mtu".
> Right, that's what it happens today on device side (i.e. vhost-net, btw
> mlx5 vdpa device seems to have a bug not pro-actively dropping packets that
> exceed the MTU size, causing guest panic in small packet path).
> >
> > So, I am lost what virtio net device user application is trying to achieve 
> > by
> sending smaller packets and dropping all receive packets.
> > (it doesn’t have any relation to mergeable or otherwise).
> 
> Usually, the use case I'm aware of would set the peer's MTU to 1500 (e.g. on
> a virtual network appliance), or it would rely on path mtu discovery to avoid
> packet drop across links.
Ok. Somehow the application knows the mtu to set on (all) peer(s) and hope for 
the best.
Understood.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V4 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-08-09 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Tuesday, August 9, 2022 3:36 PM

> > > After applying this commit, when MQ = 0, iproute2 output:
> > > $vdpa dev config show vdpa0
> > > vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false
> > > max_vq_pairs 1 mtu 1500
> > >
> > No. We do not want to diverge returning values of config space for fields
> which are not present as discussed in previous versions.
> > Please drop this patch.
> > Nack on this patch.
> 
> Wrt this patch as far as I'm concerned you guys are bikeshedding.
> 
> Parav generally don't send nacks please they are not helpful.

Ok. I explained the technical reasoning that we don't diverge in fields. All 
are linked to the respective feature bits uniformly.
This I explained repeatedly in almost v1 to v3.
And reporting 1 is mis-leading, because it says _MQ is not negotiated, how come 
this device tells its config space has max_qp = 1.

But if you want to proceed to diverge kernel on feature bits go ahead. It 
requires inspection feature but feature.
I just don't see how this can be well maintained.

Commit log doesn't even describe the weird use case that says "as user space, I 
do not want to read device feature bits and just want to read config space to 
decide...".
Odd..

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 3/6] vDPA: allow userspace to query features of a vDPA device

2022-08-09 Thread Parav Pandit via Virtualization
> From: Michael S. Tsirkin 
> Sent: Tuesday, August 9, 2022 3:24 PM
> 
> On Fri, Jul 22, 2022 at 07:53:06PM +0800, Zhu Lingshan wrote:
> > This commit adds a new vDPA netlink attribution
> > VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> features of
> > vDPA devices through this new attr.
> >
> > Signed-off-by: Zhu Lingshan 
> 
> 
> I think at least some discussion and documentation of this attribute versus
> VDPA_ATTR_DEV_SUPPORTED_FEATURES is called for.
> 
> Otherwise how do people know which one to use?
> 
> We can't send everyone to go read the lkml thread.

There is no race here. Commit log is missing example anyway.
I explained multiple times that this patch and/or its previous version cannot 
proceed in same state.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 3/6] vDPA: allow userspace to query features of a vDPA device

2022-08-09 Thread Parav Pandit via Virtualization
> From: Zhu, Lingshan 
> Sent: Wednesday, July 27, 2022 2:02 AM
> 
> 
> On 7/26/2022 7:06 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan 
> >> Sent: Tuesday, July 26, 2022 7:03 AM
> >>
> >> On 7/24/2022 11:21 PM, Parav Pandit wrote:
>  From: Zhu, Lingshan 
>  Sent: Saturday, July 23, 2022 7:24 AM
> 
> 
>  On 7/22/2022 9:12 PM, Parav Pandit wrote:
> >> From: Zhu Lingshan 
> >> Sent: Friday, July 22, 2022 7:53 AM
> >>
> >> This commit adds a new vDPA netlink attribution
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can
> >> query
>  features
> >> of vDPA devices through this new attr.
> >>
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >> drivers/vdpa/vdpa.c   | 13 +
> >> include/uapi/linux/vdpa.h |  1 +
> >> 2 files changed, 10 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> ebf2f363fbe7..9b0e39b2f022 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -815,7 +815,7 @@ static int
> vdpa_dev_net_mq_config_fill(struct
> >> vdpa_device *vdev,  static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *msg)  {
> >>struct virtio_net_config config = {};
> >> -  u64 features;
> >> +  u64 features_device, features_driver;
> >>u16 val_u16;
> >>
> >>vdpa_get_config_unlocked(vdev, 0, , sizeof(config));
> >> @@
> >> -
> >> 832,12 +832,17 @@ static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *ms
> >>if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU,
> val_u16))
> >>return -EMSGSIZE;
> >>
> >> -  features = vdev->config->get_driver_features(vdev);
> >> -  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features,
> >> +  features_driver = vdev->config->get_driver_features(vdev);
> >> +  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
> >> +VDPA_ATTR_PAD))
> >> +  return -EMSGSIZE;
> >> +
> >> +  features_device = vdev->config->get_device_features(vdev);
> >> +  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,
> >> +features_device,
> >>  VDPA_ATTR_PAD))
> >>return -EMSGSIZE;
> >>
> >> -  return vdpa_dev_net_mq_config_fill(vdev, msg, features,
> );
> >> +  return vdpa_dev_net_mq_config_fill(vdev, msg,
> features_driver,
> >> +);
> >> }
> >>
> >> static int
> >> diff --git a/include/uapi/linux/vdpa.h
> >> b/include/uapi/linux/vdpa.h index
> >> 25c55cab3d7c..39f1c3d7c112 100644
> >> --- a/include/uapi/linux/vdpa.h
> >> +++ b/include/uapi/linux/vdpa.h
> >> @@ -47,6 +47,7 @@ enum vdpa_attr {
> >>VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
> >>VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /*
> u32 */
> >>VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
> >> +  VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,  /*
> u64 */
> >>
> > I have answered in previous emails.
> > I disagree with the change.
> > Please reuse VDPA_ATTR_DEV_SUPPORTED_FEATURES.
>  I believe we have already discussed this before in the V3 thread.
>  I have told you that reusing this attr will lead to a new race condition.
> 
> >>> Returning attribute cannot lead to any race condition.
> >> Please refer to our discussion in the V3 series, I have explained if
> >> re-use this attr, it will be a multiple consumers and multiple
> >> produces model, it is a typical racing condition.
> > I read the emails with subject = " Re: [PATCH V3 3/6] vDPA: allow userspace
> to query features of a vDPA device"
> > I couldn’t find multiple consumers multiple producers working on same nla
> message.
> If this attr is reused, then there can be multiple iproute2 instances or other
> applications querying feature bits of the management device and the vDPA
> device simultaneously, and both kernel side management feature bits filler

> function and vDPA device feature bits filler function can write the NLA
> message at the same time. That's the multiple consumers and producers,
> and no locks
No. Each filling up happens in each process context. There is no race here.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-09 Thread Parav Pandit via Virtualization
> From: Si-Wei Liu 
> Sent: Tuesday, August 9, 2022 3:09 PM

> >> From: Si-Wei Liu 
> >> Sent: Tuesday, August 9, 2022 2:39 PM Currently it is not. Not a
> >> single patch nor this patch, but the context for the eventual goal is
> >> to allow XDP on a MTU=9000 link when guest users intentionally lower
> >> down MTU to 1500.
> > Which application benefit by having asymmetry by lowering mtu to 1500
> to send packets but want to receive 9K packets?

Below details doesn’t answer the question of asymmetry. :)

> I think virtio-net driver doesn't differentiate MTU and MRU, in which case
> the receive buffer will be reduced to fit the 1500B payload size when mtu is
> lowered down to 1500 from 9000. 
How? Driver reduced the mXu to 1500, say it is improved to post buffers of 1500 
bytes.

Device doesn't know about it because mtu in config space is RO field.
Device keep dropping 9K packets because buffers posted are 1500 bytes.
This is because device follows the spec " The device MUST NOT pass received 
packets that exceed mtu".

So, I am lost what virtio net device user application is trying to achieve by 
sending smaller packets and dropping all receive packets.
(it doesn’t have any relation to mergeable or otherwise).
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [virtio-dev] [PATCH] virtio-net: use mtu size as buffer length for big packets

2022-08-09 Thread Parav Pandit via Virtualization
> From: Si-Wei Liu 
> Sent: Tuesday, August 9, 2022 2:39 PM

> Currently it is not. Not a single patch nor this patch, but the context for 
> the
> eventual goal is to allow XDP on a MTU=9000 link when guest users
> intentionally lower down MTU to 1500.

Which application benefit by having asymmetry by lowering mtu to 1500 to send 
packets but want to receive 9K packets?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, July 26, 2022 10:53 PM
> 
> On 7/27/2022 10:17 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan 
> >> Sent: Tuesday, July 26, 2022 10:15 PM
> >>
> >> On 7/26/2022 11:56 PM, Parav Pandit wrote:
>  From: Zhu, Lingshan 
>  Sent: Tuesday, July 12, 2022 11:46 PM
> > When the user space which invokes netlink commands, detects that
> >> _MQ
>  is not supported, hence it takes max_queue_pair = 1 by itself.
>  I think the kernel module have all necessary information and it is
>  the only one which have precise information of a device, so it
>  should answer precisely than let the user space guess. The kernel
>  module should be reliable than stay silent, leave the question to
>  the user space
> >> tool.
> >>> Kernel is reliable. It doesn’t expose a config space field if the
> >>> field doesn’t
> >> exist regardless of field should have default or no default.
> >> so when you know it is one queue pair, you should answer one, not try
> >> to guess.
> >>> User space should not guess either. User space gets to see if _MQ
> >> present/not present. If _MQ present than get reliable data from kernel.
> >>> If _MQ not present, it means this device has one VQ pair.
> >> it is still a guess, right? And all user space tools implemented this
> >> feature need to guess
> > No. it is not a guess.
> > It is explicitly checking the _MQ feature and deriving the value.
> > The code you proposed will be present in the user space.
> > It will be uniform for _MQ and 10 other features that are present now and
> in the future.
> MQ and other features like RSS are different. If there is no _RSS_XX, there
> are no attributes like max_rss_key_size, and there is not a default value.
> But for MQ, we know it has to be 1 wihtout _MQ.
"we" = user space.
To keep the consistency among all the config space fields.

> > For feature X, kernel reports default and for feature Y, kernel skip
> reporting it, because there is no default. <- This is what we are trying to
> avoid here.
> Kernel reports one queue pair because there is actually one.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, July 26, 2022 10:15 PM
> 
> On 7/26/2022 11:56 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan 
> >> Sent: Tuesday, July 12, 2022 11:46 PM
> >>> When the user space which invokes netlink commands, detects that
> _MQ
> >> is not supported, hence it takes max_queue_pair = 1 by itself.
> >> I think the kernel module have all necessary information and it is
> >> the only one which have precise information of a device, so it should
> >> answer precisely than let the user space guess. The kernel module
> >> should be reliable than stay silent, leave the question to the user space
> tool.
> > Kernel is reliable. It doesn’t expose a config space field if the field 
> > doesn’t
> exist regardless of field should have default or no default.
> so when you know it is one queue pair, you should answer one, not try to
> guess.
> > User space should not guess either. User space gets to see if _MQ
> present/not present. If _MQ present than get reliable data from kernel.
> > If _MQ not present, it means this device has one VQ pair.
> it is still a guess, right? And all user space tools implemented this feature
> need to guess
No. it is not a guess.
It is explicitly checking the _MQ feature and deriving the value.
The code you proposed will be present in the user space.
It will be uniform for _MQ and 10 other features that are present now and in 
the future.

For feature X, kernel reports default and for feature Y, kernel skip reporting 
it, because there is no default. <- This is what we are trying to avoid here.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Tuesday, July 26, 2022 3:49 PM
> 
> On Tue, Jul 26, 2022 at 03:54:06PM +, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin 
> > > Sent: Wednesday, July 13, 2022 1:27 AM
> > >
> > > On Fri, Jul 01, 2022 at 10:07:59PM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Zhu Lingshan 
> > > > > Sent: Friday, July 1, 2022 9:28 AM If VIRTIO_NET_F_MQ == 0, the
> > > > > virtio device should have one queue pair, so when userspace
> > > > > querying queue pair numbers, it should return mq=1 than zero.
> > > > >
> > > > > Function vdpa_dev_net_config_fill() fills the attributions of
> > > > > the vDPA devices, so that it should call
> > > > > vdpa_dev_net_mq_config_fill() so the parameter in
> > > > > vdpa_dev_net_mq_config_fill() should be feature_device than
> > > > > feature_driver for the vDPA devices themselves
> > > > >
> > > > > Before this change, when MQ = 0, iproute2 output:
> > > > > $vdpa dev config show vdpa0
> > > > > vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false
> > > > > max_vq_pairs 0 mtu 1500
> > > > >
> > > > The fix belongs to user space.
> > > > When a feature bit _MQ is not negotiated, vdpa kernel space will
> > > > not add
> > > attribute VDPA_ATTR_DEV_NET_CFG_MAX_VQP.
> > > > When such attribute is not returned by kernel, max_vq_pairs should
> > > > not be
> > > shown by the iproute2.
> > > >
> > > > We have many config space fields that depend on the feature bits
> > > > and
> > > some of them do not have any defaults.
> > > > To keep consistency of existence of config space fields among all,
> > > > we don't
> > > want to show default like below.
> > > >
> > > > Please fix the iproute2 to not print max_vq_pairs when it is not
> > > > returned by
> > > the kernel.
> > >
> > > Parav I read the discussion and don't get your argument. From
> > > driver's POV _MQ with 1 VQ pair and !_MQ are exactly functionally
> equivalent.
> > But we are talking from user POV here.
> 
> From spec POV there's just driver and device, user would be part of driver 
> here.
User space application still need to inspect the _MQ bit to
> 
> > >
> > > It's true that iproute probably needs to be fixed too, to handle old 
> > > kernels.
> > > But iproute is not the only userspace, why not make it's life easier
> > > by fixing the kernel?
> > Because it cannot be fixed for other config space fields which are control 
> > by
> feature bits those do not have any defaults.
> > So better to treat all in same way from user POV.
> 
> Consistency is good for sure. What are these other fields though?

> Can you give examples so I understand please?

speed only exists if VIRTIO_NET_F_SPEED_DUPLEX.
rss_max_key_size exists only if VIRTIO_NET_F_RSS exists.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization

> From: Michael S. Tsirkin 
> Sent: Tuesday, July 26, 2022 3:52 PM
> 
> On Tue, Jul 26, 2022 at 03:56:32PM +, Parav Pandit wrote:
> >
> > > From: Zhu, Lingshan 
> > > Sent: Tuesday, July 12, 2022 11:46 PM
> > > > When the user space which invokes netlink commands, detects that
> > > > _MQ
> > > is not supported, hence it takes max_queue_pair = 1 by itself.
> > > I think the kernel module have all necessary information and it is
> > > the only one which have precise information of a device, so it
> > > should answer precisely than let the user space guess. The kernel
> > > module should be reliable than stay silent, leave the question to the user
> space tool.
> > Kernel is reliable. It doesn’t expose a config space field if the field 
> > doesn’t exist
> regardless of field should have default or no default.
> > User space should not guess either. User space gets to see if _MQ 
> > present/not
> present. If _MQ present than get reliable data from kernel.
> > If _MQ not present, it means this device has one VQ pair.
> 
> Yes that's fine. And if we just didn't return anything without MQ that would 
> be
> fine.  But IIUC netlink reports the # of pairs regardless, it just puts 0 
> there.
I read it differently at [1] which checks for the MQ feature bit.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/vdpa/vdpa.c#L825

> 
> --
> MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, July 12, 2022 11:46 PM
> > When the user space which invokes netlink commands, detects that _MQ
> is not supported, hence it takes max_queue_pair = 1 by itself.
> I think the kernel module have all necessary information and it is the only
> one which have precise information of a device, so it should answer precisely
> than let the user space guess. The kernel module should be reliable than stay
> silent, leave the question to the user space tool.
Kernel is reliable. It doesn’t expose a config space field if the field doesn’t 
exist regardless of field should have default or no default.
User space should not guess either. User space gets to see if _MQ present/not 
present. If _MQ present than get reliable data from kernel.
If _MQ not present, it means this device has one VQ pair.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-26 Thread Parav Pandit via Virtualization


> From: Michael S. Tsirkin 
> Sent: Wednesday, July 13, 2022 1:27 AM
> 
> On Fri, Jul 01, 2022 at 10:07:59PM +, Parav Pandit wrote:
> >
> >
> > > From: Zhu Lingshan 
> > > Sent: Friday, July 1, 2022 9:28 AM
> > > If VIRTIO_NET_F_MQ == 0, the virtio device should have one queue
> > > pair, so when userspace querying queue pair numbers, it should
> > > return mq=1 than zero.
> > >
> > > Function vdpa_dev_net_config_fill() fills the attributions of the
> > > vDPA devices, so that it should call vdpa_dev_net_mq_config_fill()
> > > so the parameter in vdpa_dev_net_mq_config_fill() should be
> > > feature_device than feature_driver for the vDPA devices themselves
> > >
> > > Before this change, when MQ = 0, iproute2 output:
> > > $vdpa dev config show vdpa0
> > > vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false
> > > max_vq_pairs 0 mtu 1500
> > >
> > The fix belongs to user space.
> > When a feature bit _MQ is not negotiated, vdpa kernel space will not add
> attribute VDPA_ATTR_DEV_NET_CFG_MAX_VQP.
> > When such attribute is not returned by kernel, max_vq_pairs should not be
> shown by the iproute2.
> >
> > We have many config space fields that depend on the feature bits and
> some of them do not have any defaults.
> > To keep consistency of existence of config space fields among all, we don't
> want to show default like below.
> >
> > Please fix the iproute2 to not print max_vq_pairs when it is not returned by
> the kernel.
> 
> Parav I read the discussion and don't get your argument. From driver's POV
> _MQ with 1 VQ pair and !_MQ are exactly functionally equivalent.
But we are talking from user POV here.

> 
> It's true that iproute probably needs to be fixed too, to handle old kernels.
> But iproute is not the only userspace, why not make it's life easier by fixing
> the kernel?
Because it cannot be fixed for other config space fields which are control by 
feature bits those do not have any defaults.
So better to treat all in same way from user POV.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-26 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, July 26, 2022 7:03 AM
> 
> On 7/24/2022 11:21 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan 
> >> Sent: Saturday, July 23, 2022 7:24 AM
> >>
> >>
> >> On 7/22/2022 9:12 PM, Parav Pandit wrote:
>  From: Zhu Lingshan 
>  Sent: Friday, July 22, 2022 7:53 AM
> 
>  This commit adds a new vDPA netlink attribution
>  VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can
> query
> >> features
>  of vDPA devices through this new attr.
> 
>  Signed-off-by: Zhu Lingshan 
>  ---
> drivers/vdpa/vdpa.c   | 13 +
> include/uapi/linux/vdpa.h |  1 +
> 2 files changed, 10 insertions(+), 4 deletions(-)
> 
>  diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
>  ebf2f363fbe7..9b0e39b2f022 100644
>  --- a/drivers/vdpa/vdpa.c
>  +++ b/drivers/vdpa/vdpa.c
>  @@ -815,7 +815,7 @@ static int vdpa_dev_net_mq_config_fill(struct
>  vdpa_device *vdev,  static int vdpa_dev_net_config_fill(struct
>  vdpa_device *vdev, struct sk_buff *msg)  {
>   struct virtio_net_config config = {};
>  -u64 features;
>  +u64 features_device, features_driver;
>   u16 val_u16;
> 
>   vdpa_get_config_unlocked(vdev, 0, , sizeof(config)); @@
>  -
>  832,12 +832,17 @@ static int vdpa_dev_net_config_fill(struct
>  vdpa_device *vdev, struct sk_buff *ms
>   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>   return -EMSGSIZE;
> 
>  -features = vdev->config->get_driver_features(vdev);
>  -if (nla_put_u64_64bit(msg,
>  VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features,
>  +features_driver = vdev->config->get_driver_features(vdev);
>  +if (nla_put_u64_64bit(msg,
>  VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
>  +  VDPA_ATTR_PAD))
>  +return -EMSGSIZE;
>  +
>  +features_device = vdev->config->get_device_features(vdev);
>  +if (nla_put_u64_64bit(msg,
>  VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,
>  +features_device,
> VDPA_ATTR_PAD))
>   return -EMSGSIZE;
> 
>  -return vdpa_dev_net_mq_config_fill(vdev, msg, features, 
>  );
>  +return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
>  +);
> }
> 
> static int
>  diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
>  index
>  25c55cab3d7c..39f1c3d7c112 100644
>  --- a/include/uapi/linux/vdpa.h
>  +++ b/include/uapi/linux/vdpa.h
>  @@ -47,6 +47,7 @@ enum vdpa_attr {
>   VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
>   VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /* u32 */
>   VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
>  +VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,  /* u64 */
> 
> >>> I have answered in previous emails.
> >>> I disagree with the change.
> >>> Please reuse VDPA_ATTR_DEV_SUPPORTED_FEATURES.
> >> I believe we have already discussed this before in the V3 thread.
> >> I have told you that reusing this attr will lead to a new race condition.
> >>
> > Returning attribute cannot lead to any race condition.
> Please refer to our discussion in the V3 series, I have explained if re-use 
> this
> attr, it will be a multiple consumers and multiple produces model, it is a
> typical racing condition.

I read the emails with subject = " Re: [PATCH V3 3/6] vDPA: allow userspace to 
query features of a vDPA device"
I couldn’t find multiple consumers multiple producers working on same nla 
message.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-24 Thread Parav Pandit via Virtualization


> From: Zhu, Lingshan 
> Sent: Saturday, July 23, 2022 7:27 AM
> 
> On 7/6/2022 10:25 AM, Zhu, Lingshan wrote:
> >
> >
> > On 7/6/2022 1:01 AM, Parav Pandit wrote:
> >>> From: Zhu, Lingshan 
> >>> Sent: Tuesday, July 5, 2022 12:56 PM
>  Both can be queried simultaneously. Each will return their own
>  feature bits
> >>> using same attribute.
>  It wont lead to the race.
> >>> How? It is just a piece of memory, [attr], do you see locks in
> >>> nla_put_u64_64bit()? It is a typical race condition, data accessed
> >>> by multiple producers / consumers.
> >> No. There is no race condition in here.
> >> And new attribute enum by no means avoid any race.
> >>
> >> Data put using nla_put cannot be accessed until they are transferred.
> > How this is guaranteed? Do you see errors when calling nla_put_xxx()
> > twice?
> Parav, did you miss this?
It is not called twice and reading attribute and packing in nla message is not 
race condition.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-24 Thread Parav Pandit via Virtualization



> From: Zhu, Lingshan 
> Sent: Saturday, July 23, 2022 7:24 AM
> 
> 
> On 7/22/2022 9:12 PM, Parav Pandit wrote:
> >> From: Zhu Lingshan 
> >> Sent: Friday, July 22, 2022 7:53 AM
> >>
> >> This commit adds a new vDPA netlink attribution
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> features
> >> of vDPA devices through this new attr.
> >>
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >>   drivers/vdpa/vdpa.c   | 13 +
> >>   include/uapi/linux/vdpa.h |  1 +
> >>   2 files changed, 10 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> ebf2f363fbe7..9b0e39b2f022 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -815,7 +815,7 @@ static int vdpa_dev_net_mq_config_fill(struct
> >> vdpa_device *vdev,  static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *msg)  {
> >>struct virtio_net_config config = {};
> >> -  u64 features;
> >> +  u64 features_device, features_driver;
> >>u16 val_u16;
> >>
> >>vdpa_get_config_unlocked(vdev, 0, , sizeof(config)); @@ -
> >> 832,12 +832,17 @@ static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *ms
> >>if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
> >>return -EMSGSIZE;
> >>
> >> -  features = vdev->config->get_driver_features(vdev);
> >> -  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features,
> >> +  features_driver = vdev->config->get_driver_features(vdev);
> >> +  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
> >> +VDPA_ATTR_PAD))
> >> +  return -EMSGSIZE;
> >> +
> >> +  features_device = vdev->config->get_device_features(vdev);
> >> +  if (nla_put_u64_64bit(msg,
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,
> >> +features_device,
> >>  VDPA_ATTR_PAD))
> >>return -EMSGSIZE;
> >>
> >> -  return vdpa_dev_net_mq_config_fill(vdev, msg, features, );
> >> +  return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
> >> +);
> >>   }
> >>
> >>   static int
> >> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h
> >> index
> >> 25c55cab3d7c..39f1c3d7c112 100644
> >> --- a/include/uapi/linux/vdpa.h
> >> +++ b/include/uapi/linux/vdpa.h
> >> @@ -47,6 +47,7 @@ enum vdpa_attr {
> >>VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
> >>VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /* u32 */
> >>VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
> >> +  VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,  /* u64 */
> >>
> > I have answered in previous emails.
> > I disagree with the change.
> > Please reuse VDPA_ATTR_DEV_SUPPORTED_FEATURES.
> I believe we have already discussed this before in the V3 thread.
> I have told you that reusing this attr will lead to a new race condition.
>
Returning attribute cannot lead to any race condition.

 
> Pleas refer to the previous thread, and you did not answer my questions in
> that thread.
> 
> Thanks,
> Zhu Lingshan
> >
> > MST,
> > I nack this patch.
> > As mentioned in the previous versions, also it is missing the example
> output in the commit log.
> > Please include example output.
> >
> >>VDPA_ATTR_DEV_QUEUE_INDEX,  /* u32 */
> >>VDPA_ATTR_DEV_VENDOR_ATTR_NAME, /* string */
> >> --
> >> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-22 Thread Parav Pandit via Virtualization



> From: Zhu Lingshan 
> Sent: Friday, July 22, 2022 7:53 AM
> 
> If VIRTIO_NET_F_MQ == 0, the virtio device should have one queue pair, so
> when userspace querying queue pair numbers, it should return mq=1 than
> zero.
> 
> Function vdpa_dev_net_config_fill() fills the attributions of the vDPA
> devices, so that it should call vdpa_dev_net_mq_config_fill() so the
> parameter in vdpa_dev_net_mq_config_fill() should be feature_device than
> feature_driver for the vDPA devices themselves
> 
> Before this change, when MQ = 0, iproute2 output:
> $vdpa dev config show vdpa0
> vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 0
> mtu 1500
> 
> After applying this commit, when MQ = 0, iproute2 output:
> $vdpa dev config show vdpa0
> vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs 1
> mtu 1500
> 
No. We do not want to diverge returning values of config space for fields which 
are not present as discussed in previous versions.
Please drop this patch.
Nack on this patch.

> Signed-off-by: Zhu Lingshan 
> ---
>  drivers/vdpa/vdpa.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> d76b22b2f7ae..846dd37f3549 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -806,9 +806,10 @@ static int vdpa_dev_net_mq_config_fill(struct
> vdpa_device *vdev,
>   u16 val_u16;
> 
>   if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
> - return 0;
> + val_u16 = 1;
> + else
> + val_u16 = __virtio16_to_cpu(true, config-
> >max_virtqueue_pairs);
> 
> - val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
>   return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> val_u16);  }
> 
> @@ -842,7 +843,7 @@ static int vdpa_dev_net_config_fill(struct
> vdpa_device *vdev, struct sk_buff *ms
> VDPA_ATTR_PAD))
>   return -EMSGSIZE;
> 
> - return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
> );
> + return vdpa_dev_net_mq_config_fill(vdev, msg, features_device,
> +);
>  }
> 
>  static int
> --
> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V4 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-22 Thread Parav Pandit via Virtualization


> From: Zhu Lingshan 
> Sent: Friday, July 22, 2022 7:53 AM
> 
> This commit adds a new vDPA netlink attribution
> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> features of vDPA devices through this new attr.
> 
> Signed-off-by: Zhu Lingshan 
> ---
>  drivers/vdpa/vdpa.c   | 13 +
>  include/uapi/linux/vdpa.h |  1 +
>  2 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> ebf2f363fbe7..9b0e39b2f022 100644
> --- a/drivers/vdpa/vdpa.c
> +++ b/drivers/vdpa/vdpa.c
> @@ -815,7 +815,7 @@ static int vdpa_dev_net_mq_config_fill(struct
> vdpa_device *vdev,  static int vdpa_dev_net_config_fill(struct vdpa_device
> *vdev, struct sk_buff *msg)  {
>   struct virtio_net_config config = {};
> - u64 features;
> + u64 features_device, features_driver;
>   u16 val_u16;
> 
>   vdpa_get_config_unlocked(vdev, 0, , sizeof(config)); @@ -
> 832,12 +832,17 @@ static int vdpa_dev_net_config_fill(struct vdpa_device
> *vdev, struct sk_buff *ms
>   if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
>   return -EMSGSIZE;
> 
> - features = vdev->config->get_driver_features(vdev);
> - if (nla_put_u64_64bit(msg,
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features,
> + features_driver = vdev->config->get_driver_features(vdev);
> + if (nla_put_u64_64bit(msg,
> VDPA_ATTR_DEV_NEGOTIATED_FEATURES, features_driver,
> +   VDPA_ATTR_PAD))
> + return -EMSGSIZE;
> +
> + features_device = vdev->config->get_device_features(vdev);
> + if (nla_put_u64_64bit(msg,
> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,
> +features_device,
> VDPA_ATTR_PAD))
>   return -EMSGSIZE;
> 
> - return vdpa_dev_net_mq_config_fill(vdev, msg, features, );
> + return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
> +);
>  }
> 
>  static int
> diff --git a/include/uapi/linux/vdpa.h b/include/uapi/linux/vdpa.h index
> 25c55cab3d7c..39f1c3d7c112 100644
> --- a/include/uapi/linux/vdpa.h
> +++ b/include/uapi/linux/vdpa.h
> @@ -47,6 +47,7 @@ enum vdpa_attr {
>   VDPA_ATTR_DEV_NEGOTIATED_FEATURES,  /* u64 */
>   VDPA_ATTR_DEV_MGMTDEV_MAX_VQS,  /* u32 */
>   VDPA_ATTR_DEV_SUPPORTED_FEATURES,   /* u64 */
> + VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES,  /* u64 */
> 
I have answered in previous emails.
I disagree with the change.
Please reuse VDPA_ATTR_DEV_SUPPORTED_FEATURES.

MST,
I nack this patch.
As mentioned in the previous versions, also it is missing the example output in 
the commit log.
Please include example output.

>   VDPA_ATTR_DEV_QUEUE_INDEX,  /* u32 */
>   VDPA_ATTR_DEV_VENDOR_ATTR_NAME, /* string */
> --
> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-12 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Tuesday, July 12, 2022 11:03 PM
> 
> 
> On 7/13/2022 12:48 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan 
> >> Sent: Sunday, July 10, 2022 10:30 PM
> >>> Showing max_vq_pairs of 1 even when _MQ is not negotiated,
> >>> incorrectly
> >> says that max_vq_pairs is exposed to the guest, but it is not offered.
> >>> So, please fix the iproute2 to not print max_vq_pairs when it is not
> >> returned by the kernel.
> >> iproute2 can report whether there is MQ feature in the device /
> >> driver feature bits.
> >> I think iproute2 only queries the number of max queues here.
> >>
> >> max_vq_pairs shows how many queue pairs there, this attribute's
> >> existence does not depend on MQ, if no MQ, there are still one queue
> >> pair, so just show one.
> > This netlink attribute's existence is depending on the _MQ feature bit
> existence.
> why? If no MQ, then no queues?
> > We can break that and report the value, but if we break that there are
> many other config space bits who doesn’t have good default like
> max_vq_pairs.
> max_vq_paris may not have a default value, but we know if there is no MQ,
> a virtio-net still have one queue pair to be functional.
> > There is ambiguity for user space what to do with it and so in the kernel
> space..
> > Instead of dealing with them differently in kernel, at present we attach
> each netlink attribute to a respective feature bit wherever applicable.
> > And code in kernel and user space is uniform to handle them.
> I get your point, but you see, by "max_vq_pairs", the user space tool is
> asking how many queue pairs there, it is not asking whether the device have
> MQ.
> Even no _MQ, we still need to tell the users that there are one queue pair, or
> it is not a functional virtio-net, we should detect this error earlier in the
> device initialization.
It is not an error. :)

When the user space which invokes netlink commands, detects that _MQ is not 
supported, hence it takes max_queue_pair = 1 by itself.

> 
> I think it is still uniform, it there is _MQ, we return cfg.max_queue_pair, 
> if no
> _MQ, return 1, still by netlink.
Better to do that in user space because we cannot do same for other config 
fields.

> 
> Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-12 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Sunday, July 10, 2022 10:30 PM

> > Showing max_vq_pairs of 1 even when _MQ is not negotiated, incorrectly
> says that max_vq_pairs is exposed to the guest, but it is not offered.
> >
> > So, please fix the iproute2 to not print max_vq_pairs when it is not
> returned by the kernel.
> iproute2 can report whether there is MQ feature in the device / driver
> feature bits.
> I think iproute2 only queries the number of max queues here.
> 
> max_vq_pairs shows how many queue pairs there, this attribute's existence
> does not depend on MQ, if no MQ, there are still one queue pair, so just
> show one.
This netlink attribute's existence is depending on the _MQ feature bit 
existence.
We can break that and report the value, but if we break that there are many 
other config space bits who doesn’t have good default like max_vq_pairs.
There is ambiguity for user space what to do with it and so in the kernel 
space..
Instead of dealing with them differently in kernel, at present we attach each 
netlink attribute to a respective feature bit wherever applicable.
And code in kernel and user space is uniform to handle them.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 5/6] vDPA: answer num of queue pairs = 1 to userspace when VIRTIO_NET_F_MQ == 0

2022-07-08 Thread Parav Pandit via Virtualization

> From: Zhu, Lingshan 
> Sent: Friday, July 8, 2022 2:21 AM
> 
> 
> On 7/2/2022 6:07 AM, Parav Pandit wrote:
> >
> >> From: Zhu Lingshan 
> >> Sent: Friday, July 1, 2022 9:28 AM
> >> If VIRTIO_NET_F_MQ == 0, the virtio device should have one queue
> >> pair, so when userspace querying queue pair numbers, it should return
> >> mq=1 than zero.
> >>
> >> Function vdpa_dev_net_config_fill() fills the attributions of the
> >> vDPA devices, so that it should call vdpa_dev_net_mq_config_fill() so
> >> the parameter in vdpa_dev_net_mq_config_fill() should be
> >> feature_device than feature_driver for the vDPA devices themselves
> >>
> >> Before this change, when MQ = 0, iproute2 output:
> >> $vdpa dev config show vdpa0
> >> vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs
> >> 0 mtu 1500
> >>
> > The fix belongs to user space.
> > When a feature bit _MQ is not negotiated, vdpa kernel space will not add
> attribute VDPA_ATTR_DEV_NET_CFG_MAX_VQP.
> > When such attribute is not returned by kernel, max_vq_pairs should not
> be shown by the iproute2.
> I think userspace tool does not need to care whether MQ is offered or
> negotiated, it just needs to read the number of queues there, so if no MQ, it
> is not "not any queues", there are still 1 queue pair to be a virtio-net 
> device,
> means two queues.
> 
> If not, how can you tell the user there are only 2 queues? The end users may
> don't know this is default. They may misunderstand this as an error or
> defects.
> >
When max_vq_pairs is not shown, it means that device didn’t expose MAX_VQ_PAIRS 
attribute to its guest users.
(Because _MQ was not negotiated).
It is not error or defect. 
It precisely shows what is exposed.

User space will care when it wants to turn off/on _MQ feature bits and MAX_QP 
values.

Showing max_vq_pairs of 1 even when _MQ is not negotiated, incorrectly says 
that max_vq_pairs is exposed to the guest, but it is not offered.

So, please fix the iproute2 to not print max_vq_pairs when it is not returned 
by the kernel.

> > We have many config space fields that depend on the feature bits and
> some of them do not have any defaults.
> > To keep consistency of existence of config space fields among all, we don't
> want to show default like below.
> >
> > Please fix the iproute2 to not print max_vq_pairs when it is not returned
> by the kernel.
> >
> >> After applying this commit, when MQ = 0, iproute2 output:
> >> $vdpa dev config show vdpa0
> >> vdpa0: mac 00:e8:ca:11:be:05 link up link_announce false max_vq_pairs
> >> 1 mtu 1500
> >>
> >> Fixes: a64917bc2e9b (vdpa: Provide interface to read driver features)
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >>   drivers/vdpa/vdpa.c | 7 ---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> d76b22b2f7ae..846dd37f3549 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -806,9 +806,10 @@ static int vdpa_dev_net_mq_config_fill(struct
> >> vdpa_device *vdev,
> >>u16 val_u16;
> >>
> >>if ((features & BIT_ULL(VIRTIO_NET_F_MQ)) == 0)
> >> -  return 0;
> >> +  val_u16 = 1;
> >> +  else
> >> +  val_u16 = __virtio16_to_cpu(true, config-
> >>> max_virtqueue_pairs);
> >> -  val_u16 = le16_to_cpu(config->max_virtqueue_pairs);
> >>return nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> val_u16);
> >> }
> >>
> >> @@ -842,7 +843,7 @@ static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *ms
> >>  VDPA_ATTR_PAD))
> >>return -EMSGSIZE;
> >>
> >> -  return vdpa_dev_net_mq_config_fill(vdev, msg, features_driver,
> >> );
> >> +  return vdpa_dev_net_mq_config_fill(vdev, msg, features_device,
> >> +);
> >>   }
> >>
> >>   static int
> >> --
> >> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH V3 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-08 Thread Parav Pandit via Virtualization



> From: Zhu, Lingshan 
> Sent: Friday, July 8, 2022 2:16 AM
> 
> On 7/2/2022 6:02 AM, Parav Pandit wrote:
> >
> >> From: Zhu Lingshan 
> >> Sent: Friday, July 1, 2022 9:28 AM
> >>
> >> This commit adds a new vDPA netlink attribution
> >> VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES. Userspace can query
> features
> >> of vDPA devices through this new attr.
> >>
> >> Fixes: a64917bc2e9b vdpa: (Provide interface to read driver feature)
> > Missing the "" in the line.
> will fix
> > I reviewed the patches again.
> >
> > However, this is not the fix.
> > A fix cannot add a new UAPI.
> I think we have discussed this, on why we can not re-name the existing
> wrong named attr, and why we can not re-use the attr.
> So are you suggesting remove this fixes tag?
> And why a fix can not add a new uAPI?

Because a new attribute cannot fix any existing attribute.

What is done in the patch is show current attributes of the vdpa device (which 
sometimes contains a different value than the mgmt. device).
So it is a new functionality that cannot have fixes tag.

> >
> > Code is already considering negotiated driver features to return the device
> config space.
> > Hence it is fine.
> No, the spec says:
> The device MUST allow reading of any device-specific configuration field
> before FEATURES_OK is set by the driver.
> >
> > This patch intents to provide device features to user space.
> > First what vdpa device are capable of, are already returned by features
> attribute on the management device.
> > This is done in commit [1].
> we have discussed this in another thread, vDPA device feature bits can be
> different from the management device feature bits.
> >
Yes. 
> > The only reason to have it is, when one management device indicates that
> feature is supported, but device may end up not supporting this feature if
> such feature is shared with other devices on same physical device.
> > For example all VFs may not be symmetric after large number of them are
> in use. In such case features bit of management device can differ (more
> features) than the vdpa device of this VF.
> > Hence, showing on the device is useful.
> >
> > As mentioned before in V2, commit [1] has wrongly named the attribute to
> VDPA_ATTR_DEV_SUPPORTED_FEATURES.
> > It should have been,
> VDPA_ATTR_DEV_MGMTDEV_SUPPORTED_FEATURES.
> > Because it is in UAPI, and since we don't want to break compilation of
> > iproute2, It cannot be renamed anymore.
> Yes, rename it will break current uAPI, so I can not rename it.
> >
I know, which is why this patch needs to do following listed changes described 
in previous email.

> > Given that, we do not want to start trend of naming device attributes with
> additional _VDPA_ to it as done in this patch.
> > Error in commit [1] was exception.
> >
> > Hence, please reuse VDPA_ATTR_DEV_SUPPORTED_FEATURES to return
> for device features too.
> >
> > Secondly, you need output example for showing device features in the
> commit log.
> >
> > 3rd, please drop the fixes tag as new capability is not a fix.
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V3 6/6] vDPA: fix 'cast to restricted le16' warnings in vdpa.c

2022-07-08 Thread Parav Pandit via Virtualization


> From: Zhu, Lingshan 
> Sent: Friday, July 8, 2022 2:25 AM
> 
> 
> On 7/2/2022 6:18 AM, Parav Pandit wrote:
> >> From: Zhu Lingshan 
> >> Sent: Friday, July 1, 2022 9:28 AM
> >>
> >> This commit fixes spars warnings: cast to restricted __le16 in
> >> function
> >> vdpa_dev_net_config_fill() and
> >> vdpa_fill_stats_rec()
> >>
> > Missing fixes tag.
> I am not sure whether this deserve a fix tag, I will look into it.
> >
> > But I fail to understand the warning.
> > config.status is le16, and API used is to convert le16 to cpu.
> > What is the warning about, can you please explain?


I see it. Its not le16, its __virtio16.
Please add fixes tag.
With that Reviewed-by: Parav Pandit 

> The warnings are:
> drivers/vdpa/vdpa.c:828:19: warning: cast to restricted __le16
> drivers/vdpa/vdpa.c:828:19: warning: cast from restricted __virtio16
> >
> >> Signed-off-by: Zhu Lingshan 
> >> ---
> >>   drivers/vdpa/vdpa.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index
> >> 846dd37f3549..ed49fe46a79e 100644
> >> --- a/drivers/vdpa/vdpa.c
> >> +++ b/drivers/vdpa/vdpa.c
> >> @@ -825,11 +825,11 @@ static int vdpa_dev_net_config_fill(struct
> >> vdpa_device *vdev, struct sk_buff *ms
> >>config.mac))
> >>return -EMSGSIZE;
> >>
> >> -  val_u16 = le16_to_cpu(config.status);
> >> +  val_u16 = __virtio16_to_cpu(true, config.status);
> >>if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16))
> >>return -EMSGSIZE;
> >>
> >> -  val_u16 = le16_to_cpu(config.mtu);
> >> +  val_u16 = __virtio16_to_cpu(true, config.mtu);
> >>if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MTU, val_u16))
> >>return -EMSGSIZE;
> >>
> >> @@ -911,7 +911,7 @@ static int vdpa_fill_stats_rec(struct vdpa_device
> >> *vdev, struct sk_buff *msg,
> >>}
> >>vdpa_get_config_unlocked(vdev, 0, , sizeof(config));
> >>
> >> -  max_vqp = le16_to_cpu(config.max_virtqueue_pairs);
> >> +  max_vqp = __virtio16_to_cpu(true, config.max_virtqueue_pairs);
> >>if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_CFG_MAX_VQP,
> >> max_vqp))
> >>return -EMSGSIZE;
> >>
> >> --
> >> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH V3 3/6] vDPA: allow userspace to query features of a vDPA device

2022-07-05 Thread Parav Pandit via Virtualization


> From: Zhu, Lingshan 
> Sent: Tuesday, July 5, 2022 10:25 PM
> 1. When migrate the VM to a node which has a more resourceful device. If
> the source side device does not have MQ, RSS or TSO feature, the vDPA
> device assigned to the VM does not have MQ, RSS or TSO as well. When
> migrating to a node which has a device with MQ, RSS or TSO, to provide a
> consistent network device to the guest, to be transparent to the guest, we
> need to mask out MQ, RSS or TSO in the vDPA device when provisioning.
> This is an example that management device may have different feature bits
> than the vDPA device.
Yes. Right. 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   >