Re: [PATCH v3] vringh: IOMEM support

2023-04-25 Thread Jason Wang
On Tue, Apr 25, 2023 at 6:23 PM Shunsuke Mie  wrote:
>
> Introduce a new memory accessor for vringh. It is able to use vringh to
> virtio rings located on io-memory region.

Is there a user for this? It would be better if you can describe the
use cases for this. Maybe you can post the user or at least a link to
the git as a reference.

>
> Signed-off-by: Shunsuke Mie 
> ---
>
> Changes from v2: 
> https://lore.kernel.org/virtualization/20230202090934.549556-1-...@igel.co.jp/
> - Focus on an adding io memory APIs
> Remove vringh API unification and some fixes.
> - Rebase on next-20230414
>
>  drivers/vhost/Kconfig  |   6 ++
>  drivers/vhost/vringh.c | 129 +
>  include/linux/vringh.h |  33 +++
>  3 files changed, 168 insertions(+)
>
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index b455d9ab6f3d..4b0dbb4a8ab3 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -6,6 +6,12 @@ config VHOST_IOTLB
>   This option is selected by any driver which needs to support
>   an IOMMU in software.
>
> +config VHOST_RING_IOMEM
> +   tristate
> +   select VHOST_IOMEM
> +   help
> + This option enables vringh APIs to supports io memory space.

There's no specific Kconfig for all the existing accessors. Any reason
I/O memory is special or do you care about the size of the module?

> +
>  config VHOST_RING
> tristate
> select VHOST_IOTLB
> diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> index 955d938eb663..ce5a88eecc05 100644
> --- a/drivers/vhost/vringh.c
> +++ b/drivers/vhost/vringh.c
> @@ -1604,4 +1604,133 @@ EXPORT_SYMBOL(vringh_need_notify_iotlb);
>
>  #endif
>

[...]

>
> base-commit: d3f2cd24819158bb70701c3549e586f9df9cee67
> prerequisite-patch-id: 760abbe8c981c52ccc421b8139e8999ab71619aa
> prerequisite-patch-id: 99d8679ab4569545c8af401e84142c66312e953e
> prerequisite-patch-id: aca81516aba75b58c8422d37c2dc7db2f61ffe92
> prerequisite-patch-id: 3d76136200c4e55ba2c41681325f242859dd6dbd
> prerequisite-patch-id: 47a994feb68d95412d81b0fd1fa27bc7ba05ae18
> prerequisite-patch-id: a2f7fc3f35358f70b6dad4c919ce293b10295c4f
> prerequisite-patch-id: 70e2ee32b945be96a0388f0ff564651ac9335220
> prerequisite-patch-id: 2023690f9c47017b56d7f036332a5ca3ece6bde8
> prerequisite-patch-id: 211e113fec6c450d13fbdb437ecfad67dec0a157
> prerequisite-patch-id: f2bcd3168933886e4cd4c39e47446d1bd7cb2691
> prerequisite-patch-id: 37b131560808733a0b8878e85a3d2a46d6ab02ca
> prerequisite-patch-id: 79b0219a715cb5ace227d55666d62fdb2dcc6ffe
> prerequisite-patch-id: 30f1740cd48a19aa1c3c93e625c740cae2845478
> prerequisite-patch-id: 31989e4a521f2fc6f68c4ccdb6960035e87666a7
> prerequisite-patch-id: 3948bb3e0c045e206a714d17bab16c94775d
> prerequisite-patch-id: cf28e0115b9111bcb77aa9c710d98b2be93c7e89
> prerequisite-patch-id: ebf2349c0ae1296663854eee2da0b43fe8972f9b
> prerequisite-patch-id: fc570921d885a2a6000800b4022321e63f1650a5
> prerequisite-patch-id: 1fd5219fef17c2bf2d76000207b25aae58c368f3
> prerequisite-patch-id: 34e5f078202762fe69df471e97b51b1341cbdfa9
> prerequisite-patch-id: 7fa5151b9e0488b48c2b9d1219152cfb047d6586
> prerequisite-patch-id: 33cca272767af04ae9abe7af2f6cbb9972cc0b77
> prerequisite-patch-id: bb1a6befc899dd97bcd946c2d76ce73675a1fa45
> prerequisite-patch-id: 10be04dd92fa451d13676e91d9094b63cd7fbcf8
> prerequisite-patch-id: 87b86eb4ce9501bba9c04ec81094ac9202392431
> prerequisite-patch-id: a5ced28762bf6bd6419dae0e4413d02ccafd72c2
> prerequisite-patch-id: 2db4c9603e00d69bb0184dabcc319e7f74f30305
> prerequisite-patch-id: 41933f9d53e5e9e02efd6157b68ee7d92b10cfa2
> prerequisite-patch-id: df3295b4cdde3a45eaf4c40047179698a4224d05
> prerequisite-patch-id: 9e2fca9ab0ba2b935daa96f1745ff4c909792231
> prerequisite-patch-id: 8948378099ba4d61e10a87e617d69ed2fc4104ae
> prerequisite-patch-id: 5e7466f3f0d74880d1a574a1bd91b12091dcf3f5
> prerequisite-patch-id: 902899e1cd53b7fcc7971f630aed103830fc3e3d
> prerequisite-patch-id: 42126b180500f9ff123db78748972c6ece18ac57
> prerequisite-patch-id: 5236a03ef574074f3c1009a52612051862b31eff
> prerequisite-patch-id: adae1aa80df65bd02a9e3f4db490cf801c1c6119
> prerequisite-patch-id: 22806fcabb973ee5f04ee6212db6161aab5bcbfc
> prerequisite-patch-id: 6eb14cfdc2cf31e90556f6afe7361427a332e8dc

These seem meaningless?

Thanks

> --
> 2.25.1
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 12/15] virtio_net: small: optimize code

2023-04-25 Thread Xuan Zhuo
On Wed, 26 Apr 2023 11:08:52 +0800, Jason Wang  wrote:
> On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  wrote:
> >
> > Avoid the problem that some variables(headroom and so on) will repeat
> > the calculation when process xdp.
> >
> > Signed-off-by: Xuan Zhuo 
>
> Nit: I think we need to tweak the title, it's better to say what is
> optimized. (And it would be better to tweak the title of patch 11 as
> well)

Yes, I agree this.

Thanks.

>
> Acked-by: Jason Wang 
>
> > ---
> >  drivers/net/virtio_net.c | 12 
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 5bc3dca0f60c..601c0e7fc32b 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1031,11 +1031,10 @@ static struct sk_buff *receive_small(struct 
> > net_device *dev,
> > struct sk_buff *skb;
> > struct bpf_prog *xdp_prog;
> > unsigned int xdp_headroom = (unsigned long)ctx;
> > -   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > -   unsigned int headroom = vi->hdr_len + header_offset;
> > -   unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > - SKB_DATA_ALIGN(sizeof(struct 
> > skb_shared_info));
> > struct page *page = virt_to_head_page(buf);
> > +   unsigned int header_offset;
> > +   unsigned int headroom;
> > +   unsigned int buflen;
> >
> > len -= vi->hdr_len;
> > stats->bytes += len;
> > @@ -1063,6 +1062,11 @@ static struct sk_buff *receive_small(struct 
> > net_device *dev,
> > rcu_read_unlock();
> >
> >  skip_xdp:
> > +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > +   headroom = vi->hdr_len + header_offset;
> > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +
> > skb = build_skb(buf, buflen);
> > if (!skb)
> > goto err;
> > --
> > 2.32.0.3.g01195cf9f
> >
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/5] a vendor driver should not set _CONFIG_S_FAILED

2023-04-25 Thread Jason Wang


在 2023/4/1 04:48, Zhu Lingshan 写道:

VIRTIO_CONFIG_S_FAILED indicates the guest driver has given up
the device due to fatal errors. So it is the guest decision,
the vendor driver should not set this status to the device.

Signed-off-by: Zhu Lingshan 



Acked-by: Jason Wang 

Thanks



---
  drivers/vdpa/ifcvf/ifcvf_main.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 15c6157ee841..f228fba74c61 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -423,9 +423,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
*vdpa_dev, u8 status)
!(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) {
ret = ifcvf_request_irq(vf);
if (ret) {
-   status = ifcvf_get_status(vf);
-   status |= VIRTIO_CONFIG_S_FAILED;
-   ifcvf_set_status(vf, status);
+   IFCVF_ERR(vf->pdev, "failed to request irq with error 
%d\n", ret);
return;
}
}


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] synchronize irqs in the reset routine

2023-04-25 Thread Jason Wang


在 2023/4/1 04:48, Zhu Lingshan 写道:

This commit synchronize irqs of the virtqueues
and config space in the reset routine.
Thus ifcvf_stop_hw() and reset() are refactored as well.

Signed-off-by: Zhu Lingshan 
---
  drivers/vdpa/ifcvf/ifcvf_base.c | 61 ++---
  drivers/vdpa/ifcvf/ifcvf_main.c | 45 +++-
  2 files changed, 54 insertions(+), 52 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 79e313c5e10e..49949aec20ef 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -170,12 +170,7 @@ void ifcvf_set_status(struct ifcvf_hw *hw, u8 status)
  
  void ifcvf_reset(struct ifcvf_hw *hw)

  {
-   hw->config_cb.callback = NULL;
-   hw->config_cb.private = NULL;
-
ifcvf_set_status(hw, 0);
-   /* flush set_status, make sure VF is stopped, reset */
-   ifcvf_get_status(hw);



If we don't flush or poll how can we know the reset is done?

E.g modern virtio-pci did:

    /* 0 status means a reset. */
    vp_modern_set_status(mdev, 0);
    /* After writing 0 to device_status, the driver MUST wait for a 
read of

 * device_status to return 0 before reinitializing the device.
 * This will flush out the status write, and flush in device 
writes,

 * including MSI-X interrupts, if any.
 */
    while (vp_modern_get_status(mdev))
    msleep(1);
    /* Flush pending VQ/configuration callbacks. */
       vp_synchronize_vectors(vdev);


  }
  
  u64 ifcvf_get_hw_features(struct ifcvf_hw *hw)

@@ -368,20 +363,62 @@ void ifcvf_set_vq_ready(struct ifcvf_hw *hw, u16 qid, 
bool ready)
vp_iowrite16(ready, &cfg->queue_enable);
  }
  
-static void ifcvf_hw_disable(struct ifcvf_hw *hw)

+static void synchronize_per_vq_irq(struct ifcvf_hw *hw)
  {
-   u32 i;
+   u16 qid;
  
-	ifcvf_set_config_vector(hw, VIRTIO_MSI_NO_VECTOR);

-   for (i = 0; i < hw->nr_vring; i++) {
-   ifcvf_set_vq_vector(hw, i, VIRTIO_MSI_NO_VECTOR);
+   for (qid = 0; qid < hw->nr_vring; qid++) {
+   if (hw->vring[qid].irq != -EINVAL)
+   synchronize_irq(hw->vring[qid].irq);
}
  }
  
+static void synchronize_vqs_reused_irq(struct ifcvf_hw *hw)

+{
+   if (hw->vqs_reused_irq != -EINVAL)
+   synchronize_irq(hw->vqs_reused_irq);
+}
+
+static void synchronize_vq_irq(struct ifcvf_hw *hw)
+{
+   u8 status = hw->msix_vector_status;
+
+   if (status == MSIX_VECTOR_PER_VQ_AND_CONFIG)
+   synchronize_per_vq_irq(hw);
+   else
+   synchronize_vqs_reused_irq(hw);
+}



I wonder if we need to go with such complicated ways,can we synchronize 
through the vectors like virtio-pci did?


    for (i = 0; i < vp_dev->msix_vectors; ++i)
    synchronize_irq(pci_irq_vector(vp_dev->pci_dev, i));
?



+
+static void synchronize_config_irq(struct ifcvf_hw *hw)
+{
+   if (hw->config_irq != -EINVAL)
+   synchronize_irq(hw->config_irq);
+}
+
+static void ifcvf_reset_vring(struct ifcvf_hw *hw)
+{
+   u16 qid;
+
+   for (qid = 0; qid < hw->nr_vring; qid++) {
+   synchronize_vq_irq(hw);


Since IRQ could be shared, this will result extra complexity, like a irq 
could be flushed multiple times?


Thanks



+   hw->vring[qid].cb.callback = NULL;
+   hw->vring[qid].cb.private = NULL;
+   ifcvf_set_vq_vector(hw, qid, VIRTIO_MSI_NO_VECTOR);
+   }
+}
+
+static void ifcvf_reset_config_handler(struct ifcvf_hw *hw)
+{
+   synchronize_config_irq(hw);
+   hw->config_cb.callback = NULL;
+   hw->config_cb.private = NULL;
+   ifcvf_set_config_vector(hw, VIRTIO_MSI_NO_VECTOR);
+}
+
  void ifcvf_stop_hw(struct ifcvf_hw *hw)
  {
-   ifcvf_hw_disable(hw);
-   ifcvf_reset(hw);
+   ifcvf_reset_vring(hw);
+   ifcvf_reset_config_handler(hw);
  }
  
  void ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 968687159e44..15c6157ee841 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -346,33 +346,6 @@ static int ifcvf_request_irq(struct ifcvf_hw *vf)
return 0;
  }
  
-static int ifcvf_stop_datapath(struct ifcvf_adapter *adapter)

-{
-   struct ifcvf_hw *vf = adapter->vf;
-   int i;
-
-   for (i = 0; i < vf->nr_vring; i++)
-   vf->vring[i].cb.callback = NULL;
-
-   ifcvf_stop_hw(vf);
-
-   return 0;
-}
-
-static void ifcvf_reset_vring(struct ifcvf_adapter *adapter)
-{
-   struct ifcvf_hw *vf = adapter->vf;
-   int i;
-
-   for (i = 0; i < vf->nr_vring; i++) {
-   vf->vring[i].last_avail_idx = 0;
-   vf->vring[i].cb.callback = NULL;
-   vf->vring[i].cb.private = NULL;
-   }
-
-   ifcvf_reset(vf);
-}
-
  static struct ifcvf_adapter *vdpa_to_adapter(struc

Re: [PATCH 3/5] retire ifcvf_start_datapath and ifcvf_add_status

2023-04-25 Thread Jason Wang


在 2023/4/1 04:48, Zhu Lingshan 写道:

Rather than former lazy-initialization mechanism,
now the virtqueue operations and driver_features related
ops access the virtio registers directly to take
immediate actions. So ifcvf_start_datapath() should
retire.

ifcvf_add_status() is retierd because we should not change
device status by a vendor driver's decision, this driver should
only set device status which is from virito drivers
upon vdpa_ops.set_status()

Signed-off-by: Zhu Lingshan 



Acked-by: Jason Wang 

Thanks



---
  drivers/vdpa/ifcvf/ifcvf_base.c | 19 ---
  drivers/vdpa/ifcvf/ifcvf_base.h |  1 -
  drivers/vdpa/ifcvf/ifcvf_main.c | 23 ---
  3 files changed, 43 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 546e923bcd16..79e313c5e10e 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -178,15 +178,6 @@ void ifcvf_reset(struct ifcvf_hw *hw)
ifcvf_get_status(hw);
  }
  
-static void ifcvf_add_status(struct ifcvf_hw *hw, u8 status)

-{
-   if (status != 0)
-   status |= ifcvf_get_status(hw);
-
-   ifcvf_set_status(hw, status);
-   ifcvf_get_status(hw);
-}
-
  u64 ifcvf_get_hw_features(struct ifcvf_hw *hw)
  {
struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
@@ -387,16 +378,6 @@ static void ifcvf_hw_disable(struct ifcvf_hw *hw)
}
  }
  
-int ifcvf_start_hw(struct ifcvf_hw *hw)

-{
-   ifcvf_add_status(hw, VIRTIO_CONFIG_S_ACKNOWLEDGE);
-   ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER);
-
-   ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER_OK);
-
-   return 0;
-}
-
  void ifcvf_stop_hw(struct ifcvf_hw *hw)
  {
ifcvf_hw_disable(hw);
diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index cb19196c3ece..d34d3bc0dbf4 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -110,7 +110,6 @@ struct ifcvf_vdpa_mgmt_dev {
  };
  
  int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev);

-int ifcvf_start_hw(struct ifcvf_hw *hw);
  void ifcvf_stop_hw(struct ifcvf_hw *hw);
  void ifcvf_notify_queue(struct ifcvf_hw *hw, u16 qid);
  void ifcvf_read_dev_config(struct ifcvf_hw *hw, u64 offset,
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 4588484bd53d..968687159e44 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -346,22 +346,6 @@ static int ifcvf_request_irq(struct ifcvf_hw *vf)
return 0;
  }
  
-static int ifcvf_start_datapath(struct ifcvf_adapter *adapter)

-{
-   struct ifcvf_hw *vf = adapter->vf;
-   u8 status;
-   int ret;
-
-   ret = ifcvf_start_hw(vf);
-   if (ret < 0) {
-   status = ifcvf_get_status(vf);
-   status |= VIRTIO_CONFIG_S_FAILED;
-   ifcvf_set_status(vf, status);
-   }
-
-   return ret;
-}
-
  static int ifcvf_stop_datapath(struct ifcvf_adapter *adapter)
  {
struct ifcvf_hw *vf = adapter->vf;
@@ -452,13 +436,11 @@ static u8 ifcvf_vdpa_get_status(struct vdpa_device 
*vdpa_dev)
  
  static void ifcvf_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status)

  {
-   struct ifcvf_adapter *adapter;
struct ifcvf_hw *vf;
u8 status_old;
int ret;
  
  	vf  = vdpa_to_vf(vdpa_dev);

-   adapter = vdpa_to_adapter(vdpa_dev);
status_old = ifcvf_get_status(vf);
  
  	if (status_old == status)

@@ -473,11 +455,6 @@ static void ifcvf_vdpa_set_status(struct vdpa_device 
*vdpa_dev, u8 status)
ifcvf_set_status(vf, status);
return;
}
-
-   if (ifcvf_start_datapath(adapter) < 0)
-   IFCVF_ERR(adapter->pdev,
- "Failed to set ifcvf vdpa  status %u\n",
- status);
}
  
  	ifcvf_set_status(vf, status);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/5] get_driver_features from virito registers

2023-04-25 Thread Jason Wang


在 2023/4/1 04:48, Zhu Lingshan 写道:

This commit implements a new function ifcvf_get_driver_feature()
which read driver_features from virtio registers.

To be less ambiguous, ifcvf_set_features() is renamed to
ifcvf_set_driver_features(), and ifcvf_get_features()
is renamed to ifcvf_get_dev_features() which returns
the provisioned vDPA device features.

Signed-off-by: Zhu Lingshan 
---
  drivers/vdpa/ifcvf/ifcvf_base.c | 38 +
  drivers/vdpa/ifcvf/ifcvf_base.h |  5 +++--
  drivers/vdpa/ifcvf/ifcvf_main.c |  9 +---
  3 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 6c5650f73007..546e923bcd16 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -204,11 +204,29 @@ u64 ifcvf_get_hw_features(struct ifcvf_hw *hw)
return features;
  }
  
-u64 ifcvf_get_features(struct ifcvf_hw *hw)

+/* return provisioned vDPA dev features */
+u64 ifcvf_get_dev_features(struct ifcvf_hw *hw)
  {
return hw->dev_features;
  }
  
+u64 ifcvf_get_driver_features(struct ifcvf_hw *hw)

+{
+   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+   u32 features_lo, features_hi;
+   u64 features;
+
+   vp_iowrite32(0, &cfg->device_feature_select);
+   features_lo = vp_ioread32(&cfg->guest_feature);
+
+   vp_iowrite32(1, &cfg->device_feature_select);
+   features_hi = vp_ioread32(&cfg->guest_feature);
+
+   features = ((u64)features_hi << 32) | features_lo;
+
+   return features;
+}



This duplicates with the logic ifcvf_get_hw_features(), it would be 
simpler if we just do a rename.


Thanks



+
  int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features)
  {
if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) {
@@ -275,7 +293,7 @@ void ifcvf_write_dev_config(struct ifcvf_hw *hw, u64 offset,
vp_iowrite8(*p++, hw->dev_cfg + offset + i);
  }
  
-static void ifcvf_set_features(struct ifcvf_hw *hw, u64 features)

+void ifcvf_set_driver_features(struct ifcvf_hw *hw, u64 features)
  {
struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
  
@@ -286,19 +304,6 @@ static void ifcvf_set_features(struct ifcvf_hw *hw, u64 features)

vp_iowrite32(features >> 32, &cfg->guest_feature);
  }
  
-static int ifcvf_config_features(struct ifcvf_hw *hw)

-{
-   ifcvf_set_features(hw, hw->req_features);
-   ifcvf_add_status(hw, VIRTIO_CONFIG_S_FEATURES_OK);
-
-   if (!(ifcvf_get_status(hw) & VIRTIO_CONFIG_S_FEATURES_OK)) {
-   IFCVF_ERR(hw->pdev, "Failed to set FEATURES_OK status\n");
-   return -EIO;
-   }
-
-   return 0;
-}
-
  u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid)
  {
struct ifcvf_lm_cfg __iomem *ifcvf_lm;
@@ -387,9 +392,6 @@ int ifcvf_start_hw(struct ifcvf_hw *hw)
ifcvf_add_status(hw, VIRTIO_CONFIG_S_ACKNOWLEDGE);
ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER);
  
-	if (ifcvf_config_features(hw) < 0)

-   return -EINVAL;
-
ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER_OK);
  
  	return 0;

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index d545a9411143..cb19196c3ece 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -69,7 +69,6 @@ struct ifcvf_hw {
phys_addr_t notify_base_pa;
u32 notify_off_multiplier;
u32 dev_type;
-   u64 req_features;
u64 hw_features;
/* provisioned device features */
u64 dev_features;
@@ -122,7 +121,7 @@ u8 ifcvf_get_status(struct ifcvf_hw *hw);
  void ifcvf_set_status(struct ifcvf_hw *hw, u8 status);
  void io_write64_twopart(u64 val, u32 *lo, u32 *hi);
  void ifcvf_reset(struct ifcvf_hw *hw);
-u64 ifcvf_get_features(struct ifcvf_hw *hw);
+u64 ifcvf_get_dev_features(struct ifcvf_hw *hw);
  u64 ifcvf_get_hw_features(struct ifcvf_hw *hw);
  int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features);
  u16 ifcvf_get_vq_state(struct ifcvf_hw *hw, u16 qid);
@@ -137,4 +136,6 @@ int ifcvf_set_vq_address(struct ifcvf_hw *hw, u16 qid, u64 
desc_area,
 u64 driver_area, u64 device_area);
  bool ifcvf_get_vq_ready(struct ifcvf_hw *hw, u16 qid);
  void ifcvf_set_vq_ready(struct ifcvf_hw *hw, u16 qid, bool ready);
+void ifcvf_set_driver_features(struct ifcvf_hw *hw, u64 features);
+u64 ifcvf_get_driver_features(struct ifcvf_hw *hw);
  #endif /* _IFCVF_H_ */
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 1357c67014ab..4588484bd53d 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -410,7 +410,7 @@ static u64 ifcvf_vdpa_get_device_features(struct 
vdpa_device *vdpa_dev)
u64 features;
  
  	if (type == VIRTIO_ID_NET || type == VIRTIO_ID_BLOCK)

-   features = ifcvf_get_features(vf);
+   features = ifcvf_get_dev_features(vf);
 

Re: [PATCH 1/5] virt queue ops take immediate actions

2023-04-25 Thread Jason Wang


在 2023/4/1 04:48, Zhu Lingshan 写道:

In this commit, virtqueue operations including:
set_vq_num(), set_vq_address(), set_vq_ready()
and get_vq_ready() access PCI registers directly
to take immediate actions.

Signed-off-by: Zhu Lingshan 
---
  drivers/vdpa/ifcvf/ifcvf_base.c | 58 -
  drivers/vdpa/ifcvf/ifcvf_base.h | 10 +++---
  drivers/vdpa/ifcvf/ifcvf_main.c | 16 +++--
  3 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c
index 5563b3a773c7..6c5650f73007 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.c
+++ b/drivers/vdpa/ifcvf/ifcvf_base.c
@@ -329,31 +329,49 @@ int ifcvf_set_vq_state(struct ifcvf_hw *hw, u16 qid, u16 
num)
return 0;
  }
  
-static int ifcvf_hw_enable(struct ifcvf_hw *hw)

+void ifcvf_set_vq_num(struct ifcvf_hw *hw, u16 qid, u32 num)
  {
-   struct virtio_pci_common_cfg __iomem *cfg;
-   u32 i;
+   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
  
-	cfg = hw->common_cfg;

-   for (i = 0; i < hw->nr_vring; i++) {
-   if (!hw->vring[i].ready)
-   break;
+   vp_iowrite16(qid, &cfg->queue_select);
+   vp_iowrite16(num, &cfg->queue_size);
+}
  
-		vp_iowrite16(i, &cfg->queue_select);

-   vp_iowrite64_twopart(hw->vring[i].desc, &cfg->queue_desc_lo,
-&cfg->queue_desc_hi);
-   vp_iowrite64_twopart(hw->vring[i].avail, &cfg->queue_avail_lo,
- &cfg->queue_avail_hi);
-   vp_iowrite64_twopart(hw->vring[i].used, &cfg->queue_used_lo,
-&cfg->queue_used_hi);
-   vp_iowrite16(hw->vring[i].size, &cfg->queue_size);
-   ifcvf_set_vq_state(hw, i, hw->vring[i].last_avail_idx);
-   vp_iowrite16(1, &cfg->queue_enable);
-   }
+int ifcvf_set_vq_address(struct ifcvf_hw *hw, u16 qid, u64 desc_area,
+u64 driver_area, u64 device_area)
+{
+   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+
+   vp_iowrite16(qid, &cfg->queue_select);
+   vp_iowrite64_twopart(desc_area, &cfg->queue_desc_lo,
+&cfg->queue_desc_hi);
+   vp_iowrite64_twopart(driver_area, &cfg->queue_avail_lo,
+&cfg->queue_avail_hi);
+   vp_iowrite64_twopart(device_area, &cfg->queue_used_lo,
+&cfg->queue_used_hi);
  
  	return 0;

  }
  
+bool ifcvf_get_vq_ready(struct ifcvf_hw *hw, u16 qid)

+{
+   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+   u16 queue_enable;
+
+   vp_iowrite16(qid, &cfg->queue_select);
+   queue_enable = vp_ioread16(&cfg->queue_enable);
+
+   return (bool)queue_enable;
+}
+
+void ifcvf_set_vq_ready(struct ifcvf_hw *hw, u16 qid, bool ready)
+{
+   struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg;
+
+   vp_iowrite16(qid, &cfg->queue_select);
+   vp_iowrite16(ready, &cfg->queue_enable);
+}
+
  static void ifcvf_hw_disable(struct ifcvf_hw *hw)
  {
u32 i;
@@ -366,16 +384,12 @@ static void ifcvf_hw_disable(struct ifcvf_hw *hw)
  
  int ifcvf_start_hw(struct ifcvf_hw *hw)

  {
-   ifcvf_reset(hw);



This seems unrelated to the immediate actions?

The rest looks good.

Thanks



ifcvf_add_status(hw, VIRTIO_CONFIG_S_ACKNOWLEDGE);
ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER);
  
  	if (ifcvf_config_features(hw) < 0)

return -EINVAL;
  
-	if (ifcvf_hw_enable(hw) < 0)

-   return -EINVAL;
-
ifcvf_add_status(hw, VIRTIO_CONFIG_S_DRIVER_OK);
  
  	return 0;

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index c20d1c40214e..d545a9411143 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -47,12 +47,7 @@
  #define MSIX_VECTOR_DEV_SHARED3
  
  struct vring_info {

-   u64 desc;
-   u64 avail;
-   u64 used;
-   u16 size;
u16 last_avail_idx;
-   bool ready;
void __iomem *notify_addr;
phys_addr_t notify_pa;
u32 irq;
@@ -137,4 +132,9 @@ int ifcvf_probed_virtio_net(struct ifcvf_hw *hw);
  u32 ifcvf_get_config_size(struct ifcvf_hw *hw);
  u16 ifcvf_set_vq_vector(struct ifcvf_hw *hw, u16 qid, int vector);
  u16 ifcvf_set_config_vector(struct ifcvf_hw *hw, int vector);
+void ifcvf_set_vq_num(struct ifcvf_hw *hw, u16 qid, u32 num);
+int ifcvf_set_vq_address(struct ifcvf_hw *hw, u16 qid, u64 desc_area,
+u64 driver_area, u64 device_area);
+bool ifcvf_get_vq_ready(struct ifcvf_hw *hw, u16 qid);
+void ifcvf_set_vq_ready(struct ifcvf_hw *hw, u16 qid, bool ready);
  #endif /* _IFCVF_H_ */
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 7f78c47e40d6..1357c67014ab 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c

Re: [PATCH net-next v3 10/15] virtio_net: introduce receive_small_xdp()

2023-04-25 Thread Jason Wang
On Tue, Apr 25, 2023 at 4:10 PM Xuan Zhuo  wrote:
>
> On Tue, 25 Apr 2023 16:00:05 +0800, Xuan Zhuo  
> wrote:
> > On Tue, 25 Apr 2023 15:58:03 +0800, Jason Wang  wrote:
> > > On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  
> > > wrote:
> > > >
> > > > The purpose of this patch is to simplify the receive_small().
> > > > Separate all the logic of XDP of small into a function.
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > ---
> > > >  drivers/net/virtio_net.c | 165 ---
> > > >  1 file changed, 100 insertions(+), 65 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index de5a579e8603..9b5fd2e0d27f 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -931,6 +931,99 @@ static struct page *xdp_linearize_page(struct 
> > > > receive_queue *rq,
> > > > return NULL;
> > > >  }
> > > >
> > > > +static struct sk_buff *receive_small_xdp(struct net_device *dev,
> > > > +struct virtnet_info *vi,
> > > > +struct receive_queue *rq,
> > > > +struct bpf_prog *xdp_prog,
> > > > +void *buf,
> > > > +unsigned int xdp_headroom,
> > > > +unsigned int len,
> > > > +unsigned int *xdp_xmit,
> > > > +struct virtnet_rq_stats *stats)
> > > > +{
> > > > +   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > > > +   unsigned int headroom = vi->hdr_len + header_offset;
> > > > +   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
> > > > +   struct page *page = virt_to_head_page(buf);
> > > > +   struct page *xdp_page;
> > > > +   unsigned int buflen;
> > > > +   struct xdp_buff xdp;
> > > > +   struct sk_buff *skb;
> > > > +   unsigned int delta = 0;
> > > > +   unsigned int metasize = 0;
> > > > +   void *orig_data;
> > > > +   u32 act;
> > > > +
> > > > +   if (unlikely(hdr->hdr.gso_type))
> > > > +   goto err_xdp;
> > > > +
> > > > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > > > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > +
> > > > +   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> > > > +   int offset = buf - page_address(page) + header_offset;
> > > > +   unsigned int tlen = len + vi->hdr_len;
> > > > +   int num_buf = 1;
> > > > +
> > > > +   xdp_headroom = virtnet_get_headroom(vi);
> > > > +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > > > +   headroom = vi->hdr_len + header_offset;
> > > > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > > > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > > +   xdp_page = xdp_linearize_page(rq, &num_buf, page,
> > > > + offset, header_offset,
> > > > + &tlen);
> > > > +   if (!xdp_page)
> > > > +   goto err_xdp;
> > > > +
> > > > +   buf = page_address(xdp_page);
> > > > +   put_page(page);
> > > > +   page = xdp_page;
> > > > +   }
> > > > +
> > > > +   xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
> > > > +   xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
> > > > +xdp_headroom, len, true);
> > > > +   orig_data = xdp.data;
> > > > +
> > > > +   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> > > > +
> > > > +   switch (act) {
> > > > +   case XDP_PASS:
> > > > +   /* Recalculate length in case bpf program changed it */
> > > > +   delta = orig_data - xdp.data;
> > > > +   len = xdp.data_end - xdp.data;
> > > > +   metasize = xdp.data - xdp.data_meta;
> > > > +   break;
> > > > +
> > > > +   case XDP_TX:
> > > > +   case XDP_REDIRECT:
> > > > +   goto xdp_xmit;
> > > > +
> > > > +   default:
> > > > +   goto err_xdp;
> > > > +   }
> > > > +
> > > > +   skb = build_skb(buf, buflen);
> > > > +   if (!skb)
> > > > +   goto err;
> > > > +
> > > > +   skb_reserve(skb, headroom - delta);
> > > > +   skb_put(skb, len);
> > > > +   if (metasize)
> > > > +   skb_metadata_set(skb, metasize);
> > > > +
> > > > +   return skb;
> > > > +
> > > > +err_xdp:
> > > > +   stats->xdp_drops++;
> > > > +err:
> > > > +   stats->drops++;
> > > > +   put_page(page);
> > > > +xdp_xmit:
> > > > +   return NULL;
> > > > +}
> > >
> > > It looks like some of the comments of the above v

Re: [PATCH net-next v3 15/15] virtio_net: introduce virtnet_build_skb()

2023-04-25 Thread Jason Wang


在 2023/4/23 18:57, Xuan Zhuo 写道:

This logic is used in multiple places, now we separate it into
a helper.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 

Thanks



---
  drivers/net/virtio_net.c | 34 +-
  1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 811cf1046df2..f768e683dadb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -443,6 +443,22 @@ static unsigned int mergeable_ctx_to_truesize(void 
*mrg_ctx)
return (unsigned long)mrg_ctx & ((1 << MRG_CTX_HEADER_SHIFT) - 1);
  }
  
+static struct sk_buff *virtnet_build_skb(void *buf, unsigned int buflen,

+unsigned int headroom,
+unsigned int len)
+{
+   struct sk_buff *skb;
+
+   skb = build_skb(buf, buflen);
+   if (unlikely(!skb))
+   return NULL;
+
+   skb_reserve(skb, headroom);
+   skb_put(skb, len);
+
+   return skb;
+}
+
  /* Called from bottom half context */
  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
   struct receive_queue *rq,
@@ -476,13 +492,10 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
  
  	/* copy small packet so we can reuse these pages */

if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
-   skb = build_skb(buf, truesize);
+   skb = virtnet_build_skb(buf, truesize, p - buf, len);
if (unlikely(!skb))
return NULL;
  
-		skb_reserve(skb, p - buf);

-   skb_put(skb, len);
-
page = (struct page *)page->private;
if (page)
give_pages(rq, page);
@@ -946,13 +959,10 @@ static struct sk_buff *receive_small_build_skb(struct 
virtnet_info *vi,
buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
  
-	skb = build_skb(buf, buflen);

-   if (!skb)
+   skb = virtnet_build_skb(buf, buflen, headroom, len);
+   if (unlikely(!skb))
return NULL;
  
-	skb_reserve(skb, headroom);

-   skb_put(skb, len);
-
buf += header_offset;
memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
  
@@ -1028,12 +1038,10 @@ static struct sk_buff *receive_small_xdp(struct net_device *dev,

goto err_xdp;
}
  
-	skb = build_skb(buf, buflen);

-   if (!skb)
+   skb = virtnet_build_skb(buf, buflen, xdp.data - buf, len);
+   if (unlikely(!skb))
goto err;
  
-	skb_reserve(skb, xdp.data - buf);

-   skb_put(skb, len);
if (metasize)
skb_metadata_set(skb, metasize);
  


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 14/15] virtio_net: introduce receive_small_build_xdp

2023-04-25 Thread Jason Wang


在 2023/4/23 18:57, Xuan Zhuo 写道:

Simplifying receive_small() function. Bringing the logic relating to
build_skb together.

Signed-off-by: Xuan Zhuo 



Acked-by: Jason Wang 

Thanks



---
  drivers/net/virtio_net.c | 48 ++--
  1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d2973c8fa48c..811cf1046df2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -931,6 +931,34 @@ static struct page *xdp_linearize_page(struct 
receive_queue *rq,
return NULL;
  }
  
+static struct sk_buff *receive_small_build_skb(struct virtnet_info *vi,

+  unsigned int xdp_headroom,
+  void *buf,
+  unsigned int len)
+{
+   unsigned int header_offset;
+   unsigned int headroom;
+   unsigned int buflen;
+   struct sk_buff *skb;
+
+   header_offset = VIRTNET_RX_PAD + xdp_headroom;
+   headroom = vi->hdr_len + header_offset;
+   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+   skb = build_skb(buf, buflen);
+   if (!skb)
+   return NULL;
+
+   skb_reserve(skb, headroom);
+   skb_put(skb, len);
+
+   buf += header_offset;
+   memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
+
+   return skb;
+}
+
  static struct sk_buff *receive_small_xdp(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -1030,9 +1058,6 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
  {
unsigned int xdp_headroom = (unsigned long)ctx;
struct page *page = virt_to_head_page(buf);
-   unsigned int header_offset;
-   unsigned int headroom;
-   unsigned int buflen;
struct sk_buff *skb;
  
  	len -= vi->hdr_len;

@@ -1060,20 +1085,9 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
rcu_read_unlock();
}
  
-	header_offset = VIRTNET_RX_PAD + xdp_headroom;

-   headroom = vi->hdr_len + header_offset;
-   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
-   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-
-   skb = build_skb(buf, buflen);
-   if (!skb)
-   goto err;
-   skb_reserve(skb, headroom);
-   skb_put(skb, len);
-
-   buf += header_offset;
-   memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
-   return skb;
+   skb = receive_small_build_skb(vi, xdp_headroom, buf, len);
+   if (likely(skb))
+   return skb;
  
  err:

stats->drops++;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 13/15] virtio_net: small: remove skip_xdp

2023-04-25 Thread Jason Wang
On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  wrote:
>
> now, the process of xdp is simple, we can remove the skip_xdp.

I would say the reason why xdp is simple, I think it is because the
skb build path is not shared between XDP and non-XDP case.

Other than this

Acked-by: Jason Wang 

Thanks


>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 26 --
>  1 file changed, 12 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 601c0e7fc32b..d2973c8fa48c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1028,13 +1028,12 @@ static struct sk_buff *receive_small(struct 
> net_device *dev,
>  unsigned int *xdp_xmit,
>  struct virtnet_rq_stats *stats)
>  {
> -   struct sk_buff *skb;
> -   struct bpf_prog *xdp_prog;
> unsigned int xdp_headroom = (unsigned long)ctx;
> struct page *page = virt_to_head_page(buf);
> unsigned int header_offset;
> unsigned int headroom;
> unsigned int buflen;
> +   struct sk_buff *skb;
>
> len -= vi->hdr_len;
> stats->bytes += len;
> @@ -1046,22 +1045,21 @@ static struct sk_buff *receive_small(struct 
> net_device *dev,
> goto err;
> }
>
> -   if (likely(!vi->xdp_enabled)) {
> -   xdp_prog = NULL;
> -   goto skip_xdp;
> -   }
> +   if (unlikely(vi->xdp_enabled)) {
> +   struct bpf_prog *xdp_prog;
>
> -   rcu_read_lock();
> -   xdp_prog = rcu_dereference(rq->xdp_prog);
> -   if (xdp_prog) {
> -   skb = receive_small_xdp(dev, vi, rq, xdp_prog, buf, 
> xdp_headroom,
> -   len, xdp_xmit, stats);
> +   rcu_read_lock();
> +   xdp_prog = rcu_dereference(rq->xdp_prog);
> +   if (xdp_prog) {
> +   skb = receive_small_xdp(dev, vi, rq, xdp_prog, buf,
> +   xdp_headroom, len, xdp_xmit,
> +   stats);
> +   rcu_read_unlock();
> +   return skb;
> +   }
> rcu_read_unlock();
> -   return skb;
> }
> -   rcu_read_unlock();
>
> -skip_xdp:
> header_offset = VIRTNET_RX_PAD + xdp_headroom;
> headroom = vi->hdr_len + header_offset;
> buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3 12/15] virtio_net: small: optimize code

2023-04-25 Thread Jason Wang
On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  wrote:
>
> Avoid the problem that some variables(headroom and so on) will repeat
> the calculation when process xdp.
>
> Signed-off-by: Xuan Zhuo 

Nit: I think we need to tweak the title, it's better to say what is
optimized. (And it would be better to tweak the title of patch 11 as
well)

Acked-by: Jason Wang 

> ---
>  drivers/net/virtio_net.c | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5bc3dca0f60c..601c0e7fc32b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1031,11 +1031,10 @@ static struct sk_buff *receive_small(struct 
> net_device *dev,
> struct sk_buff *skb;
> struct bpf_prog *xdp_prog;
> unsigned int xdp_headroom = (unsigned long)ctx;
> -   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> -   unsigned int headroom = vi->hdr_len + header_offset;
> -   unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> - SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> struct page *page = virt_to_head_page(buf);
> +   unsigned int header_offset;
> +   unsigned int headroom;
> +   unsigned int buflen;
>
> len -= vi->hdr_len;
> stats->bytes += len;
> @@ -1063,6 +1062,11 @@ static struct sk_buff *receive_small(struct net_device 
> *dev,
> rcu_read_unlock();
>
>  skip_xdp:
> +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> +   headroom = vi->hdr_len + header_offset;
> +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> skb = build_skb(buf, buflen);
> if (!skb)
> goto err;
> --
> 2.32.0.3.g01195cf9f
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 virtio 01/10] virtio: allow caller to override device id and DMA mask

2023-04-25 Thread Shannon Nelson via Virtualization

On 4/25/23 7:09 PM, Xuan Zhuo wrote:


On Tue, 25 Apr 2023 14:25:53 -0700, Shannon Nelson  
wrote:

To add a bit of flexibility with various virtio based devices, allow
the caller to specify a different device id and DMA mask.  This adds
fields to struct virtio_pci_modern_device to specify an override device
id check and a DMA mask.

int (*device_id_check)(struct pci_dev *pdev);
   If defined by the driver, this function will be called to check
   that the PCI device is the vendor's expected device, and will
   return the found device id to be stored in mdev->id.device.
   This allows vendors with alternative vendor device ids to use
   this library on their own device BAR.

u64 dma_mask;
   If defined by the driver, this mask will be used in a call to
   dma_set_mask_and_coherent() instead of the traditional
   DMA_BIT_MASK(64).  This allows limiting the DMA space on
   vendor devices with address limitations.

Signed-off-by: Shannon Nelson 
---
  drivers/virtio/virtio_pci_modern_dev.c | 37 +-
  include/linux/virtio_pci_modern.h  |  6 +
  2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index 869cb46bef96..1f2db76e8f91 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -218,21 +218,29 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
   int err, common, isr, notify, device;
   u32 notify_length;
   u32 notify_offset;
+ int devid;

   check_offsets();

- /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */
- if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
- return -ENODEV;
-
- if (pci_dev->device < 0x1040) {
- /* Transitional devices: use the PCI subsystem device id as
-  * virtio device id, same as legacy driver always did.
-  */
- mdev->id.device = pci_dev->subsystem_device;
+ if (mdev->device_id_check) {
+ devid = mdev->device_id_check(pci_dev);
+ if (devid < 0)
+ return devid;


I would want to know is there any other reason to return the errno?
How about return -ENODEV directly?


Because if device_id_check() is returning an errno, it is trying to 
communicate some information about what went wrong, and I really get 
annoyed when an intermediate layer stomps on the value and makes that 
potentially useful information disappear.


sln



Thanks.



+ mdev->id.device = devid;
   } else {
- /* Modern devices: simply use PCI device id, but start from 
0x1040. */
- mdev->id.device = pci_dev->device - 0x1040;
+ /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */
+ if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
+ return -ENODEV;
+
+ if (pci_dev->device < 0x1040) {
+ /* Transitional devices: use the PCI subsystem device id 
as
+  * virtio device id, same as legacy driver always did.
+  */
+ mdev->id.device = pci_dev->subsystem_device;
+ } else {
+ /* Modern devices: simply use PCI device id, but start 
from 0x1040. */
+ mdev->id.device = pci_dev->device - 0x1040;
+ }
   }
   mdev->id.vendor = pci_dev->subsystem_vendor;

@@ -260,7 +268,12 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
   return -EINVAL;
   }

- err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
+ if (mdev->dma_mask)
+ err = dma_set_mask_and_coherent(&pci_dev->dev,
+ mdev->dma_mask);
+ else
+ err = dma_set_mask_and_coherent(&pci_dev->dev,
+ DMA_BIT_MASK(64));
   if (err)
   err = dma_set_mask_and_coherent(&pci_dev->dev,
   DMA_BIT_MASK(32));
diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index c4eeb79b0139..067ac1d789bc 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -38,6 +38,12 @@ struct virtio_pci_modern_device {
   int modern_bars;

   struct virtio_device_id id;
+
+ /* optional check for vendor virtio device, returns dev_id or -ERRNO */
+ int (*device_id_check)(struct pci_dev *pdev);
+
+ /* optional mask for devices with limited DMA space */
+ u64 dma_mask;
  };

  /*
--
2.17.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 virtio 01/10] virtio: allow caller to override device id and DMA mask

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 14:25:53 -0700, Shannon Nelson  
wrote:
> To add a bit of flexibility with various virtio based devices, allow
> the caller to specify a different device id and DMA mask.  This adds
> fields to struct virtio_pci_modern_device to specify an override device
> id check and a DMA mask.
>
> int (*device_id_check)(struct pci_dev *pdev);
>   If defined by the driver, this function will be called to check
>   that the PCI device is the vendor's expected device, and will
>   return the found device id to be stored in mdev->id.device.
>   This allows vendors with alternative vendor device ids to use
>   this library on their own device BAR.
>
> u64 dma_mask;
>   If defined by the driver, this mask will be used in a call to
>   dma_set_mask_and_coherent() instead of the traditional
>   DMA_BIT_MASK(64).  This allows limiting the DMA space on
>   vendor devices with address limitations.
>
> Signed-off-by: Shannon Nelson 
> ---
>  drivers/virtio/virtio_pci_modern_dev.c | 37 +-
>  include/linux/virtio_pci_modern.h  |  6 +
>  2 files changed, 31 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
> b/drivers/virtio/virtio_pci_modern_dev.c
> index 869cb46bef96..1f2db76e8f91 100644
> --- a/drivers/virtio/virtio_pci_modern_dev.c
> +++ b/drivers/virtio/virtio_pci_modern_dev.c
> @@ -218,21 +218,29 @@ int vp_modern_probe(struct virtio_pci_modern_device 
> *mdev)
>   int err, common, isr, notify, device;
>   u32 notify_length;
>   u32 notify_offset;
> + int devid;
>
>   check_offsets();
>
> - /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */
> - if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
> - return -ENODEV;
> -
> - if (pci_dev->device < 0x1040) {
> - /* Transitional devices: use the PCI subsystem device id as
> -  * virtio device id, same as legacy driver always did.
> -  */
> - mdev->id.device = pci_dev->subsystem_device;
> + if (mdev->device_id_check) {
> + devid = mdev->device_id_check(pci_dev);
> + if (devid < 0)
> + return devid;

I would want to know is there any other reason to return the errno?
How about return -ENODEV directly?

Thanks.


> + mdev->id.device = devid;
>   } else {
> - /* Modern devices: simply use PCI device id, but start from 
> 0x1040. */
> - mdev->id.device = pci_dev->device - 0x1040;
> + /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. 
> */
> + if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
> + return -ENODEV;
> +
> + if (pci_dev->device < 0x1040) {
> + /* Transitional devices: use the PCI subsystem device 
> id as
> +  * virtio device id, same as legacy driver always did.
> +  */
> + mdev->id.device = pci_dev->subsystem_device;
> + } else {
> + /* Modern devices: simply use PCI device id, but start 
> from 0x1040. */
> + mdev->id.device = pci_dev->device - 0x1040;
> + }
>   }
>   mdev->id.vendor = pci_dev->subsystem_vendor;
>
> @@ -260,7 +268,12 @@ int vp_modern_probe(struct virtio_pci_modern_device 
> *mdev)
>   return -EINVAL;
>   }
>
> - err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
> + if (mdev->dma_mask)
> + err = dma_set_mask_and_coherent(&pci_dev->dev,
> + mdev->dma_mask);
> + else
> + err = dma_set_mask_and_coherent(&pci_dev->dev,
> + DMA_BIT_MASK(64));
>   if (err)
>   err = dma_set_mask_and_coherent(&pci_dev->dev,
>   DMA_BIT_MASK(32));
> diff --git a/include/linux/virtio_pci_modern.h 
> b/include/linux/virtio_pci_modern.h
> index c4eeb79b0139..067ac1d789bc 100644
> --- a/include/linux/virtio_pci_modern.h
> +++ b/include/linux/virtio_pci_modern.h
> @@ -38,6 +38,12 @@ struct virtio_pci_modern_device {
>   int modern_bars;
>
>   struct virtio_device_id id;
> +
> + /* optional check for vendor virtio device, returns dev_id or -ERRNO */
> + int (*device_id_check)(struct pci_dev *pdev);
> +
> + /* optional mask for devices with limited DMA space */
> + u64 dma_mask;
>  };
>
>  /*
> --
> 2.17.1
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 virtio 08/10] pds_vdpa: add support for vdpa and vdpamgmt interfaces

2023-04-25 Thread Shannon Nelson via Virtualization
This is the vDPA device support, where we advertise that we can
support the virtio queues and deal with the configuration work
through the pds_core's adminq.

Signed-off-by: Shannon Nelson 
---
 drivers/vdpa/pds/aux_drv.c  |  15 +
 drivers/vdpa/pds/aux_drv.h  |   1 +
 drivers/vdpa/pds/debugfs.c  | 261 ++
 drivers/vdpa/pds/debugfs.h  |   5 +
 drivers/vdpa/pds/vdpa_dev.c | 532 +++-
 5 files changed, 813 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
index 0c4a135b1484..186e9ee22eb1 100644
--- a/drivers/vdpa/pds/aux_drv.c
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -63,8 +63,21 @@ static int pds_vdpa_probe(struct auxiliary_device *aux_dev,
goto err_free_mgmt_info;
}
 
+   /* Let vdpa know that we can provide devices */
+   err = vdpa_mgmtdev_register(&vdpa_aux->vdpa_mdev);
+   if (err) {
+   dev_err(dev, "%s: Failed to initialize vdpa_mgmt interface: 
%pe\n",
+   __func__, ERR_PTR(err));
+   goto err_free_virtio;
+   }
+
+   pds_vdpa_debugfs_add_pcidev(vdpa_aux);
+   pds_vdpa_debugfs_add_ident(vdpa_aux);
+
return 0;
 
+err_free_virtio:
+   vp_modern_remove(&vdpa_aux->vd_mdev);
 err_free_mgmt_info:
pci_free_irq_vectors(padev->vf_pdev);
 err_free_mem:
@@ -79,9 +92,11 @@ static void pds_vdpa_remove(struct auxiliary_device *aux_dev)
struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
struct device *dev = &aux_dev->dev;
 
+   vdpa_mgmtdev_unregister(&vdpa_aux->vdpa_mdev);
vp_modern_remove(&vdpa_aux->vd_mdev);
pci_free_irq_vectors(vdpa_aux->padev->vf_pdev);
 
+   pds_vdpa_debugfs_del_vdpadev(vdpa_aux);
kfree(vdpa_aux);
auxiliary_set_drvdata(aux_dev, NULL);
 
diff --git a/drivers/vdpa/pds/aux_drv.h b/drivers/vdpa/pds/aux_drv.h
index 99e0ff340bfa..26b75344156e 100644
--- a/drivers/vdpa/pds/aux_drv.h
+++ b/drivers/vdpa/pds/aux_drv.h
@@ -13,6 +13,7 @@ struct pds_vdpa_aux {
struct pds_auxiliary_dev *padev;
 
struct vdpa_mgmt_dev vdpa_mdev;
+   struct pds_vdpa_device *pdsv;
 
struct pds_vdpa_ident ident;
 
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
index d91dceb07380..0ecd0e2ec6b9 100644
--- a/drivers/vdpa/pds/debugfs.c
+++ b/drivers/vdpa/pds/debugfs.c
@@ -10,6 +10,7 @@
 #include 
 
 #include "aux_drv.h"
+#include "vdpa_dev.h"
 #include "debugfs.h"
 
 static struct dentry *dbfs_dir;
@@ -24,3 +25,263 @@ void pds_vdpa_debugfs_destroy(void)
debugfs_remove_recursive(dbfs_dir);
dbfs_dir = NULL;
 }
+
+#define PRINT_SBIT_NAME(__seq, __f, __name) \
+   do {\
+   if ((__f) & (__name))   \
+   seq_printf(__seq, " %s", &#__name[16]); \
+   } while (0)
+
+static void print_status_bits(struct seq_file *seq, u8 status)
+{
+   seq_puts(seq, "status:");
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER);
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_DRIVER_OK);
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FEATURES_OK);
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_NEEDS_RESET);
+   PRINT_SBIT_NAME(seq, status, VIRTIO_CONFIG_S_FAILED);
+   seq_puts(seq, "\n");
+}
+
+static void print_feature_bits_all(struct seq_file *seq, u64 features)
+{
+   int i;
+
+   seq_puts(seq, "features:");
+
+   for (i = 0; i < (sizeof(u64) * 8); i++) {
+   u64 mask = BIT_ULL(i);
+
+   switch (features & mask) {
+   case BIT_ULL(VIRTIO_NET_F_CSUM):
+   seq_puts(seq, " VIRTIO_NET_F_CSUM");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_GUEST_CSUM):
+   seq_puts(seq, " VIRTIO_NET_F_GUEST_CSUM");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS):
+   seq_puts(seq, " VIRTIO_NET_F_CTRL_GUEST_OFFLOADS");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_MTU):
+   seq_puts(seq, " VIRTIO_NET_F_MTU");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_MAC):
+   seq_puts(seq, " VIRTIO_NET_F_MAC");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_GUEST_TSO4):
+   seq_puts(seq, " VIRTIO_NET_F_GUEST_TSO4");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_GUEST_TSO6):
+   seq_puts(seq, " VIRTIO_NET_F_GUEST_TSO6");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_GUEST_ECN):
+   seq_puts(seq, " VIRTIO_NET_F_GUEST_ECN");
+   break;
+   case BIT_ULL(VIRTIO_NET_F_GUEST_UFO):
+ 

[PATCH v4 virtio 09/10] pds_vdpa: subscribe to the pds_core events

2023-04-25 Thread Shannon Nelson via Virtualization
Register for the pds_core's notification events, primarily to
find out when the FW has been reset so we can pass this on
back up the chain.

Signed-off-by: Shannon Nelson 
---
 drivers/vdpa/pds/vdpa_dev.c | 68 -
 drivers/vdpa/pds/vdpa_dev.h |  1 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
index c3316f0faa0c..93b12f73423f 100644
--- a/drivers/vdpa/pds/vdpa_dev.c
+++ b/drivers/vdpa/pds/vdpa_dev.c
@@ -21,6 +21,61 @@ static struct pds_vdpa_device *vdpa_to_pdsv(struct 
vdpa_device *vdpa_dev)
return container_of(vdpa_dev, struct pds_vdpa_device, vdpa_dev);
 }
 
+static int pds_vdpa_notify_handler(struct notifier_block *nb,
+  unsigned long ecode,
+  void *data)
+{
+   struct pds_vdpa_device *pdsv = container_of(nb, struct pds_vdpa_device, 
nb);
+   struct device *dev = &pdsv->vdpa_aux->padev->aux_dev.dev;
+
+   dev_dbg(dev, "%s: event code %lu\n", __func__, ecode);
+
+   /* Give the upper layers a hint that something interesting
+* may have happened.  It seems that the only thing this
+* triggers in the virtio-net drivers above us is a check
+* of link status.
+*
+* We don't set the NEEDS_RESET flag for EVENT_RESET
+* because we're likely going through a recovery or
+* fw_update and will be back up and running soon.
+*/
+   if (ecode == PDS_EVENT_RESET || ecode == PDS_EVENT_LINK_CHANGE) {
+   if (pdsv->config_cb.callback)
+   pdsv->config_cb.callback(pdsv->config_cb.private);
+   }
+
+   return 0;
+}
+
+static int pds_vdpa_register_event_handler(struct pds_vdpa_device *pdsv)
+{
+   struct device *dev = &pdsv->vdpa_aux->padev->aux_dev.dev;
+   struct notifier_block *nb = &pdsv->nb;
+   int err;
+
+   if (!nb->notifier_call) {
+   nb->notifier_call = pds_vdpa_notify_handler;
+   err = pdsc_register_notify(nb);
+   if (err) {
+   nb->notifier_call = NULL;
+   dev_err(dev, "failed to register pds event handler: 
%ps\n",
+   ERR_PTR(err));
+   return -EINVAL;
+   }
+   dev_dbg(dev, "pds event handler registered\n");
+   }
+
+   return 0;
+}
+
+static void pds_vdpa_unregister_event_handler(struct pds_vdpa_device *pdsv)
+{
+   if (pdsv->nb.notifier_call) {
+   pdsc_unregister_notify(&pdsv->nb);
+   pdsv->nb.notifier_call = NULL;
+   }
+}
+
 static int pds_vdpa_set_vq_address(struct vdpa_device *vdpa_dev, u16 qid,
   u64 desc_addr, u64 driver_addr, u64 
device_addr)
 {
@@ -522,6 +577,12 @@ static int pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, 
const char *name,
 
pdsv->vdpa_dev.mdev = &vdpa_aux->vdpa_mdev;
 
+   err = pds_vdpa_register_event_handler(pdsv);
+   if (err) {
+   dev_err(dev, "Failed to register for PDS events: %pe\n", 
ERR_PTR(err));
+   goto err_unmap;
+   }
+
/* We use the _vdpa_register_device() call rather than the
 * vdpa_register_device() to avoid a deadlock because our
 * dev_add() is called with the vdpa_dev_lock already set
@@ -530,13 +591,15 @@ static int pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, 
const char *name,
err = _vdpa_register_device(&pdsv->vdpa_dev, pdsv->num_vqs);
if (err) {
dev_err(dev, "Failed to register to vDPA bus: %pe\n", 
ERR_PTR(err));
-   goto err_unmap;
+   goto err_unevent;
}
 
pds_vdpa_debugfs_add_vdpadev(vdpa_aux);
 
return 0;
 
+err_unevent:
+   pds_vdpa_unregister_event_handler(pdsv);
 err_unmap:
put_device(&pdsv->vdpa_dev.dev);
vdpa_aux->pdsv = NULL;
@@ -546,8 +609,11 @@ static int pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, 
const char *name,
 static void pds_vdpa_dev_del(struct vdpa_mgmt_dev *mdev,
 struct vdpa_device *vdpa_dev)
 {
+   struct pds_vdpa_device *pdsv = vdpa_to_pdsv(vdpa_dev);
struct pds_vdpa_aux *vdpa_aux;
 
+   pds_vdpa_unregister_event_handler(pdsv);
+
vdpa_aux = container_of(mdev, struct pds_vdpa_aux, vdpa_mdev);
_vdpa_unregister_device(vdpa_dev);
 
diff --git a/drivers/vdpa/pds/vdpa_dev.h b/drivers/vdpa/pds/vdpa_dev.h
index a21596f438c1..1650a2b08845 100644
--- a/drivers/vdpa/pds/vdpa_dev.h
+++ b/drivers/vdpa/pds/vdpa_dev.h
@@ -40,6 +40,7 @@ struct pds_vdpa_device {
u8 vdpa_index;  /* rsvd for future subdevice use */
u8 num_vqs; /* num vqs in use */
struct vdpa_callback config_cb;
+   struct notifier_block nb;
 };
 
 int pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux);
-- 
2.17.1

___

[PATCH v4 virtio 07/10] pds_vdpa: add vdpa config client commands

2023-04-25 Thread Shannon Nelson via Virtualization
These are the adminq commands that will be needed for
setting up and using the vDPA device.  There are a number
of commands defined in the FW's API, but by making use of
the FW's virtio BAR we only need a few of these commands
for vDPA support.

Signed-off-by: Shannon Nelson 
---

Note: the previous version was Acked-by Jason Wang, but this has gone
  through some rework of the adminq command API usage and added
  two new commands.  It is still essentially the same code, but
  I've dropped the Acked-by for now.

 drivers/vdpa/pds/Makefile   |   1 +
 drivers/vdpa/pds/cmds.c | 207 
 drivers/vdpa/pds/cmds.h |  20 
 drivers/vdpa/pds/vdpa_dev.h |  33 +-
 4 files changed, 260 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vdpa/pds/cmds.c
 create mode 100644 drivers/vdpa/pds/cmds.h

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index 13b50394ec64..2e22418e3ab3 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -4,6 +4,7 @@
 obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
 
 pds_vdpa-y := aux_drv.o \
+ cmds.o \
  vdpa_dev.o
 
 pds_vdpa-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/vdpa/pds/cmds.c b/drivers/vdpa/pds/cmds.c
new file mode 100644
index ..405711a0a0f8
--- /dev/null
+++ b/drivers/vdpa/pds/cmds.c
@@ -0,0 +1,207 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Advanced Micro Devices, Inc */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "vdpa_dev.h"
+#include "aux_drv.h"
+#include "cmds.h"
+
+int pds_vdpa_init_hw(struct pds_vdpa_device *pdsv)
+{
+   struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+   struct device *dev = &padev->aux_dev.dev;
+   union pds_core_adminq_cmd cmd = {
+   .vdpa_init.opcode = PDS_VDPA_CMD_INIT,
+   .vdpa_init.vdpa_index = pdsv->vdpa_index,
+   .vdpa_init.vf_id = cpu_to_le16(pdsv->vdpa_aux->vf_id),
+   };
+   union pds_core_adminq_comp comp = {};
+   int err;
+
+   /* Initialize the vdpa/virtio device */
+   err = pds_client_adminq_cmd(padev, &cmd, sizeof(cmd.vdpa_init),
+   &comp, 0);
+   if (err)
+   dev_dbg(dev, "Failed to init hw, status %d: %pe\n",
+   comp.status, ERR_PTR(err));
+
+   return err;
+}
+
+int pds_vdpa_cmd_reset(struct pds_vdpa_device *pdsv)
+{
+   struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+   struct device *dev = &padev->aux_dev.dev;
+   union pds_core_adminq_cmd cmd = {
+   .vdpa.opcode = PDS_VDPA_CMD_RESET,
+   .vdpa.vdpa_index = pdsv->vdpa_index,
+   .vdpa.vf_id = cpu_to_le16(pdsv->vdpa_aux->vf_id),
+   };
+   union pds_core_adminq_comp comp = {};
+   int err;
+
+   err = pds_client_adminq_cmd(padev, &cmd, sizeof(cmd.vdpa), &comp, 0);
+   if (err)
+   dev_dbg(dev, "Failed to reset hw, status %d: %pe\n",
+   comp.status, ERR_PTR(err));
+
+   return err;
+}
+
+int pds_vdpa_cmd_set_mac(struct pds_vdpa_device *pdsv, u8 *mac)
+{
+   struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+   struct device *dev = &padev->aux_dev.dev;
+   union pds_core_adminq_cmd cmd = {
+   .vdpa_setattr.opcode = PDS_VDPA_CMD_SET_ATTR,
+   .vdpa_setattr.vdpa_index = pdsv->vdpa_index,
+   .vdpa_setattr.vf_id = cpu_to_le16(pdsv->vdpa_aux->vf_id),
+   .vdpa_setattr.attr = PDS_VDPA_ATTR_MAC,
+   };
+   union pds_core_adminq_comp comp = {};
+   int err;
+
+   ether_addr_copy(cmd.vdpa_setattr.mac, mac);
+   err = pds_client_adminq_cmd(padev, &cmd, sizeof(cmd.vdpa_setattr),
+   &comp, 0);
+   if (err)
+   dev_dbg(dev, "Failed to set mac address %pM, status %d: %pe\n",
+   mac, comp.status, ERR_PTR(err));
+
+   return err;
+}
+
+int pds_vdpa_cmd_set_max_vq_pairs(struct pds_vdpa_device *pdsv, u16 max_vqp)
+{
+   struct pds_auxiliary_dev *padev = pdsv->vdpa_aux->padev;
+   struct device *dev = &padev->aux_dev.dev;
+   union pds_core_adminq_cmd cmd = {
+   .vdpa_setattr.opcode = PDS_VDPA_CMD_SET_ATTR,
+   .vdpa_setattr.vdpa_index = pdsv->vdpa_index,
+   .vdpa_setattr.vf_id = cpu_to_le16(pdsv->vdpa_aux->vf_id),
+   .vdpa_setattr.attr = PDS_VDPA_ATTR_MAX_VQ_PAIRS,
+   .vdpa_setattr.max_vq_pairs = cpu_to_le16(max_vqp),
+   };
+   union pds_core_adminq_comp comp = {};
+   int err;
+
+   err = pds_client_adminq_cmd(padev, &cmd, sizeof(cmd.vdpa_setattr),
+   &comp, 0);
+   if (err)
+   dev_dbg(dev, "Failed to set max vq pairs %u, status %d: %pe\n",
+   max_vqp, comp.status, ERR_PTR(err));
+
+   return err;
+}
+
+int pds_vdpa_c

[PATCH v4 virtio 10/10] pds_vdpa: pds_vdps.rst and Kconfig

2023-04-25 Thread Shannon Nelson via Virtualization
Add the documentation and Kconfig entry for pds_vdpa driver.

Signed-off-by: Shannon Nelson 
---
 .../device_drivers/ethernet/amd/pds_vdpa.rst  | 85 +++
 .../device_drivers/ethernet/index.rst |  1 +
 MAINTAINERS   |  4 +
 drivers/vdpa/Kconfig  |  8 ++
 4 files changed, 98 insertions(+)
 create mode 100644 
Documentation/networking/device_drivers/ethernet/amd/pds_vdpa.rst

diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_vdpa.rst 
b/Documentation/networking/device_drivers/ethernet/amd/pds_vdpa.rst
new file mode 100644
index ..587927d3de92
--- /dev/null
+++ b/Documentation/networking/device_drivers/ethernet/amd/pds_vdpa.rst
@@ -0,0 +1,85 @@
+.. SPDX-License-Identifier: GPL-2.0+
+.. note: can be edited and viewed with /usr/bin/formiko-vim
+
+==
+PCI vDPA driver for the AMD/Pensando(R) DSC adapter family
+==
+
+AMD/Pensando vDPA VF Device Driver
+
+Copyright(c) 2023 Advanced Micro Devices, Inc
+
+Overview
+
+
+The ``pds_vdpa`` driver is an auxiliary bus driver that supplies
+a vDPA device for use by the virtio network stack.  It is used with
+the Pensando Virtual Function devices that offer vDPA and virtio queue
+services.  It depends on the ``pds_core`` driver and hardware for the PF
+and VF PCI handling as well as for device configuration services.
+
+Using the device
+
+
+The ``pds_vdpa`` device is enabled via multiple configuration steps and
+depends on the ``pds_core`` driver to create and enable SR-IOV Virtual
+Function devices.  After the VFs are enabled, we enable the vDPA service
+in the ``pds_core`` device to create the auxiliary devices used by pds_vdpa.
+
+Example steps:
+
+.. code-block:: bash
+
+  #!/bin/bash
+
+  modprobe pds_core
+  modprobe vdpa
+  modprobe pds_vdpa
+
+  PF_BDF=`ls /sys/module/pds_core/drivers/pci\:pds_core/*/sriov_numvfs | awk 
-F / '{print $7}'`
+
+  # Enable vDPA VF auxiliary device(s) in the PF
+  devlink dev param set pci/$PF_BDF name enable_vnet cmode runtime value true
+
+  # Create a VF for vDPA use
+  echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
+
+  # Find the vDPA services/devices available
+  PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
+
+  # Create a vDPA device for use in virtio network configurations
+  vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55
+
+  # Set up an ethernet interface on the vdpa device
+  modprobe virtio_vdpa
+
+
+
+Enabling the driver
+===
+
+The driver is enabled via the standard kernel configuration system,
+using the make command::
+
+  make oldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+  -> Device Drivers
+-> Network device support (NETDEVICES [=y])
+  -> Ethernet driver support
+-> Pensando devices
+  -> Pensando Ethernet PDS_VDPA Support
+
+Support
+===
+
+For general Linux networking support, please use the netdev mailing
+list, which is monitored by Pensando personnel::
+
+  net...@vger.kernel.org
+
+For more specific support needs, please use the Pensando driver support
+email::
+
+  driv...@pensando.io
diff --git a/Documentation/networking/device_drivers/ethernet/index.rst 
b/Documentation/networking/device_drivers/ethernet/index.rst
index 417ca514a4d0..94ecb67c0885 100644
--- a/Documentation/networking/device_drivers/ethernet/index.rst
+++ b/Documentation/networking/device_drivers/ethernet/index.rst
@@ -15,6 +15,7 @@ Contents:
amazon/ena
altera/altera_tse
amd/pds_core
+   amd/pds_vdpa
aquantia/atlantic
chelsio/cxgb
cirrus/cs89x0
diff --git a/MAINTAINERS b/MAINTAINERS
index 6ac562e0381e..93210a8ac74f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22148,6 +22148,10 @@ SNET DPU VIRTIO DATA PATH ACCELERATOR
 R: Alvaro Karsz 
 F: drivers/vdpa/solidrun/
 
+PDS DSC VIRTIO DATA PATH ACCELERATOR
+R: Shannon Nelson 
+F: drivers/vdpa/pds/
+
 VIRTIO BALLOON
 M: "Michael S. Tsirkin" 
 M: David Hildenbrand 
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index cd6ad92f3f05..2ee1b288691d 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -116,4 +116,12 @@ config ALIBABA_ENI_VDPA
  This driver includes a HW monitor device that
  reads health values from the DPU.
 
+config PDS_VDPA
+   tristate "vDPA driver for AMD/Pensando DSC devices"
+   depends on PDS_CORE
+   help
+ vDPA network driver for AMD/Pensando's PDS Core devices.
+ With this driver, the VirtIO dataplane can be
+ offloaded to an AMD/Pensando DSC device.
+
 endif # VDPA
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 virtio 06/10] pds_vdpa: virtio bar setup for vdpa

2023-04-25 Thread Shannon Nelson via Virtualization
Prep and use the "modern" virtio bar utilities to get our
virtio config space ready.

Signed-off-by: Shannon Nelson 
---
 drivers/vdpa/pds/aux_drv.c | 25 +
 drivers/vdpa/pds/aux_drv.h |  3 +++
 2 files changed, 28 insertions(+)

diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
index aa748cf55d2b..0c4a135b1484 100644
--- a/drivers/vdpa/pds/aux_drv.c
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -19,12 +20,22 @@ static const struct auxiliary_device_id pds_vdpa_id_table[] 
= {
{},
 };
 
+static int pds_vdpa_device_id_check(struct pci_dev *pdev)
+{
+   if (pdev->device != PCI_DEVICE_ID_PENSANDO_VDPA_VF ||
+   pdev->vendor != PCI_VENDOR_ID_PENSANDO)
+   return -ENODEV;
+
+   return PCI_DEVICE_ID_PENSANDO_VDPA_VF;
+}
+
 static int pds_vdpa_probe(struct auxiliary_device *aux_dev,
  const struct auxiliary_device_id *id)
 
 {
struct pds_auxiliary_dev *padev =
container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
+   struct device *dev = &aux_dev->dev;
struct pds_vdpa_aux *vdpa_aux;
int err;
 
@@ -41,8 +52,21 @@ static int pds_vdpa_probe(struct auxiliary_device *aux_dev,
if (err)
goto err_free_mem;
 
+   /* Find the virtio configuration */
+   vdpa_aux->vd_mdev.pci_dev = padev->vf_pdev;
+   vdpa_aux->vd_mdev.device_id_check = pds_vdpa_device_id_check;
+   vdpa_aux->vd_mdev.dma_mask = DMA_BIT_MASK(PDS_CORE_ADDR_LEN);
+   err = vp_modern_probe(&vdpa_aux->vd_mdev);
+   if (err) {
+   dev_err(dev, "Unable to probe for virtio configuration: %pe\n",
+   ERR_PTR(err));
+   goto err_free_mgmt_info;
+   }
+
return 0;
 
+err_free_mgmt_info:
+   pci_free_irq_vectors(padev->vf_pdev);
 err_free_mem:
kfree(vdpa_aux);
auxiliary_set_drvdata(aux_dev, NULL);
@@ -55,6 +79,7 @@ static void pds_vdpa_remove(struct auxiliary_device *aux_dev)
struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
struct device *dev = &aux_dev->dev;
 
+   vp_modern_remove(&vdpa_aux->vd_mdev);
pci_free_irq_vectors(vdpa_aux->padev->vf_pdev);
 
kfree(vdpa_aux);
diff --git a/drivers/vdpa/pds/aux_drv.h b/drivers/vdpa/pds/aux_drv.h
index dcec782e79eb..99e0ff340bfa 100644
--- a/drivers/vdpa/pds/aux_drv.h
+++ b/drivers/vdpa/pds/aux_drv.h
@@ -4,6 +4,8 @@
 #ifndef _AUX_DRV_H_
 #define _AUX_DRV_H_
 
+#include 
+
 #define PDS_VDPA_DRV_DESCRIPTION"AMD/Pensando vDPA VF Device Driver"
 #define PDS_VDPA_DRV_NAME   KBUILD_MODNAME
 
@@ -16,6 +18,7 @@ struct pds_vdpa_aux {
 
int vf_id;
struct dentry *dentry;
+   struct virtio_pci_modern_device vd_mdev;
 
int nintrs;
 };
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 virtio 02/10] pds_vdpa: Add new vDPA driver for AMD/Pensando DSC

2023-04-25 Thread Shannon Nelson via Virtualization
This is the initial auxiliary driver framework for a new vDPA
device driver, an auxiliary_bus client of the pds_core driver.
The pds_core driver supplies the PCI services for the VF device
and for accessing the adminq in the PF device.

This patch adds the very basics of registering for the auxiliary
device and setting up debugfs entries.

Signed-off-by: Shannon Nelson 
Acked-by: Jason Wang 
---
 drivers/vdpa/Makefile  |  1 +
 drivers/vdpa/pds/Makefile  |  8 
 drivers/vdpa/pds/aux_drv.c | 83 ++
 drivers/vdpa/pds/aux_drv.h | 15 ++
 drivers/vdpa/pds/debugfs.c | 25 ++
 drivers/vdpa/pds/debugfs.h | 12 +
 include/linux/pds/pds_common.h |  2 +
 7 files changed, 146 insertions(+)
 create mode 100644 drivers/vdpa/pds/Makefile
 create mode 100644 drivers/vdpa/pds/aux_drv.c
 create mode 100644 drivers/vdpa/pds/aux_drv.h
 create mode 100644 drivers/vdpa/pds/debugfs.c
 create mode 100644 drivers/vdpa/pds/debugfs.h

diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 59396ff2a318..8f53c6f3cca7 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
 obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/
 obj-$(CONFIG_SNET_VDPA) += solidrun/
+obj-$(CONFIG_PDS_VDPA) += pds/
diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
new file mode 100644
index ..a9cd2f450ae1
--- /dev/null
+++ b/drivers/vdpa/pds/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright(c) 2023 Advanced Micro Devices, Inc
+
+obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
+
+pds_vdpa-y := aux_drv.o
+
+pds_vdpa-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
new file mode 100644
index ..e4a0ad61ea22
--- /dev/null
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Advanced Micro Devices, Inc */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "aux_drv.h"
+#include "debugfs.h"
+
+static const struct auxiliary_device_id pds_vdpa_id_table[] = {
+   { .name = PDS_VDPA_DEV_NAME, },
+   {},
+};
+
+static int pds_vdpa_probe(struct auxiliary_device *aux_dev,
+ const struct auxiliary_device_id *id)
+
+{
+   struct pds_auxiliary_dev *padev =
+   container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
+   struct pds_vdpa_aux *vdpa_aux;
+
+   vdpa_aux = kzalloc(sizeof(*vdpa_aux), GFP_KERNEL);
+   if (!vdpa_aux)
+   return -ENOMEM;
+
+   vdpa_aux->padev = padev;
+   auxiliary_set_drvdata(aux_dev, vdpa_aux);
+
+   return 0;
+}
+
+static void pds_vdpa_remove(struct auxiliary_device *aux_dev)
+{
+   struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
+   struct device *dev = &aux_dev->dev;
+
+   kfree(vdpa_aux);
+   auxiliary_set_drvdata(aux_dev, NULL);
+
+   dev_info(dev, "Removed\n");
+}
+
+static struct auxiliary_driver pds_vdpa_driver = {
+   .name = PDS_DEV_TYPE_VDPA_STR,
+   .probe = pds_vdpa_probe,
+   .remove = pds_vdpa_remove,
+   .id_table = pds_vdpa_id_table,
+};
+
+static void __exit pds_vdpa_cleanup(void)
+{
+   auxiliary_driver_unregister(&pds_vdpa_driver);
+
+   pds_vdpa_debugfs_destroy();
+}
+module_exit(pds_vdpa_cleanup);
+
+static int __init pds_vdpa_init(void)
+{
+   int err;
+
+   pds_vdpa_debugfs_create();
+
+   err = auxiliary_driver_register(&pds_vdpa_driver);
+   if (err) {
+   pr_err("%s: aux driver register failed: %pe\n",
+  PDS_VDPA_DRV_NAME, ERR_PTR(err));
+   pds_vdpa_debugfs_destroy();
+   }
+
+   return err;
+}
+module_init(pds_vdpa_init);
+
+MODULE_DESCRIPTION(PDS_VDPA_DRV_DESCRIPTION);
+MODULE_AUTHOR("Advanced Micro Devices, Inc");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vdpa/pds/aux_drv.h b/drivers/vdpa/pds/aux_drv.h
new file mode 100644
index ..f1e99359424e
--- /dev/null
+++ b/drivers/vdpa/pds/aux_drv.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Advanced Micro Devices, Inc */
+
+#ifndef _AUX_DRV_H_
+#define _AUX_DRV_H_
+
+#define PDS_VDPA_DRV_DESCRIPTION"AMD/Pensando vDPA VF Device Driver"
+#define PDS_VDPA_DRV_NAME   KBUILD_MODNAME
+
+struct pds_vdpa_aux {
+   struct pds_auxiliary_dev *padev;
+
+   struct dentry *dentry;
+};
+#endif /* _AUX_DRV_H_ */
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
new file mode 100644
index ..5be22fb7a76a
--- /dev/null
+++ b/drivers/vdpa/pds/debugfs.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Advanced Micro Devices, Inc */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "aux_drv.h"
+#include "debugfs.h"
+
+static struct dentry *dbfs_dir;
+
+void pds_vd

[PATCH v4 virtio 04/10] pds_vdpa: new adminq entries

2023-04-25 Thread Shannon Nelson via Virtualization
Add new adminq definitions in support for vDPA operations.

Signed-off-by: Shannon Nelson 
---
 include/linux/pds/pds_adminq.h | 266 +
 1 file changed, 266 insertions(+)

diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h
index 61b0a8634e1a..c66ead725434 100644
--- a/include/linux/pds/pds_adminq.h
+++ b/include/linux/pds/pds_adminq.h
@@ -605,6 +605,257 @@ struct pds_core_q_init_comp {
u8 color;
 };
 
+/*
+ * enum pds_vdpa_cmd_opcode - vDPA Device commands
+ */
+enum pds_vdpa_cmd_opcode {
+   PDS_VDPA_CMD_INIT   = 48,
+   PDS_VDPA_CMD_IDENT  = 49,
+   PDS_VDPA_CMD_RESET  = 51,
+   PDS_VDPA_CMD_VQ_RESET   = 52,
+   PDS_VDPA_CMD_VQ_INIT= 53,
+   PDS_VDPA_CMD_STATUS_UPDATE  = 54,
+   PDS_VDPA_CMD_SET_FEATURES   = 55,
+   PDS_VDPA_CMD_SET_ATTR   = 56,
+   PDS_VDPA_CMD_VQ_SET_STATE   = 57,
+   PDS_VDPA_CMD_VQ_GET_STATE   = 58,
+};
+
+/**
+ * struct pds_vdpa_cmd - generic command
+ * @opcode:Opcode
+ * @vdpa_index:Index for vdpa subdevice
+ * @vf_id: VF id
+ */
+struct pds_vdpa_cmd {
+   u8 opcode;
+   u8 vdpa_index;
+   __le16 vf_id;
+};
+
+/**
+ * struct pds_vdpa_init_cmd - INIT command
+ * @opcode:Opcode PDS_VDPA_CMD_INIT
+ * @vdpa_index: Index for vdpa subdevice
+ * @vf_id: VF id
+ */
+struct pds_vdpa_init_cmd {
+   u8 opcode;
+   u8 vdpa_index;
+   __le16 vf_id;
+};
+
+/**
+ * struct pds_vdpa_ident - vDPA identification data
+ * @hw_features:   vDPA features supported by device
+ * @max_vqs:   max queues available (2 queues for a single queuepair)
+ * @max_qlen:  log(2) of maximum number of descriptors
+ * @min_qlen:  log(2) of minimum number of descriptors
+ *
+ * This struct is used in a DMA block that is set up for the PDS_VDPA_CMD_IDENT
+ * transaction.  Set up the DMA block and send the address in the IDENT cmd
+ * data, the DSC will write the ident information, then we can remove the DMA
+ * block after reading the answer.  If the completion status is 0, then there
+ * is valid information, else there was an error and the data should be 
invalid.
+ */
+struct pds_vdpa_ident {
+   __le64 hw_features;
+   __le16 max_vqs;
+   __le16 max_qlen;
+   __le16 min_qlen;
+};
+
+/**
+ * struct pds_vdpa_ident_cmd - IDENT command
+ * @opcode:Opcode PDS_VDPA_CMD_IDENT
+ * @rsvd:   Word boundary padding
+ * @vf_id: VF id
+ * @len:   length of ident info DMA space
+ * @ident_pa:  address for DMA of ident info (struct pds_vdpa_ident)
+ * only used for this transaction, then forgotten by DSC
+ */
+struct pds_vdpa_ident_cmd {
+   u8 opcode;
+   u8 rsvd;
+   __le16 vf_id;
+   __le32 len;
+   __le64 ident_pa;
+};
+
+/**
+ * struct pds_vdpa_status_cmd - STATUS_UPDATE command
+ * @opcode:Opcode PDS_VDPA_CMD_STATUS_UPDATE
+ * @vdpa_index: Index for vdpa subdevice
+ * @vf_id: VF id
+ * @status:new status bits
+ */
+struct pds_vdpa_status_cmd {
+   u8 opcode;
+   u8 vdpa_index;
+   __le16 vf_id;
+   u8 status;
+};
+
+/**
+ * enum pds_vdpa_attr - List of VDPA device attributes
+ * @PDS_VDPA_ATTR_MAC:  MAC address
+ * @PDS_VDPA_ATTR_MAX_VQ_PAIRS: Max virtqueue pairs
+ */
+enum pds_vdpa_attr {
+   PDS_VDPA_ATTR_MAC  = 1,
+   PDS_VDPA_ATTR_MAX_VQ_PAIRS = 2,
+};
+
+/**
+ * struct pds_vdpa_setattr_cmd - SET_ATTR command
+ * @opcode:Opcode PDS_VDPA_CMD_SET_ATTR
+ * @vdpa_index:Index for vdpa subdevice
+ * @vf_id: VF id
+ * @attr:  attribute to be changed (enum pds_vdpa_attr)
+ * @pad:   Word boundary padding
+ * @mac:   new mac address to be assigned as vdpa device address
+ * @max_vq_pairs:  new limit of virtqueue pairs
+ */
+struct pds_vdpa_setattr_cmd {
+   u8 opcode;
+   u8 vdpa_index;
+   __le16 vf_id;
+   u8 attr;
+   u8 pad[3];
+   union {
+   u8 mac[6];
+   __le16 max_vq_pairs;
+   } __packed;
+};
+
+/**
+ * struct pds_vdpa_vq_init_cmd - queue init command
+ * @opcode: Opcode PDS_VDPA_CMD_VQ_INIT
+ * @vdpa_index:Index for vdpa subdevice
+ * @vf_id: VF id
+ * @qid:   Queue id (bit0 clear = rx, bit0 set = tx, qid=N is ctrlq)
+ * @len:   log(2) of max descriptor count
+ * @desc_addr: DMA address of descriptor area
+ * @avail_addr:DMA address of available descriptors (aka driver area)
+ * @used_addr: DMA address of used descriptors (aka device area)
+ * @intr_index:interrupt index
+ */
+struct pds_vdpa_vq_init_cmd {
+   u8 opcode;
+   u8 vdpa_index;
+   __le16 vf_id;
+   __le16 qid;
+   __le16 len;
+   __le64 desc_addr;
+   __le64 avail_addr;
+   __le64 used_addr;
+   __le16 intr_index;
+};

[PATCH v4 virtio 03/10] pds_vdpa: move enum from common to adminq header

2023-04-25 Thread Shannon Nelson via Virtualization
The pds_core_logical_qtype enum and IFNAMSIZ are not needed
in the common PDS header, only needed when working with the
adminq, so move them to the adminq header.

Note: This patch might conflict with pds_vfio patches that are
  in review, depending on which patchset gets pulled first.

Signed-off-by: Shannon Nelson 
---
 include/linux/pds/pds_adminq.h | 21 +
 include/linux/pds/pds_common.h | 21 -
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h
index 98a60ce87b92..61b0a8634e1a 100644
--- a/include/linux/pds/pds_adminq.h
+++ b/include/linux/pds/pds_adminq.h
@@ -222,6 +222,27 @@ enum pds_core_lif_type {
PDS_CORE_LIF_TYPE_DEFAULT = 0,
 };
 
+#define PDS_CORE_IFNAMSIZ  16
+
+/**
+ * enum pds_core_logical_qtype - Logical Queue Types
+ * @PDS_CORE_QTYPE_ADMINQ:Administrative Queue
+ * @PDS_CORE_QTYPE_NOTIFYQ:   Notify Queue
+ * @PDS_CORE_QTYPE_RXQ:   Receive Queue
+ * @PDS_CORE_QTYPE_TXQ:   Transmit Queue
+ * @PDS_CORE_QTYPE_EQ:Event Queue
+ * @PDS_CORE_QTYPE_MAX:   Max queue type supported
+ */
+enum pds_core_logical_qtype {
+   PDS_CORE_QTYPE_ADMINQ  = 0,
+   PDS_CORE_QTYPE_NOTIFYQ = 1,
+   PDS_CORE_QTYPE_RXQ = 2,
+   PDS_CORE_QTYPE_TXQ = 3,
+   PDS_CORE_QTYPE_EQ  = 4,
+
+   PDS_CORE_QTYPE_MAX = 16   /* don't change - used in struct size */
+};
+
 /**
  * union pds_core_lif_config - LIF configuration
  * @state: LIF state (enum pds_core_lif_state)
diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h
index 2a0d1669cfd0..435c8e8161c2 100644
--- a/include/linux/pds/pds_common.h
+++ b/include/linux/pds/pds_common.h
@@ -41,27 +41,6 @@ enum pds_core_vif_types {
 
 #define PDS_VDPA_DEV_NAME  PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR
 
-#define PDS_CORE_IFNAMSIZ  16
-
-/**
- * enum pds_core_logical_qtype - Logical Queue Types
- * @PDS_CORE_QTYPE_ADMINQ:Administrative Queue
- * @PDS_CORE_QTYPE_NOTIFYQ:   Notify Queue
- * @PDS_CORE_QTYPE_RXQ:   Receive Queue
- * @PDS_CORE_QTYPE_TXQ:   Transmit Queue
- * @PDS_CORE_QTYPE_EQ:Event Queue
- * @PDS_CORE_QTYPE_MAX:   Max queue type supported
- */
-enum pds_core_logical_qtype {
-   PDS_CORE_QTYPE_ADMINQ  = 0,
-   PDS_CORE_QTYPE_NOTIFYQ = 1,
-   PDS_CORE_QTYPE_RXQ = 2,
-   PDS_CORE_QTYPE_TXQ = 3,
-   PDS_CORE_QTYPE_EQ  = 4,
-
-   PDS_CORE_QTYPE_MAX = 16   /* don't change - used in struct size */
-};
-
 int pdsc_register_notify(struct notifier_block *nb);
 void pdsc_unregister_notify(struct notifier_block *nb);
 void *pdsc_get_pf_struct(struct pci_dev *vf_pdev);
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 virtio 05/10] pds_vdpa: get vdpa management info

2023-04-25 Thread Shannon Nelson via Virtualization
Find the vDPA management information from the DSC in order to
advertise it to the vdpa subsystem.

Signed-off-by: Shannon Nelson 
Acked-by: Jason Wang 
---
 drivers/vdpa/pds/Makefile   |   3 +-
 drivers/vdpa/pds/aux_drv.c  |  17 ++
 drivers/vdpa/pds/aux_drv.h  |   7 +++
 drivers/vdpa/pds/debugfs.c  |   1 +
 drivers/vdpa/pds/vdpa_dev.c | 108 
 drivers/vdpa/pds/vdpa_dev.h |  15 +
 6 files changed, 150 insertions(+), 1 deletion(-)
 create mode 100644 drivers/vdpa/pds/vdpa_dev.c
 create mode 100644 drivers/vdpa/pds/vdpa_dev.h

diff --git a/drivers/vdpa/pds/Makefile b/drivers/vdpa/pds/Makefile
index a9cd2f450ae1..13b50394ec64 100644
--- a/drivers/vdpa/pds/Makefile
+++ b/drivers/vdpa/pds/Makefile
@@ -3,6 +3,7 @@
 
 obj-$(CONFIG_PDS_VDPA) := pds_vdpa.o
 
-pds_vdpa-y := aux_drv.o
+pds_vdpa-y := aux_drv.o \
+ vdpa_dev.o
 
 pds_vdpa-$(CONFIG_DEBUG_FS) += debugfs.o
diff --git a/drivers/vdpa/pds/aux_drv.c b/drivers/vdpa/pds/aux_drv.c
index e4a0ad61ea22..aa748cf55d2b 100644
--- a/drivers/vdpa/pds/aux_drv.c
+++ b/drivers/vdpa/pds/aux_drv.c
@@ -3,6 +3,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -11,6 +12,7 @@
 
 #include "aux_drv.h"
 #include "debugfs.h"
+#include "vdpa_dev.h"
 
 static const struct auxiliary_device_id pds_vdpa_id_table[] = {
{ .name = PDS_VDPA_DEV_NAME, },
@@ -24,15 +26,28 @@ static int pds_vdpa_probe(struct auxiliary_device *aux_dev,
struct pds_auxiliary_dev *padev =
container_of(aux_dev, struct pds_auxiliary_dev, aux_dev);
struct pds_vdpa_aux *vdpa_aux;
+   int err;
 
vdpa_aux = kzalloc(sizeof(*vdpa_aux), GFP_KERNEL);
if (!vdpa_aux)
return -ENOMEM;
 
vdpa_aux->padev = padev;
+   vdpa_aux->vf_id = pci_iov_vf_id(padev->vf_pdev);
auxiliary_set_drvdata(aux_dev, vdpa_aux);
 
+   /* Get device ident info and set up the vdpa_mgmt_dev */
+   err = pds_vdpa_get_mgmt_info(vdpa_aux);
+   if (err)
+   goto err_free_mem;
+
return 0;
+
+err_free_mem:
+   kfree(vdpa_aux);
+   auxiliary_set_drvdata(aux_dev, NULL);
+
+   return err;
 }
 
 static void pds_vdpa_remove(struct auxiliary_device *aux_dev)
@@ -40,6 +55,8 @@ static void pds_vdpa_remove(struct auxiliary_device *aux_dev)
struct pds_vdpa_aux *vdpa_aux = auxiliary_get_drvdata(aux_dev);
struct device *dev = &aux_dev->dev;
 
+   pci_free_irq_vectors(vdpa_aux->padev->vf_pdev);
+
kfree(vdpa_aux);
auxiliary_set_drvdata(aux_dev, NULL);
 
diff --git a/drivers/vdpa/pds/aux_drv.h b/drivers/vdpa/pds/aux_drv.h
index f1e99359424e..dcec782e79eb 100644
--- a/drivers/vdpa/pds/aux_drv.h
+++ b/drivers/vdpa/pds/aux_drv.h
@@ -10,6 +10,13 @@
 struct pds_vdpa_aux {
struct pds_auxiliary_dev *padev;
 
+   struct vdpa_mgmt_dev vdpa_mdev;
+
+   struct pds_vdpa_ident ident;
+
+   int vf_id;
struct dentry *dentry;
+
+   int nintrs;
 };
 #endif /* _AUX_DRV_H_ */
diff --git a/drivers/vdpa/pds/debugfs.c b/drivers/vdpa/pds/debugfs.c
index 5be22fb7a76a..d91dceb07380 100644
--- a/drivers/vdpa/pds/debugfs.c
+++ b/drivers/vdpa/pds/debugfs.c
@@ -2,6 +2,7 @@
 /* Copyright(c) 2023 Advanced Micro Devices, Inc */
 
 #include 
+#include 
 
 #include 
 #include 
diff --git a/drivers/vdpa/pds/vdpa_dev.c b/drivers/vdpa/pds/vdpa_dev.c
new file mode 100644
index ..0f0f0ab8b811
--- /dev/null
+++ b/drivers/vdpa/pds/vdpa_dev.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Advanced Micro Devices, Inc */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "vdpa_dev.h"
+#include "aux_drv.h"
+
+static struct virtio_device_id pds_vdpa_id_table[] = {
+   {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID},
+   {0},
+};
+
+static int pds_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name,
+   const struct vdpa_dev_set_config *add_config)
+{
+   return -EOPNOTSUPP;
+}
+
+static void pds_vdpa_dev_del(struct vdpa_mgmt_dev *mdev,
+struct vdpa_device *vdpa_dev)
+{
+}
+
+static const struct vdpa_mgmtdev_ops pds_vdpa_mgmt_dev_ops = {
+   .dev_add = pds_vdpa_dev_add,
+   .dev_del = pds_vdpa_dev_del
+};
+
+int pds_vdpa_get_mgmt_info(struct pds_vdpa_aux *vdpa_aux)
+{
+   union pds_core_adminq_cmd cmd = {
+   .vdpa_ident.opcode = PDS_VDPA_CMD_IDENT,
+   .vdpa_ident.vf_id = cpu_to_le16(vdpa_aux->vf_id),
+   };
+   union pds_core_adminq_comp comp = {};
+   struct vdpa_mgmt_dev *mgmt;
+   struct pci_dev *pf_pdev;
+   struct device *pf_dev;
+   struct pci_dev *pdev;
+   dma_addr_t ident_pa;
+   struct device *dev;
+   u16 dev_intrs;
+   u16 max_vqs;
+   int err;
+
+   dev = &vdpa_aux->padev->aux_dev.dev;
+   pdev = vdpa_aux->padev->vf_pdev;
+   mgmt = &vdpa_aux->vdpa_mdev;
+
+   /* Get resource info through the PF's 

[PATCH v4 virtio 01/10] virtio: allow caller to override device id and DMA mask

2023-04-25 Thread Shannon Nelson via Virtualization
To add a bit of flexibility with various virtio based devices, allow
the caller to specify a different device id and DMA mask.  This adds
fields to struct virtio_pci_modern_device to specify an override device
id check and a DMA mask.

int (*device_id_check)(struct pci_dev *pdev);
If defined by the driver, this function will be called to check
that the PCI device is the vendor's expected device, and will
return the found device id to be stored in mdev->id.device.
This allows vendors with alternative vendor device ids to use
this library on their own device BAR.

u64 dma_mask;
If defined by the driver, this mask will be used in a call to
dma_set_mask_and_coherent() instead of the traditional
DMA_BIT_MASK(64).  This allows limiting the DMA space on
vendor devices with address limitations.

Signed-off-by: Shannon Nelson 
---
 drivers/virtio/virtio_pci_modern_dev.c | 37 +-
 include/linux/virtio_pci_modern.h  |  6 +
 2 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern_dev.c 
b/drivers/virtio/virtio_pci_modern_dev.c
index 869cb46bef96..1f2db76e8f91 100644
--- a/drivers/virtio/virtio_pci_modern_dev.c
+++ b/drivers/virtio/virtio_pci_modern_dev.c
@@ -218,21 +218,29 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
int err, common, isr, notify, device;
u32 notify_length;
u32 notify_offset;
+   int devid;
 
check_offsets();
 
-   /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. */
-   if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
-   return -ENODEV;
-
-   if (pci_dev->device < 0x1040) {
-   /* Transitional devices: use the PCI subsystem device id as
-* virtio device id, same as legacy driver always did.
-*/
-   mdev->id.device = pci_dev->subsystem_device;
+   if (mdev->device_id_check) {
+   devid = mdev->device_id_check(pci_dev);
+   if (devid < 0)
+   return devid;
+   mdev->id.device = devid;
} else {
-   /* Modern devices: simply use PCI device id, but start from 
0x1040. */
-   mdev->id.device = pci_dev->device - 0x1040;
+   /* We only own devices >= 0x1000 and <= 0x107f: leave the rest. 
*/
+   if (pci_dev->device < 0x1000 || pci_dev->device > 0x107f)
+   return -ENODEV;
+
+   if (pci_dev->device < 0x1040) {
+   /* Transitional devices: use the PCI subsystem device 
id as
+* virtio device id, same as legacy driver always did.
+*/
+   mdev->id.device = pci_dev->subsystem_device;
+   } else {
+   /* Modern devices: simply use PCI device id, but start 
from 0x1040. */
+   mdev->id.device = pci_dev->device - 0x1040;
+   }
}
mdev->id.vendor = pci_dev->subsystem_vendor;
 
@@ -260,7 +268,12 @@ int vp_modern_probe(struct virtio_pci_modern_device *mdev)
return -EINVAL;
}
 
-   err = dma_set_mask_and_coherent(&pci_dev->dev, DMA_BIT_MASK(64));
+   if (mdev->dma_mask)
+   err = dma_set_mask_and_coherent(&pci_dev->dev,
+   mdev->dma_mask);
+   else
+   err = dma_set_mask_and_coherent(&pci_dev->dev,
+   DMA_BIT_MASK(64));
if (err)
err = dma_set_mask_and_coherent(&pci_dev->dev,
DMA_BIT_MASK(32));
diff --git a/include/linux/virtio_pci_modern.h 
b/include/linux/virtio_pci_modern.h
index c4eeb79b0139..067ac1d789bc 100644
--- a/include/linux/virtio_pci_modern.h
+++ b/include/linux/virtio_pci_modern.h
@@ -38,6 +38,12 @@ struct virtio_pci_modern_device {
int modern_bars;
 
struct virtio_device_id id;
+
+   /* optional check for vendor virtio device, returns dev_id or -ERRNO */
+   int (*device_id_check)(struct pci_dev *pdev);
+
+   /* optional mask for devices with limited DMA space */
+   u64 dma_mask;
 };
 
 /*
-- 
2.17.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v4 virtio 00/10] pds_vdpa driver

2023-04-25 Thread Shannon Nelson via Virtualization
This patchset implements a new module for the AMD/Pensando DSC that
supports vDPA services on PDS Core VF devices.  This code is based on
and depends on include files from the pds_core driver described here[0].
The pds_core driver creates the auxiliary_bus devices that this module
connects to, and this creates vdpa devices for use by the vdpa module.

The first version of this driver was a part of the original pds_core RFC
[1] but has since been reworked to pull out the PCI driver and to make
better use of the virtio and virtio_net configuration spaces made available
by the DSC's PCI configuration.  As the device development has progressed,
the ability to rely on the virtio config spaces has grown.

This patchset includes a modification to the existing vp_modern_probe()
which implements overrides for the PCI device id check and the DMA mask.
These are intended to be used with vendor vDPA devices that implement
enough of the virtio config space to be used directly, but don't use the
virtio device id.

To use this module, enable the VFs and turn on the vDPA services in the
pds_core PF, then use the 'vdpa' utility to create devices for use by
virtio_vdpa or vhost_vdpa:
   echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs
   devlink dev param set pci/$PF_BDF name enable_vnet value true cmode runtime
   PDS_VDPA_MGMT=`vdpa mgmtdev show | grep vDPA | head -1 | cut -d: -f1`
   vdpa dev add name vdpa1 mgmtdev $PDS_VDPA_MGMT mac 00:11:22:33:44:55

[0] Link: 
https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nel...@amd.com/
[1] Link: 
https://lore.kernel.org/netdev/20221118225656.48309-1-snel...@pensando.io/

Changes:
 v4:
 - rename device_id_check_override() to device_id_check()
 - make device_id_check() return the device_id found and checked
 - removed pds_vdpa.h, put its adminq changes into pds_adminq.h
 - added a patch to separate out the adminq changes
 - added a patch to move an adminq enum from pds_common.h to pds_adminq.h
 - moved adminq calls for get/set_vq_state into cmds.c
 - limit max_vqs by number of msix available
 - don't increment nintrs for CVQ, it should already be covered from max_vqs
 - pds_core API related rework following pds_core inclusion to net-next
 - use non-debugfs method to find PF pci address in pds_vdpa.rst instructions

 v3:
Link: 
https://lore.kernel.org/netdev/20230330192313.62018-1-shannon.nel...@amd.com/
 - added a patch to modify vp_modern_probe() such that specific device id and
   DMA mask overrides can be used
 - add pds_vdpa.rst into index file
 - dev_dbg instead of dev_err on most of the adminq commands
 - rework use of pds_vdpa_cmd_reset() and pds_vdpa_init_hw() for better
   firmware setup in start-stop-start scenarios
 - removed unused pds_vdpa_cmd_set_features(), we can rely on 
vp_modern_set_features()
 - remove unused hw_qtype and hw_qindex from pds_vdpa_vq_info
 - reworked debugfs print_feature_bits to also print unknown bits
 - changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
   without regard to kernel PAGE_SIZE configuration

 v2:
https://lore.kernel.org/netdev/20230309013046.23523-1-shannon.nel...@amd.com/
 - removed PCI driver code
 - replaced home-grown event listener with notifier
 - replaced many adminq uses with direct virtio_net config access
 - reworked irqs to follow virtio layout
 - removed local_mac_bit logic
 - replaced uses of devm_ interfaces as suggested in pds_core reviews
 - updated copyright strings to reflect the new owner

Shannon Nelson (10):
  virtio: allow caller to override device id and DMA mask
  pds_vdpa: Add new vDPA driver for AMD/Pensando DSC
  pds_vdpa: move enum from common to adminq header
  pds_vdpa: new adminq entries
  pds_vdpa: get vdpa management info
  pds_vdpa: virtio bar setup for vdpa
  pds_vdpa: add vdpa config client commands
  pds_vdpa: add support for vdpa and vdpamgmt interfaces
  pds_vdpa: subscribe to the pds_core events
  pds_vdpa: pds_vdps.rst and Kconfig

 .../device_drivers/ethernet/amd/pds_vdpa.rst  |  85 +++
 .../device_drivers/ethernet/index.rst |   1 +
 MAINTAINERS   |   4 +
 drivers/vdpa/Kconfig  |   8 +
 drivers/vdpa/Makefile |   1 +
 drivers/vdpa/pds/Makefile |  10 +
 drivers/vdpa/pds/aux_drv.c| 140 
 drivers/vdpa/pds/aux_drv.h|  26 +
 drivers/vdpa/pds/cmds.c   | 207 +
 drivers/vdpa/pds/cmds.h   |  20 +
 drivers/vdpa/pds/debugfs.c| 287 +++
 drivers/vdpa/pds/debugfs.h|  17 +
 drivers/vdpa/pds/vdpa_dev.c   | 704 ++
 drivers/vdpa/pds/vdpa_dev.h   |  47 ++
 drivers/virtio/virtio_pci_modern_dev.c|  37 +-
 include/linux/pds/pds_adminq.h| 287 +++
 include/linux/pds/pds_common.h|  21 +-
 include/linux/virtio_pci_modern.h |   6 

Re: [PATCH v2 0/3] vhost_vdpa: better PACKED support

2023-04-25 Thread Shannon Nelson via Virtualization

On 4/24/23 11:08 PM, Michael S. Tsirkin wrote:


On Mon, Apr 24, 2023 at 03:50:28PM -0700, Shannon Nelson wrote:

While testing our vDPA driver with QEMU we found that vhost_vdpa was
missing some support for PACKED VQs.  Adding these helped us get further
in our testing.  The first patch makes sure that the vhost_vdpa vqs are
given the feature flags, as done in other vhost variants.  The second
and third patches use the feature flags to properly format and pass
set/get_vq requests.


This missed the merge window most likely, unless there's
a reason to send a second pull towards the end of it. We'll see.


Understood - we'll keep our fingers crossed for good luck :-)
Thanks for the help.

sln





v1:
  - included wrap counter in packing answers for VHOST_GET_VRING_BASE
  - added comments to vhost_virtqueue fields last_avail_idx and last_used_idx

v0:

https://lists.linuxfoundation.org/pipermail/virtualization/2023-April/066253.html
  - initial version

Shannon Nelson (3):
   vhost_vdpa: tell vqs about the negotiated
   vhost: support PACKED when setting-getting vring_base
   vhost_vdpa: support PACKED when setting-getting vring_base

  drivers/vhost/vdpa.c  | 34 ++
  drivers/vhost/vhost.c | 18 +-
  drivers/vhost/vhost.h |  8 ++--
  3 files changed, 49 insertions(+), 11 deletions(-)

--
2.17.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 01:02:38PM +, Alvaro Karsz wrote:
> > > In the virtnet case, we'll decide which features to block based on the 
> > > ring size.
> > > 2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
> > > ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> > 
> > why MRG_RXBUF? what does it matter?
> > 
> 
> You're right, it should be blocked only when ring < 2.
> Or we should let this pass, and let the device figure out that MRG_RXBUF is 
> meaningless with 1 entry..

yep, later I think.

> > > So we'll need a new virtio callback instead of flags.
> > > Furthermore, other virtio drivers may decide which features to block 
> > > based on parameters different than ring size (I don't have a good example 
> > > at the moment).
> > > So maybe we should leave it to the driver to handle (during probe), and 
> > > offer a virtio core function to re-negotiate the features?
> > >
> > > In the solution I'm working on, I expose a new virtio core function that 
> > > resets the device and renegotiates the received features.
> > > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
> > > before calling find_vqs. (The callback must be called after the features 
> > > negotiation)
> > >
> > > So, the flow is something like:
> > >
> > > * Super early in virtnet probe, we peek at the VQ lengths and decide if 
> > > we are
> > >using small vrings, if so, we reset and renegotiate the features.
> > 
> > Using which APIs? What does peek_vqs_len do and why does it matter that
> > it is super early?
> > 
> 
> We peek at the lengths using a new virtio_config.h function that calls a 
> transport specific callback.
> We renegotiate calling the new, exported virtio core function.
> 
> peek_vqs_len fills an array of u16 variables with the max length of every VQ.
> 
> The idea here is not to fail probe.
> So we start probe, check if the ring is small, renegotiate the features and 
> then continue with the new features.
> This needs to be super early because otherwise, some virtio_has_feature calls 
> before re-negotiating may be invalid, meaning a lot of reconfigurations.
> 
> > > * We continue normally and create the VQs.
> > > * We check if the created rings are small.
> > >If they are and some blocked features were negotiated anyway (may 
> > > occur if
> > >the re-negotiation fails, or if the transport has no implementation for
> > >peek_vqs_len), we fail probe.
> > >If the ring is small and the features are ok, we mark the virtnet 
> > > device as
> > >vring_small and fixup some variables.
> > >
> > >
> > > peek_vqs_len is needed because we must know the VQ length before calling 
> > > init_vqs.
> > >
> > > During virtnet_find_vqs we check the following:
> > > vi->has_cvq
> > > vi->big_packets
> > > vi->mergeable_rx_bufs
> > >
> > > But these will change if the ring is small..
> > >
> > > (Of course, another solution will be to re-negotiate features after 
> > > init_vqs, but this will make a big mess, tons of things to clean and 
> > > reconfigure)
> > >
> > >
> > > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and 
> > > it is working.
> > >
> > > I'm considering splitting the effort into 2 series.
> > > A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring 
> > > < 2 case.
> > >
> > > I'm also thinking about sending the first series as an RFC soon, so it 
> > > will be more broadly tested.
> > >
> > > What do you think?
> > 
> > Lots of work spilling over to transports.
> > 
> > And I especially don't like that it slows down boot on good path.
> 
> Yes, but I don't think that this is really significant.
> It's just a call to the transport to get the length of the VQs.

With lots of VQs that is lots of exits.

> If ring is not small, we continue as normal.
> If ring is small, we renegotiate and continue, without failing probe.
> 
> > 
> > I have the following idea:
> > - add a blocked features value in virtio_device
> > - before calling probe, core saves blocked features
> > - if probe fails, checks blocked features.
> >   if any were added, reset, negotiate all features
> >   except blocked ones and do the validate/probe dance again
> > 
> > 
> > This will mean mostly no changes to drivers: just check condition,
> > block feature and fail probe.
> > 
> 
> I like the idea, will try to implement it.
> 
> Thanks,

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Alvaro Karsz
> > In the virtnet case, we'll decide which features to block based on the ring 
> > size.
> > 2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
> > ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> 
> why MRG_RXBUF? what does it matter?
> 

You're right, it should be blocked only when ring < 2.
Or we should let this pass, and let the device figure out that MRG_RXBUF is 
meaningless with 1 entry..

> > So we'll need a new virtio callback instead of flags.
> > Furthermore, other virtio drivers may decide which features to block based 
> > on parameters different than ring size (I don't have a good example at the 
> > moment).
> > So maybe we should leave it to the driver to handle (during probe), and 
> > offer a virtio core function to re-negotiate the features?
> >
> > In the solution I'm working on, I expose a new virtio core function that 
> > resets the device and renegotiates the received features.
> > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
> > before calling find_vqs. (The callback must be called after the features 
> > negotiation)
> >
> > So, the flow is something like:
> >
> > * Super early in virtnet probe, we peek at the VQ lengths and decide if we 
> > are
> >using small vrings, if so, we reset and renegotiate the features.
> 
> Using which APIs? What does peek_vqs_len do and why does it matter that
> it is super early?
> 

We peek at the lengths using a new virtio_config.h function that calls a 
transport specific callback.
We renegotiate calling the new, exported virtio core function.

peek_vqs_len fills an array of u16 variables with the max length of every VQ.

The idea here is not to fail probe.
So we start probe, check if the ring is small, renegotiate the features and 
then continue with the new features.
This needs to be super early because otherwise, some virtio_has_feature calls 
before re-negotiating may be invalid, meaning a lot of reconfigurations.

> > * We continue normally and create the VQs.
> > * We check if the created rings are small.
> >If they are and some blocked features were negotiated anyway (may occur 
> > if
> >the re-negotiation fails, or if the transport has no implementation for
> >peek_vqs_len), we fail probe.
> >If the ring is small and the features are ok, we mark the virtnet device 
> > as
> >vring_small and fixup some variables.
> >
> >
> > peek_vqs_len is needed because we must know the VQ length before calling 
> > init_vqs.
> >
> > During virtnet_find_vqs we check the following:
> > vi->has_cvq
> > vi->big_packets
> > vi->mergeable_rx_bufs
> >
> > But these will change if the ring is small..
> >
> > (Of course, another solution will be to re-negotiate features after 
> > init_vqs, but this will make a big mess, tons of things to clean and 
> > reconfigure)
> >
> >
> > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and 
> > it is working.
> >
> > I'm considering splitting the effort into 2 series.
> > A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring < 
> > 2 case.
> >
> > I'm also thinking about sending the first series as an RFC soon, so it will 
> > be more broadly tested.
> >
> > What do you think?
> 
> Lots of work spilling over to transports.
> 
> And I especially don't like that it slows down boot on good path.

Yes, but I don't think that this is really significant.
It's just a call to the transport to get the length of the VQs.
If ring is not small, we continue as normal.
If ring is small, we renegotiate and continue, without failing probe.

> 
> I have the following idea:
> - add a blocked features value in virtio_device
> - before calling probe, core saves blocked features
> - if probe fails, checks blocked features.
>   if any were added, reset, negotiate all features
>   except blocked ones and do the validate/probe dance again
> 
> 
> This will mean mostly no changes to drivers: just check condition,
> block feature and fail probe.
> 

I like the idea, will try to implement it.

Thanks,
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] can: virtio-can: cleanups

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 11:17:20AM +0200, Marc Kleine-Budde wrote:
> On 24.04.2023 17:09:23, Michael S. Tsirkin wrote:
> > On Mon, Apr 24, 2023 at 09:47:58PM +0200, Marc Kleine-Budde wrote:
> > > Address the topics raised in
> > > 
> > > https://lore.kernel.org/20230424-footwear-daily-9339bd0ec428-...@pengutronix.de
> > > 
> > > Signed-off-by: Marc Kleine-Budde 
> > 
> > given base patch is rfc this should be too?
> 
> This is an incremental patch that fixes the topics I raised in the
> review of "[RFC PATCH v2] can: virtio: Initial virtio CAN driver.", see
> linked discussion thread.
> 
> regards,
> Marc

and that's fine, just pls put RFC in the subject.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 04:21:53PM +0800, Xuan Zhuo wrote:
> On Tue, 25 Apr 2023 04:13:09 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Tue, Apr 25, 2023 at 04:05:03PM +0800, Xuan Zhuo wrote:
> > > On Tue, 25 Apr 2023 03:51:47 -0400, "Michael S. Tsirkin" 
> > >  wrote:
> > > > On Tue, Apr 25, 2023 at 03:36:02PM +0800, Xuan Zhuo wrote:
> > > > > ## About DMA APIs
> > > > >
> > > > > Now, virtio may can not work with DMA APIs when virtio features do 
> > > > > not have
> > > > > VIRTIO_F_ACCESS_PLATFORM.
> > > > >
> > > > > 1. I tried to let DMA APIs return phy address by virtio-device. But 
> > > > > DMA APIs just
> > > > >work with the "real" devices.
> > > > > 2. I tried to let xsk support callballs to get phy address from 
> > > > > virtio-net
> > > > >driver as the dma address. But the maintainers of xsk may want to 
> > > > > use dma-buf
> > > > >to replace the DMA APIs. I think that may be a larger effort. We 
> > > > > will wait
> > > > >too long.
> > > > >
> > > > > So rethinking this, firstly, we can support premapped-dma only for 
> > > > > devices with
> > > > > VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to 
> > > > > use it,
> > > > > they have to update the device to support VIRTIO_F_RING_RESET, and 
> > > > > they can also
> > > > > enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.
> > > >
> > > > I don't understand this last sentence. If you think ring
> > > > reset can change device features then the answer is no, it can't.
> > >
> > >
> > > Sorry, I should remove "by the way".
> > >
> > >
> > > >
> > > > If you are saying device has to set VIRTIO_F_ACCESS_PLATFORM to
> > > > benefit from this work, that's fine at least as a first approach.
> > > > Note that setting VIRTIO_F_ACCESS_PLATFORM breaks old guests
> > > > (it's a secirity boundary), e.g. it is not available for
> > > > transitional devices.
> > > > So to support transitional devices, we might want to find another way to
> > > > address this down the road,
> > >
> > > Maybe dma-buf is a way. I'll look into it, especially some practice on 
> > > xsk.
> > >
> > > > but as a first step, I agree just going with
> > > > DMA is fine.
> > >
> > >
> > > Thanks.
> >
> > Pls do make sure to disable the feature when !VIRTIO_F_ACCESS_PLATFORM
> > though.
> 
> If you refer to the implementation inside virtio-net, this feature will depend
> on the return of virtqueue_dma_dev().
> 
> But virtqueue_dma_dev() depends "use_dma_api". When xen_domain() is true and
> !VIRTIO_F_ACCESS_PLATFORM, the "use_dma_api" is true.
> 
> So what kind of situation do you mean?
> 
> Thanks.

E.g. a legacy device.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 11:11:54AM +, Alvaro Karsz wrote:
> > > So, let's add some funky flags in virtio device to block out
> > > features, have core compare these before and after,
> > > detect change, reset and retry?
> > 
> > In the virtnet case, we'll decide which features to block based on the ring 
> > size.
> > 2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
> > ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> > 
> > So we'll need a new virtio callback instead of flags.
> > 
> > Furthermore, other virtio drivers may decide which features to block based 
> > on parameters different than ring size (I don't have a good example at the 
> > moment).
> > So maybe we should leave it to the driver to handle (during probe), > and 
> > offer a virtio core function to re-negotiate the features?
> > 
> > In the solution I'm working on, I expose a new virtio core function that 
> > resets the device and renegotiates the received features.
> > + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
> > before calling find_vqs. (The callback must be called after the features 
> > negotiation)
> > 
> > So, the flow is something like:
> > 
> > * Super early in virtnet probe, we peek at the VQ lengths and decide if we 
> > are
> >using small vrings, if so, we reset and renegotiate the features.
> > * We continue normally and create the VQs.
> > * We check if the created rings are small.
> >If they are and some blocked features were negotiated anyway (may occur 
> > if
> >the re-negotiation fails, or if the transport has no implementation for
> >peek_vqs_len), we fail probe.
> 
> Small fix: if the re-negotiation fails, we fail probe immediately.
> The only way to negotiate blocked features with a small vring is if the 
> transport has no implementation for peek_vqs_len.

with my idea, you can go iteratively: fail one condition, core will
retry with a feature blocked, we can block more, retry again.
up to 64 times :)

> >If the ring is small and the features are ok, we mark the virtnet device 
> > as
> >vring_small and fixup some variables.
> > 
> > 
> > peek_vqs_len is needed because we must know the VQ length before calling 
> > init_vqs.
> > 
> > During virtnet_find_vqs we check the following:
> > vi->has_cvq
> > vi->big_packets
> > vi->mergeable_rx_bufs
> > 
> > But these will change if the ring is small..
> > 
> > (Of course, another solution will be to re-negotiate features after 
> > init_vqs, but this will make a big mess, tons of things to clean and 
> > reconfigure)
> > 
> > 
> > The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and 
> > it is working.
> > 
> > I'm considering splitting the effort into 2 series.
> > A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring < 
> > 2 case.
> > 
> > I'm also thinking about sending the first series as an RFC soon, so it will 
> > be more broadly tested.
> > 
> > What do you think?
> > 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 09:41:35AM +, Alvaro Karsz wrote:
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
> 
> In the virtnet case, we'll decide which features to block based on the ring 
> size.
> 2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
> ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ

why MRG_RXBUF? what does it matter?

> So we'll need a new virtio callback instead of flags.
> Furthermore, other virtio drivers may decide which features to block based on 
> parameters different than ring size (I don't have a good example at the 
> moment).
> So maybe we should leave it to the driver to handle (during probe), and offer 
> a virtio core function to re-negotiate the features?
> 
> In the solution I'm working on, I expose a new virtio core function that 
> resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
> before calling find_vqs. (The callback must be called after the features 
> negotiation)
> 
> So, the flow is something like:
> 
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we 
> are 
>using small vrings, if so, we reset and renegotiate the features.

Using which APIs? What does peek_vqs_len do and why does it matter that
it is super early?

> * We continue normally and create the VQs.
> * We check if the created rings are small.
>If they are and some blocked features were negotiated anyway (may occur if 
>the re-negotiation fails, or if the transport has no implementation for 
>peek_vqs_len), we fail probe.
>If the ring is small and the features are ok, we mark the virtnet device 
> as 
>vring_small and fixup some variables.
>  
> 
> peek_vqs_len is needed because we must know the VQ length before calling 
> init_vqs.
> 
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
> 
> But these will change if the ring is small..
> 
> (Of course, another solution will be to re-negotiate features after init_vqs, 
> but this will make a big mess, tons of things to clean and reconfigure)
> 
> 
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it 
> is working.
> 
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring < 2 
> case.
> 
> I'm also thinking about sending the first series as an RFC soon, so it will 
> be more broadly tested.
> 
> What do you think?

Lots of work spilling over to transports.

And I especially don't like that it slows down boot on good path.

I have the following idea:
- add a blocked features value in virtio_device
- before calling probe, core saves blocked features
- if probe fails, checks blocked features.
  if any were added, reset, negotiate all features
  except blocked ones and do the validate/probe dance again


This will mean mostly no changes to drivers: just check condition,
block feature and fail probe.


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Alvaro Karsz
> > So, let's add some funky flags in virtio device to block out
> > features, have core compare these before and after,
> > detect change, reset and retry?
> 
> In the virtnet case, we'll decide which features to block based on the ring 
> size.
> 2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
> ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ
> 
> So we'll need a new virtio callback instead of flags.
> 
> Furthermore, other virtio drivers may decide which features to block based on 
> parameters different than ring size (I don't have a good example at the 
> moment).
> So maybe we should leave it to the driver to handle (during probe), > and 
> offer a virtio core function to re-negotiate the features?
> 
> In the solution I'm working on, I expose a new virtio core function that 
> resets the device and renegotiates the received features.
> + A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
> before calling find_vqs. (The callback must be called after the features 
> negotiation)
> 
> So, the flow is something like:
> 
> * Super early in virtnet probe, we peek at the VQ lengths and decide if we are
>using small vrings, if so, we reset and renegotiate the features.
> * We continue normally and create the VQs.
> * We check if the created rings are small.
>If they are and some blocked features were negotiated anyway (may occur if
>the re-negotiation fails, or if the transport has no implementation for
>peek_vqs_len), we fail probe.

Small fix: if the re-negotiation fails, we fail probe immediately.
The only way to negotiate blocked features with a small vring is if the 
transport has no implementation for peek_vqs_len.

>If the ring is small and the features are ok, we mark the virtnet device as
>vring_small and fixup some variables.
> 
> 
> peek_vqs_len is needed because we must know the VQ length before calling 
> init_vqs.
> 
> During virtnet_find_vqs we check the following:
> vi->has_cvq
> vi->big_packets
> vi->mergeable_rx_bufs
> 
> But these will change if the ring is small..
> 
> (Of course, another solution will be to re-negotiate features after init_vqs, 
> but this will make a big mess, tons of things to clean and reconfigure)
> 
> 
> The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it 
> is working.
> 
> I'm considering splitting the effort into 2 series.
> A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring < 2 
> case.
> 
> I'm also thinking about sending the first series as an RFC soon, so it will 
> be more broadly tested.
> 
> What do you think?
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3] vringh: IOMEM support

2023-04-25 Thread Shunsuke Mie
Introduce a new memory accessor for vringh. It is able to use vringh to
virtio rings located on io-memory region.

Signed-off-by: Shunsuke Mie 
---

Changes from v2: 
https://lore.kernel.org/virtualization/20230202090934.549556-1-...@igel.co.jp/
- Focus on an adding io memory APIs
Remove vringh API unification and some fixes.
- Rebase on next-20230414

 drivers/vhost/Kconfig  |   6 ++
 drivers/vhost/vringh.c | 129 +
 include/linux/vringh.h |  33 +++
 3 files changed, 168 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index b455d9ab6f3d..4b0dbb4a8ab3 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -6,6 +6,12 @@ config VHOST_IOTLB
  This option is selected by any driver which needs to support
  an IOMMU in software.
 
+config VHOST_RING_IOMEM
+   tristate
+   select VHOST_IOMEM
+   help
+ This option enables vringh APIs to supports io memory space.
+
 config VHOST_RING
tristate
select VHOST_IOTLB
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index 955d938eb663..ce5a88eecc05 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -1604,4 +1604,133 @@ EXPORT_SYMBOL(vringh_need_notify_iotlb);
 
 #endif
 
+#if IS_REACHABLE(CONFIG_VHOST_RING_IOMEM)
+
+static inline int getu16_iomem(const struct vringh *vrh, u16 *val,
+  const __virtio16 *p)
+{
+   *val = vringh16_to_cpu(vrh, ioread16(p));
+   return 0;
+}
+
+static inline int putu16_iomem(const struct vringh *vrh, __virtio16 *p, u16 
val)
+{
+   iowrite16(cpu_to_vringh16(vrh, val), p);
+   return 0;
+}
+
+static inline int copydesc_iomem(const struct vringh *vrh, void *dst,
+const void *src, size_t len)
+{
+   memcpy_fromio(dst, src, len);
+   return 0;
+}
+
+static int putused_iomem(const struct vringh *vrh, struct vring_used_elem *dst,
+const struct vring_used_elem *src, unsigned int num)
+{
+   memcpy_toio(dst, src, num * sizeof(*dst));
+   return 0;
+}
+
+static inline int xfer_from_iomem(const struct vringh *vrh, void *src,
+ void *dst, size_t len)
+{
+   memcpy_fromio(dst, src, len);
+   return 0;
+}
+
+static inline int xfer_to_iomem(const struct vringh *vrh, void *dst, void *src,
+   size_t len)
+{
+   memcpy_toio(dst, src, len);
+   return 0;
+}
+
+int vringh_init_iomem(struct vringh *vrh, u64 features, unsigned int num,
+ bool weak_barriers, struct vring_desc *desc,
+ struct vring_avail *avail, struct vring_used *used)
+{
+   return vringh_init_kern(vrh, features, num, weak_barriers, desc, avail,
+   used);
+}
+EXPORT_SYMBOL(vringh_init_iomem);
+
+int vringh_getdesc_iomem(struct vringh *vrh, struct vringh_kiov *riov,
+struct vringh_kiov *wiov, u16 *head, gfp_t gfp)
+{
+   int err;
+
+   err = __vringh_get_head(vrh, getu16_iomem, &vrh->last_avail_idx);
+   if (err < 0)
+   return err;
+
+   /* Empty... */
+   if (err == vrh->vring.num)
+   return 0;
+
+   *head = err;
+   err = __vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL, gfp,
+  copydesc_iomem);
+   if (err)
+   return err;
+
+   return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_iomem);
+
+ssize_t vringh_iov_pull_iomem(struct vringh *vrh, struct vringh_kiov *riov,
+ void *dst, size_t len)
+{
+   return vringh_iov_xfer(vrh, riov, dst, len, xfer_from_iomem);
+}
+EXPORT_SYMBOL(vringh_iov_pull_iomem);
+
+ssize_t vringh_iov_push_iomem(struct vringh *vrh, struct vringh_kiov *wiov,
+ const void *src, size_t len)
+{
+   return vringh_iov_xfer(vrh, wiov, (void *)src, len, xfer_to_iomem);
+}
+EXPORT_SYMBOL(vringh_iov_push_iomem);
+
+void vringh_abandon_iomem(struct vringh *vrh, unsigned int num)
+{
+   /* We only update vring_avail_event(vr) when we want to be notified,
+* so we haven't changed that yet.
+*/
+   vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_iomem);
+
+int vringh_complete_iomem(struct vringh *vrh, u16 head, u32 len)
+{
+   struct vring_used_elem used;
+
+   used.id = cpu_to_vringh32(vrh, head);
+   used.len = cpu_to_vringh32(vrh, len);
+
+   return __vringh_complete(vrh, &used, 1, putu16_iomem, putused_iomem);
+}
+EXPORT_SYMBOL(vringh_complete_iomem);
+
+bool vringh_notify_enable_iomem(struct vringh *vrh)
+{
+   return __vringh_notify_enable(vrh, getu16_iomem, putu16_iomem);
+}
+EXPORT_SYMBOL(vringh_notify_enable_iomem);
+
+void vringh_notify_disable_iomem(struct vringh *vrh)
+{
+   __vringh_notify_disable(vrh, putu16_iomem);
+}
+EXPORT_SYMBOL(vringh_notify_disable_iomem);
+
+int vringh_need_notify_iomem(struct vringh *v

Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Alvaro Karsz
> So, let's add some funky flags in virtio device to block out
> features, have core compare these before and after,
> detect change, reset and retry?

In the virtnet case, we'll decide which features to block based on the ring 
size.
2 < ring < MAX_FRAGS + 2  -> BLOCK GRO + MRG_RXBUF
ring < 2  -> BLOCK GRO + MRG_RXBUF + CTRL_VQ

So we'll need a new virtio callback instead of flags.

Furthermore, other virtio drivers may decide which features to block based on 
parameters different than ring size (I don't have a good example at the moment).
So maybe we should leave it to the driver to handle (during probe), and offer a 
virtio core function to re-negotiate the features?

In the solution I'm working on, I expose a new virtio core function that resets 
the device and renegotiates the received features.
+ A new virtio_config_ops callback peek_vqs_len to peek at the VQ lengths 
before calling find_vqs. (The callback must be called after the features 
negotiation)

So, the flow is something like:

* Super early in virtnet probe, we peek at the VQ lengths and decide if we are 
   using small vrings, if so, we reset and renegotiate the features.
* We continue normally and create the VQs.
* We check if the created rings are small.
   If they are and some blocked features were negotiated anyway (may occur if 
   the re-negotiation fails, or if the transport has no implementation for 
   peek_vqs_len), we fail probe.
   If the ring is small and the features are ok, we mark the virtnet device as 
   vring_small and fixup some variables.
 

peek_vqs_len is needed because we must know the VQ length before calling 
init_vqs.

During virtnet_find_vqs we check the following:
vi->has_cvq
vi->big_packets
vi->mergeable_rx_bufs

But these will change if the ring is small..

(Of course, another solution will be to re-negotiate features after init_vqs, 
but this will make a big mess, tons of things to clean and reconfigure)


The 2 < ring < MAX_FRAGS + 2 part is ready, I have tested a few cases and it is 
working.

I'm considering splitting the effort into 2 series.
A 2 < ring < MAX_FRAGS + 2  series, and a follow up series with the ring < 2 
case.

I'm also thinking about sending the first series as an RFC soon, so it will be 
more broadly tested.

What do you think?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: reject small vring sizes

2023-04-25 Thread Michael S. Tsirkin
On Sun, Apr 23, 2023 at 12:28:49PM +, Alvaro Karsz wrote:
> 
> > > > The rest of stuff can probably just be moved to after find_vqs without
> > > > much pain.
> > > >
> > > Actually, I think that with a little bit of pain :)
> > > If we use small vrings and a GRO feature bit is set, Linux will need to 
> > > allocate 64KB of continuous memory for every receive descriptor..
> > 
> > Oh right. Hmm. Well this is same as big packets though, isn't it?
> > 
> 
> Well, when VIRTIO_NET_F_MRG_RXBUF is not negotiated and one of the GRO 
> features is, the receive buffers are page size buffers chained together to 
> form a 64K buffer.
> In this case, do all the chained descriptors actually point to a single block 
> of continuous memory, or is it possible for the descriptors to point to pages 
> spread all over?
> 
> > 
> > > Instead of failing probe if GRO/CVQ are set, can we just reset the device 
> > > if we discover small vrings and start over?
> > > Can we remember that this device uses small vrings, and then just avoid 
> > > negotiating the features that we cannot support?
> > 
> > 
> > We technically can of course. I am just not sure supporting CVQ with just 1 
> > s/g entry will
> > ever be viable.
> 
> Even if we won't support 1 s/g entry, do we want to fail probe in such cases?
> We could just disable the CVQ feature (with reset, as suggested before).
> I'm not saying that we should, just raising the option.
> 

So, let's add some funky flags in virtio device to block out
features, have core compare these before and after,
detect change, reset and retry?


-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 04:13:09 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Apr 25, 2023 at 04:05:03PM +0800, Xuan Zhuo wrote:
> > On Tue, 25 Apr 2023 03:51:47 -0400, "Michael S. Tsirkin"  
> > wrote:
> > > On Tue, Apr 25, 2023 at 03:36:02PM +0800, Xuan Zhuo wrote:
> > > > ## About DMA APIs
> > > >
> > > > Now, virtio may can not work with DMA APIs when virtio features do not 
> > > > have
> > > > VIRTIO_F_ACCESS_PLATFORM.
> > > >
> > > > 1. I tried to let DMA APIs return phy address by virtio-device. But DMA 
> > > > APIs just
> > > >work with the "real" devices.
> > > > 2. I tried to let xsk support callballs to get phy address from 
> > > > virtio-net
> > > >driver as the dma address. But the maintainers of xsk may want to 
> > > > use dma-buf
> > > >to replace the DMA APIs. I think that may be a larger effort. We 
> > > > will wait
> > > >too long.
> > > >
> > > > So rethinking this, firstly, we can support premapped-dma only for 
> > > > devices with
> > > > VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to 
> > > > use it,
> > > > they have to update the device to support VIRTIO_F_RING_RESET, and they 
> > > > can also
> > > > enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.
> > >
> > > I don't understand this last sentence. If you think ring
> > > reset can change device features then the answer is no, it can't.
> >
> >
> > Sorry, I should remove "by the way".
> >
> >
> > >
> > > If you are saying device has to set VIRTIO_F_ACCESS_PLATFORM to
> > > benefit from this work, that's fine at least as a first approach.
> > > Note that setting VIRTIO_F_ACCESS_PLATFORM breaks old guests
> > > (it's a secirity boundary), e.g. it is not available for
> > > transitional devices.
> > > So to support transitional devices, we might want to find another way to
> > > address this down the road,
> >
> > Maybe dma-buf is a way. I'll look into it, especially some practice on xsk.
> >
> > > but as a first step, I agree just going with
> > > DMA is fine.
> >
> >
> > Thanks.
>
> Pls do make sure to disable the feature when !VIRTIO_F_ACCESS_PLATFORM
> though.

If you refer to the implementation inside virtio-net, this feature will depend
on the return of virtqueue_dma_dev().

But virtqueue_dma_dev() depends "use_dma_api". When xen_domain() is true and
!VIRTIO_F_ACCESS_PLATFORM, the "use_dma_api" is true.

So what kind of situation do you mean?

Thanks.

>
> >
> > >
> > >
> > > > Thanks for the help from Christoph.
> > > >
> > > > =
> > > >
> > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > > zero
> > > > copy feature of xsk (XDP socket) needs to be supported by the driver. 
> > > > The
> > > > performance of zero copy is very good.
> > > >
> > > > ENV: Qemu with vhost.
> > > >
> > > >vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> > > > -|---|--|
> > > > xmit by sockperf: 90%|   100%|  |  
> > > > 318967
> > > > xmit by xsk:  100%   |   30% |   33%| 
> > > > 1192064
> > > > recv by sockperf: 100%   |   68% |   100%   |  
> > > > 692288
> > > > recv by xsk:  100%   |   33% |   43%|  
> > > > 771670
> > > >
> > > > Before achieving the function of Virtio-Net, we also have to let virtio 
> > > > core
> > > > support these features:
> > > >
> > > > 1. virtio core support premapped
> > > > 2. virtio core support reset per-queue
> > > > 3. introduce DMA APIs to virtio core
> > > >
> > > > Please review.
> > > >
> > > > Thanks.
> > > >
> > > > v7:
> > > >  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> > > >
> > > > v6:
> > > >  1. change the size of the flags to u32.
> > > >
> > > > v5:
> > > >  1. fix for error handler
> > > >  2. add flags to record internal dma mapping
> > > >
> > > > v4:
> > > >  1. rename map_inter to dma_map_internal
> > > >  2. fix: Excess function parameter 'vq' description in 
> > > > 'virtqueue_dma_dev'
> > > >
> > > > v3:
> > > >  1. add map_inter to struct desc state to reocrd whether virtio core do 
> > > > dma map
> > > >
> > > > v2:
> > > >  1. based on sgs[0]->dma_address to judgment is premapped
> > > >  2. based on extra.addr to judgment to do unmap for no-indirect desc
> > > >  3. based on indir_desc to judgment to do unmap for indirect desc
> > > >  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> > > >
> > > > v1:
> > > >  1. expose dma device. NO introduce the api for dma and sync
> > > >  2. split some commit for review.
> > > >
> > > > Xuan Zhuo (11):
> > > >   virtio_ring: split: separate dma codes
> > > >   virtio_ring: packed: separate dma codes
> > > >   virtio_ring: packed-indirect: separate dma codes
> > > >   virtio_ring: split: support premapped
> > > >   virtio_ring: packed: support premapped
> > > >   virtio_ring: packed-indirect: support premapped
> 

Re: [PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 04:05:03PM +0800, Xuan Zhuo wrote:
> On Tue, 25 Apr 2023 03:51:47 -0400, "Michael S. Tsirkin"  
> wrote:
> > On Tue, Apr 25, 2023 at 03:36:02PM +0800, Xuan Zhuo wrote:
> > > ## About DMA APIs
> > >
> > > Now, virtio may can not work with DMA APIs when virtio features do not 
> > > have
> > > VIRTIO_F_ACCESS_PLATFORM.
> > >
> > > 1. I tried to let DMA APIs return phy address by virtio-device. But DMA 
> > > APIs just
> > >work with the "real" devices.
> > > 2. I tried to let xsk support callballs to get phy address from virtio-net
> > >driver as the dma address. But the maintainers of xsk may want to use 
> > > dma-buf
> > >to replace the DMA APIs. I think that may be a larger effort. We will 
> > > wait
> > >too long.
> > >
> > > So rethinking this, firstly, we can support premapped-dma only for 
> > > devices with
> > > VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use 
> > > it,
> > > they have to update the device to support VIRTIO_F_RING_RESET, and they 
> > > can also
> > > enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.
> >
> > I don't understand this last sentence. If you think ring
> > reset can change device features then the answer is no, it can't.
> 
> 
> Sorry, I should remove "by the way".
> 
> 
> >
> > If you are saying device has to set VIRTIO_F_ACCESS_PLATFORM to
> > benefit from this work, that's fine at least as a first approach.
> > Note that setting VIRTIO_F_ACCESS_PLATFORM breaks old guests
> > (it's a secirity boundary), e.g. it is not available for
> > transitional devices.
> > So to support transitional devices, we might want to find another way to
> > address this down the road,
> 
> Maybe dma-buf is a way. I'll look into it, especially some practice on xsk.
> 
> > but as a first step, I agree just going with
> > DMA is fine.
> 
> 
> Thanks.

Pls do make sure to disable the feature when !VIRTIO_F_ACCESS_PLATFORM
though.

> 
> >
> >
> > > Thanks for the help from Christoph.
> > >
> > > =
> > >
> > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The 
> > > zero
> > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > performance of zero copy is very good.
> > >
> > > ENV: Qemu with vhost.
> > >
> > >vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> > > -|---|--|
> > > xmit by sockperf: 90%|   100%|  |  318967
> > > xmit by xsk:  100%   |   30% |   33%| 1192064
> > > recv by sockperf: 100%   |   68% |   100%   |  692288
> > > recv by xsk:  100%   |   33% |   43%|  771670
> > >
> > > Before achieving the function of Virtio-Net, we also have to let virtio 
> > > core
> > > support these features:
> > >
> > > 1. virtio core support premapped
> > > 2. virtio core support reset per-queue
> > > 3. introduce DMA APIs to virtio core
> > >
> > > Please review.
> > >
> > > Thanks.
> > >
> > > v7:
> > >  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> > >
> > > v6:
> > >  1. change the size of the flags to u32.
> > >
> > > v5:
> > >  1. fix for error handler
> > >  2. add flags to record internal dma mapping
> > >
> > > v4:
> > >  1. rename map_inter to dma_map_internal
> > >  2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
> > >
> > > v3:
> > >  1. add map_inter to struct desc state to reocrd whether virtio core do 
> > > dma map
> > >
> > > v2:
> > >  1. based on sgs[0]->dma_address to judgment is premapped
> > >  2. based on extra.addr to judgment to do unmap for no-indirect desc
> > >  3. based on indir_desc to judgment to do unmap for indirect desc
> > >  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> > >
> > > v1:
> > >  1. expose dma device. NO introduce the api for dma and sync
> > >  2. split some commit for review.
> > >
> > > Xuan Zhuo (11):
> > >   virtio_ring: split: separate dma codes
> > >   virtio_ring: packed: separate dma codes
> > >   virtio_ring: packed-indirect: separate dma codes
> > >   virtio_ring: split: support premapped
> > >   virtio_ring: packed: support premapped
> > >   virtio_ring: packed-indirect: support premapped
> > >   virtio_ring: update document for virtqueue_add_*
> > >   virtio_ring: introduce virtqueue_dma_dev()
> > >   virtio_ring: correct the expression of the description of
> > > virtqueue_resize()
> > >   virtio_ring: separate the logic of reset/enable from virtqueue_resize
> > >   virtio_ring: introduce virtqueue_reset()
> > >
> > >  drivers/virtio/virtio_ring.c | 352 +--
> > >  include/linux/virtio.h   |   4 +
> > >  2 files changed, 259 insertions(+), 97 deletions(-)
> > >
> > > --
> > > 2.32.0.3.g01195cf9f
> >

___
Virtualization mailing list
Virtualization@lists.linux-

Re: [PATCH net-next] xsk: introduce xsk_dma_ops

2023-04-25 Thread Michael S. Tsirkin
On Thu, Apr 20, 2023 at 09:18:21AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 20, 2023 at 05:11:48PM +0800, Xuan Zhuo wrote:
> > I know that the current design of DMA API only supports some physical 
> > hardware,
> > but can it be modified or expanded?
> 
> I think the important point is that for some cases there is no need
> to dma map at all, and upper layers should be fine by that by just
> doing the dma mapping in helpers called by the driver.
> 
> The virtio drivers then check if platform_access is set, then call the
> generic dma mapping helper, or if not just allocate memory using
> alloc_pages and also skip all the sync calls.

In theory, absolutely. In practice modern virtio devices are ok,
the reason we are stuck supporting old legacy ones is because legacy
devices are needed to run old guests. And then people turn
around and run a new guest on the same device,
for example because they switch back and forth e.g.
for data recovery? Or because whoever is selling the
host wants to opt for maximum compatibility.

Teaching all of linux to sometimes use dma and sometimes not
is a lot of work, and for limited benefit of these legacy systems.
We do it in a limited number of cases but generally
making DMA itself DTRT sounds more attractive.

So special DMA ops for these makes some sense: yes the
firmware described DMA is wrong on these boxes but
buggy firmware is not so unusual, is it?
Given virtio devices actually are on a virtual bus (the virtio bus)
sticking the fake DMA ops on this bus seems to make sense
as a way to express this quirk.

No?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 10/15] virtio_net: introduce receive_small_xdp()

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 16:00:05 +0800, Xuan Zhuo  
wrote:
> On Tue, 25 Apr 2023 15:58:03 +0800, Jason Wang  wrote:
> > On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  
> > wrote:
> > >
> > > The purpose of this patch is to simplify the receive_small().
> > > Separate all the logic of XDP of small into a function.
> > >
> > > Signed-off-by: Xuan Zhuo 
> > > ---
> > >  drivers/net/virtio_net.c | 165 ---
> > >  1 file changed, 100 insertions(+), 65 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index de5a579e8603..9b5fd2e0d27f 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -931,6 +931,99 @@ static struct page *xdp_linearize_page(struct 
> > > receive_queue *rq,
> > > return NULL;
> > >  }
> > >
> > > +static struct sk_buff *receive_small_xdp(struct net_device *dev,
> > > +struct virtnet_info *vi,
> > > +struct receive_queue *rq,
> > > +struct bpf_prog *xdp_prog,
> > > +void *buf,
> > > +unsigned int xdp_headroom,
> > > +unsigned int len,
> > > +unsigned int *xdp_xmit,
> > > +struct virtnet_rq_stats *stats)
> > > +{
> > > +   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > > +   unsigned int headroom = vi->hdr_len + header_offset;
> > > +   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
> > > +   struct page *page = virt_to_head_page(buf);
> > > +   struct page *xdp_page;
> > > +   unsigned int buflen;
> > > +   struct xdp_buff xdp;
> > > +   struct sk_buff *skb;
> > > +   unsigned int delta = 0;
> > > +   unsigned int metasize = 0;
> > > +   void *orig_data;
> > > +   u32 act;
> > > +
> > > +   if (unlikely(hdr->hdr.gso_type))
> > > +   goto err_xdp;
> > > +
> > > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > +
> > > +   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> > > +   int offset = buf - page_address(page) + header_offset;
> > > +   unsigned int tlen = len + vi->hdr_len;
> > > +   int num_buf = 1;
> > > +
> > > +   xdp_headroom = virtnet_get_headroom(vi);
> > > +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > > +   headroom = vi->hdr_len + header_offset;
> > > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > +   xdp_page = xdp_linearize_page(rq, &num_buf, page,
> > > + offset, header_offset,
> > > + &tlen);
> > > +   if (!xdp_page)
> > > +   goto err_xdp;
> > > +
> > > +   buf = page_address(xdp_page);
> > > +   put_page(page);
> > > +   page = xdp_page;
> > > +   }
> > > +
> > > +   xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
> > > +   xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
> > > +xdp_headroom, len, true);
> > > +   orig_data = xdp.data;
> > > +
> > > +   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> > > +
> > > +   switch (act) {
> > > +   case XDP_PASS:
> > > +   /* Recalculate length in case bpf program changed it */
> > > +   delta = orig_data - xdp.data;
> > > +   len = xdp.data_end - xdp.data;
> > > +   metasize = xdp.data - xdp.data_meta;
> > > +   break;
> > > +
> > > +   case XDP_TX:
> > > +   case XDP_REDIRECT:
> > > +   goto xdp_xmit;
> > > +
> > > +   default:
> > > +   goto err_xdp;
> > > +   }
> > > +
> > > +   skb = build_skb(buf, buflen);
> > > +   if (!skb)
> > > +   goto err;
> > > +
> > > +   skb_reserve(skb, headroom - delta);
> > > +   skb_put(skb, len);
> > > +   if (metasize)
> > > +   skb_metadata_set(skb, metasize);
> > > +
> > > +   return skb;
> > > +
> > > +err_xdp:
> > > +   stats->xdp_drops++;
> > > +err:
> > > +   stats->drops++;
> > > +   put_page(page);
> > > +xdp_xmit:
> > > +   return NULL;
> > > +}
> >
> > It looks like some of the comments of the above version is not addressed?
> >
> > "
> > So we end up with some code duplication between receive_small() and
> > receive_small_xdp() on building skbs. Is this intended?
> > "
>
> I answer you in the #13 commit of the above version. This patch-set has 
> optimize
> this with the last two 

Re: [PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 03:51:47 -0400, "Michael S. Tsirkin"  
wrote:
> On Tue, Apr 25, 2023 at 03:36:02PM +0800, Xuan Zhuo wrote:
> > ## About DMA APIs
> >
> > Now, virtio may can not work with DMA APIs when virtio features do not have
> > VIRTIO_F_ACCESS_PLATFORM.
> >
> > 1. I tried to let DMA APIs return phy address by virtio-device. But DMA 
> > APIs just
> >work with the "real" devices.
> > 2. I tried to let xsk support callballs to get phy address from virtio-net
> >driver as the dma address. But the maintainers of xsk may want to use 
> > dma-buf
> >to replace the DMA APIs. I think that may be a larger effort. We will 
> > wait
> >too long.
> >
> > So rethinking this, firstly, we can support premapped-dma only for devices 
> > with
> > VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use 
> > it,
> > they have to update the device to support VIRTIO_F_RING_RESET, and they can 
> > also
> > enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.
>
> I don't understand this last sentence. If you think ring
> reset can change device features then the answer is no, it can't.


Sorry, I should remove "by the way".


>
> If you are saying device has to set VIRTIO_F_ACCESS_PLATFORM to
> benefit from this work, that's fine at least as a first approach.
> Note that setting VIRTIO_F_ACCESS_PLATFORM breaks old guests
> (it's a secirity boundary), e.g. it is not available for
> transitional devices.
> So to support transitional devices, we might want to find another way to
> address this down the road,

Maybe dma-buf is a way. I'll look into it, especially some practice on xsk.

> but as a first step, I agree just going with
> DMA is fine.


Thanks.


>
>
> > Thanks for the help from Christoph.
> >
> > =
> >
> > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > performance of zero copy is very good.
> >
> > ENV: Qemu with vhost.
> >
> >vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> > -|---|--|
> > xmit by sockperf: 90%|   100%|  |  318967
> > xmit by xsk:  100%   |   30% |   33%| 1192064
> > recv by sockperf: 100%   |   68% |   100%   |  692288
> > recv by xsk:  100%   |   33% |   43%|  771670
> >
> > Before achieving the function of Virtio-Net, we also have to let virtio core
> > support these features:
> >
> > 1. virtio core support premapped
> > 2. virtio core support reset per-queue
> > 3. introduce DMA APIs to virtio core
> >
> > Please review.
> >
> > Thanks.
> >
> > v7:
> >  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> >
> > v6:
> >  1. change the size of the flags to u32.
> >
> > v5:
> >  1. fix for error handler
> >  2. add flags to record internal dma mapping
> >
> > v4:
> >  1. rename map_inter to dma_map_internal
> >  2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
> >
> > v3:
> >  1. add map_inter to struct desc state to reocrd whether virtio core do dma 
> > map
> >
> > v2:
> >  1. based on sgs[0]->dma_address to judgment is premapped
> >  2. based on extra.addr to judgment to do unmap for no-indirect desc
> >  3. based on indir_desc to judgment to do unmap for indirect desc
> >  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> >
> > v1:
> >  1. expose dma device. NO introduce the api for dma and sync
> >  2. split some commit for review.
> >
> > Xuan Zhuo (11):
> >   virtio_ring: split: separate dma codes
> >   virtio_ring: packed: separate dma codes
> >   virtio_ring: packed-indirect: separate dma codes
> >   virtio_ring: split: support premapped
> >   virtio_ring: packed: support premapped
> >   virtio_ring: packed-indirect: support premapped
> >   virtio_ring: update document for virtqueue_add_*
> >   virtio_ring: introduce virtqueue_dma_dev()
> >   virtio_ring: correct the expression of the description of
> > virtqueue_resize()
> >   virtio_ring: separate the logic of reset/enable from virtqueue_resize
> >   virtio_ring: introduce virtqueue_reset()
> >
> >  drivers/virtio/virtio_ring.c | 352 +--
> >  include/linux/virtio.h   |   4 +
> >  2 files changed, 259 insertions(+), 97 deletions(-)
> >
> > --
> > 2.32.0.3.g01195cf9f
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 10/15] virtio_net: introduce receive_small_xdp()

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 15:58:03 +0800, Jason Wang  wrote:
> On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  wrote:
> >
> > The purpose of this patch is to simplify the receive_small().
> > Separate all the logic of XDP of small into a function.
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/net/virtio_net.c | 165 ---
> >  1 file changed, 100 insertions(+), 65 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index de5a579e8603..9b5fd2e0d27f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -931,6 +931,99 @@ static struct page *xdp_linearize_page(struct 
> > receive_queue *rq,
> > return NULL;
> >  }
> >
> > +static struct sk_buff *receive_small_xdp(struct net_device *dev,
> > +struct virtnet_info *vi,
> > +struct receive_queue *rq,
> > +struct bpf_prog *xdp_prog,
> > +void *buf,
> > +unsigned int xdp_headroom,
> > +unsigned int len,
> > +unsigned int *xdp_xmit,
> > +struct virtnet_rq_stats *stats)
> > +{
> > +   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > +   unsigned int headroom = vi->hdr_len + header_offset;
> > +   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
> > +   struct page *page = virt_to_head_page(buf);
> > +   struct page *xdp_page;
> > +   unsigned int buflen;
> > +   struct xdp_buff xdp;
> > +   struct sk_buff *skb;
> > +   unsigned int delta = 0;
> > +   unsigned int metasize = 0;
> > +   void *orig_data;
> > +   u32 act;
> > +
> > +   if (unlikely(hdr->hdr.gso_type))
> > +   goto err_xdp;
> > +
> > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +
> > +   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> > +   int offset = buf - page_address(page) + header_offset;
> > +   unsigned int tlen = len + vi->hdr_len;
> > +   int num_buf = 1;
> > +
> > +   xdp_headroom = virtnet_get_headroom(vi);
> > +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> > +   headroom = vi->hdr_len + header_offset;
> > +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> > +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +   xdp_page = xdp_linearize_page(rq, &num_buf, page,
> > + offset, header_offset,
> > + &tlen);
> > +   if (!xdp_page)
> > +   goto err_xdp;
> > +
> > +   buf = page_address(xdp_page);
> > +   put_page(page);
> > +   page = xdp_page;
> > +   }
> > +
> > +   xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
> > +   xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
> > +xdp_headroom, len, true);
> > +   orig_data = xdp.data;
> > +
> > +   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> > +
> > +   switch (act) {
> > +   case XDP_PASS:
> > +   /* Recalculate length in case bpf program changed it */
> > +   delta = orig_data - xdp.data;
> > +   len = xdp.data_end - xdp.data;
> > +   metasize = xdp.data - xdp.data_meta;
> > +   break;
> > +
> > +   case XDP_TX:
> > +   case XDP_REDIRECT:
> > +   goto xdp_xmit;
> > +
> > +   default:
> > +   goto err_xdp;
> > +   }
> > +
> > +   skb = build_skb(buf, buflen);
> > +   if (!skb)
> > +   goto err;
> > +
> > +   skb_reserve(skb, headroom - delta);
> > +   skb_put(skb, len);
> > +   if (metasize)
> > +   skb_metadata_set(skb, metasize);
> > +
> > +   return skb;
> > +
> > +err_xdp:
> > +   stats->xdp_drops++;
> > +err:
> > +   stats->drops++;
> > +   put_page(page);
> > +xdp_xmit:
> > +   return NULL;
> > +}
>
> It looks like some of the comments of the above version is not addressed?
>
> "
> So we end up with some code duplication between receive_small() and
> receive_small_xdp() on building skbs. Is this intended?
> "

I answer you in the #13 commit of the above version. This patch-set has optimize
this with the last two commits. This commit is not unchanged.

Thanks.


>
> Thanks
>
> > +
> >  static struct sk_buff *receive_small(struct net_device *dev,
> >  struct virtnet_info *vi,
> >  struct receive_queue *rq,
> > @@ -947,9 +1040,6 @@ static struct

Re: [PATCH net-next v3 10/15] virtio_net: introduce receive_small_xdp()

2023-04-25 Thread Jason Wang
On Sun, Apr 23, 2023 at 6:58 PM Xuan Zhuo  wrote:
>
> The purpose of this patch is to simplify the receive_small().
> Separate all the logic of XDP of small into a function.
>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 165 ---
>  1 file changed, 100 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index de5a579e8603..9b5fd2e0d27f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -931,6 +931,99 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
> return NULL;
>  }
>
> +static struct sk_buff *receive_small_xdp(struct net_device *dev,
> +struct virtnet_info *vi,
> +struct receive_queue *rq,
> +struct bpf_prog *xdp_prog,
> +void *buf,
> +unsigned int xdp_headroom,
> +unsigned int len,
> +unsigned int *xdp_xmit,
> +struct virtnet_rq_stats *stats)
> +{
> +   unsigned int header_offset = VIRTNET_RX_PAD + xdp_headroom;
> +   unsigned int headroom = vi->hdr_len + header_offset;
> +   struct virtio_net_hdr_mrg_rxbuf *hdr = buf + header_offset;
> +   struct page *page = virt_to_head_page(buf);
> +   struct page *xdp_page;
> +   unsigned int buflen;
> +   struct xdp_buff xdp;
> +   struct sk_buff *skb;
> +   unsigned int delta = 0;
> +   unsigned int metasize = 0;
> +   void *orig_data;
> +   u32 act;
> +
> +   if (unlikely(hdr->hdr.gso_type))
> +   goto err_xdp;
> +
> +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> +   if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> +   int offset = buf - page_address(page) + header_offset;
> +   unsigned int tlen = len + vi->hdr_len;
> +   int num_buf = 1;
> +
> +   xdp_headroom = virtnet_get_headroom(vi);
> +   header_offset = VIRTNET_RX_PAD + xdp_headroom;
> +   headroom = vi->hdr_len + header_offset;
> +   buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
> +   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +   xdp_page = xdp_linearize_page(rq, &num_buf, page,
> + offset, header_offset,
> + &tlen);
> +   if (!xdp_page)
> +   goto err_xdp;
> +
> +   buf = page_address(xdp_page);
> +   put_page(page);
> +   page = xdp_page;
> +   }
> +
> +   xdp_init_buff(&xdp, buflen, &rq->xdp_rxq);
> +   xdp_prepare_buff(&xdp, buf + VIRTNET_RX_PAD + vi->hdr_len,
> +xdp_headroom, len, true);
> +   orig_data = xdp.data;
> +
> +   act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, stats);
> +
> +   switch (act) {
> +   case XDP_PASS:
> +   /* Recalculate length in case bpf program changed it */
> +   delta = orig_data - xdp.data;
> +   len = xdp.data_end - xdp.data;
> +   metasize = xdp.data - xdp.data_meta;
> +   break;
> +
> +   case XDP_TX:
> +   case XDP_REDIRECT:
> +   goto xdp_xmit;
> +
> +   default:
> +   goto err_xdp;
> +   }
> +
> +   skb = build_skb(buf, buflen);
> +   if (!skb)
> +   goto err;
> +
> +   skb_reserve(skb, headroom - delta);
> +   skb_put(skb, len);
> +   if (metasize)
> +   skb_metadata_set(skb, metasize);
> +
> +   return skb;
> +
> +err_xdp:
> +   stats->xdp_drops++;
> +err:
> +   stats->drops++;
> +   put_page(page);
> +xdp_xmit:
> +   return NULL;
> +}

It looks like some of the comments of the above version is not addressed?

"
So we end up with some code duplication between receive_small() and
receive_small_xdp() on building skbs. Is this intended?
"

Thanks

> +
>  static struct sk_buff *receive_small(struct net_device *dev,
>  struct virtnet_info *vi,
>  struct receive_queue *rq,
> @@ -947,9 +1040,6 @@ static struct sk_buff *receive_small(struct net_device 
> *dev,
> unsigned int buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
>   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> struct page *page = virt_to_head_page(buf);
> -   unsigned int delta = 0;
> -   struct page *xdp_page;
> -   unsigned int metasize = 0;
>
> len -= vi->hdr_len;
> stats->bytes += len;
> @@ -969,56 +1059,10 @@ stat

Re: [PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Michael S. Tsirkin
On Tue, Apr 25, 2023 at 03:36:02PM +0800, Xuan Zhuo wrote:
> ## About DMA APIs
> 
> Now, virtio may can not work with DMA APIs when virtio features do not have
> VIRTIO_F_ACCESS_PLATFORM.
> 
> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs 
> just
>work with the "real" devices.
> 2. I tried to let xsk support callballs to get phy address from virtio-net
>driver as the dma address. But the maintainers of xsk may want to use 
> dma-buf
>to replace the DMA APIs. I think that may be a larger effort. We will wait
>too long.
> 
> So rethinking this, firstly, we can support premapped-dma only for devices 
> with
> VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
> they have to update the device to support VIRTIO_F_RING_RESET, and they can 
> also
> enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.

I don't understand this last sentence. If you think ring
reset can change device features then the answer is no, it can't.

If you are saying device has to set VIRTIO_F_ACCESS_PLATFORM to
benefit from this work, that's fine at least as a first approach.
Note that setting VIRTIO_F_ACCESS_PLATFORM breaks old guests
(it's a secirity boundary), e.g. it is not available for
transitional devices.
So to support transitional devices, we might want to find another way to
address this down the road, but as a first step, I agree just going with
DMA is fine.


> Thanks for the help from Christoph.
> 
> =
> 
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good.
> 
> ENV: Qemu with vhost.
> 
>vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> -|---|--|
> xmit by sockperf: 90%|   100%|  |  318967
> xmit by xsk:  100%   |   30% |   33%| 1192064
> recv by sockperf: 100%   |   68% |   100%   |  692288
> recv by xsk:  100%   |   33% |   43%|  771670
> 
> Before achieving the function of Virtio-Net, we also have to let virtio core
> support these features:
> 
> 1. virtio core support premapped
> 2. virtio core support reset per-queue
> 3. introduce DMA APIs to virtio core
> 
> Please review.
> 
> Thanks.
> 
> v7:
>  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> 
> v6:
>  1. change the size of the flags to u32.
> 
> v5:
>  1. fix for error handler
>  2. add flags to record internal dma mapping
> 
> v4:
>  1. rename map_inter to dma_map_internal
>  2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
> 
> v3:
>  1. add map_inter to struct desc state to reocrd whether virtio core do dma 
> map
> 
> v2:
>  1. based on sgs[0]->dma_address to judgment is premapped
>  2. based on extra.addr to judgment to do unmap for no-indirect desc
>  3. based on indir_desc to judgment to do unmap for indirect desc
>  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> 
> v1:
>  1. expose dma device. NO introduce the api for dma and sync
>  2. split some commit for review.
> 
> Xuan Zhuo (11):
>   virtio_ring: split: separate dma codes
>   virtio_ring: packed: separate dma codes
>   virtio_ring: packed-indirect: separate dma codes
>   virtio_ring: split: support premapped
>   virtio_ring: packed: support premapped
>   virtio_ring: packed-indirect: support premapped
>   virtio_ring: update document for virtqueue_add_*
>   virtio_ring: introduce virtqueue_dma_dev()
>   virtio_ring: correct the expression of the description of
> virtqueue_resize()
>   virtio_ring: separate the logic of reset/enable from virtqueue_resize
>   virtio_ring: introduce virtqueue_reset()
> 
>  drivers/virtio/virtio_ring.c | 352 +--
>  include/linux/virtio.h   |   4 +
>  2 files changed, 259 insertions(+), 97 deletions(-)
> 
> --
> 2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 07/15] virtio_net: auto release xdp shinfo

2023-04-25 Thread Xuan Zhuo
On Tue, 25 Apr 2023 15:41:28 +0800, Jason Wang  wrote:
> On Sun, Apr 23, 2023 at 6:57 PM Xuan Zhuo  wrote:
> >
> > virtnet_build_xdp_buff_mrg() and virtnet_xdp_handler() auto
> > release xdp shinfo then the caller no need to careful the xdp shinfo.
>
> Thinking of this, I think releasing frags in
> virtnet_build_xdp_buff_mrg() is fine. But for virtnet_xdp_handler(),
> it's better to be done by the caller, since the frags were prepared by
> the caller anyhow.


I agree this.

Thanks.


>
> Thanks
>
> >
> > Signed-off-by: Xuan Zhuo 
> > ---
> >  drivers/net/virtio_net.c | 29 +
> >  1 file changed, 17 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 5f37a1cef61e..c6bf425e8844 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -825,7 +825,7 @@ static int virtnet_xdp_handler(struct bpf_prog 
> > *xdp_prog, struct xdp_buff *xdp,
> > xdpf = xdp_convert_buff_to_frame(xdp);
> > if (unlikely(!xdpf)) {
> > netdev_dbg(dev, "convert buff to frame failed for 
> > xdp\n");
> > -   return XDP_DROP;
> > +   goto drop;
> > }
> >
> > err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> > @@ -833,7 +833,7 @@ static int virtnet_xdp_handler(struct bpf_prog 
> > *xdp_prog, struct xdp_buff *xdp,
> > xdp_return_frame_rx_napi(xdpf);
> > } else if (unlikely(err < 0)) {
> > trace_xdp_exception(dev, xdp_prog, act);
> > -   return XDP_DROP;
> > +   goto drop;
> > }
> > *xdp_xmit |= VIRTIO_XDP_TX;
> > return act;
> > @@ -842,7 +842,7 @@ static int virtnet_xdp_handler(struct bpf_prog 
> > *xdp_prog, struct xdp_buff *xdp,
> > stats->xdp_redirects++;
> > err = xdp_do_redirect(dev, xdp, xdp_prog);
> > if (err)
> > -   return XDP_DROP;
> > +   goto drop;
> >
> > *xdp_xmit |= VIRTIO_XDP_REDIR;
> > return act;
> > @@ -854,8 +854,12 @@ static int virtnet_xdp_handler(struct bpf_prog 
> > *xdp_prog, struct xdp_buff *xdp,
> > trace_xdp_exception(dev, xdp_prog, act);
> > fallthrough;
> > case XDP_DROP:
> > -   return XDP_DROP;
> > +   break;
> > }
> > +
> > +drop:
> > +   put_xdp_frags(xdp);
> > +   return XDP_DROP;
> >  }
> >
> >  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> > @@ -1190,7 +1194,7 @@ static int virtnet_build_xdp_buff_mrg(struct 
> > net_device *dev,
> >  dev->name, *num_buf,
> >  virtio16_to_cpu(vi->vdev, 
> > hdr->num_buffers));
> > dev->stats.rx_length_errors++;
> > -   return -EINVAL;
> > +   goto err;
> > }
> >
> > stats->bytes += len;
> > @@ -1209,7 +1213,7 @@ static int virtnet_build_xdp_buff_mrg(struct 
> > net_device *dev,
> > pr_debug("%s: rx error: len %u exceeds truesize 
> > %lu\n",
> >  dev->name, len, (unsigned long)(truesize - 
> > room));
> > dev->stats.rx_length_errors++;
> > -   return -EINVAL;
> > +   goto err;
> > }
> >
> > frag = &shinfo->frags[shinfo->nr_frags++];
> > @@ -1224,6 +1228,10 @@ static int virtnet_build_xdp_buff_mrg(struct 
> > net_device *dev,
> >
> > *xdp_frags_truesize = xdp_frags_truesz;
> > return 0;
> > +
> > +err:
> > +   put_xdp_frags(xdp);
> > +   return -EINVAL;
> >  }
> >
> >  static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
> > @@ -1353,7 +1361,7 @@ static struct sk_buff *receive_mergeable(struct 
> > net_device *dev,
> > err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, 
> > len, frame_sz,
> >  &num_buf, 
> > &xdp_frags_truesz, stats);
> > if (unlikely(err))
> > -   goto err_xdp_frags;
> > +   goto err_xdp;
> >
> > act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, 
> > stats);
> >
> > @@ -1361,7 +1369,7 @@ static struct sk_buff *receive_mergeable(struct 
> > net_device *dev,
> > case XDP_PASS:
> > head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, 
> > xdp_frags_truesz);
> > if (unlikely(!head_skb))
> > -   goto err_xdp_frags;
> > +   goto err_xdp;
> >
> > rcu_read_unlock();
> > return head_skb;
> > @@ -1370,11 +1378,8 @@ stat

Re: [PATCH net-next v3 07/15] virtio_net: auto release xdp shinfo

2023-04-25 Thread Jason Wang
On Sun, Apr 23, 2023 at 6:57 PM Xuan Zhuo  wrote:
>
> virtnet_build_xdp_buff_mrg() and virtnet_xdp_handler() auto
> release xdp shinfo then the caller no need to careful the xdp shinfo.

Thinking of this, I think releasing frags in
virtnet_build_xdp_buff_mrg() is fine. But for virtnet_xdp_handler(),
it's better to be done by the caller, since the frags were prepared by
the caller anyhow.

Thanks

>
> Signed-off-by: Xuan Zhuo 
> ---
>  drivers/net/virtio_net.c | 29 +
>  1 file changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 5f37a1cef61e..c6bf425e8844 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -825,7 +825,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
> struct xdp_buff *xdp,
> xdpf = xdp_convert_buff_to_frame(xdp);
> if (unlikely(!xdpf)) {
> netdev_dbg(dev, "convert buff to frame failed for 
> xdp\n");
> -   return XDP_DROP;
> +   goto drop;
> }
>
> err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
> @@ -833,7 +833,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
> struct xdp_buff *xdp,
> xdp_return_frame_rx_napi(xdpf);
> } else if (unlikely(err < 0)) {
> trace_xdp_exception(dev, xdp_prog, act);
> -   return XDP_DROP;
> +   goto drop;
> }
> *xdp_xmit |= VIRTIO_XDP_TX;
> return act;
> @@ -842,7 +842,7 @@ static int virtnet_xdp_handler(struct bpf_prog *xdp_prog, 
> struct xdp_buff *xdp,
> stats->xdp_redirects++;
> err = xdp_do_redirect(dev, xdp, xdp_prog);
> if (err)
> -   return XDP_DROP;
> +   goto drop;
>
> *xdp_xmit |= VIRTIO_XDP_REDIR;
> return act;
> @@ -854,8 +854,12 @@ static int virtnet_xdp_handler(struct bpf_prog 
> *xdp_prog, struct xdp_buff *xdp,
> trace_xdp_exception(dev, xdp_prog, act);
> fallthrough;
> case XDP_DROP:
> -   return XDP_DROP;
> +   break;
> }
> +
> +drop:
> +   put_xdp_frags(xdp);
> +   return XDP_DROP;
>  }
>
>  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> @@ -1190,7 +1194,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
> *dev,
>  dev->name, *num_buf,
>  virtio16_to_cpu(vi->vdev, hdr->num_buffers));
> dev->stats.rx_length_errors++;
> -   return -EINVAL;
> +   goto err;
> }
>
> stats->bytes += len;
> @@ -1209,7 +1213,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device 
> *dev,
> pr_debug("%s: rx error: len %u exceeds truesize 
> %lu\n",
>  dev->name, len, (unsigned long)(truesize - 
> room));
> dev->stats.rx_length_errors++;
> -   return -EINVAL;
> +   goto err;
> }
>
> frag = &shinfo->frags[shinfo->nr_frags++];
> @@ -1224,6 +1228,10 @@ static int virtnet_build_xdp_buff_mrg(struct 
> net_device *dev,
>
> *xdp_frags_truesize = xdp_frags_truesz;
> return 0;
> +
> +err:
> +   put_xdp_frags(xdp);
> +   return -EINVAL;
>  }
>
>  static void *mergeable_xdp_get_buf(struct virtnet_info *vi,
> @@ -1353,7 +1361,7 @@ static struct sk_buff *receive_mergeable(struct 
> net_device *dev,
> err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, 
> len, frame_sz,
>  &num_buf, &xdp_frags_truesz, 
> stats);
> if (unlikely(err))
> -   goto err_xdp_frags;
> +   goto err_xdp;
>
> act = virtnet_xdp_handler(xdp_prog, &xdp, dev, xdp_xmit, 
> stats);
>
> @@ -1361,7 +1369,7 @@ static struct sk_buff *receive_mergeable(struct 
> net_device *dev,
> case XDP_PASS:
> head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, 
> xdp_frags_truesz);
> if (unlikely(!head_skb))
> -   goto err_xdp_frags;
> +   goto err_xdp;
>
> rcu_read_unlock();
> return head_skb;
> @@ -1370,11 +1378,8 @@ static struct sk_buff *receive_mergeable(struct 
> net_device *dev,
> rcu_read_unlock();
> goto xdp_xmit;
> default:
> -   break;
> +   goto err_xdp;
> }
> -err_xdp_frags:
> -   put_xdp_frags(&xdp);
> -

[PATCH vhost v7 11/11] virtio_ring: introduce virtqueue_reset()

2023-04-25 Thread Xuan Zhuo
Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 33 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 98b3931bf346..764488a6ee0d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2809,6 +2809,39 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+   void (*recycle)(struct virtqueue *vq, void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   int err;
+
+   err = virtqueue_disable_and_recycle(_vq, recycle);
+   if (err)
+   return err;
+
+   if (vq->packed_ring)
+   virtqueue_reinit_packed(vq);
+   else
+   virtqueue_reinit_split(vq);
+
+   return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
  unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 5ebef5d50f04..bb449b857829 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -97,6 +97,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue 
*vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+   void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 09/11] virtio_ring: correct the expression of the description of virtqueue_resize()

2023-04-25 Thread Xuan Zhuo
Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 871ac552b3b6..af46746c9637 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2724,7 +2724,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current 
vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 10/11] virtio_ring: separate the logic of reset/enable from virtqueue_resize

2023-04-25 Thread Xuan Zhuo
The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 58 
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index af46746c9637..98b3931bf346 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2184,6 +2184,43 @@ static int virtqueue_resize_packed(struct virtqueue 
*_vq, u32 num)
return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+void (*recycle)(struct virtqueue *vq, 
void *buf))
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+   void *buf;
+   int err;
+
+   if (!vq->we_own_ring)
+   return -EPERM;
+
+   if (!vdev->config->disable_vq_and_reset)
+   return -ENOENT;
+
+   if (!vdev->config->enable_vq_after_reset)
+   return -ENOENT;
+
+   err = vdev->config->disable_vq_and_reset(_vq);
+   if (err)
+   return err;
+
+   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+   recycle(_vq, buf);
+
+   return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   struct virtio_device *vdev = vq->vq.vdev;
+
+   if (vdev->config->enable_vq_after_reset(_vq))
+   return -EBUSY;
+
+   return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2748,13 +2785,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 void (*recycle)(struct virtqueue *vq, void *buf))
 {
struct vring_virtqueue *vq = to_vvq(_vq);
-   struct virtio_device *vdev = vq->vq.vdev;
-   void *buf;
int err;
 
-   if (!vq->we_own_ring)
-   return -EPERM;
-
if (num > vq->vq.num_max)
return -E2BIG;
 
@@ -2764,28 +2796,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == 
num)
return 0;
 
-   if (!vdev->config->disable_vq_and_reset)
-   return -ENOENT;
-
-   if (!vdev->config->enable_vq_after_reset)
-   return -ENOENT;
-
-   err = vdev->config->disable_vq_and_reset(_vq);
+   err = virtqueue_disable_and_recycle(_vq, recycle);
if (err)
return err;
 
-   while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-   recycle(_vq, buf);
-
if (vq->packed_ring)
err = virtqueue_resize_packed(_vq, num);
else
err = virtqueue_resize_split(_vq, num);
 
-   if (vdev->config->enable_vq_after_reset(_vq))
-   return -EBUSY;
-
-   return err;
+   return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 06/11] virtio_ring: packed-indirect: support premapped

2023-04-25 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use sg->dma_address,
otherwise all dma_address must be null when passing it to the APIs of
virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 4b6c6da4cdb4..0bc011e6e96e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1346,6 +1346,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
unsigned int i, n;
u16 head, id;
dma_addr_t addr;
+   bool dma_map_internal;
 
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
@@ -1363,7 +1364,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs))
goto err_map;
 
for (n = 0; n < out_sgs + in_sgs; n++) {
@@ -1425,6 +1427,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = desc;
vq->packed.desc_state[id].last = id;
+   vq->packed.desc_state[id].flags = dma_map_internal ? 
VRING_STATE_F_MAP_INTERNAL : 0;
+
 
vq->num_added += 1;
 
@@ -1434,7 +1438,8 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (dma_map_internal)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
 err_map:
kfree(desc);
@@ -1661,7 +1666,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
if (!desc)
return;
 
-   if (vq->use_dma_api) {
+   if (vq->use_dma_api && dma_map_internal) {
len = vq->packed.desc_extra[id].len;
for (i = 0; i < len / sizeof(struct vring_packed_desc);
i++)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 05/11] virtio_ring: packed: support premapped

2023-04-25 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 055c1c069fff..4b6c6da4cdb4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -81,6 +81,7 @@ struct vring_desc_state_packed {
struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
u16 num;/* Descriptor list length. */
u16 last;   /* The last desc state in a list. */
+   u32 flags;  /* State flags. */
 };
 
 struct vring_desc_extra {
@@ -1272,7 +1273,8 @@ static u16 packed_last_used(u16 last_used_idx)
 }
 
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-const struct vring_desc_extra *extra)
+const struct vring_desc_extra *extra,
+bool dma_map_internal)
 {
u16 flags;
 
@@ -1287,6 +1289,9 @@ static void vring_unmap_extra_packed(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!dma_map_internal)
+   return;
+
dma_unmap_page(vring_dma_dev(vq),
   extra->addr, extra->len,
   (flags & VRING_DESC_F_WRITE) ?
@@ -1453,6 +1458,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
u16 head, id, prev, curr;
+   bool dma_map_internal;
int err;
 
START_USE(vq);
@@ -1498,7 +1504,8 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs)) {
END_USE(vq);
return -EIO;
}
@@ -1552,6 +1559,7 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
vq->packed.desc_state[id].data = data;
vq->packed.desc_state[id].indir_desc = ctx;
vq->packed.desc_state[id].last = prev;
+   vq->packed.desc_state[id].flags = dma_map_internal ? 
VRING_STATE_F_MAP_INTERNAL : 0;
 
/*
 * A driver MUST NOT make the first descriptor in the list
@@ -1623,8 +1631,10 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
struct vring_desc_state_packed *state = NULL;
struct vring_packed_desc *desc;
unsigned int i, curr;
+   bool dma_map_internal;
 
state = &vq->packed.desc_state[id];
+   dma_map_internal = !!(state->flags & VRING_STATE_F_MAP_INTERNAL);
 
/* Clear data ptr. */
state->data = NULL;
@@ -1637,7 +1647,8 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
curr = id;
for (i = 0; i < state->num; i++) {
vring_unmap_extra_packed(vq,
-&vq->packed.desc_extra[curr]);
+&vq->packed.desc_extra[curr],
+dma_map_internal);
curr = vq->packed.desc_extra[curr].next;
}
}
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 07/11] virtio_ring: update document for virtqueue_add_*

2023-04-25 Thread Xuan Zhuo
Update the document of virtqueue_add_* series API, allowing the callers to
use sg->dma_address to pass the dma address to Virtio Core.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0bc011e6e96e..e7921b27bb01 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2218,6 +2218,10 @@ static inline int virtqueue_add(struct virtqueue *_vq,
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_sgs(struct virtqueue *_vq,
@@ -2252,6 +2256,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_outbuf(struct virtqueue *vq,
@@ -2274,6 +2282,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf(struct virtqueue *vq,
@@ -2297,6 +2309,10 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
  * Caller must ensure we don't call this with other virtqueue operations
  * at the same time (except where noted).
  *
+ * If the caller has done dma map then use sg->dma_address to pass dma address.
+ * If one sg->dma_address is used, then all sgs must use sg->dma_address;
+ * otherwise all sg->dma_address must be NULL.
+ *
  * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
  */
 int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 01/11] virtio_ring: split: separate dma codes

2023-04-25 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_split() to
one new function. DMA address will be saved as sg->dma_address if
use_dma_api is true, then virtqueue_add_split() will use it directly.
Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 122 +++
 1 file changed, 94 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..b073a70c1291 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -379,6 +379,14 @@ static dma_addr_t vring_map_one_sg(const struct 
vring_virtqueue *vq,
direction);
 }
 
+static dma_addr_t vring_sg_address(struct scatterlist *sg)
+{
+   if (sg->dma_address)
+   return sg->dma_address;
+
+   return (dma_addr_t)sg_phys(sg);
+}
+
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
   void *cpu_addr, size_t size,
   enum dma_data_direction direction)
@@ -520,6 +528,80 @@ static inline unsigned int virtqueue_add_desc_split(struct 
virtqueue *vq,
return next;
 }
 
+static void virtqueue_unmap_sgs(struct vring_virtqueue *vq,
+   struct scatterlist *sgs[],
+   unsigned int total_sg,
+   unsigned int out_sgs,
+   unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_TO_DEVICE);
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   if (!sg->dma_address)
+   return;
+
+   dma_unmap_page(vring_dma_dev(vq), sg->dma_address,
+  sg->length, DMA_FROM_DEVICE);
+   }
+   }
+}
+
+static int virtqueue_map_sgs(struct vring_virtqueue *vq,
+struct scatterlist *sgs[],
+unsigned int total_sg,
+unsigned int out_sgs,
+unsigned int in_sgs)
+{
+   struct scatterlist *sg;
+   unsigned int n;
+
+   if (!vq->use_dma_api)
+   return 0;
+
+   for (n = 0; n < out_sgs; n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   for (; n < (out_sgs + in_sgs); n++) {
+   for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_FROM_DEVICE);
+
+   if (vring_mapping_error(vq, addr))
+   goto err;
+
+   sg->dma_address = addr;
+   }
+   }
+
+   return 0;
+
+err:
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   return -ENOMEM;
+}
+
 static inline int virtqueue_add_split(struct virtqueue *_vq,
  struct scatterlist *sgs[],
  unsigned int total_sg,
@@ -532,9 +614,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct scatterlist *sg;
struct vring_desc *desc;
-   unsigned int i, n, avail, descs_used, prev, err_idx;
-   int head;
+   unsigned int i, n, avail, descs_used, prev;
bool indirect;
+   int head;
 
START_USE(vq);
 
@@ -586,32 +668,30 @@ static inline int virtqueue_add_split(struct virtqueue 
*_vq,
return -ENOSPC;
}
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   goto err_map;
+
for (n = 0; n < out_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, 
DMA_TO_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
prev = i;
/* Note that we trust indirect descriptor
 * table since it use stream DMA mapping.
 */
- 

[PATCH vhost v7 08/11] virtio_ring: introduce virtqueue_dma_dev()

2023-04-25 Thread Xuan Zhuo
Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 17 +
 include/linux/virtio.h   |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e7921b27bb01..871ac552b3b6 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2325,6 +2325,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   if (vq->use_dma_api)
+   return vring_dma_dev(vq);
+   else
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..5ebef5d50f04 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
  void *data,
  gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 03/11] virtio_ring: packed-indirect: separate dma codes

2023-04-25 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_indirect_packed().

DMA address will be saved as sg->dma_address, then
virtqueue_add_indirect_packed() will use it directly. Unmap operation
will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3e7797ee45ea..62db16a15ae7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1323,7 +1323,7 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
 {
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, err_idx;
+   unsigned int i, n;
u16 head, id;
dma_addr_t addr;
 
@@ -1343,16 +1343,14 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   goto err_map;
+
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
desc[i].flags = cpu_to_le16(n < out_sgs ?
0 : VRING_DESC_F_WRITE);
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
i++;
}
@@ -1416,11 +1414,9 @@ static int virtqueue_add_indirect_packed(struct 
vring_virtqueue *vq,
return 0;
 
 unmap_release:
-   err_idx = i;
-
-   for (i = 0; i < err_idx; i++)
-   vring_unmap_desc_packed(vq, &desc[i]);
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
+err_map:
kfree(desc);
 
END_USE(vq);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 04/11] virtio_ring: split: support premapped

2023-04-25 Thread Xuan Zhuo
virtio core only supports virtual addresses, dma is completed in virtio
core.

In some scenarios (such as the AF_XDP), the memory is allocated
and DMA mapping is completed in advance, so it is necessary for us to
support passing the DMA address to virtio core.

Drives can use sg->dma_address to pass the mapped dma address to virtio
core. If one sg->dma_address is used then all sgs must use
sg->dma_address, otherwise all must be null when passing it to the APIs
of virtio.

Signed-off-by: Xuan Zhuo 
---
 drivers/virtio/virtio_ring.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 62db16a15ae7..055c1c069fff 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -67,9 +67,13 @@
 #define LAST_ADD_TIME_INVALID(vq)
 #endif
 
+#define VRING_STATE_F_MAP_INTERNAL BIT(0)
+
 struct vring_desc_state_split {
void *data; /* Data for callback. */
struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
+   u32 flags;  /* State flags. */
+   u32 padding;
 };
 
 struct vring_desc_state_packed {
@@ -448,7 +452,7 @@ static void vring_unmap_one_split_indirect(const struct 
vring_virtqueue *vq,
 }
 
 static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
- unsigned int i)
+ unsigned int i, bool dma_map_internal)
 {
struct vring_desc_extra *extra = vq->split.desc_extra;
u16 flags;
@@ -465,6 +469,9 @@ static unsigned int vring_unmap_one_split(const struct 
vring_virtqueue *vq,
 (flags & VRING_DESC_F_WRITE) ?
 DMA_FROM_DEVICE : DMA_TO_DEVICE);
} else {
+   if (!dma_map_internal)
+   goto out;
+
dma_unmap_page(vring_dma_dev(vq),
   extra[i].addr,
   extra[i].len,
@@ -615,7 +622,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
struct scatterlist *sg;
struct vring_desc *desc;
unsigned int i, n, avail, descs_used, prev;
-   bool indirect;
+   bool indirect, dma_map_internal;
int head;
 
START_USE(vq);
@@ -668,7 +675,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return -ENOSPC;
}
 
-   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs))
+   dma_map_internal = !sgs[0]->dma_address;
+   if (dma_map_internal && virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, 
in_sgs))
goto err_map;
 
for (n = 0; n < out_sgs; n++) {
@@ -735,6 +743,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
else
vq->split.desc_state[head].indir_desc = ctx;
 
+   vq->split.desc_state[head].flags = dma_map_internal ? 
VRING_STATE_F_MAP_INTERNAL : 0;
+
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
avail = vq->split.avail_idx_shadow & (vq->split.vring.num - 1);
@@ -759,7 +769,8 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
return 0;
 
 unmap_release:
-   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
+   if (dma_map_internal)
+   virtqueue_unmap_sgs(vq, sgs, total_sg, out_sgs, in_sgs);
 
 err_map:
if (indirect)
@@ -805,20 +816,22 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
 {
unsigned int i, j;
__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+   bool dma_map_internal;
 
/* Clear data ptr. */
vq->split.desc_state[head].data = NULL;
+   dma_map_internal = !!(vq->split.desc_state[head].flags & 
VRING_STATE_F_MAP_INTERNAL);
 
/* Put back on free list: unmap first-level descriptors and find end */
i = head;
 
while (vq->split.vring.desc[i].flags & nextflag) {
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, dma_map_internal);
i = vq->split.desc_extra[i].next;
vq->vq.num_free++;
}
 
-   vring_unmap_one_split(vq, i);
+   vring_unmap_one_split(vq, i, dma_map_internal);
vq->split.desc_extra[i].next = vq->free_head;
vq->free_head = head;
 
@@ -840,8 +853,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, 
unsigned int head,
VRING_DESC_F_INDIRECT));
BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-   for (j = 0; j < len / sizeof(struct vring_desc); j++)
-   vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+   if (dma_map_internal) {
+   for (j = 0; j < len / sizeof(struct vring_desc); j++)
+

[PATCH vhost v7 00/11] virtio core prepares for AF_XDP

2023-04-25 Thread Xuan Zhuo
## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs 
just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature by the way.

Thanks for the help from Christoph.

=

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-|---|--|
xmit by sockperf: 90%|   100%|  |  318967
xmit by xsk:  100%   |   30% |   33%| 1192064
recv by sockperf: 100%   |   68% |   100%   |  692288
recv by xsk:  100%   |   33% |   43%|  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.

Xuan Zhuo (11):
  virtio_ring: split: separate dma codes
  virtio_ring: packed: separate dma codes
  virtio_ring: packed-indirect: separate dma codes
  virtio_ring: split: support premapped
  virtio_ring: packed: support premapped
  virtio_ring: packed-indirect: support premapped
  virtio_ring: update document for virtqueue_add_*
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: correct the expression of the description of
virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()

 drivers/virtio/virtio_ring.c | 352 +--
 include/linux/virtio.h   |   4 +
 2 files changed, 259 insertions(+), 97 deletions(-)

--
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH vhost v7 02/11] virtio_ring: packed: separate dma codes

2023-04-25 Thread Xuan Zhuo
DMA-related logic is separated from the virtqueue_add_packed(). DMA
address will be saved as sg->dma_address, then virtqueue_add_packed()
will use it directly. Unmap operation will be simpler.

The purpose of this is to facilitate subsequent support to receive
dma address mapped by drivers.

Signed-off-by: Xuan Zhuo 
Acked-by: Jason Wang 
---
 drivers/virtio/virtio_ring.c | 39 +---
 1 file changed, 9 insertions(+), 30 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b073a70c1291..3e7797ee45ea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1439,9 +1439,9 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
struct vring_virtqueue *vq = to_vvq(_vq);
struct vring_packed_desc *desc;
struct scatterlist *sg;
-   unsigned int i, n, c, descs_used, err_idx;
+   unsigned int i, n, c, descs_used;
__le16 head_flags, flags;
-   u16 head, id, prev, curr, avail_used_flags;
+   u16 head, id, prev, curr;
int err;
 
START_USE(vq);
@@ -1470,7 +1470,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
}
 
head = vq->packed.next_avail_idx;
-   avail_used_flags = vq->packed.avail_used_flags;
 
WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect);
 
@@ -1488,15 +1487,15 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
id = vq->free_head;
BUG_ON(id == vq->packed.vring.num);
 
+   if (virtqueue_map_sgs(vq, sgs, total_sg, out_sgs, in_sgs)) {
+   END_USE(vq);
+   return -EIO;
+   }
+
curr = id;
c = 0;
for (n = 0; n < out_sgs + in_sgs; n++) {
for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-   dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-   DMA_TO_DEVICE : DMA_FROM_DEVICE);
-   if (vring_mapping_error(vq, addr))
-   goto unmap_release;
-
flags = cpu_to_le16(vq->packed.avail_used_flags |
(++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
(n < out_sgs ? 0 : VRING_DESC_F_WRITE));
@@ -1505,12 +1504,12 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
else
desc[i].flags = flags;
 
-   desc[i].addr = cpu_to_le64(addr);
+   desc[i].addr = cpu_to_le64(vring_sg_address(sg));
desc[i].len = cpu_to_le32(sg->length);
desc[i].id = cpu_to_le16(id);
 
if (unlikely(vq->use_dma_api)) {
-   vq->packed.desc_extra[curr].addr = addr;
+   vq->packed.desc_extra[curr].addr = 
vring_sg_address(sg);
vq->packed.desc_extra[curr].len = sg->length;
vq->packed.desc_extra[curr].flags =
le16_to_cpu(flags);
@@ -1556,26 +1555,6 @@ static inline int virtqueue_add_packed(struct virtqueue 
*_vq,
END_USE(vq);
 
return 0;
-
-unmap_release:
-   err_idx = i;
-   i = head;
-   curr = vq->free_head;
-
-   vq->packed.avail_used_flags = avail_used_flags;
-
-   for (n = 0; n < total_sg; n++) {
-   if (i == err_idx)
-   break;
-   vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
-   curr = vq->packed.desc_extra[curr].next;
-   i++;
-   if (i >= vq->packed.vring.num)
-   i = 0;
-   }
-
-   END_USE(vq);
-   return -EIO;
 }
 
 static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization