[PATCH vhost] virtio_net: fix the missing of the dma cpu sync

2023-09-26 Thread Xuan Zhuo
Commit 295525e29a5b ("virtio_net: merge dma operations when filling
mergeable buffers") unmaps the buffer with DMA_ATTR_SKIP_CPU_SYNC when
the dma->ref is zero. We do that with DMA_ATTR_SKIP_CPU_SYNC, because we
do not want to do the sync for the entire page_frag. But that misses the
sync for the current area.

This patch does cpu sync regardless of whether the ref is zero or not.

Fixes: 295525e29a5b ("virtio_net: merge dma operations when filling mergeable 
buffers")
Reported-by: Michael Roth 
Closes: http://lore.kernel.org/all/20230926130451.axgodaa6tvwqs...@amd.com
Signed-off-by: Xuan Zhuo 
---
 drivers/net/virtio_net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 98dc9b49d56b..9ece27dc5144 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -589,16 +589,16 @@ static void virtnet_rq_unmap(struct receive_queue *rq, 
void *buf, u32 len)
 
--dma->ref;
 
-   if (dma->ref) {
-   if (dma->need_sync && len) {
-   offset = buf - (head + sizeof(*dma));
+   if (dma->need_sync && len) {
+   offset = buf - (head + sizeof(*dma));
 
-   virtqueue_dma_sync_single_range_for_cpu(rq->vq, 
dma->addr, offset,
-   len, 
DMA_FROM_DEVICE);
-   }
+   virtqueue_dma_sync_single_range_for_cpu(rq->vq, dma->addr,
+   offset, len,
+   DMA_FROM_DEVICE);
+   }
 
+   if (dma->ref)
return;
-   }
 
virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len,
 DMA_FROM_DEVICE, 
DMA_ATTR_SKIP_CPU_SYNC);
-- 
2.32.0.3.g01195cf9f

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v13 12/12] virtio_net: merge dma operations when filling mergeable buffers

2023-09-26 Thread Xuan Zhuo
On Tue, 26 Sep 2023 12:01:34 -0400, "Michael S. Tsirkin"  
wrote:
> On Thu, Aug 10, 2023 at 08:30:57PM +0800, Xuan Zhuo wrote:
> > Currently, the virtio core will perform a dma operation for each
> > buffer. Although, the same page may be operated multiple times.
> >
> > This patch, the driver does the dma operation and manages the dma
> > address based the feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the pages of the
> > alloc frag. This is beneficial for the iommu device.
> >
> > kernel command line: intel_iommu=on iommu.passthrough=0
> >
> >|  strict=0  | strict=1
> > Before |  775496pps | 428614pps
> > After  | 1109316pps | 742853pps
> >
> > Signed-off-by: Xuan Zhuo 
>
> Hi Xuan Zhuo,
> looks like this patch is causing regressions. Do you have time to debug
> or should I revert?


I have sended a fix to Michael Roth.

I will wait his response. If that is ok, I will post that fix commit.

Thanks.


>
> > ---
> >  drivers/net/virtio_net.c | 228 ++-
> >  1 file changed, 202 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..16adb5ef18f8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,14 @@ static const struct virtnet_stat_desc 
> > virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The dma information of pages allocated at a time. */
> > +struct virtnet_rq_dma {
> > +   dma_addr_t addr;
> > +   u32 ref;
> > +   u16 len;
> > +   u16 need_sync;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> > /* Virtqueue associated with this send _queue */
> > @@ -175,6 +183,12 @@ struct receive_queue {
> > char name[16];
> >
> > struct xdp_rxq_info xdp_rxq;
> > +
> > +   /* Record the last dma info to free after new pages is allocated. */
> > +   struct virtnet_rq_dma *last_dma;
> > +
> > +   /* Do dma by self */
> > +   bool do_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for 
> > indirection table and keysize
> > @@ -549,6 +563,156 @@ static struct sk_buff *page_to_skb(struct 
> > virtnet_info *vi,
> > return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +   struct page *page = virt_to_head_page(buf);
> > +   struct virtnet_rq_dma *dma;
> > +   void *head;
> > +   int offset;
> > +
> > +   head = page_address(page);
> > +
> > +   dma = head;
> > +
> > +   --dma->ref;
> > +
> > +   if (dma->ref) {
> > +   if (dma->need_sync && len) {
> > +   offset = buf - (head + sizeof(*dma));
> > +
> > +   virtqueue_dma_sync_single_range_for_cpu(rq->vq, 
> > dma->addr, offset,
> > +   len, 
> > DMA_FROM_DEVICE);
> > +   }
> > +
> > +   return;
> > +   }
> > +
> > +   virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len,
> > +DMA_FROM_DEVICE, 
> > DMA_ATTR_SKIP_CPU_SYNC);
> > +   put_page(page);
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void 
> > **ctx)
> > +{
> > +   void *buf;
> > +
> > +   buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +   if (buf && rq->do_dma)
> > +   virtnet_rq_unmap(rq, buf, *len);
> > +
> > +   return buf;
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +   void *buf;
> > +
> > +   buf = virtqueue_detach_unused_buf(rq->vq);
> > +   if (buf && rq->do_dma)
> > +   virtnet_rq_unmap(rq, buf, 0);
> > +
> > +   return buf;
> > +}
> > +
> > +static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, 
> > u32 len)
> > +{
> > +   struct virtnet_rq_dma *dma;
> > +   dma_addr_t addr;
> > +   u32 offset;
> > +   void *head;
> > +
> > +   if (!rq->do_dma) {
> > +   sg_init_one(rq->sg, buf, len);
> > +   return;
> > +   }
> > +
> > +   head = page_address(rq->alloc_frag.page);
> > +
> > +   offset = buf - head;
> > +
> > +   dma = head;
> > +
> > +   addr = dma->addr - sizeof(*dma) + offset;
> > +
> > +   sg_init_table(rq->sg, 1);
> > +   rq->sg[0].dma_address = addr;
> > +   rq->sg[0].length = len;
> > +}
> > +
> > +static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t 
> > gfp)
> > +{
> > +   struct page_frag *alloc_frag = >alloc_frag;
> > +   struct virtnet_rq_dma *dma;
> > +   void *buf, *head;
> > +   dma_addr_t addr;
> > +
> > +   if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> > +   return NULL;
> > +
> > +   head = page_address(alloc_frag->page);
> > +
> > +   if (rq->do_dma) {
> > +   dma = head;
> > +
> > +   /* new pages */
> > +   if 

Re: [GIT PULL] virtio: features

2023-09-26 Thread Xuan Zhuo
On Tue, 26 Sep 2023 08:04:51 -0500, Michael Roth  wrote:
> On Sun, Sep 03, 2023 at 06:13:38PM -0400, Michael S. Tsirkin wrote:
> > The following changes since commit 2dde18cd1d8fac735875f2e4987f11817cc0bc2c:
> >
> >   Linux 6.5 (2023-08-27 14:49:51 -0700)
> >
> > are available in the Git repository at:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git 
> > tags/for_linus
> >
> > for you to fetch changes up to 1acfe2c1225899eab5ab724c91b7e1eb2881b9ab:
> >
> >   virtio_ring: fix avail_wrap_counter in virtqueue_add_packed (2023-09-03 
> > 18:10:24 -0400)
> >
> > 
> > virtio: features
> >
> > a small pull request this time around, mostly because the
> > vduse network got postponed to next relase so we can be sure
> > we got the security store right.
> >
> > Signed-off-by: Michael S. Tsirkin 
> >
> > 
> > Eugenio P閞ez (4):
> >   vdpa: add VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK flag
> >   vdpa: accept VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend feature
> >   vdpa: add get_backend_features vdpa operation
> >   vdpa_sim: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK
> >
> > Jason Wang (1):
> >   virtio_vdpa: build affinity masks conditionally
> >
> > Xuan Zhuo (12):
> >   virtio_ring: check use_dma_api before unmap desc for indirect
> >   virtio_ring: put mapping error check in vring_map_one_sg
> >   virtio_ring: introduce virtqueue_set_dma_premapped()
> >   virtio_ring: support add premapped buf
> >   virtio_ring: introduce virtqueue_dma_dev()
> >   virtio_ring: skip unmap for premapped
> >   virtio_ring: correct the expression of the description of 
> > virtqueue_resize()
> >   virtio_ring: separate the logic of reset/enable from virtqueue_resize
> >   virtio_ring: introduce virtqueue_reset()
> >   virtio_ring: introduce dma map api for virtqueue
> >   virtio_ring: introduce dma sync api for virtqueue
> >   virtio_net: merge dma operations when filling mergeable buffers
>
> This ^ patch (upstream commit 295525e29a) seems to cause a
> network-related regression when using SWIOTLB in the guest. I noticed
> this initially testing SEV guests, which use SWIOTLB by default, but
> it can also be seen with normal guests when forcing SWIOTLB via
> swiotlb=force kernel cmdline option. I see it with both 6.6-rc1 and
> 6.6-rc2 (haven't tried rc3 yet, but don't see any related changes
> there), and reverting 714073495f seems to avoid the issue.
>
> Steps to reproduce:
>
> 1) Boot QEMU/KVM guest with 6.6-rc2 with swiotlb=force via something like the 
> following cmdline:
>
>qemu-system-x86_64 \
>-machine q35 -smp 4,maxcpus=255 -cpu EPYC-Milan-v2 \
>-enable-kvm -m 16G,slots=5,maxmem=256G -vga none \
>-device virtio-scsi-pci,id=scsi0,disable-legacy=on,iommu_platform=true \
>-drive 
> file=/home/mroth/storage/ubuntu-18.04-seves2.qcow2,if=none,id=drive0,snapshot=off
>  \
>-device scsi-hd,id=hd0,drive=drive0,bus=scsi0.0 \
>-device 
> virtio-net-pci,netdev=netdev0,id=net0,disable-legacy=on,iommu_platform=true,romfile=
>  \
>-netdev tap,script=/home/mroth/qemu-ifup,id=netdev0 \
>-L /home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu 
> \
>-drive 
> if=pflash,format=raw,unit=0,file=/home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu/OVMF_CODE.fd,readonly
>  \
>-drive 
> if=pflash,format=raw,unit=1,file=/home/mroth/storage/AMDSEV2/snp-release-2023-09-23/usr/local/share/qemu/OVMF_VARS.fd
>  \
>-debugcon file:debug.log -global isa-debugcon.iobase=0x402 -msg 
> timestamp=on \
>-kernel /boot/vmlinuz-6.6.0-rc2-vanilla0+ \
>-initrd /boot/initrd.img-6.6.0-rc2-vanilla0+ \
>-append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro 
> console=ttyS0,115200n8 earlyprintk=serial,ttyS0,115200 debug=1 sev=debug 
> page_poison=0 spec_rstack_overflow=off swiotlb=force"
>
> 2) scp a small file from the host to the guest IP via its virtio-net device.
>Smaller file sizes succeed, but the larger the file the more likely
>it will fail. e.g.:
>
>mroth@host:~$ dd if=/dev/zero of=test bs=1K count=19
>19+0 records in
>19+0 records out
>19456 bytes (19 kB, 19 KiB) copied, 0.000940134 s, 20.7 MB/s
>mroth@host:~$ scp test vm0:
>test
> 100%   19KB  10.1MB/s   00:00
>mroth@host:~$ dd if=/dev/zero of=test bs=1K count=20
>20+0 records in
>20+0 records out
>20480 bytes (20 kB, 20 KiB) copied, 0.00093774 s, 21.8 MB/s
>mroth@host:~$ scp test vm0:
>test  
> 0%0 0.0KB/s   --:-- ETA
>client_loop: send disconnect: Broken pipe
>lost connection
>mroth@host:~$


Hi Michael,

Thanks for the report.

Cloud you try this fix?  I reproduce this issue, and 

Re: [PATCH vfio 03/11] virtio-pci: Introduce admin virtqueue

2023-09-26 Thread Feng Liu via Virtualization



On 2023-09-21 a.m.9:57, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Thu, Sep 21, 2023 at 03:40:32PM +0300, Yishai Hadas wrote:

From: Feng Liu 

Introduce support for the admin virtqueue. By negotiating
VIRTIO_F_ADMIN_VQ feature, driver detects capability and creates one
administration virtqueue. Administration virtqueue implementation in
virtio pci generic layer, enables multiple types of upper layer
drivers such as vfio, net, blk to utilize it.

Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Yishai Hadas 
---
  drivers/virtio/Makefile|  2 +-
  drivers/virtio/virtio.c| 37 +--
  drivers/virtio/virtio_pci_common.h | 15 +-
  drivers/virtio/virtio_pci_modern.c | 10 +++-
  drivers/virtio/virtio_pci_modern_avq.c | 65 ++


if you have a .c file without a .h file you know there's something
fishy. Just add this inside drivers/virtio/virtio_pci_modern.c ?


Will do.


  include/linux/virtio_config.h  |  4 ++
  include/linux/virtio_pci_modern.h  |  3 ++
  7 files changed, 129 insertions(+), 7 deletions(-)
  create mode 100644 drivers/virtio/virtio_pci_modern_avq.c

diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 8e98d24917cc..dcc535b5b4d9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_VIRTIO_PCI_LIB) += virtio_pci_modern_dev.o
  obj-$(CONFIG_VIRTIO_PCI_LIB_LEGACY) += virtio_pci_legacy_dev.o
  obj-$(CONFIG_VIRTIO_MMIO) += virtio_mmio.o
  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
-virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
+virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o virtio_pci_modern_avq.o
  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 3893dc29eb26..f4080692b351 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -302,9 +302,15 @@ static int virtio_dev_probe(struct device *_d)
   if (err)
   goto err;

+ if (dev->config->create_avq) {
+ err = dev->config->create_avq(dev);
+ if (err)
+ goto err;
+ }
+
   err = drv->probe(dev);
   if (err)
- goto err;
+ goto err_probe;

   /* If probe didn't do it, mark device DRIVER_OK ourselves. */
   if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK))
@@ -316,6 +322,10 @@ static int virtio_dev_probe(struct device *_d)
   virtio_config_enable(dev);

   return 0;
+
+err_probe:
+ if (dev->config->destroy_avq)
+ dev->config->destroy_avq(dev);
  err:
   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
   return err;
@@ -331,6 +341,9 @@ static void virtio_dev_remove(struct device *_d)

   drv->remove(dev);

+ if (dev->config->destroy_avq)
+ dev->config->destroy_avq(dev);
+
   /* Driver should have reset device. */
   WARN_ON_ONCE(dev->config->get_status(dev));

@@ -489,13 +502,20 @@ EXPORT_SYMBOL_GPL(unregister_virtio_device);
  int virtio_device_freeze(struct virtio_device *dev)
  {
   struct virtio_driver *drv = drv_to_virtio(dev->dev.driver);
+ int ret;

   virtio_config_disable(dev);

   dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;

- if (drv && drv->freeze)
- return drv->freeze(dev);
+ if (drv && drv->freeze) {
+ ret = drv->freeze(dev);
+ if (ret)
+ return ret;
+ }
+
+ if (dev->config->destroy_avq)
+ dev->config->destroy_avq(dev);

   return 0;
  }
@@ -532,10 +552,16 @@ int virtio_device_restore(struct virtio_device *dev)
   if (ret)
   goto err;

+ if (dev->config->create_avq) {
+ ret = dev->config->create_avq(dev);
+ if (ret)
+ goto err;
+ }
+
   if (drv->restore) {
   ret = drv->restore(dev);
   if (ret)
- goto err;
+ goto err_restore;
   }

   /* If restore didn't do it, mark device DRIVER_OK ourselves. */
@@ -546,6 +572,9 @@ int virtio_device_restore(struct virtio_device *dev)

   return 0;

+err_restore:
+ if (dev->config->destroy_avq)
+ dev->config->destroy_avq(dev);
  err:
   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
   return ret;
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 602021967aaa..9bffa95274b6 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -41,6 +41,14 @@ struct virtio_pci_vq_info {
   unsigned int msix_vector;
  };

+struct virtio_avq {


admin_vq would be better. and this is pci specific yes? so virtio_pci_



Will do.



Re: [PATCH vfio 01/11] virtio-pci: Use virtio pci device layer vq info instead of generic one

2023-09-26 Thread Feng Liu via Virtualization




On 2023-09-21 a.m.9:46, Michael S. Tsirkin wrote:

External email: Use caution opening links or attachments


On Thu, Sep 21, 2023 at 03:40:30PM +0300, Yishai Hadas wrote:

From: Feng Liu 

Currently VQ deletion callback vp_del_vqs() processes generic
virtio_device level VQ list instead of VQ information available at PCI
layer.

To adhere to the layering, use the pci device level VQ information
stored in the virtqueues or vqs.

This also prepares the code to handle PCI layer admin vq life cycle to
be managed within the pci layer and thereby avoid undesired deletion of
admin vq by upper layer drivers (net, console, vfio), in the del_vqs()
callback.



Signed-off-by: Feng Liu 
Reviewed-by: Parav Pandit 
Reviewed-by: Jiri Pirko 
Signed-off-by: Yishai Hadas 
---
  drivers/virtio/virtio_pci_common.c | 12 +---
  drivers/virtio/virtio_pci_common.h |  1 +
  2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index c2524a7207cf..7a3e6edc4dd6 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -232,12 +232,16 @@ static void vp_del_vq(struct virtqueue *vq)
  void vp_del_vqs(struct virtio_device *vdev)
  {
   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
- struct virtqueue *vq, *n;
+ struct virtqueue *vq;
   int i;

- list_for_each_entry_safe(vq, n, >vqs, list) {
+ for (i = 0; i < vp_dev->nvqs; i++) {
+ if (!vp_dev->vqs[i])
+ continue;
+
+ vq = vp_dev->vqs[i]->vq;
   if (vp_dev->per_vq_vectors) {
- int v = vp_dev->vqs[vq->index]->msix_vector;
+ int v = vp_dev->vqs[i]->msix_vector;

   if (v != VIRTIO_MSI_NO_VECTOR) {
   int irq = pci_irq_vector(vp_dev->pci_dev, v);
@@ -294,6 +298,7 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, 
unsigned int nvqs,
   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
   if (!vp_dev->vqs)
   return -ENOMEM;
+ vp_dev->nvqs = nvqs;

   if (per_vq_vectors) {
   /* Best option: one for change interrupt, one per vq. */
@@ -365,6 +370,7 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, 
unsigned int nvqs,
   vp_dev->vqs = kcalloc(nvqs, sizeof(*vp_dev->vqs), GFP_KERNEL);
   if (!vp_dev->vqs)
   return -ENOMEM;
+ vp_dev->nvqs = nvqs;

   err = request_irq(vp_dev->pci_dev->irq, vp_interrupt, IRQF_SHARED,
   dev_name(>dev), vp_dev);
diff --git a/drivers/virtio/virtio_pci_common.h 
b/drivers/virtio/virtio_pci_common.h
index 4b773bd7c58c..602021967aaa 100644
--- a/drivers/virtio/virtio_pci_common.h
+++ b/drivers/virtio/virtio_pci_common.h
@@ -60,6 +60,7 @@ struct virtio_pci_device {

   /* array of all queues for house-keeping */
   struct virtio_pci_vq_info **vqs;
+ u32 nvqs;


I don't much like it that we are adding more duplicated info here.
In fact, we tried removing the vqs array in
5c34d002dcc7a6dd665a19d098b4f4cd5501ba1a - there was some bug in that
patch and the author didn't have the time to debug
so I reverted but I don't really think we need to add to that.



Hi Michael

As explained in commit message, this patch is mainly to prepare for the 
subsequent admin vq patches.


The admin vq is also established using the common mechanism of vring, 
and is added to vdev->vqs in __vring_new_virtqueue(). So vdev->vqs 
contains all virtqueues, including rxq, txq, ctrlvq and admin vq.


admin vq should be managed by the virito_pci layer and should not be 
created or deleted by upper driver (net, blk);
When the upper driver was unloaded, it will call del_vqs() interface, 
which wll call vp_del_vqs(), and vp_del_vqs() should not delete the 
admin vq, but only delete the virtqueues created by the upper driver 
such as rxq, txq, and ctrlq.



vp_dev->vqs[] array only contains virtqueues created by upper driver 
such as rxq, txq, ctrlq. Traversing vp_dev->vqs array can only delete 
the upper virtqueues, without the admin vq. Use the vdev->vqs linked 
list cannot meet the needs.



Can such an explanation be explained clearly? Or do you have any other 
alternative methods?




   /* MSI-X support */
   int msix_enabled;
--
2.27.0



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] crypto: virtio-crypto: call finalize with bh disabled

2023-09-26 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 06:41:58PM +0200, Halil Pasic wrote:
> [..]
> > --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> > +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> > @@ -61,8 +61,9 @@ static void virtio_crypto_akcipher_finalize_req(
> > vc_akcipher_req->src_buf = NULL;
> > vc_akcipher_req->dst_buf = NULL;
> > virtcrypto_clear_request(_akcipher_req->base);
> > -
> > +   local_bh_disable();
> > crypto_finalize_akcipher_request(vc_akcipher_req->base.dataq->engine, 
> > req, err);
> > +   local_bh_enable();
> 
> Thanks Gonglei!
> 
> I did this a quick spin, and it does not seem to be sufficient on s390x.
> Which does not come as a surprise to me, because 
> 
> #define lockdep_assert_in_softirq() \
> do {\
> WARN_ON_ONCE(__lockdep_enabled  &&  \
>  (!in_softirq() || in_irq() || in_nmi()));  \
> } while (0)
> 
> will still warn because  in_irq() still evaluates to true (your patch
> addresses the !in_softirq() part).
> 
> I don't have any results on x86 yet. My current understanding is that the
> virtio-pci transport code disables interrupts locally somewhere in the
> call chain (actually in vp_vring_interrupt() via spin_lock_irqsave())
> and then x86 would be fine. But I will get that verified.
> 
> On the other hand virtio_airq_handler() calls vring_interrupt() with
> interrupts enabled. (While vring_interrupt() is called in a (read)
> critical section in virtio_airq_handler() we use read_lock() and
> not read_lock_irqsave() to grab the lock. Whether that is correct in
> it self (i.e. disregarding the crypto problem) or not I'm not sure right
> now. Will think some more about it tomorrow.) If the way to go forward
> is disabling interrupts in virtio-ccw before vring_interrupt() is
> called, I would be glad to spin a patch for that.
> 
> Copying Conny, as she may have an opinion on this (if I'm not wrong she
> authored that code).
> 
> Regards,
> Halil

On a related note, config change callback is also handled incorrectly
in this driver, it takes a mutex from interrupt context.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-26 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 06:20:45PM +0300, Yishai Hadas wrote:
> On 21/09/2023 22:58, Alex Williamson wrote:
> > On Thu, 21 Sep 2023 15:40:40 +0300
> > Yishai Hadas  wrote:
> > 
> > > Introduce a vfio driver over virtio devices to support the legacy
> > > interface functionality for VFs.
> > > 
> > > Background, from the virtio spec [1].
> > > 
> > > In some systems, there is a need to support a virtio legacy driver with
> > > a device that does not directly support the legacy interface. In such
> > > scenarios, a group owner device can provide the legacy interface
> > > functionality for the group member devices. The driver of the owner
> > > device can then access the legacy interface of a member device on behalf
> > > of the legacy member device driver.
> > > 
> > > For example, with the SR-IOV group type, group members (VFs) can not
> > > present the legacy interface in an I/O BAR in BAR0 as expected by the
> > > legacy pci driver. If the legacy driver is running inside a virtual
> > > machine, the hypervisor executing the virtual machine can present a
> > > virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
> > > legacy driver accesses to this I/O BAR and forwards them to the group
> > > owner device (PF) using group administration commands.
> > > 
> > > 
> > > Specifically, this driver adds support for a virtio-net VF to be exposed
> > > as a transitional device to a guest driver and allows the legacy IO BAR
> > > functionality on top.
> > > 
> > > This allows a VM which uses a legacy virtio-net driver in the guest to
> > > work transparently over a VF which its driver in the host is that new
> > > driver.
> > > 
> > > The driver can be extended easily to support some other types of virtio
> > > devices (e.g virtio-blk), by adding in a few places the specific type
> > > properties as was done for virtio-net.
> > > 
> > > For now, only the virtio-net use case was tested and as such we introduce
> > > the support only for such a device.
> > > 
> > > Practically,
> > > Upon probing a VF for a virtio-net device, in case its PF supports
> > > legacy access over the virtio admin commands and the VF doesn't have BAR
> > > 0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
> > > transitional device with I/O BAR in BAR 0.
> > > 
> > > The existence of the simulated I/O bar is reported later on by
> > > overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
> > > exposes itself as a transitional device by overwriting some properties
> > > upon reading its config space.
> > > 
> > > Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
> > > guest may use it via read/write calls according to the virtio
> > > specification.
> > > 
> > > Any read/write towards the control parts of the BAR will be captured by
> > > the new driver and will be translated into admin commands towards the
> > > device.
> > > 
> > > Any data path read/write access (i.e. virtio driver notifications) will
> > > be forwarded to the physical BAR which its properties were supplied by
> > > the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.
> > > 
> > > With that code in place a legacy driver in the guest has the look and
> > > feel as if having a transitional device with legacy support for both its
> > > control and data path flows.
> > > 
> > > [1]
> > > https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c
> > > 
> > > Signed-off-by: Yishai Hadas 
> > > ---
> > >   MAINTAINERS  |   6 +
> > >   drivers/vfio/pci/Kconfig |   2 +
> > >   drivers/vfio/pci/Makefile|   2 +
> > >   drivers/vfio/pci/virtio/Kconfig  |  15 +
> > >   drivers/vfio/pci/virtio/Makefile |   4 +
> > >   drivers/vfio/pci/virtio/cmd.c|   4 +-
> > >   drivers/vfio/pci/virtio/cmd.h|   8 +
> > >   drivers/vfio/pci/virtio/main.c   | 546 +++
> > >   8 files changed, 585 insertions(+), 2 deletions(-)
> > >   create mode 100644 drivers/vfio/pci/virtio/Kconfig
> > >   create mode 100644 drivers/vfio/pci/virtio/Makefile
> > >   create mode 100644 drivers/vfio/pci/virtio/main.c
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index bf0f54c24f81..5098418c8389 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -22624,6 +22624,12 @@ L:   k...@vger.kernel.org
> > >   S:  Maintained
> > >   F:  drivers/vfio/pci/mlx5/
> > > +VFIO VIRTIO PCI DRIVER
> > > +M:   Yishai Hadas 
> > > +L:   k...@vger.kernel.org
> > > +S:   Maintained
> > > +F:   drivers/vfio/pci/virtio
> > > +
> > >   VFIO PCI DEVICE SPECIFIC DRIVERS
> > >   R:  Jason Gunthorpe 
> > >   R:  Yishai Hadas 
> > > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > > index 8125e5f37832..18c397df566d 100644
> > > --- 

Re: [PATCH] crypto: virtio-crypto: call finalize with bh disabled

2023-09-26 Thread Halil Pasic
[..]
> --- a/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> +++ b/drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
> @@ -61,8 +61,9 @@ static void virtio_crypto_akcipher_finalize_req(
>   vc_akcipher_req->src_buf = NULL;
>   vc_akcipher_req->dst_buf = NULL;
>   virtcrypto_clear_request(_akcipher_req->base);
> -
> + local_bh_disable();
>   crypto_finalize_akcipher_request(vc_akcipher_req->base.dataq->engine, 
> req, err);
> + local_bh_enable();

Thanks Gonglei!

I did this a quick spin, and it does not seem to be sufficient on s390x.
Which does not come as a surprise to me, because 

#define lockdep_assert_in_softirq() \
do {\
WARN_ON_ONCE(__lockdep_enabled  &&  \
 (!in_softirq() || in_irq() || in_nmi()));  \
} while (0)

will still warn because  in_irq() still evaluates to true (your patch
addresses the !in_softirq() part).

I don't have any results on x86 yet. My current understanding is that the
virtio-pci transport code disables interrupts locally somewhere in the
call chain (actually in vp_vring_interrupt() via spin_lock_irqsave())
and then x86 would be fine. But I will get that verified.

On the other hand virtio_airq_handler() calls vring_interrupt() with
interrupts enabled. (While vring_interrupt() is called in a (read)
critical section in virtio_airq_handler() we use read_lock() and
not read_lock_irqsave() to grab the lock. Whether that is correct in
it self (i.e. disregarding the crypto problem) or not I'm not sure right
now. Will think some more about it tomorrow.) If the way to go forward
is disabling interrupts in virtio-ccw before vring_interrupt() is
called, I would be glad to spin a patch for that.

Copying Conny, as she may have an opinion on this (if I'm not wrong she
authored that code).

Regards,
Halil
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()

2023-09-26 Thread Stefan Hajnoczi
On Tue, Sep 26, 2023 at 09:28:15AM +0800, Ming Lei wrote:
> On Mon, Sep 25, 2023 at 05:17:10PM -0400, Stefan Hajnoczi wrote:
> > On Fri, Sep 15, 2023 at 03:04:05PM +0800, Jason Wang wrote:
> > > On Fri, Sep 8, 2023 at 11:25 PM Ming Lei  wrote:
> > > >
> > > > On Fri, Sep 08, 2023 at 08:44:45AM -0600, Jens Axboe wrote:
> > > > > On 9/8/23 8:34 AM, Ming Lei wrote:
> > > > > > On Fri, Sep 08, 2023 at 07:49:53AM -0600, Jens Axboe wrote:
> > > > > >> On 9/8/23 3:30 AM, Ming Lei wrote:
> > > > > >>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > > > > >>> index ad636954abae..95a3d31a1ef1 100644
> > > > > >>> --- a/io_uring/io_uring.c
> > > > > >>> +++ b/io_uring/io_uring.c
> > > > > >>> @@ -1930,6 +1930,10 @@ void io_wq_submit_work(struct io_wq_work 
> > > > > >>> *work)
> > > > > >>>   }
> > > > > >>>   }
> > > > > >>>
> > > > > >>> + /* It is fragile to block POLLED IO, so switch to NON_BLOCK */
> > > > > >>> + if ((req->ctx->flags & IORING_SETUP_IOPOLL) && 
> > > > > >>> def->iopoll_queue)
> > > > > >>> + issue_flags |= IO_URING_F_NONBLOCK;
> > > > > >>> +
> > > > > >>
> > > > > >> I think this comment deserves to be more descriptive. Normally we
> > > > > >> absolutely cannot block for polled IO, it's only OK here because 
> > > > > >> io-wq
> > > > > >
> > > > > > Yeah, we don't do that until commit 2bc057692599 ("block: don't 
> > > > > > make REQ_POLLED
> > > > > > imply REQ_NOWAIT") which actually push the responsibility/risk up to
> > > > > > io_uring.
> > > > > >
> > > > > >> is the issuer and not necessarily the poller of it. That generally 
> > > > > >> falls
> > > > > >> upon the original issuer to poll these requests.
> > > > > >>
> > > > > >> I think this should be a separate commit, coming before the main 
> > > > > >> fix
> > > > > >> which is below.
> > > > > >
> > > > > > Looks fine, actually IO_URING_F_NONBLOCK change isn't a must, and 
> > > > > > the
> > > > > > approach in V2 doesn't need this change.
> > > > > >
> > > > > >>
> > > > > >>> @@ -3363,6 +3367,12 @@ __cold void io_uring_cancel_generic(bool 
> > > > > >>> cancel_all, struct io_sq_data *sqd)
> > > > > >>>   finish_wait(>wait, );
> > > > > >>>   } while (1);
> > > > > >>>
> > > > > >>> + /*
> > > > > >>> +  * Reap events from each ctx, otherwise these requests may take
> > > > > >>> +  * resources and prevent other contexts from being moved on.
> > > > > >>> +  */
> > > > > >>> + xa_for_each(>xa, index, node)
> > > > > >>> + io_iopoll_try_reap_events(node->ctx);
> > > > > >>
> > > > > >> The main issue here is that if someone isn't polling for them, 
> > > > > >> then we
> > > > > >
> > > > > > That is actually what this patch is addressing, :-)
> > > > >
> > > > > Right, that part is obvious :)
> > > > >
> > > > > >> get to wait for a timeout before they complete. This can delay 
> > > > > >> exit, for
> > > > > >> example, as we're now just waiting 30 seconds (or whatever the 
> > > > > >> timeout
> > > > > >> is on the underlying device) for them to get timed out before exit 
> > > > > >> can
> > > > > >> finish.
> > > > > >
> > > > > > For the issue on null_blk, device timeout handler provides
> > > > > > forward-progress, such as requests are released, so new IO can be
> > > > > > handled.
> > > > > >
> > > > > > However, not all devices support timeout, such as virtio device.
> > > > >
> > > > > That's a bug in the driver, you cannot sanely support polled IO and 
> > > > > not
> > > > > be able to deal with timeouts. Someone HAS to reap the requests and
> > > > > there are only two things that can do that - the application doing the
> > > > > polled IO, or if that doesn't happen, a timeout.
> > > >
> > > > OK, then device driver timeout handler has new responsibility of 
> > > > covering
> > > > userspace accident, :-)
> > 
> > Sorry, I don't have enough context so this is probably a silly question:
> > 
> > When an application doesn't reap a polled request, why doesn't the block
> > layer take care of this in a generic way? I don't see anything
> > driver-specific about this.
> 
> block layer doesn't have knowledge to handle that, io_uring knows the
> application is exiting, and can help to reap the events.

I thought the discussion was about I/O timeouts in general but here
you're only mentioning application exit. Are we talking about I/O
timeouts or purely about cleaning up I/O requests when an application
exits?

> 
> But the big question is that if there is really IO timeout for virtio-blk.
> If there is, the reap done in io_uring may never return and cause other
> issue, so if it is done in io_uring, that can be just thought as sort of
> improvement.

virtio-blk drivers have no way of specifying timeouts on the device or
aborting/canceling requests.

virtio-blk devices may fail requests if they implement an internal
timeout mechanism (e.g. the host kernel fails requests after a host
timeout), but this is not controlled by the driver and there is no
guarantee that the device 

Re: [PATCH vhost v13 12/12] virtio_net: merge dma operations when filling mergeable buffers

2023-09-26 Thread Michael S. Tsirkin
On Thu, Aug 10, 2023 at 08:30:57PM +0800, Xuan Zhuo wrote:
> Currently, the virtio core will perform a dma operation for each
> buffer. Although, the same page may be operated multiple times.
> 
> This patch, the driver does the dma operation and manages the dma
> address based the feature premapped of virtio core.
> 
> This way, we can perform only one dma operation for the pages of the
> alloc frag. This is beneficial for the iommu device.
> 
> kernel command line: intel_iommu=on iommu.passthrough=0
> 
>|  strict=0  | strict=1
> Before |  775496pps | 428614pps
> After  | 1109316pps | 742853pps
> 
> Signed-off-by: Xuan Zhuo 

Hi Xuan Zhuo,
looks like this patch is causing regressions. Do you have time to debug
or should I revert?

> ---
>  drivers/net/virtio_net.c | 228 ++-
>  1 file changed, 202 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 486b5849033d..16adb5ef18f8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -126,6 +126,14 @@ static const struct virtnet_stat_desc 
> virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN ARRAY_SIZE(virtnet_rq_stats_desc)
>  
> +/* The dma information of pages allocated at a time. */
> +struct virtnet_rq_dma {
> + dma_addr_t addr;
> + u32 ref;
> + u16 len;
> + u16 need_sync;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>   /* Virtqueue associated with this send _queue */
> @@ -175,6 +183,12 @@ struct receive_queue {
>   char name[16];
>  
>   struct xdp_rxq_info xdp_rxq;
> +
> + /* Record the last dma info to free after new pages is allocated. */
> + struct virtnet_rq_dma *last_dma;
> +
> + /* Do dma by self */
> + bool do_dma;
>  };
>  
>  /* This structure can contain rss message with maximum settings for 
> indirection table and keysize
> @@ -549,6 +563,156 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   return skb;
>  }
>  
> +static void virtnet_rq_unmap(struct receive_queue *rq, void *buf, u32 len)
> +{
> + struct page *page = virt_to_head_page(buf);
> + struct virtnet_rq_dma *dma;
> + void *head;
> + int offset;
> +
> + head = page_address(page);
> +
> + dma = head;
> +
> + --dma->ref;
> +
> + if (dma->ref) {
> + if (dma->need_sync && len) {
> + offset = buf - (head + sizeof(*dma));
> +
> + virtqueue_dma_sync_single_range_for_cpu(rq->vq, 
> dma->addr, offset,
> + len, 
> DMA_FROM_DEVICE);
> + }
> +
> + return;
> + }
> +
> + virtqueue_dma_unmap_single_attrs(rq->vq, dma->addr, dma->len,
> +  DMA_FROM_DEVICE, 
> DMA_ATTR_SKIP_CPU_SYNC);
> + put_page(page);
> +}
> +
> +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void 
> **ctx)
> +{
> + void *buf;
> +
> + buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> + if (buf && rq->do_dma)
> + virtnet_rq_unmap(rq, buf, *len);
> +
> + return buf;
> +}
> +
> +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +{
> + void *buf;
> +
> + buf = virtqueue_detach_unused_buf(rq->vq);
> + if (buf && rq->do_dma)
> + virtnet_rq_unmap(rq, buf, 0);
> +
> + return buf;
> +}
> +
> +static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 
> len)
> +{
> + struct virtnet_rq_dma *dma;
> + dma_addr_t addr;
> + u32 offset;
> + void *head;
> +
> + if (!rq->do_dma) {
> + sg_init_one(rq->sg, buf, len);
> + return;
> + }
> +
> + head = page_address(rq->alloc_frag.page);
> +
> + offset = buf - head;
> +
> + dma = head;
> +
> + addr = dma->addr - sizeof(*dma) + offset;
> +
> + sg_init_table(rq->sg, 1);
> + rq->sg[0].dma_address = addr;
> + rq->sg[0].length = len;
> +}
> +
> +static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
> +{
> + struct page_frag *alloc_frag = >alloc_frag;
> + struct virtnet_rq_dma *dma;
> + void *buf, *head;
> + dma_addr_t addr;
> +
> + if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
> + return NULL;
> +
> + head = page_address(alloc_frag->page);
> +
> + if (rq->do_dma) {
> + dma = head;
> +
> + /* new pages */
> + if (!alloc_frag->offset) {
> + if (rq->last_dma) {
> + /* Now, the new page is allocated, the last dma
> +  * will not be used. So the dma can be unmapped
> +  * if the ref is 0.
> +  */
> + virtnet_rq_unmap(rq, rq->last_dma, 0);
> +

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-26 Thread Yishai Hadas via Virtualization

On 21/09/2023 22:58, Alex Williamson wrote:

On Thu, 21 Sep 2023 15:40:40 +0300
Yishai Hadas  wrote:


Introduce a vfio driver over virtio devices to support the legacy
interface functionality for VFs.

Background, from the virtio spec [1].

In some systems, there is a need to support a virtio legacy driver with
a device that does not directly support the legacy interface. In such
scenarios, a group owner device can provide the legacy interface
functionality for the group member devices. The driver of the owner
device can then access the legacy interface of a member device on behalf
of the legacy member device driver.

For example, with the SR-IOV group type, group members (VFs) can not
present the legacy interface in an I/O BAR in BAR0 as expected by the
legacy pci driver. If the legacy driver is running inside a virtual
machine, the hypervisor executing the virtual machine can present a
virtual device with an I/O BAR in BAR0. The hypervisor intercepts the
legacy driver accesses to this I/O BAR and forwards them to the group
owner device (PF) using group administration commands.


Specifically, this driver adds support for a virtio-net VF to be exposed
as a transitional device to a guest driver and allows the legacy IO BAR
functionality on top.

This allows a VM which uses a legacy virtio-net driver in the guest to
work transparently over a VF which its driver in the host is that new
driver.

The driver can be extended easily to support some other types of virtio
devices (e.g virtio-blk), by adding in a few places the specific type
properties as was done for virtio-net.

For now, only the virtio-net use case was tested and as such we introduce
the support only for such a device.

Practically,
Upon probing a VF for a virtio-net device, in case its PF supports
legacy access over the virtio admin commands and the VF doesn't have BAR
0, we set some specific 'vfio_device_ops' to be able to simulate in SW a
transitional device with I/O BAR in BAR 0.

The existence of the simulated I/O bar is reported later on by
overwriting the VFIO_DEVICE_GET_REGION_INFO command and the device
exposes itself as a transitional device by overwriting some properties
upon reading its config space.

Once we report the existence of I/O BAR as BAR 0 a legacy driver in the
guest may use it via read/write calls according to the virtio
specification.

Any read/write towards the control parts of the BAR will be captured by
the new driver and will be translated into admin commands towards the
device.

Any data path read/write access (i.e. virtio driver notifications) will
be forwarded to the physical BAR which its properties were supplied by
the command VIRTIO_PCI_QUEUE_NOTIFY upon the probing/init flow.

With that code in place a legacy driver in the guest has the look and
feel as if having a transitional device with legacy support for both its
control and data path flows.

[1]
https://github.com/oasis-tcs/virtio-spec/commit/03c2d32e5093ca9f2a17797242fbef88efe94b8c

Signed-off-by: Yishai Hadas 
---
  MAINTAINERS  |   6 +
  drivers/vfio/pci/Kconfig |   2 +
  drivers/vfio/pci/Makefile|   2 +
  drivers/vfio/pci/virtio/Kconfig  |  15 +
  drivers/vfio/pci/virtio/Makefile |   4 +
  drivers/vfio/pci/virtio/cmd.c|   4 +-
  drivers/vfio/pci/virtio/cmd.h|   8 +
  drivers/vfio/pci/virtio/main.c   | 546 +++
  8 files changed, 585 insertions(+), 2 deletions(-)
  create mode 100644 drivers/vfio/pci/virtio/Kconfig
  create mode 100644 drivers/vfio/pci/virtio/Makefile
  create mode 100644 drivers/vfio/pci/virtio/main.c

diff --git a/MAINTAINERS b/MAINTAINERS
index bf0f54c24f81..5098418c8389 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -22624,6 +22624,12 @@ L: k...@vger.kernel.org
  S:Maintained
  F:drivers/vfio/pci/mlx5/
  
+VFIO VIRTIO PCI DRIVER

+M: Yishai Hadas 
+L: k...@vger.kernel.org
+S: Maintained
+F: drivers/vfio/pci/virtio
+
  VFIO PCI DEVICE SPECIFIC DRIVERS
  R:Jason Gunthorpe 
  R:Yishai Hadas 
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 8125e5f37832..18c397df566d 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig"
  
  source "drivers/vfio/pci/pds/Kconfig"
  
+source "drivers/vfio/pci/virtio/Kconfig"

+
  endmenu
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 45167be462d8..046139a4eca5 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI)   += mlx5/
  obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
  
  obj-$(CONFIG_PDS_VFIO_PCI) += pds/

+
+obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
diff --git a/drivers/vfio/pci/virtio/Kconfig b/drivers/vfio/pci/virtio/Kconfig
new file mode 100644
index 

Re: [PATCH net-next v1 00/12] vsock/virtio: continue MSG_ZEROCOPY support

2023-09-26 Thread Stefano Garzarella

Hi Arseniy,

On Fri, Sep 22, 2023 at 08:24:16AM +0300, Arseniy Krasnov wrote:

Hello,

this patchset contains second and third parts of another big patchset
for MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-avkras...@sberdevices.ru/

During review of this series, Stefano Garzarella 
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) (merged to net-next, see
  link below)
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
  tx completions) and update for Documentation/. <-- this patchset
3) Updates for tests and utils. <-- this patchset

Part 1) was merged:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=71b263e79370348349553ecdf46f4a69eb436dc7


Thanks for the series.
I did a quick review highlighting some things that need to be changed.

Overall, the series seems to be in good shape. The tests went well.

In the next few days I'll see if I can get a better look at the larger 
patches like the tests, or I'll check in the next version.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v1 12/12] test/vsock: io_uring rx/tx tests

2023-09-26 Thread Stefano Garzarella

On Fri, Sep 22, 2023 at 08:24:28AM +0300, Arseniy Krasnov wrote:

This adds set of tests which use io_uring for rx/tx. This test suite is
implemented as separated util like 'vsock_test' and has the same set of
input arguments as 'vsock_test'. These tests only cover cases of data
transmission (no connect/bind/accept etc).

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v5(big patchset) -> v1:
 * Use LDLIBS instead of LDFLAGS.

tools/testing/vsock/Makefile   |   7 +-
tools/testing/vsock/vsock_uring_test.c | 321 +
2 files changed, 327 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/vsock_uring_test.c

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 1a26f60a596c..c84380bfc18d 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,12 +1,17 @@
# SPDX-License-Identifier: GPL-2.0-only
+ifeq ($(MAKECMDGOALS),vsock_uring_test)
+LDLIBS = -luring
+endif
+


This will fails if for example we call make with more targets,
e.g. `make vsock_test vsock_uring_test`.

I'd suggest to use something like this:

--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,13 +1,11 @@
 # SPDX-License-Identifier: GPL-2.0-only
-ifeq ($(MAKECMDGOALS),vsock_uring_test)
-LDLIBS = -luring
-endif
-
 all: test vsock_perf
 test: vsock_test vsock_diag_test
 vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
 vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
 vsock_perf: vsock_perf.o
+
+vsock_uring_test: LDLIBS = -luring
 vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o

 CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include 
-Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD 
-U_FORTIFY_SOURCE -D_GNU_SOURCE


all: test vsock_perf
test: vsock_test vsock_diag_test
vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o
+vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o


Shoud we add this new test to the "test" target as well?

Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v1 10/12] test/vsock: MSG_ZEROCOPY flag tests

2023-09-26 Thread Stefano Garzarella

On Fri, Sep 22, 2023 at 08:24:26AM +0300, Arseniy Krasnov wrote:

This adds three tests for MSG_ZEROCOPY feature:
1) SOCK_STREAM tx with different buffers.
2) SOCK_SEQPACKET tx with different buffers.
3) SOCK_STREAM test to read empty error queue of the socket.

Signed-off-by: Arseniy Krasnov 
---
tools/testing/vsock/Makefile  |   2 +-
tools/testing/vsock/util.c| 222 +++
tools/testing/vsock/util.h|  19 ++
tools/testing/vsock/vsock_test.c  |  16 ++
tools/testing/vsock/vsock_test_zerocopy.c | 314 ++
tools/testing/vsock/vsock_test_zerocopy.h |  15 ++
6 files changed, 587 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.c
create mode 100644 tools/testing/vsock/vsock_test_zerocopy.h

diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
index 21a98ba565ab..1a26f60a596c 100644
--- a/tools/testing/vsock/Makefile
+++ b/tools/testing/vsock/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
all: test vsock_perf
test: vsock_test vsock_diag_test
-vsock_test: vsock_test.o timeout.o control.o util.o
+vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o
vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
vsock_perf: vsock_perf.o

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 6779d5008b27..d531dbbfa8ff 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -11,15 +11,27 @@
#include 
#include 
#include 
+#include 
#include 
#include 
#include 
#include 
+#include 
+#include 
+#include 

#include "timeout.h"
#include "control.h"
#include "util.h"

+#ifndef SOL_VSOCK
+#define SOL_VSOCK  287
+#endif
+
+#ifndef VSOCK_RECVERR
+#define VSOCK_RECVERR  1
+#endif


Maybe better to re-define them in util.h where we include vm_socktes.h


+
/* Install signal handlers */
void init_signals(void)
{
@@ -444,3 +456,213 @@ unsigned long hash_djb2(const void *data, size_t len)

return hash;
}
+
+void enable_so_zerocopy(int fd)
+{
+   int val = 1;
+
+   if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, , sizeof(val))) {
+   perror("setsockopt");
+   exit(EXIT_FAILURE);
+   }
+}
+
+static void *mmap_no_fail(size_t bytes)
+{
+   void *res;
+
+   res = mmap(NULL, bytes, PROT_READ | PROT_WRITE,
+  MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
+   if (res == MAP_FAILED) {
+   perror("mmap");
+   exit(EXIT_FAILURE);
+   }
+
+   return res;
+}
+
+size_t iovec_bytes(const struct iovec *iov, size_t iovnum)
+{
+   size_t bytes;
+   int i;
+
+   for (bytes = 0, i = 0; i < iovnum; i++)
+   bytes += iov[i].iov_len;
+
+   return bytes;
+}
+
+static void iovec_random_init(struct iovec *iov,
+ const struct vsock_test_data *test_data)
+{
+   int i;
+
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   int j;
+
+   if (test_data->vecs[i].iov_base == MAP_FAILED)
+   continue;
+
+   for (j = 0; j < iov[i].iov_len; j++)
+   ((uint8_t *)iov[i].iov_base)[j] = rand() & 0xff;
+   }
+}
+
+unsigned long iovec_hash_djb2(struct iovec *iov, size_t iovnum)
+{
+   unsigned long hash;
+   size_t iov_bytes;
+   size_t offs;
+   void *tmp;
+   int i;
+
+   iov_bytes = iovec_bytes(iov, iovnum);
+
+   tmp = malloc(iov_bytes);
+   if (!tmp) {
+   perror("malloc");
+   exit(EXIT_FAILURE);
+   }
+
+   for (offs = 0, i = 0; i < iovnum; i++) {
+   memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len);
+   offs += iov[i].iov_len;
+   }
+
+   hash = hash_djb2(tmp, iov_bytes);
+   free(tmp);
+
+   return hash;
+}
+
+struct iovec *iovec_from_test_data(const struct vsock_test_data *test_data)
+{
+   const struct iovec *test_iovec;
+   struct iovec *iovec;
+   int i;
+
+   iovec = malloc(sizeof(*iovec) * test_data->vecs_cnt);
+   if (!iovec) {
+   perror("malloc");
+   exit(EXIT_FAILURE);
+   }
+
+   test_iovec = test_data->vecs;
+
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   iovec[i].iov_len = test_iovec[i].iov_len;
+   iovec[i].iov_base = mmap_no_fail(test_iovec[i].iov_len);
+
+   if (test_iovec[i].iov_base != MAP_FAILED &&
+   test_iovec[i].iov_base)
+   iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base;
+   }
+
+   /* Unmap "invalid" elements. */
+   for (i = 0; i < test_data->vecs_cnt; i++) {
+   if (test_iovec[i].iov_base == MAP_FAILED) {
+   if (munmap(iovec[i].iov_base, iovec[i].iov_len)) {
+   perror("munmap");
+   exit(EXIT_FAILURE);
+   }
+  

Re: [PATCH net-next v1 08/12] vsock: enable setting SO_ZEROCOPY

2023-09-26 Thread Stefano Garzarella

On Fri, Sep 22, 2023 at 08:24:24AM +0300, Arseniy Krasnov wrote:

For AF_VSOCK, zerocopy tx mode depends on transport, so this option must
be set in AF_VSOCK implementation where transport is accessible (if
transport is not set during setting SO_ZEROCOPY: for example socket is
not connected, then SO_ZEROCOPY will be enabled, but once transport will
be assigned, support of this type of transmission will be checked).

To handle SO_ZEROCOPY, AF_VSOCK implementation uses SOCK_CUSTOM_SOCKOPT
bit, thus handling SOL_SOCKET option operations, but all of them except
SO_ZEROCOPY will be forwarded to the generic handler by calling
'sock_setsockopt()'.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v5(big patchset) -> v1:
 * Compact 'if' conditions.
 * Rename 'zc_val' to 'zerocopy'.
 * Use 'zerocopy' value directly in 'sock_valbool_flag()', without
   ?: operator.
 * Set 'SOCK_CUSTOM_SOCKOPT' bit for connectible sockets only, as
   suggested by Bobby Eshleman .

net/vmw_vsock/af_vsock.c | 46 ++--
1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 482300eb88e0..c05a42e02a17 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1406,8 +1406,16 @@ static int vsock_connect(struct socket *sock, struct 
sockaddr *addr,
goto out;
}

-   if (vsock_msgzerocopy_allow(transport))
+   if (vsock_msgzerocopy_allow(transport)) {
set_bit(SOCK_SUPPORT_ZC, >sk_socket->flags);
+   } else if (sock_flag(sk, SOCK_ZEROCOPY)) {
+   /* If this option was set before 'connect()',
+* when transport was unknown, check that this
+* feature is supported here.
+*/
+   err = -EOPNOTSUPP;
+   goto out;
+   }

err = vsock_auto_bind(vsk);
if (err)
@@ -1643,7 +1651,7 @@ static int vsock_connectible_setsockopt(struct socket 
*sock,
const struct vsock_transport *transport;
u64 val;

-   if (level != AF_VSOCK)
+   if (level != AF_VSOCK && level != SOL_SOCKET)
return -ENOPROTOOPT;

#define COPY_IN(_v)   \
@@ -1666,6 +1674,34 @@ static int vsock_connectible_setsockopt(struct socket 
*sock,

transport = vsk->transport;

+   if (level == SOL_SOCKET) {
+   int zerocopy;
+
+   if (optname != SO_ZEROCOPY) {
+   release_sock(sk);
+   return sock_setsockopt(sock, level, optname, optval, 
optlen);
+   }
+
+   /* Use 'int' type here, because variable to
+* set this option usually has this type.
+*/
+   COPY_IN(zerocopy);
+
+   if (zerocopy < 0 || zerocopy > 1) {
+   err = -EINVAL;
+   goto exit;
+   }
+
+   if (transport && !vsock_msgzerocopy_allow(transport)) {
+   err = -EOPNOTSUPP;
+   goto exit;
+   }
+
+   sock_valbool_flag(sk, SOCK_ZEROCOPY,
+ zerocopy);


it's not necessary to wrap this call.


+   goto exit;
+   }
+
switch (optname) {
case SO_VM_SOCKETS_BUFFER_SIZE:
COPY_IN(val);
@@ -2322,6 +2358,12 @@ static int vsock_create(struct net *net, struct socket 
*sock,
}
}

+   /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its
+* proto_ops, so there is no handler for custom logic.
+*/
+   if (sock_type_connectible(sock->type))
+   set_bit(SOCK_CUSTOM_SOCKOPT, >sk_socket->flags);
+
vsock_insert_unbound(vsk);

return 0;
--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v1 02/12] vsock: read from socket's error queue

2023-09-26 Thread Stefano Garzarella

On Fri, Sep 22, 2023 at 08:24:18AM +0300, Arseniy Krasnov wrote:

This adds handling of MSG_ERRQUEUE input flag in receive call. This flag
is used to read socket's error queue instead of data queue. Possible
scenario of error queue usage is receiving completions for transmission
with MSG_ZEROCOPY flag. This patch also adds new defines: 'SOL_VSOCK'
and 'VSOCK_RECVERR'.

Signed-off-by: Arseniy Krasnov 
---
Changelog:
v5(big patchset) -> v1:
 * R-b tag removed, due to added defines to 'include/uapi/linux/vsock.h'.
   Both 'SOL_VSOCK' and 'VSOCK_RECVERR' are needed by userspace, so
   they were placed to 'include/uapi/linux/vsock.h'. At the same time,
   the same define for 'SOL_VSOCK' was placed to 'include/linux/socket.h'.
   This is needed because this file contains SOL_XXX defines for different
   types of socket, so it prevents situation when another new SOL_XXX
   will use constant 287.

include/linux/socket.h | 1 +
include/uapi/linux/vsock.h | 9 +
net/vmw_vsock/af_vsock.c   | 6 ++
3 files changed, 16 insertions(+)
create mode 100644 include/uapi/linux/vsock.h

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 39b74d83c7c4..cfcb7e2c3813 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -383,6 +383,7 @@ struct ucred {
#define SOL_MPTCP   284
#define SOL_MCTP285
#define SOL_SMC 286
+#define SOL_VSOCK  287

/* IPX options */
#define IPX_TYPE1
diff --git a/include/uapi/linux/vsock.h b/include/uapi/linux/vsock.h
new file mode 100644
index ..b25c1347a3b8
--- /dev/null
+++ b/include/uapi/linux/vsock.h


We already have include/uapi/linux/vm_sockets.h

Should we include these changes there instead of creating a new header?


@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_VSOCK_H
+#define _UAPI_LINUX_VSOCK_H
+
+#define SOL_VSOCK  287


Why we need to re-define this also here?

In that case, should we protect with some guards to avoid double
defines?


+
+#define VSOCK_RECVERR  1
+
+#endif /* _UAPI_LINUX_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index d841f4de33b0..4fd11bf34bc7 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -110,6 +110,8 @@
#include 
#include 
#include 
+#include 
+#include 

static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr);
static void vsock_sk_destruct(struct sock *sk);
@@ -2137,6 +2139,10 @@ vsock_connectible_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
int err;

sk = sock->sk;
+
+   if (unlikely(flags & MSG_ERRQUEUE))
+   return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, 
VSOCK_RECVERR);
+
vsk = vsock_sk(sk);
err = 0;

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v1 01/12] vsock: fix EPOLLERR set on non-empty error queue

2023-09-26 Thread Stefano Garzarella

On Fri, Sep 22, 2023 at 08:24:17AM +0300, Arseniy Krasnov wrote:

If socket's error queue is not empty, EPOLLERR must be set. Otherwise,
reader of error queue won't detect data in it using EPOLLERR bit.
Currently for AF_VSOCK this is reproducible only with MSG_ZEROCOPY, as
this feature is the only user of an error queue of the socket.


So this is not really a fix. I'd use a different title to avoid
confusion on backporting this on stable branches or not.

Maybe just "vsock: set EPOLLERR on non-empty error queue"

The change LGTM.

Stefano



Signed-off-by: Arseniy Krasnov 
---
net/vmw_vsock/af_vsock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 013b65241b65..d841f4de33b0 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1030,7 +1030,7 @@ static __poll_t vsock_poll(struct file *file, struct 
socket *sock,
poll_wait(file, sk_sleep(sk), wait);
mask = 0;

-   if (sk->sk_err)
+   if (sk->sk_err || !skb_queue_empty_lockless(>sk_error_queue))
/* Signify that there has been an error on this socket. */
mask |= EPOLLERR;

--
2.25.1



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/16] vdpa: Add support for vq descriptor mappings

2023-09-26 Thread Dragos Tatulea via Virtualization
On Tue, 2023-09-26 at 05:22 -0700, Si-Wei Liu wrote:
> 
> 
> On 9/25/2023 12:59 AM, Dragos Tatulea wrote:
> > On Tue, 2023-09-12 at 16:01 +0300, Dragos Tatulea wrote:
> > > This patch series adds support for vq descriptor table mappings which
> > > are used to improve vdpa live migration downtime. The improvement comes
> > > from using smaller mappings which take less time to create and destroy
> > > in hw.
> > > 
> > Gentle ping.
> > 
> > Note that I will have to send a v2. The changes in mlx5_ifc.h will need to
> > be
> > merged first separately into the mlx5-next branch [0] and then pulled from
> > there
> > when the series is applied.
> This separation is unnecessary, as historically the virtio emulation 
> portion of the update to mlx5_ifc.h often had to go through the vhost 
> tree. See commits 1892a3d425bf and e13cd45d352d. Especially the 
> additions from this series (mainly desc group mkey) have nothing to do 
> with any networking or NIC driver feature.
> 
The reason for doing that is to avoid conflicts in Linus's tree on the
mlx5_ifc.h file.

Thanks,
Dragos

> -Siwei
> 
> > 
> > [0]
> > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next
> > 
> > Thanks,
> > Dragos
> > 
> > > The first part adds the vdpa core changes from Si-Wei [0].
> > > 
> > > The second part adds support in mlx5_vdpa:
> > > - Refactor the mr code to be able to cleanly add descriptor mappings.
> > > - Add hardware descriptor mr support.
> > > - Properly update iotlb for cvq during ASID switch.
> > > 
> > > [0]
> > > https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com
> > > 
> > > Dragos Tatulea (13):
> > >    vdpa/mlx5: Create helper function for dma mappings
> > >    vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code
> > >    vdpa/mlx5: Take cvq iotlb lock during refresh
> > >    vdpa/mlx5: Collapse "dvq" mr add/delete functions
> > >    vdpa/mlx5: Rename mr destroy functions
> > >    vdpa/mlx5: Allow creation/deletion of any given mr struct
> > >    vdpa/mlx5: Move mr mutex out of mr struct
> > >    vdpa/mlx5: Improve mr update flow
> > >    vdpa/mlx5: Introduce mr for vq descriptor
> > >    vdpa/mlx5: Enable hw support for vq descriptor mapping
> > >    vdpa/mlx5: Make iotlb helper functions more generic
> > >    vdpa/mlx5: Update cvq iotlb mapping on ASID change
> > >    Cover letter: vdpa/mlx5: Add support for vq descriptor mappings
> > > 
> > > Si-Wei Liu (3):
> > >    vdpa: introduce dedicated descriptor group for virtqueue
> > >    vhost-vdpa: introduce descriptor group backend feature
> > >    vhost-vdpa: uAPI to get dedicated descriptor group id
> > > 
> > >   drivers/vdpa/mlx5/core/mlx5_vdpa.h |  31 +++--
> > >   drivers/vdpa/mlx5/core/mr.c    | 191 -
> > >   drivers/vdpa/mlx5/core/resources.c |   6 +-
> > >   drivers/vdpa/mlx5/net/mlx5_vnet.c  | 100 ++-
> > >   drivers/vhost/vdpa.c   |  27 
> > >   include/linux/mlx5/mlx5_ifc.h  |   8 +-
> > >   include/linux/mlx5/mlx5_ifc_vdpa.h |   7 +-
> > >   include/linux/vdpa.h   |  11 ++
> > >   include/uapi/linux/vhost.h |   8 ++
> > >   include/uapi/linux/vhost_types.h   |   5 +
> > >   10 files changed, 264 insertions(+), 130 deletions(-)
> > > 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 00/16] vdpa: Add support for vq descriptor mappings

2023-09-26 Thread Si-Wei Liu



On 9/25/2023 12:59 AM, Dragos Tatulea wrote:

On Tue, 2023-09-12 at 16:01 +0300, Dragos Tatulea wrote:

This patch series adds support for vq descriptor table mappings which
are used to improve vdpa live migration downtime. The improvement comes
from using smaller mappings which take less time to create and destroy
in hw.


Gentle ping.

Note that I will have to send a v2. The changes in mlx5_ifc.h will need to be
merged first separately into the mlx5-next branch [0] and then pulled from there
when the series is applied.
This separation is unnecessary, as historically the virtio emulation 
portion of the update to mlx5_ifc.h often had to go through the vhost 
tree. See commits 1892a3d425bf and e13cd45d352d. Especially the 
additions from this series (mainly desc group mkey) have nothing to do 
with any networking or NIC driver feature.


-Siwei



[0]
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next

Thanks,
Dragos


The first part adds the vdpa core changes from Si-Wei [0].

The second part adds support in mlx5_vdpa:
- Refactor the mr code to be able to cleanly add descriptor mappings.
- Add hardware descriptor mr support.
- Properly update iotlb for cvq during ASID switch.

[0]
https://lore.kernel.org/virtualization/1694248959-13369-1-git-send-email-si-wei@oracle.com

Dragos Tatulea (13):
   vdpa/mlx5: Create helper function for dma mappings
   vdpa/mlx5: Decouple cvq iotlb handling from hw mapping code
   vdpa/mlx5: Take cvq iotlb lock during refresh
   vdpa/mlx5: Collapse "dvq" mr add/delete functions
   vdpa/mlx5: Rename mr destroy functions
   vdpa/mlx5: Allow creation/deletion of any given mr struct
   vdpa/mlx5: Move mr mutex out of mr struct
   vdpa/mlx5: Improve mr update flow
   vdpa/mlx5: Introduce mr for vq descriptor
   vdpa/mlx5: Enable hw support for vq descriptor mapping
   vdpa/mlx5: Make iotlb helper functions more generic
   vdpa/mlx5: Update cvq iotlb mapping on ASID change
   Cover letter: vdpa/mlx5: Add support for vq descriptor mappings

Si-Wei Liu (3):
   vdpa: introduce dedicated descriptor group for virtqueue
   vhost-vdpa: introduce descriptor group backend feature
   vhost-vdpa: uAPI to get dedicated descriptor group id

  drivers/vdpa/mlx5/core/mlx5_vdpa.h |  31 +++--
  drivers/vdpa/mlx5/core/mr.c    | 191 -
  drivers/vdpa/mlx5/core/resources.c |   6 +-
  drivers/vdpa/mlx5/net/mlx5_vnet.c  | 100 ++-
  drivers/vhost/vdpa.c   |  27 
  include/linux/mlx5/mlx5_ifc.h  |   8 +-
  include/linux/mlx5/mlx5_ifc_vdpa.h |   7 +-
  include/linux/vdpa.h   |  11 ++
  include/uapi/linux/vhost.h |   8 ++
  include/uapi/linux/vhost_types.h   |   5 +
  10 files changed, 264 insertions(+), 130 deletions(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH vfio 11/11] vfio/virtio: Introduce a vfio driver over virtio devices

2023-09-26 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 10:32:39AM +0800, Jason Wang wrote:
> It's the implementation details in legacy. The device needs to make
> sure (reset) the driver can work (is done before get_status return).

I think that there's no way to make it reliably work for all legacy drivers.

They just assumed a software backend and did not bother with DMA
ordering. You can try to avoid resets, they are not that common so
things will tend to mostly work if you don't stress them to much with
things like hot plug/unplug in a loop.  Or you can try to use a driver
after 2011 which is more aware of hardware ordering and flushes the
reset write with a read.  One of these two tricks, I think, is the magic
behind the device exposing memory bar 0 that you mention.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-09-26 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 02:14:01PM +0300, Yishai Hadas wrote:
> On 22/09/2023 12:54, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas 
> > 
> > This stuff is pure virtio spec. I think it should live under
> > drivers/virtio, too.
> 
> The motivation to put it in the vfio layer was from the below main reasons:
> 
> 1) Having it inside virtio may require to export a symbol/function per
> command.
> 
>    This will end up today by 5 and in the future (e.g. live migration) with
> much more exported symbols.
>
>    With current code we export only 2 generic symbols
> virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for any
> further extension.

Except, there's no reasonable way for virtio to know what is done with
the device then. You are not using just 2 symbols at all, instead you
are using the rich vq API which was explicitly designed for the driver
running the device being responsible for serializing accesses. Which is
actually loaded and running. And I *think* your use won't conflict ATM
mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
at all even hints at the fact that the reason for the complicated
dance is because another driver pokes at some of the vqs.


> 2) For now there is no logic in this vfio layer, however, in the future we
> may have some DMA/other logic that should better fit to the caller/client
> layer (i.e. vfio).

You are poking at the device without any locks etc. Maybe it looks like
no logic to you but it does not look like that from where I stand.

> By the way, this follows what was done already between vfio/mlx5 to
> mlx5_core modules where mlx5_core exposes generic APIs to execute a command
> and to get the a PF from a given mlx5 VF.

This is up to mlx5 maintainers. In particular they only need to worry
that their patches work with specific hardware which they likely have.
virtio has to work with multiple vendors - hardware and software -
and exposing a low level API that I can't test on my laptop
is not at all my ideal.

> This way, we can enable further commands to be added/extended
> easily/cleanly.

Something for vfio maintainer to consider in case it was
assumed that it's just this one weird thing
but otherwise it's all generic vfio. It's not going to stop there,
will it? The duplication of functionality with vdpa will continue :(


I am much more interested in adding reusable functionality that
everyone benefits from than in vfio poking at the device in its
own weird ways that only benefit specific hardware.


> See for example here [1, 2].
> 
> [1] 
> https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210
> 
> [2] 
> https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683
> 
> Yishai



> > 
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index ..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights 
> > > reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > + struct scatterlist out_sg;
> > > + struct virtio_admin_cmd cmd = {};
> > > +
> > > + if (!virtio_dev)
> > > + return -ENOTCONN;
> > > +
> > > + sg_init_one(_sg, buf, buf_size);
> > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > + cmd.result_sg = _sg;
> > > +
> > > + return virtio_admin_cmd_exec(virtio_dev, );
> > > +}
> > > +
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > + struct scatterlist in_sg;
> > > + struct virtio_admin_cmd cmd = {};
> > > +
> > > + if (!virtio_dev)
> > > + return -ENOTCONN;
> > > +
> > > + sg_init_one(_sg, buf, buf_size);
> > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > + cmd.data_sg = _sg;
> > > +
> > > + return virtio_admin_cmd_exec(virtio_dev, );
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 
> 

Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-09-26 Thread Michael S. Tsirkin
On Tue, Sep 26, 2023 at 01:51:13PM +0300, Yishai Hadas wrote:
> On 21/09/2023 23:34, Michael S. Tsirkin wrote:
> > On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:
> > > Expose admin commands over the virtio device, to be used by the
> > > vfio-virtio driver in the next patches.
> > > 
> > > It includes: list query/use, legacy write/read, read notify_info.
> > > 
> > > Signed-off-by: Yishai Hadas 
> > > ---
> > >   drivers/vfio/pci/virtio/cmd.c | 146 ++
> > >   drivers/vfio/pci/virtio/cmd.h |  27 +++
> > >   2 files changed, 173 insertions(+)
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.c
> > >   create mode 100644 drivers/vfio/pci/virtio/cmd.h
> > > 
> > > diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
> > > new file mode 100644
> > > index ..f068239cdbb0
> > > --- /dev/null
> > > +++ b/drivers/vfio/pci/virtio/cmd.c
> > > @@ -0,0 +1,146 @@
> > > +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> > > +/*
> > > + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights 
> > > reserved
> > > + */
> > > +
> > > +#include "cmd.h"
> > > +
> > > +int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > + struct scatterlist out_sg;
> > > + struct virtio_admin_cmd cmd = {};
> > > +
> > > + if (!virtio_dev)
> > > + return -ENOTCONN;
> > > +
> > > + sg_init_one(_sg, buf, buf_size);
> > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
> > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > + cmd.result_sg = _sg;
> > > +
> > > + return virtio_admin_cmd_exec(virtio_dev, );
> > > +}
> > > +
> > in/out seem all wrong here. In virtio terminology, in means from
> > device to driver, out means from driver to device.
> I referred here to in/out from vfio POV who prepares the command.
> 
> However, I can replace it to follow the virtio terminology as you suggested
> if this more makes sense.
> 
> Please see also my coming answer on your suggestion to put all of this in
> the virtio layer.
> 
> > > +int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
> > > +{
> > > + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
> > > + struct scatterlist in_sg;
> > > + struct virtio_admin_cmd cmd = {};
> > > +
> > > + if (!virtio_dev)
> > > + return -ENOTCONN;
> > > +
> > > + sg_init_one(_sg, buf, buf_size);
> > > + cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
> > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > + cmd.data_sg = _sg;
> > > +
> > > + return virtio_admin_cmd_exec(virtio_dev, );
> > > +}
> > > +
> > > +int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 
> > > opcode,
> > 
> > what is _lr short for?
> 
> This was an acronym to legacy_read.
> 
> The actual command is according to the given opcode which can be one among
> LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.
> 
> I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) to
> be clearer.
> 
> > 
> > > +   u8 offset, u8 size, u8 *buf)
> > > +{
> > > + struct virtio_device *virtio_dev =
> > > + virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
> > > + struct virtio_admin_cmd_data_lr_write *in;
> > > + struct scatterlist in_sg;
> > > + struct virtio_admin_cmd cmd = {};
> > > + int ret;
> > > +
> > > + if (!virtio_dev)
> > > + return -ENOTCONN;
> > > +
> > > + in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
> > > + if (!in)
> > > + return -ENOMEM;
> > > +
> > > + in->offset = offset;
> > > + memcpy(in->registers, buf, size);
> > > + sg_init_one(_sg, in, sizeof(*in) + size);
> > > + cmd.opcode = opcode;
> > > + cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
> > > + cmd.group_member_id = virtvdev->vf_id + 1;
> > weird. why + 1?
> 
> This follows the virtio spec in that area.
> 
> "When sending commands with the SR-IOV group type, the driver specify a
> value for group_member_id
> between 1 and NumVFs inclusive."

Ah, I get it. Pls add a comment.

> The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by
> pci_iov_vf_id() which its first index is 0.
> 
> > > + cmd.data_sg = _sg;
> > > + ret = virtio_admin_cmd_exec(virtio_dev, );
> > > +
> > > + kfree(in);
> > > + return ret;
> > > +}
> > How do you know it's safe to send this command, in particular at
> > this time? This seems to be doing zero checks, and zero synchronization
> > with the PF driver.
> > 
> The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by
> calling virtio_pci_vf_get_pf_dev().
> 
> The VF can't gone by 'disable sriov' as it's owned/used by vfio.
> 
> The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in
> use'/dependencies between VFIO to VIRTIO.
> 
> The below check [1] was done only from a clean code perspective, which might
> theoretically fail in case the given VF doesn't use a virtio driver.
> 
> [1] if 

Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-09-26 Thread Yishai Hadas via Virtualization

On 22/09/2023 12:54, Michael S. Tsirkin wrote:

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:

Expose admin commands over the virtio device, to be used by the
vfio-virtio driver in the next patches.

It includes: list query/use, legacy write/read, read notify_info.

Signed-off-by: Yishai Hadas 


This stuff is pure virtio spec. I think it should live under
drivers/virtio, too.


The motivation to put it in the vfio layer was from the below main reasons:

1) Having it inside virtio may require to export a symbol/function per 
command.


   This will end up today by 5 and in the future (e.g. live migration) 
with much more exported symbols.


   With current code we export only 2 generic symbols 
virtio_pci_vf_get_pf_dev(), virtio_admin_cmd_exec() which may fit for 
any further extension.


2) For now there is no logic in this vfio layer, however, in the future 
we may have some DMA/other logic that should better fit to the 
caller/client layer (i.e. vfio).


By the way, this follows what was done already between vfio/mlx5 to 
mlx5_core modules where mlx5_core exposes generic APIs to execute a 
command and to get the a PF from a given mlx5 VF.


This way, we can enable further commands to be added/extended 
easily/cleanly.


See for example here [1, 2].

[1] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L210


[2] 
https://elixir.bootlin.com/linux/v6.6-rc3/source/drivers/vfio/pci/mlx5/cmd.c#L683


Yishai




---
  drivers/vfio/pci/virtio/cmd.c | 146 ++
  drivers/vfio/pci/virtio/cmd.h |  27 +++
  2 files changed, 173 insertions(+)
  create mode 100644 drivers/vfio/pci/virtio/cmd.c
  create mode 100644 drivers/vfio/pci/virtio/cmd.h

diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
new file mode 100644
index ..f068239cdbb0
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "cmd.h"
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct scatterlist out_sg;
+   struct virtio_admin_cmd cmd = {};
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   sg_init_one(_sg, buf, buf_size);
+   cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.result_sg = _sg;
+
+   return virtio_admin_cmd_exec(virtio_dev, );
+}
+
+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct scatterlist in_sg;
+   struct virtio_admin_cmd cmd = {};
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   sg_init_one(_sg, buf, buf_size);
+   cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.data_sg = _sg;
+
+   return virtio_admin_cmd_exec(virtio_dev, );
+}
+
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 
opcode,
+ u8 offset, u8 size, u8 *buf)
+{
+   struct virtio_device *virtio_dev =
+   virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+   struct virtio_admin_cmd_data_lr_write *in;
+   struct scatterlist in_sg;
+   struct virtio_admin_cmd cmd = {};
+   int ret;
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   in->offset = offset;
+   memcpy(in->registers, buf, size);
+   sg_init_one(_sg, in, sizeof(*in) + size);
+   cmd.opcode = opcode;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.group_member_id = virtvdev->vf_id + 1;
+   cmd.data_sg = _sg;
+   ret = virtio_admin_cmd_exec(virtio_dev, );
+
+   kfree(in);
+   return ret;
+}
+
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+u8 offset, u8 size, u8 *buf)
+{
+   struct virtio_device *virtio_dev =
+   virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+   struct virtio_admin_cmd_data_lr_read *in;
+   struct scatterlist in_sg, out_sg;
+   struct virtio_admin_cmd cmd = {};
+   int ret;
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   in = kzalloc(sizeof(*in), GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   in->offset = offset;
+   sg_init_one(_sg, in, sizeof(*in));
+   sg_init_one(_sg, buf, size);
+   cmd.opcode = opcode;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.data_sg = _sg;
+   cmd.result_sg = _sg;
+   cmd.group_member_id = virtvdev->vf_id + 1;
+   ret = 

Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-09-26 Thread Yishai Hadas via Virtualization

On 21/09/2023 23:34, Michael S. Tsirkin wrote:

On Thu, Sep 21, 2023 at 03:40:39PM +0300, Yishai Hadas wrote:

Expose admin commands over the virtio device, to be used by the
vfio-virtio driver in the next patches.

It includes: list query/use, legacy write/read, read notify_info.

Signed-off-by: Yishai Hadas 
---
  drivers/vfio/pci/virtio/cmd.c | 146 ++
  drivers/vfio/pci/virtio/cmd.h |  27 +++
  2 files changed, 173 insertions(+)
  create mode 100644 drivers/vfio/pci/virtio/cmd.c
  create mode 100644 drivers/vfio/pci/virtio/cmd.h

diff --git a/drivers/vfio/pci/virtio/cmd.c b/drivers/vfio/pci/virtio/cmd.c
new file mode 100644
index ..f068239cdbb0
--- /dev/null
+++ b/drivers/vfio/pci/virtio/cmd.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "cmd.h"
+
+int virtiovf_cmd_list_query(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct scatterlist out_sg;
+   struct virtio_admin_cmd cmd = {};
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   sg_init_one(_sg, buf, buf_size);
+   cmd.opcode = VIRTIO_ADMIN_CMD_LIST_QUERY;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.result_sg = _sg;
+
+   return virtio_admin_cmd_exec(virtio_dev, );
+}
+

in/out seem all wrong here. In virtio terminology, in means from
device to driver, out means from driver to device.

I referred here to in/out from vfio POV who prepares the command.

However, I can replace it to follow the virtio terminology as you 
suggested if this more makes sense.


Please see also my coming answer on your suggestion to put all of this 
in the virtio layer.



+int virtiovf_cmd_list_use(struct pci_dev *pdev, u8 *buf, int buf_size)
+{
+   struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev);
+   struct scatterlist in_sg;
+   struct virtio_admin_cmd cmd = {};
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   sg_init_one(_sg, buf, buf_size);
+   cmd.opcode = VIRTIO_ADMIN_CMD_LIST_USE;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.data_sg = _sg;
+
+   return virtio_admin_cmd_exec(virtio_dev, );
+}
+
+int virtiovf_cmd_lr_write(struct virtiovf_pci_core_device *virtvdev, u16 
opcode,


what is _lr short for?


This was an acronym to legacy_read.

The actual command is according to the given opcode which can be one 
among LEGACY_COMMON_CFG_READ, LEGACY_DEV_CFG_READ.


I can rename it to '_legacy_read' (i.e. virtiovf_issue_legacy_read_cmd) 
to be clearer.





+ u8 offset, u8 size, u8 *buf)
+{
+   struct virtio_device *virtio_dev =
+   virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+   struct virtio_admin_cmd_data_lr_write *in;
+   struct scatterlist in_sg;
+   struct virtio_admin_cmd cmd = {};
+   int ret;
+
+   if (!virtio_dev)
+   return -ENOTCONN;
+
+   in = kzalloc(sizeof(*in) + size, GFP_KERNEL);
+   if (!in)
+   return -ENOMEM;
+
+   in->offset = offset;
+   memcpy(in->registers, buf, size);
+   sg_init_one(_sg, in, sizeof(*in) + size);
+   cmd.opcode = opcode;
+   cmd.group_type = VIRTIO_ADMIN_GROUP_TYPE_SRIOV;
+   cmd.group_member_id = virtvdev->vf_id + 1;

weird. why + 1?


This follows the virtio spec in that area.

"When sending commands with the SR-IOV group type, the driver specify a 
value for group_member_id

between 1 and NumVFs inclusive."

The 'virtvdev->vf_id' was set upon vfio/virtio driver initialization by 
pci_iov_vf_id() which its first index is 0.



+   cmd.data_sg = _sg;
+   ret = virtio_admin_cmd_exec(virtio_dev, );
+
+   kfree(in);
+   return ret;
+}

How do you know it's safe to send this command, in particular at
this time? This seems to be doing zero checks, and zero synchronization
with the PF driver.

The virtiovf_cmd_lr_read()/other gets a virtio VF and it gets its PF by 
calling virtio_pci_vf_get_pf_dev().


The VF can't gone by 'disable sriov' as it's owned/used by vfio.

The PF can't gone by rmmod/modprobe -r of virtio, as of the 'module in 
use'/dependencies between VFIO to VIRTIO.


The below check [1] was done only from a clean code perspective, which 
might theoretically fail in case the given VF doesn't use a virtio driver.


[1] if (!virtio_dev)
    return -ENOTCONN;

So, it looks safe as is.


+
+int virtiovf_cmd_lr_read(struct virtiovf_pci_core_device *virtvdev, u16 opcode,
+u8 offset, u8 size, u8 *buf)
+{
+   struct virtio_device *virtio_dev =
+   virtio_pci_vf_get_pf_dev(virtvdev->core_device.pdev);
+   struct virtio_admin_cmd_data_lr_read *in;
+   struct scatterlist in_sg, out_sg;
+   struct virtio_admin_cmd cmd = {};
+   int ret;
+
+   

Re: [PATCH vfio 08/11] vfio/pci: Expose vfio_pci_core_setup_barmap()

2023-09-26 Thread Yishai Hadas via Virtualization

On 21/09/2023 19:35, Alex Williamson wrote:

On Thu, 21 Sep 2023 15:40:37 +0300
Yishai Hadas  wrote:


Expose vfio_pci_core_setup_barmap() to be used by drivers.

This will let drivers to mmap a BAR and re-use it from both vfio and the
driver when it's applicable.

This API will be used in the next patches by the vfio/virtio coming
driver.

Signed-off-by: Yishai Hadas 
---
  drivers/vfio/pci/vfio_pci_core.c | 25 +
  drivers/vfio/pci/vfio_pci_rdwr.c | 28 ++--
  include/linux/vfio_pci_core.h|  1 +
  3 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 1929103ee59a..b56111ed8a8c 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -684,6 +684,31 @@ void vfio_pci_core_disable(struct vfio_pci_core_device 
*vdev)
  }
  EXPORT_SYMBOL_GPL(vfio_pci_core_disable);
  
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar)

+{
+   struct pci_dev *pdev = vdev->pdev;
+   void __iomem *io;
+   int ret;
+
+   if (vdev->barmap[bar])
+   return 0;
+
+   ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
+   if (ret)
+   return ret;
+
+   io = pci_iomap(pdev, bar, 0);
+   if (!io) {
+   pci_release_selected_regions(pdev, 1 << bar);
+   return -ENOMEM;
+   }
+
+   vdev->barmap[bar] = io;
+
+   return 0;
+}
+EXPORT_SYMBOL(vfio_pci_core_setup_barmap);

Not to endorse the rest of this yet, but minimally _GPL, same for the
following patch.  Thanks,

Alex


Sure, will change to EXPORT_SYMBOL_GPL as part of V1.

Yishai


+
  void vfio_pci_core_close_device(struct vfio_device *core_vdev)
  {
struct vfio_pci_core_device *vdev =
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index e27de61ac9fe..6f08b3ecbb89 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -200,30 +200,6 @@ static ssize_t do_io_rw(struct vfio_pci_core_device *vdev, 
bool test_mem,
return done;
  }
  
-static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar)

-{
-   struct pci_dev *pdev = vdev->pdev;
-   int ret;
-   void __iomem *io;
-
-   if (vdev->barmap[bar])
-   return 0;
-
-   ret = pci_request_selected_regions(pdev, 1 << bar, "vfio");
-   if (ret)
-   return ret;
-
-   io = pci_iomap(pdev, bar, 0);
-   if (!io) {
-   pci_release_selected_regions(pdev, 1 << bar);
-   return -ENOMEM;
-   }
-
-   vdev->barmap[bar] = io;
-
-   return 0;
-}
-
  ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
  {
@@ -262,7 +238,7 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, 
char __user *buf,
}
x_end = end;
} else {
-   int ret = vfio_pci_setup_barmap(vdev, bar);
+   int ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret) {
done = ret;
goto out;
@@ -438,7 +414,7 @@ int vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, 
loff_t offset,
return -EINVAL;
  #endif
  
-	ret = vfio_pci_setup_barmap(vdev, bar);

+   ret = vfio_pci_core_setup_barmap(vdev, bar);
if (ret)
return ret;
  
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h

index 562e8754869d..67ac58e20e1d 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -127,6 +127,7 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char 
*buf);
  int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
  void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
  void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar);
  pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
pci_channel_state_t state);
  



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 09/16] vdpa/mlx5: Allow creation/deletion of any given mr struct

2023-09-26 Thread Dragos Tatulea via Virtualization
On Tue, 2023-09-26 at 12:44 +0800, Jason Wang wrote:
> On Tue, Sep 12, 2023 at 9:02 PM Dragos Tatulea  wrote:
> > 
> > This patch adapts the mr creation/deletion code to be able to work with
> > any given mr struct pointer. All the APIs are adapted to take an extra
> > parameter for the mr.
> > 
> > mlx5_vdpa_create/delete_mr doesn't need a ASID parameter anymore. The
> > check is done in the caller instead (mlx5_set_map).
> > 
> > This change is needed for a followup patch which will introduce an
> > additional mr for the vq descriptor data.
> > 
> > Signed-off-by: Dragos Tatulea 
> > ---
> 
> Thinking of this decoupling I think I have a question.
> 
> We advertise 2 address spaces and 2 groups. So we actually don't know
> for example which address spaces will be used by dvq.
> 
> And actually we allow the user space to do something like
> 
> set_group_asid(dvq_group, 0)
> set_map(0)
> set_group_asid(dvq_group, 1)
> set_map(1)
> 
> I wonder if the decoupling like this patch can work and why.
> 
This scenario could indeed work. Especially if you look at the 13'th patch [0]
where hw support is added. Are you wondering if this should work at all or if it
should be blocked?

> It looks to me the most easy way is to let each AS be backed by an MR.
> Then we don't even need to care about the dvq, cvq.
That's what this patch series dowes.

Thanks,
Dragos

[0]https://lore.kernel.org/virtualization/20230912130132.561193-14-dtatu...@nvidia.com/T/#u
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization