Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-24 Thread Christoph Hellwig
On Thu, Mar 25, 2021 at 06:12:37AM +, Tian, Kevin wrote:
> Agree. The vSVA series is still undergoing a refactor according to Jason's
> comment thus won't be ready in short term. It's better to let this one
> go in first.

Would be great to get a few more reviews while we're at it :)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 09/11] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-03-24 Thread Jason Wang


在 2021/3/24 下午4:55, Yongji Xie 写道:

On Wed, Mar 24, 2021 at 12:43 PM Jason Wang  wrote:


在 2021/3/15 下午1:37, Xie Yongji 写道:

This VDUSE driver enables implementing vDPA devices in userspace.
Both control path and data path of vDPA devices will be able to
be handled in userspace.

In the control path, the VDUSE driver will make use of message
mechnism to forward the config operation from vdpa bus driver
to userspace. Userspace can use read()/write() to receive/reply
those control messages.

In the data path, userspace can use mmap() to access vDPA device's
iova regions obtained through VDUSE_IOTLB_GET_ENTRY ioctl. Besides,
userspace can use ioctl() to inject interrupt and use the eventfd
mechanism to receive virtqueue kicks.

Signed-off-by: Xie Yongji 
---
   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
   drivers/vdpa/Kconfig   |   10 +
   drivers/vdpa/Makefile  |1 +
   drivers/vdpa/vdpa_user/Makefile|5 +
   drivers/vdpa/vdpa_user/vduse_dev.c | 1281 

   include/uapi/linux/vduse.h |  153 +++
   6 files changed, 1451 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/Makefile
   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
   create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index a4c75a28c839..71722e6f8f23 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
conflict!
   '|'   00-7F  linux/media.h
   0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
   0x89  00-06  arch/x86/include/asm/sockios.h
   0x89  0B-DF  linux/sockios.h
   0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a245809c99d0..77a1da522c21 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -25,6 +25,16 @@ config VDPA_SIM_NET
   help
 vDPA networking device simulator which loops TX traffic back to RX.

+config VDPA_USER
+ tristate "VDUSE (vDPA Device in Userspace) support"
+ depends on EVENTFD && MMU && HAS_DMA
+ select DMA_OPS
+ select VHOST_IOTLB
+ select IOMMU_IOVA
+ help
+   With VDUSE it is possible to emulate a vDPA Device
+   in a userspace program.
+
   config IFCVF
   tristate "Intel IFC VF vDPA driver"
   depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
   # SPDX-License-Identifier: GPL-2.0
   obj-$(CONFIG_VDPA) += vdpa.o
   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
   obj-$(CONFIG_IFCVF)+= ifcvf/
   obj-$(CONFIG_MLX5_VDPA) += mlx5/
   obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..07d0ae92d470
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1281 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_VERSION  "1.0"
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+
+struct vduse_virtqueue {
+ u16 index;
+ bool ready;
+ spinlock_t kick_lock;
+ spinlock_t irq_lock;
+ struct eventfd_ctx *kickfd;
+ struct vdpa_callback cb;
+ struct work_struct inject;
+};
+
+struct vduse_dev;
+
+struct vduse_vdpa {
+ struct vdpa_device vdpa;
+ struct vduse_dev *dev;
+};
+
+struct vduse_dev {
+ struct vduse_vdpa *vdev;
+ struct device dev;
+ struct cdev cdev;
+ struct vduse_virtqueue *vqs;
+ struct vduse_iova_domain *domain;
+ spinlock_t msg_lock;
+ atomic64_t msg_unique;
+ wait_queue_head_t waitq;
+ struct list_head send_list;
+ struct list_head recv_list;
+ struct list_head 

RE: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-24 Thread Tian, Kevin
> From: Auger Eric
> Sent: Monday, March 15, 2021 3:52 PM
> To: Christoph Hellwig 
> Cc: k...@vger.kernel.org; Will Deacon ; linuxppc-
> d...@lists.ozlabs.org; dri-de...@lists.freedesktop.org; Li Yang
> ; io...@lists.linux-foundation.org;
> 
> Hi Christoph,
> 
> On 3/14/21 4:58 PM, Christoph Hellwig wrote:
> > On Sun, Mar 14, 2021 at 11:44:52AM +0100, Auger Eric wrote:
> >> As mentionned by Robin, there are series planning to use
> >> DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu
> (ARM
> >> and Intel):
> >>
> >> [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
> >> patches 1, 2, 3
> >>
> >> Is the plan to introduce a new domain_get_nesting_info ops then?
> >
> > The plan as usual would be to add it the series adding that support.
> > Not sure what the merge plans are - if the series is ready to be
> > merged I could rebase on top of it, otherwise that series will need
> > to add the method.
> OK I think your series may be upstreamed first.
> 

Agree. The vSVA series is still undergoing a refactor according to Jason's
comment thus won't be ready in short term. It's better to let this one
go in first.

Thanks
Kevin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver

2021-03-24 Thread Jason Wang


在 2021/3/24 下午3:39, Yongji Xie 写道:

On Wed, Mar 24, 2021 at 11:54 AM Jason Wang  wrote:


在 2021/3/15 下午1:37, Xie Yongji 写道:

This implements an MMU-based IOMMU driver to support mapping
kernel dma buffer into userspace. The basic idea behind it is
treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set
up MMU mapping instead of IOMMU mapping for the DMA transfer so
that the userspace process is able to use its virtual address to
access the dma buffer in kernel.

And to avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly.

Signed-off-by: Xie Yongji 
---
   drivers/vdpa/vdpa_user/iova_domain.c | 535 
+++
   drivers/vdpa/vdpa_user/iova_domain.h |  75 +
   2 files changed, 610 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
   create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
new file mode 100644
index ..83de216b0e51
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -0,0 +1,535 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * MMU-based IOMMU implementation
+ *
+ * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights 
reserved.


2021 as well.


Sure.


+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+static int vduse_iotlb_add_range(struct vduse_iova_domain *domain,
+  u64 start, u64 last,
+  u64 addr, unsigned int perm,
+  struct file *file, u64 offset)
+{
+ struct vdpa_map_file *map_file;
+ int ret;
+
+ map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC);
+ if (!map_file)
+ return -ENOMEM;
+
+ map_file->file = get_file(file);
+ map_file->offset = offset;
+
+ ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last,
+ addr, perm, map_file);
+ if (ret) {
+ fput(map_file->file);
+ kfree(map_file);
+ return ret;
+ }
+ return 0;
+}
+
+static void vduse_iotlb_del_range(struct vduse_iova_domain *domain,
+   u64 start, u64 last)
+{
+ struct vdpa_map_file *map_file;
+ struct vhost_iotlb_map *map;
+
+ while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) {
+ map_file = (struct vdpa_map_file *)map->opaque;
+ fput(map_file->file);
+ kfree(map_file);
+ vhost_iotlb_map_free(domain->iotlb, map);
+ }
+}
+
+int vduse_domain_set_map(struct vduse_iova_domain *domain,
+  struct vhost_iotlb *iotlb)
+{
+ struct vdpa_map_file *map_file;
+ struct vhost_iotlb_map *map;
+ u64 start = 0ULL, last = ULLONG_MAX;
+ int ret;
+
+ spin_lock(&domain->iotlb_lock);
+ vduse_iotlb_del_range(domain, start, last);
+
+ for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+  map = vhost_iotlb_itree_next(map, start, last)) {
+ map_file = (struct vdpa_map_file *)map->opaque;
+ ret = vduse_iotlb_add_range(domain, map->start, map->last,
+ map->addr, map->perm,
+ map_file->file,
+ map_file->offset);
+ if (ret)
+ goto err;
+ }
+ spin_unlock(&domain->iotlb_lock);
+
+ return 0;
+err:
+ vduse_iotlb_del_range(domain, start, last);
+ spin_unlock(&domain->iotlb_lock);
+ return ret;
+}
+
+static void vduse_domain_map_bounce_page(struct vduse_iova_domain *domain,
+  u64 iova, u64 size, u64 paddr)
+{
+ struct vduse_bounce_map *map;
+ unsigned int index;
+ u64 last = iova + size - 1;
+
+ while (iova < last) {
+ map = &domain->bounce_maps[iova >> PAGE_SHIFT];
+ index = offset_in_page(iova) >> IOVA_ALLOC_ORDER;
+ map->orig_phys[index] = paddr;
+ paddr += IOVA_ALLOC_SIZE;
+ iova += IOVA_ALLOC_SIZE;
+ }
+}
+
+static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain,
+u64 iova, u64 size)
+{
+ struct vduse_bounce_map *map;
+ unsigned int index;
+ u64 last = iova + size - 1;
+
+ while (iova < last) {
+ map = &domain->bounce_maps[iova >> PAGE_SHIFT];
+ index = offset_in_page(iova) >> IOVA_ALLOC_ORDER;
+ map->orig_phys[index] = INVALID_PHYS_ADDR;
+ iova += IOVA_ALLOC_SIZE;
+ }
+}
+
+static void do_bounce(phys_addr_t orig, void *addr, size_t size,
+   enum dma_data_direction dir)
+{
+ unsigned long pfn = PFN_DOWN(orig);
+
+ if (PageHighMem(pfn_to_page(pfn))) {
+ unsigned in

Re: [PATCH 2/3] virtiofs: split requests that exceed virtqueue size

2021-03-24 Thread Connor Kuehl

On 3/24/21 10:30 AM, Miklos Szeredi wrote:

On Wed, Mar 24, 2021 at 4:09 PM Connor Kuehl  wrote:


On 3/18/21 10:17 AM, Miklos Szeredi wrote:

I removed the conditional compilation and renamed the limit.  Also made
virtio_fs_get_tree() bail out if it hit the WARN_ON().  Updated patch below.


Hi Miklos,

Has this patch been queued?


It's in my internal patch queue at the moment.   Will push to
fuse.git#for-next in a couple of days.


Cool! Thank you :-)

Connor

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] virtiofs: split requests that exceed virtqueue size

2021-03-24 Thread Connor Kuehl

On 3/18/21 10:17 AM, Miklos Szeredi wrote:

I removed the conditional compilation and renamed the limit.  Also made
virtio_fs_get_tree() bail out if it hit the WARN_ON().  Updated patch below.


Hi Miklos,

Has this patch been queued?

Connor


---
From: Connor Kuehl 
Subject: virtiofs: split requests that exceed virtqueue size
Date: Thu, 18 Mar 2021 08:52:22 -0500

If an incoming FUSE request can't fit on the virtqueue, the request is
placed onto a workqueue so a worker can try to resubmit it later where
there will (hopefully) be space for it next time.

This is fine for requests that aren't larger than a virtqueue's maximum
capacity.  However, if a request's size exceeds the maximum capacity of the
virtqueue (even if the virtqueue is empty), it will be doomed to a life of
being placed on the workqueue, removed, discovered it won't fit, and placed
on the workqueue yet again.

Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect
Descriptors) of the virtio spec:

   "A driver MUST NOT create a descriptor chain longer than the Queue
   Size of the device."

To fix this, limit the number of pages FUSE will use for an overall
request.  This way, each request can realistically fit on the virtqueue
when it is decomposed into a scattergather list and avoid violating section
2.6.5.3.1 of the virtio spec.

Signed-off-by: Connor Kuehl 
Signed-off-by: Miklos Szeredi 
---
  fs/fuse/fuse_i.h|3 +++
  fs/fuse/inode.c |3 ++-
  fs/fuse/virtio_fs.c |   19 +--
  3 files changed, 22 insertions(+), 3 deletions(-)

--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -555,6 +555,9 @@ struct fuse_conn {
/** Maxmum number of pages that can be used in a single request */
unsigned int max_pages;
  
+	/** Constrain ->max_pages to this value during feature negotiation */

+   unsigned int max_pages_limit;
+
/** Input queue */
struct fuse_iqueue iq;
  
--- a/fs/fuse/inode.c

+++ b/fs/fuse/inode.c
@@ -712,6 +712,7 @@ void fuse_conn_init(struct fuse_conn *fc
fc->pid_ns = get_pid_ns(task_active_pid_ns(current));
fc->user_ns = get_user_ns(user_ns);
fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
+   fc->max_pages_limit = FUSE_MAX_MAX_PAGES;
  
  	INIT_LIST_HEAD(&fc->mounts);

list_add(&fm->fc_entry, &fc->mounts);
@@ -1040,7 +1041,7 @@ static void process_init_reply(struct fu
fc->abort_err = 1;
if (arg->flags & FUSE_MAX_PAGES) {
fc->max_pages =
-   min_t(unsigned int, FUSE_MAX_MAX_PAGES,
+   min_t(unsigned int, fc->max_pages_limit,
max_t(unsigned int, arg->max_pages, 1));
}
if (IS_ENABLED(CONFIG_FUSE_DAX) &&
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -18,6 +18,12 @@
  #include 
  #include "fuse_i.h"
  
+/* Used to help calculate the FUSE connection's max_pages limit for a request's

+ * size. Parts of the struct fuse_req are sliced into scattergather lists in
+ * addition to the pages used, so this can help account for that overhead.
+ */
+#define FUSE_HEADER_OVERHEAD4
+
  /* List of virtio-fs device instances and a lock for the list. Also provides
   * mutual exclusion in device removal and mounting path
   */
@@ -1413,9 +1419,10 @@ static int virtio_fs_get_tree(struct fs_
  {
struct virtio_fs *fs;
struct super_block *sb;
-   struct fuse_conn *fc;
+   struct fuse_conn *fc = NULL;
struct fuse_mount *fm;
-   int err;
+   unsigned int virtqueue_size;
+   int err = -EIO;
  
  	/* This gets a reference on virtio_fs object. This ptr gets installed

 * in fc->iq->priv. Once fuse_conn is going away, it calls ->put()
@@ -1427,6 +1434,10 @@ static int virtio_fs_get_tree(struct fs_
return -EINVAL;
}
  
+	virtqueue_size = virtqueue_get_vring_size(fs->vqs[VQ_REQUEST].vq);

+   if (WARN_ON(virtqueue_size <= FUSE_HEADER_OVERHEAD))
+   goto out_err;
+
err = -ENOMEM;
fc = kzalloc(sizeof(struct fuse_conn), GFP_KERNEL);
if (!fc)
@@ -1442,6 +1453,10 @@ static int virtio_fs_get_tree(struct fs_
fc->delete_stale = true;
fc->auto_submounts = true;
  
+	/* Tell FUSE to split requests that exceed the virtqueue's size */

+   fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
+   virtqueue_size - FUSE_HEADER_OVERHEAD);
+
fsc->s_fs_info = fm;
sb = sget_fc(fsc, virtio_fs_test_super, set_anon_super_fc);
if (fsc->s_fs_info) {



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] fs/fuse/virtio_fs: Fix a potential memory allocation failure

2021-03-24 Thread Connor Kuehl

On 3/24/21 7:38 AM, zhouchuangao wrote:

Allocate memory for struct fuse_conn may fail, we should not jump to
out_err to kfree(fc).


Why not? If fc's allocation fails then it is NULL and calling kfree() on 
a NULL pointer is a noop[1].


Connor

[1] 
https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kfree#c.kfree


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v4] virtio-vsock: use C style defines for constants

2021-03-24 Thread Stefan Hajnoczi
On Tue, Mar 23, 2021 at 06:48:36PM +0300, Arseny Krasnov wrote:
> This:
> 1) Replaces enums with C style "defines", because
>use of enums is not documented, while "defines"
>are widely used in spec.
> 2) Adds defines for some constants.
> 
> Signed-off-by: Arseny Krasnov 
> ---
>  virtio-vsock.tex | 54 +---
>  1 file changed, 28 insertions(+), 26 deletions(-)

Awesome, thanks!

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization