Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING
On Thu, Mar 25, 2021 at 06:12:37AM +, Tian, Kevin wrote: > Agree. The vSVA series is still undergoing a refactor according to Jason's > comment thus won't be ready in short term. It's better to let this one > go in first. Would be great to get a few more reviews while we're at it :) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 09/11] vduse: Introduce VDUSE - vDPA Device in Userspace
在 2021/3/24 下午4:55, Yongji Xie 写道: On Wed, Mar 24, 2021 at 12:43 PM Jason Wang wrote: 在 2021/3/15 下午1:37, Xie Yongji 写道: This VDUSE driver enables implementing vDPA devices in userspace. Both control path and data path of vDPA devices will be able to be handled in userspace. In the control path, the VDUSE driver will make use of message mechnism to forward the config operation from vdpa bus driver to userspace. Userspace can use read()/write() to receive/reply those control messages. In the data path, userspace can use mmap() to access vDPA device's iova regions obtained through VDUSE_IOTLB_GET_ENTRY ioctl. Besides, userspace can use ioctl() to inject interrupt and use the eventfd mechanism to receive virtqueue kicks. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1281 include/uapi/linux/vduse.h | 153 +++ 6 files changed, 1451 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index a4c75a28c839..71722e6f8f23 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq#Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a245809c99d0..77a1da522c21 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -25,6 +25,16 @@ config VDPA_SIM_NET help vDPA networking device simulator which loops TX traffic back to RX. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA)+= virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index ..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index ..07d0ae92d470 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1281 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_VERSION "1.0" +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) + +struct vduse_virtqueue { + u16 index; + bool ready; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; + struct vdpa_callback cb; + struct work_struct inject; +}; + +struct vduse_dev; + +struct vduse_vdpa { + struct vdpa_device vdpa; + struct vduse_dev *dev; +}; + +struct vduse_dev { + struct vduse_vdpa *vdev; + struct device dev; + struct cdev cdev; + struct vduse_virtqueue *vqs; + struct vduse_iova_domain *domain; + spinlock_t msg_lock; + atomic64_t msg_unique; + wait_queue_head_t waitq; + struct list_head send_list; + struct list_head recv_list; + struct list_head
RE: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING
> From: Auger Eric > Sent: Monday, March 15, 2021 3:52 PM > To: Christoph Hellwig > Cc: k...@vger.kernel.org; Will Deacon ; linuxppc- > d...@lists.ozlabs.org; dri-de...@lists.freedesktop.org; Li Yang > ; io...@lists.linux-foundation.org; > > Hi Christoph, > > On 3/14/21 4:58 PM, Christoph Hellwig wrote: > > On Sun, Mar 14, 2021 at 11:44:52AM +0100, Auger Eric wrote: > >> As mentionned by Robin, there are series planning to use > >> DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu > (ARM > >> and Intel): > >> > >> [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs > >> patches 1, 2, 3 > >> > >> Is the plan to introduce a new domain_get_nesting_info ops then? > > > > The plan as usual would be to add it the series adding that support. > > Not sure what the merge plans are - if the series is ready to be > > merged I could rebase on top of it, otherwise that series will need > > to add the method. > OK I think your series may be upstreamed first. > Agree. The vSVA series is still undergoing a refactor according to Jason's comment thus won't be ready in short term. It's better to let this one go in first. Thanks Kevin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 08/11] vduse: Implement an MMU-based IOMMU driver
在 2021/3/24 下午3:39, Yongji Xie 写道: On Wed, Mar 24, 2021 at 11:54 AM Jason Wang wrote: 在 2021/3/15 下午1:37, Xie Yongji 写道: This implements an MMU-based IOMMU driver to support mapping kernel dma buffer into userspace. The basic idea behind it is treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set up MMU mapping instead of IOMMU mapping for the DMA transfer so that the userspace process is able to use its virtual address to access the dma buffer in kernel. And to avoid security issue, a bounce-buffering mechanism is introduced to prevent userspace accessing the original buffer directly. Signed-off-by: Xie Yongji --- drivers/vdpa/vdpa_user/iova_domain.c | 535 +++ drivers/vdpa/vdpa_user/iova_domain.h | 75 + 2 files changed, 610 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c new file mode 100644 index ..83de216b0e51 --- /dev/null +++ b/drivers/vdpa/vdpa_user/iova_domain.c @@ -0,0 +1,535 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * MMU-based IOMMU implementation + * + * Copyright (C) 2020 Bytedance Inc. and/or its affiliates. All rights reserved. 2021 as well. Sure. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +static int vduse_iotlb_add_range(struct vduse_iova_domain *domain, + u64 start, u64 last, + u64 addr, unsigned int perm, + struct file *file, u64 offset) +{ + struct vdpa_map_file *map_file; + int ret; + + map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC); + if (!map_file) + return -ENOMEM; + + map_file->file = get_file(file); + map_file->offset = offset; + + ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last, + addr, perm, map_file); + if (ret) { + fput(map_file->file); + kfree(map_file); + return ret; + } + return 0; +} + +static void vduse_iotlb_del_range(struct vduse_iova_domain *domain, + u64 start, u64 last) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + + while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) { + map_file = (struct vdpa_map_file *)map->opaque; + fput(map_file->file); + kfree(map_file); + vhost_iotlb_map_free(domain->iotlb, map); + } +} + +int vduse_domain_set_map(struct vduse_iova_domain *domain, + struct vhost_iotlb *iotlb) +{ + struct vdpa_map_file *map_file; + struct vhost_iotlb_map *map; + u64 start = 0ULL, last = ULLONG_MAX; + int ret; + + spin_lock(&domain->iotlb_lock); + vduse_iotlb_del_range(domain, start, last); + + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; + map = vhost_iotlb_itree_next(map, start, last)) { + map_file = (struct vdpa_map_file *)map->opaque; + ret = vduse_iotlb_add_range(domain, map->start, map->last, + map->addr, map->perm, + map_file->file, + map_file->offset); + if (ret) + goto err; + } + spin_unlock(&domain->iotlb_lock); + + return 0; +err: + vduse_iotlb_del_range(domain, start, last); + spin_unlock(&domain->iotlb_lock); + return ret; +} + +static void vduse_domain_map_bounce_page(struct vduse_iova_domain *domain, + u64 iova, u64 size, u64 paddr) +{ + struct vduse_bounce_map *map; + unsigned int index; + u64 last = iova + size - 1; + + while (iova < last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + index = offset_in_page(iova) >> IOVA_ALLOC_ORDER; + map->orig_phys[index] = paddr; + paddr += IOVA_ALLOC_SIZE; + iova += IOVA_ALLOC_SIZE; + } +} + +static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain, +u64 iova, u64 size) +{ + struct vduse_bounce_map *map; + unsigned int index; + u64 last = iova + size - 1; + + while (iova < last) { + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; + index = offset_in_page(iova) >> IOVA_ALLOC_ORDER; + map->orig_phys[index] = INVALID_PHYS_ADDR; + iova += IOVA_ALLOC_SIZE; + } +} + +static void do_bounce(phys_addr_t orig, void *addr, size_t size, + enum dma_data_direction dir) +{ + unsigned long pfn = PFN_DOWN(orig); + + if (PageHighMem(pfn_to_page(pfn))) { + unsigned in
Re: [PATCH 2/3] virtiofs: split requests that exceed virtqueue size
On 3/24/21 10:30 AM, Miklos Szeredi wrote: On Wed, Mar 24, 2021 at 4:09 PM Connor Kuehl wrote: On 3/18/21 10:17 AM, Miklos Szeredi wrote: I removed the conditional compilation and renamed the limit. Also made virtio_fs_get_tree() bail out if it hit the WARN_ON(). Updated patch below. Hi Miklos, Has this patch been queued? It's in my internal patch queue at the moment. Will push to fuse.git#for-next in a couple of days. Cool! Thank you :-) Connor ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 2/3] virtiofs: split requests that exceed virtqueue size
On 3/18/21 10:17 AM, Miklos Szeredi wrote: I removed the conditional compilation and renamed the limit. Also made virtio_fs_get_tree() bail out if it hit the WARN_ON(). Updated patch below. Hi Miklos, Has this patch been queued? Connor --- From: Connor Kuehl Subject: virtiofs: split requests that exceed virtqueue size Date: Thu, 18 Mar 2021 08:52:22 -0500 If an incoming FUSE request can't fit on the virtqueue, the request is placed onto a workqueue so a worker can try to resubmit it later where there will (hopefully) be space for it next time. This is fine for requests that aren't larger than a virtqueue's maximum capacity. However, if a request's size exceeds the maximum capacity of the virtqueue (even if the virtqueue is empty), it will be doomed to a life of being placed on the workqueue, removed, discovered it won't fit, and placed on the workqueue yet again. Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect Descriptors) of the virtio spec: "A driver MUST NOT create a descriptor chain longer than the Queue Size of the device." To fix this, limit the number of pages FUSE will use for an overall request. This way, each request can realistically fit on the virtqueue when it is decomposed into a scattergather list and avoid violating section 2.6.5.3.1 of the virtio spec. Signed-off-by: Connor Kuehl Signed-off-by: Miklos Szeredi --- fs/fuse/fuse_i.h|3 +++ fs/fuse/inode.c |3 ++- fs/fuse/virtio_fs.c | 19 +-- 3 files changed, 22 insertions(+), 3 deletions(-) --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -555,6 +555,9 @@ struct fuse_conn { /** Maxmum number of pages that can be used in a single request */ unsigned int max_pages; + /** Constrain ->max_pages to this value during feature negotiation */ + unsigned int max_pages_limit; + /** Input queue */ struct fuse_iqueue iq; --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -712,6 +712,7 @@ void fuse_conn_init(struct fuse_conn *fc fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; + fc->max_pages_limit = FUSE_MAX_MAX_PAGES; INIT_LIST_HEAD(&fc->mounts); list_add(&fm->fc_entry, &fc->mounts); @@ -1040,7 +1041,7 @@ static void process_init_reply(struct fu fc->abort_err = 1; if (arg->flags & FUSE_MAX_PAGES) { fc->max_pages = - min_t(unsigned int, FUSE_MAX_MAX_PAGES, + min_t(unsigned int, fc->max_pages_limit, max_t(unsigned int, arg->max_pages, 1)); } if (IS_ENABLED(CONFIG_FUSE_DAX) && --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -18,6 +18,12 @@ #include #include "fuse_i.h" +/* Used to help calculate the FUSE connection's max_pages limit for a request's + * size. Parts of the struct fuse_req are sliced into scattergather lists in + * addition to the pages used, so this can help account for that overhead. + */ +#define FUSE_HEADER_OVERHEAD4 + /* List of virtio-fs device instances and a lock for the list. Also provides * mutual exclusion in device removal and mounting path */ @@ -1413,9 +1419,10 @@ static int virtio_fs_get_tree(struct fs_ { struct virtio_fs *fs; struct super_block *sb; - struct fuse_conn *fc; + struct fuse_conn *fc = NULL; struct fuse_mount *fm; - int err; + unsigned int virtqueue_size; + int err = -EIO; /* This gets a reference on virtio_fs object. This ptr gets installed * in fc->iq->priv. Once fuse_conn is going away, it calls ->put() @@ -1427,6 +1434,10 @@ static int virtio_fs_get_tree(struct fs_ return -EINVAL; } + virtqueue_size = virtqueue_get_vring_size(fs->vqs[VQ_REQUEST].vq); + if (WARN_ON(virtqueue_size <= FUSE_HEADER_OVERHEAD)) + goto out_err; + err = -ENOMEM; fc = kzalloc(sizeof(struct fuse_conn), GFP_KERNEL); if (!fc) @@ -1442,6 +1453,10 @@ static int virtio_fs_get_tree(struct fs_ fc->delete_stale = true; fc->auto_submounts = true; + /* Tell FUSE to split requests that exceed the virtqueue's size */ + fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, + virtqueue_size - FUSE_HEADER_OVERHEAD); + fsc->s_fs_info = fm; sb = sget_fc(fsc, virtio_fs_test_super, set_anon_super_fc); if (fsc->s_fs_info) { ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] fs/fuse/virtio_fs: Fix a potential memory allocation failure
On 3/24/21 7:38 AM, zhouchuangao wrote: Allocate memory for struct fuse_conn may fail, we should not jump to out_err to kfree(fc). Why not? If fc's allocation fails then it is NULL and calling kfree() on a NULL pointer is a noop[1]. Connor [1] https://www.kernel.org/doc/html/latest/core-api/mm-api.html?highlight=kfree#c.kfree ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC PATCH v4] virtio-vsock: use C style defines for constants
On Tue, Mar 23, 2021 at 06:48:36PM +0300, Arseny Krasnov wrote: > This: > 1) Replaces enums with C style "defines", because >use of enums is not documented, while "defines" >are widely used in spec. > 2) Adds defines for some constants. > > Signed-off-by: Arseny Krasnov > --- > virtio-vsock.tex | 54 +--- > 1 file changed, 28 insertions(+), 26 deletions(-) Awesome, thanks! Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization