Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
On Thu, Mar 30, 2023 at 12:52:39PM +, Liu, Yi L wrote: > > > "Use a negative value for no-iommu, where supported", or better, should > > > we define this explicitly as -1, or why not use a flag bit to specify > > > no-iommu? Maybe minsz is only through flags for the noiommu use case. > > > > I was happy enough for this to be defined as -1. We could give it a > > formal sounding constant too > > are you suggesting having something like "#define VFIO_NOIOMMU_FD -1"? Yeah something like that Jason
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
> From: Jason Gunthorpe > Sent: Thursday, March 30, 2023 7:22 AM > > On Wed, Mar 29, 2023 at 03:00:55PM -0600, Alex Williamson wrote: > > > > + * The user should provide a device cookie when calling this ioctl. The > > > + * cookie is carried only in event e.g. I/O fault reported to userspace > > > + * via iommufd. The user should use devid returned by this ioctl to mark > > > + * the target device in other ioctls (e.g. iommu hardware infomration > > > query > > > + * via iommufd, and etc.). > > > > AFAICT, the whole concept of this dev_cookie is a fantasy. It only > > exists in this series in these comments and the structure below. It's > > not even defined whether it needs to be unique within an iommufd > > context, and clearly nothing here validates that. There's not enough > > implementation for it to exist in this series. Maybe dev_cookie is > > appended to the end of the structure and indicated with a flag when it > > has some meaning. > > Yes, I've asked for this to be punted to the PRI series enough times > already, why does it keep coming back ?? yes, I promise to remove it in next version. > > > + * @argsz:user filled size of this data. > > > + * @flags:reserved for future extension. > > > + * @dev_cookie: a per device cookie provided by userspace. > > > + * @iommufd: iommufd to bind. a negative value means noiommu. > > > > "Use a negative value for no-iommu, where supported", or better, should > > we define this explicitly as -1, or why not use a flag bit to specify > > no-iommu? Maybe minsz is only through flags for the noiommu use case. > > I was happy enough for this to be defined as -1. We could give it a > formal sounding constant too are you suggesting having something like "#define VFIO_NOIOMMU_FD -1"? Regards, Yi Liu
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
> From: Jason Gunthorpe > Sent: Thursday, March 30, 2023 7:52 PM > > On Thu, Mar 30, 2023 at 07:09:31AM +, Liu, Yi L wrote: > > > > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, > > > > + struct vfio_device_bind_iommufd > > > > __user *arg) > > > > +{ > > > > + struct vfio_device *device = df->device; > > > > + struct vfio_device_bind_iommufd bind; > > > > + struct iommufd_ctx *iommufd = NULL; > > > > + unsigned long minsz; > > > > + int ret; > > > > + > > > > + static_assert(__same_type(arg->out_devid, bind.out_devid)); > > > > > > They're the same field in the same structure, how could they be > > > otherwise? > > > > @Jason, should I remove this check? > > Yes, it was from something that looked very different from this ok, I'll remove it here and next patch.
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
On Thu, Mar 30, 2023 at 07:09:31AM +, Liu, Yi L wrote: > > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, > > > + struct vfio_device_bind_iommufd __user *arg) > > > +{ > > > + struct vfio_device *device = df->device; > > > + struct vfio_device_bind_iommufd bind; > > > + struct iommufd_ctx *iommufd = NULL; > > > + unsigned long minsz; > > > + int ret; > > > + > > > + static_assert(__same_type(arg->out_devid, bind.out_devid)); > > > > They're the same field in the same structure, how could they be > > otherwise? > > @Jason, should I remove this check? Yes, it was from something that looked very different from this Jason
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
> From: Alex Williamson > Sent: Thursday, March 30, 2023 5:01 AM > > On Mon, 27 Mar 2023 02:40:44 -0700 > Yi Liu wrote: > > > This adds ioctl for userspace to bind device cdev fd to iommufd. > > > > VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA > > control provided by the iommufd. open_device > > op is called after bind_iommufd op. > > VFIO no iommu mode is indicated by passing > > a negative iommufd value. > > > > Reviewed-by: Kevin Tian > > Tested-by: Terrence Xu > > Signed-off-by: Yi Liu > > --- > > drivers/vfio/device_cdev.c | 153 + > > drivers/vfio/vfio.h| 13 > > drivers/vfio/vfio_main.c | 5 ++ > > include/uapi/linux/vfio.h | 37 + > > 4 files changed, 208 insertions(+) > > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > > index 1c640016a824..2b563bac50b9 100644 > > --- a/drivers/vfio/device_cdev.c > > +++ b/drivers/vfio/device_cdev.c > > @@ -3,6 +3,7 @@ > > * Copyright (c) 2023 Intel Corporation. > > */ > > #include > > +#include > > > > #include "vfio.h" > > > > @@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, > > struct > file *filep) > > return ret; > > } > > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df) > > +{ > > + spin_lock(>kvm_ref_lock); > > + if (df->kvm) > > + _vfio_device_get_kvm_safe(df->device, df->kvm); > > + spin_unlock(>kvm_ref_lock); > > +} > > + > > +void vfio_device_cdev_close(struct vfio_device_file *df) > > +{ > > + struct vfio_device *device = df->device; > > + > > + /* > > +* As df->access_granted writer is under dev_set->lock as well, > > +* so this read no need to use smp_load_acquire() to pair with > > Nit, "no need to use" -> "does not need to use" got it. > > > +* smp_store_release() in the caller of vfio_device_open(). > > +*/ > > + if (!df->access_granted) > > + return; > > + > > Isn't the lock we're acquiring below the one that we claim to have in > the comment above to make the non-smp_load_acquire() test safe? the comment may be not accurate enough. The the non-smp_load_acquire() and no lock test were according to the below two remarks in v4 and v5. https://lore.kernel.org/kvm/y%2fyrx7jluyeol...@nvidia.com/ https://lore.kernel.org/kvm/y%2f0cv1k0ynha+...@nvidia.com/ Perhaps the comment should be: "In the time of close, there is no contention with another one changing this flag. So test df->access_granted without lock nor smp_load_acquire() is ok." > > + mutex_lock(>dev_set->lock); > > + vfio_device_close(df); > > + vfio_device_put_kvm(device); > > + if (df->iommufd) > > + iommufd_ctx_put(df->iommufd); > > + mutex_unlock(>dev_set->lock); > > + vfio_device_unblock_group(device); > > +} > > + > > +static int vfio_device_cdev_enable_noiommu(struct vfio_device *device) > > +{ > > + if (!capable(CAP_SYS_RAWIO)) > > + return -EPERM; > > + > > + if (!device->noiommu) > > + return -EINVAL; > > + > > + return 0; > > +} > > This is testing, not enabling. ie. naming nit. how about probe_noiommu or test_noiommu? > > > + > > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd) > > +{ > > + struct fd f; > > + struct iommufd_ctx *iommufd; > > + > > + f = fdget(fd); > > + if (!f.file) > > + return ERR_PTR(-EBADF); > > + > > + iommufd = iommufd_ctx_from_file(f.file); > > + > > + fdput(f); > > + return iommufd; > > +} > > + > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, > > + struct vfio_device_bind_iommufd __user *arg) > > +{ > > + struct vfio_device *device = df->device; > > + struct vfio_device_bind_iommufd bind; > > + struct iommufd_ctx *iommufd = NULL; > > + unsigned long minsz; > > + int ret; > > + > > + static_assert(__same_type(arg->out_devid, bind.out_devid)); > > They're the same field in the same structure, how could they be > otherwise? @Jason, should I remove this check? > > + > > + minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid); > > + > > + if (copy_from_user(, arg, minsz)) > > + return -EFAULT; > > + > > + if (bind.argsz < minsz || bind.flags) > > + return -EINVAL; > > + > > + if (!device->ops->bind_iommufd) > > + return -ENODEV; > > This test seems beyond normal paranoia since we test in > __vfio_register_dev() yes. The whole c file depends on VFIO_DEVICE_CDEV which depends on IOMMUFD, and if IOMMUFD is enabled, __vfio_register_dev() already checks this callback. > > > + > > + /* BIND_IOMMUFD only allowed for cdev fds */ > > + if (df->group) > > + return -EINVAL; > > + > > + ret = vfio_device_block_group(device); > > + if (ret) > > + return ret; > > + > > + mutex_lock(>dev_set->lock);
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
On Wed, Mar 29, 2023 at 03:00:55PM -0600, Alex Williamson wrote: > > + * The user should provide a device cookie when calling this ioctl. The > > + * cookie is carried only in event e.g. I/O fault reported to userspace > > + * via iommufd. The user should use devid returned by this ioctl to mark > > + * the target device in other ioctls (e.g. iommu hardware infomration query > > + * via iommufd, and etc.). > > AFAICT, the whole concept of this dev_cookie is a fantasy. It only > exists in this series in these comments and the structure below. It's > not even defined whether it needs to be unique within an iommufd > context, and clearly nothing here validates that. There's not enough > implementation for it to exist in this series. Maybe dev_cookie is > appended to the end of the structure and indicated with a flag when it > has some meaning. Yes, I've asked for this to be punted to the PRI series enough times already, why does it keep coming back ?? > > + * @argsz: user filled size of this data. > > + * @flags: reserved for future extension. > > + * @dev_cookie: a per device cookie provided by userspace. > > + * @iommufd:iommufd to bind. a negative value means noiommu. > > "Use a negative value for no-iommu, where supported", or better, should > we define this explicitly as -1, or why not use a flag bit to specify > no-iommu? Maybe minsz is only through flags for the noiommu use case. I was happy enough for this to be defined as -1. We could give it a formal sounding constant too Jason
Re: [Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
On Mon, 27 Mar 2023 02:40:44 -0700 Yi Liu wrote: > This adds ioctl for userspace to bind device cdev fd to iommufd. > > VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA > control provided by the iommufd. open_device > op is called after bind_iommufd op. > VFIO no iommu mode is indicated by passing > a negative iommufd value. > > Reviewed-by: Kevin Tian > Tested-by: Terrence Xu > Signed-off-by: Yi Liu > --- > drivers/vfio/device_cdev.c | 153 + > drivers/vfio/vfio.h| 13 > drivers/vfio/vfio_main.c | 5 ++ > include/uapi/linux/vfio.h | 37 + > 4 files changed, 208 insertions(+) > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c > index 1c640016a824..2b563bac50b9 100644 > --- a/drivers/vfio/device_cdev.c > +++ b/drivers/vfio/device_cdev.c > @@ -3,6 +3,7 @@ > * Copyright (c) 2023 Intel Corporation. > */ > #include > +#include > > #include "vfio.h" > > @@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, > struct file *filep) > return ret; > } > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df) > +{ > + spin_lock(>kvm_ref_lock); > + if (df->kvm) > + _vfio_device_get_kvm_safe(df->device, df->kvm); > + spin_unlock(>kvm_ref_lock); > +} > + > +void vfio_device_cdev_close(struct vfio_device_file *df) > +{ > + struct vfio_device *device = df->device; > + > + /* > + * As df->access_granted writer is under dev_set->lock as well, > + * so this read no need to use smp_load_acquire() to pair with Nit, "no need to use" -> "does not need to use" > + * smp_store_release() in the caller of vfio_device_open(). > + */ > + if (!df->access_granted) > + return; > + Isn't the lock we're acquiring below the one that we claim to have in the comment above to make the non-smp_load_acquire() test safe? > + mutex_lock(>dev_set->lock); > + vfio_device_close(df); > + vfio_device_put_kvm(device); > + if (df->iommufd) > + iommufd_ctx_put(df->iommufd); > + mutex_unlock(>dev_set->lock); > + vfio_device_unblock_group(device); > +} > + > +static int vfio_device_cdev_enable_noiommu(struct vfio_device *device) > +{ > + if (!capable(CAP_SYS_RAWIO)) > + return -EPERM; > + > + if (!device->noiommu) > + return -EINVAL; > + > + return 0; > +} This is testing, not enabling. ie. naming nit. > + > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd) > +{ > + struct fd f; > + struct iommufd_ctx *iommufd; > + > + f = fdget(fd); > + if (!f.file) > + return ERR_PTR(-EBADF); > + > + iommufd = iommufd_ctx_from_file(f.file); > + > + fdput(f); > + return iommufd; > +} > + > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, > + struct vfio_device_bind_iommufd __user *arg) > +{ > + struct vfio_device *device = df->device; > + struct vfio_device_bind_iommufd bind; > + struct iommufd_ctx *iommufd = NULL; > + unsigned long minsz; > + int ret; > + > + static_assert(__same_type(arg->out_devid, bind.out_devid)); They're the same field in the same structure, how could they be otherwise? > + > + minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid); > + > + if (copy_from_user(, arg, minsz)) > + return -EFAULT; > + > + if (bind.argsz < minsz || bind.flags) > + return -EINVAL; > + > + if (!device->ops->bind_iommufd) > + return -ENODEV; This test seems beyond normal paranoia since we test in __vfio_register_dev() > + > + /* BIND_IOMMUFD only allowed for cdev fds */ > + if (df->group) > + return -EINVAL; > + > + ret = vfio_device_block_group(device); > + if (ret) > + return ret; > + > + mutex_lock(>dev_set->lock); > + /* one device cannot be bound twice */ > + if (df->access_granted) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + /* iommufd < 0 means noiommu mode */ > + if (bind.iommufd < 0) { > + ret = vfio_device_cdev_enable_noiommu(device); > + if (ret) > + goto out_unlock; > + } else { > + iommufd = vfio_get_iommufd_from_fd(bind.iommufd); > + if (IS_ERR(iommufd)) { > + ret = PTR_ERR(iommufd); > + goto out_unlock; > + } > + } > + > + /* > + * Before the device open, get the KVM pointer currently > + * associated with the device file (if there is) and obtain > + * a reference. This reference is held until device closed. > + * Save the pointer in the device for use by drivers. > + */ > +
[Intel-gfx] [PATCH v8 21/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD
This adds ioctl for userspace to bind device cdev fd to iommufd. VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA control provided by the iommufd. open_device op is called after bind_iommufd op. VFIO no iommu mode is indicated by passing a negative iommufd value. Reviewed-by: Kevin Tian Tested-by: Terrence Xu Signed-off-by: Yi Liu --- drivers/vfio/device_cdev.c | 153 + drivers/vfio/vfio.h| 13 drivers/vfio/vfio_main.c | 5 ++ include/uapi/linux/vfio.h | 37 + 4 files changed, 208 insertions(+) diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c index 1c640016a824..2b563bac50b9 100644 --- a/drivers/vfio/device_cdev.c +++ b/drivers/vfio/device_cdev.c @@ -3,6 +3,7 @@ * Copyright (c) 2023 Intel Corporation. */ #include +#include #include "vfio.h" @@ -44,6 +45,158 @@ int vfio_device_fops_cdev_open(struct inode *inode, struct file *filep) return ret; } +static void vfio_device_get_kvm_safe(struct vfio_device_file *df) +{ + spin_lock(>kvm_ref_lock); + if (df->kvm) + _vfio_device_get_kvm_safe(df->device, df->kvm); + spin_unlock(>kvm_ref_lock); +} + +void vfio_device_cdev_close(struct vfio_device_file *df) +{ + struct vfio_device *device = df->device; + + /* +* As df->access_granted writer is under dev_set->lock as well, +* so this read no need to use smp_load_acquire() to pair with +* smp_store_release() in the caller of vfio_device_open(). +*/ + if (!df->access_granted) + return; + + mutex_lock(>dev_set->lock); + vfio_device_close(df); + vfio_device_put_kvm(device); + if (df->iommufd) + iommufd_ctx_put(df->iommufd); + mutex_unlock(>dev_set->lock); + vfio_device_unblock_group(device); +} + +static int vfio_device_cdev_enable_noiommu(struct vfio_device *device) +{ + if (!capable(CAP_SYS_RAWIO)) + return -EPERM; + + if (!device->noiommu) + return -EINVAL; + + return 0; +} + +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd) +{ + struct fd f; + struct iommufd_ctx *iommufd; + + f = fdget(fd); + if (!f.file) + return ERR_PTR(-EBADF); + + iommufd = iommufd_ctx_from_file(f.file); + + fdput(f); + return iommufd; +} + +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df, + struct vfio_device_bind_iommufd __user *arg) +{ + struct vfio_device *device = df->device; + struct vfio_device_bind_iommufd bind; + struct iommufd_ctx *iommufd = NULL; + unsigned long minsz; + int ret; + + static_assert(__same_type(arg->out_devid, bind.out_devid)); + + minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid); + + if (copy_from_user(, arg, minsz)) + return -EFAULT; + + if (bind.argsz < minsz || bind.flags) + return -EINVAL; + + if (!device->ops->bind_iommufd) + return -ENODEV; + + /* BIND_IOMMUFD only allowed for cdev fds */ + if (df->group) + return -EINVAL; + + ret = vfio_device_block_group(device); + if (ret) + return ret; + + mutex_lock(>dev_set->lock); + /* one device cannot be bound twice */ + if (df->access_granted) { + ret = -EINVAL; + goto out_unlock; + } + + /* iommufd < 0 means noiommu mode */ + if (bind.iommufd < 0) { + ret = vfio_device_cdev_enable_noiommu(device); + if (ret) + goto out_unlock; + } else { + iommufd = vfio_get_iommufd_from_fd(bind.iommufd); + if (IS_ERR(iommufd)) { + ret = PTR_ERR(iommufd); + goto out_unlock; + } + } + + /* +* Before the device open, get the KVM pointer currently +* associated with the device file (if there is) and obtain +* a reference. This reference is held until device closed. +* Save the pointer in the device for use by drivers. +*/ + vfio_device_get_kvm_safe(df); + + df->iommufd = iommufd; + ret = vfio_device_open(df); + if (ret) + goto out_put_kvm; + + if (df->iommufd) + bind.out_devid = df->devid; + + ret = copy_to_user(>out_devid, _devid, + sizeof(bind.out_devid)) ? -EFAULT : 0; + if (ret) + goto out_close_device; + + if (bind.iommufd < 0) + dev_warn(device->dev, "device is bound to vfio-noiommu by user " +"(%s:%d)\n", current->comm, task_pid_nr(current)); + + /* +