RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
Hi Jason, > From: Jason Gunthorpe > Sent: Wednesday, November 3, 2021 9:26 PM > > On Tue, Nov 02, 2021 at 09:53:29AM +, Liu, Yi L wrote: > > > > vfio_uninit_group_dev(&mdev_state->vdev); > > > kfree(mdev_state->pages); > > > kfree(mdev_state->vconfig); > > > kfree(mdev_state); > > > > > > pages/vconfig would logically be in a release function > > > > I see. So the criteria is: the pointer fields pointing to a memory buffer > > allocated by the device driver should be logically be free in a release > > function. right? > > Often yes, that is usually a good idea > > >I can see there are such fields in struct vfio_pci_core_device > > and mdev_state (both mbochs and mdpy). So we may go with your option > #2. > > Is it? otherwise, needs to add release callback for all the related drivers. > > Yes, that is the approx trade off > > > > On the other hand ccw needs to rcu free the vfio_device, so that would > > > have to be global overhead with this api design. > > > > not quite get. why ccw is special here? could you elaborate? > > I added a rcu usage to it in order to fix a race > > +static inline struct vfio_ccw_private *vfio_ccw_get_priv(struct subchannel > *sch) > +{ > + struct vfio_ccw_private *private; > + > + rcu_read_lock(); > + private = dev_get_drvdata(&sch->dev); > + if (private && !vfio_device_try_get(&private->vdev)) > + private = NULL; > + rcu_read_unlock(); > + return private; > +} you are right. After checking your ccw patch, the private free triggered by vfio_ccw_free_private() should use kfree_rcu(). So it is not quite same with other vfio_device users which only need kfree() to free the vfio_device. So how can I address the difference when moving the vfio_device alloc/free into vfio core? any suggestion? @@ -164,14 +173,14 @@ static void vfio_ccw_free_private(struct vfio_ccw_private *private) kmem_cache_free(vfio_ccw_io_region, private->io_region); kfree(private->cp.guest_cp); mutex_destroy(&private->io_mutex); - kfree(private); + vfio_uninit_group_dev(&private->vdev); + kfree_rcu(private, rcu); } https://lore.kernel.org/kvm/10-v3-57c1502c62fd+2190-ccw_mdev_...@nvidia.com/ Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Tue, Nov 02, 2021 at 09:53:29AM +, Liu, Yi L wrote: > > vfio_uninit_group_dev(&mdev_state->vdev); > > kfree(mdev_state->pages); > > kfree(mdev_state->vconfig); > > kfree(mdev_state); > > > > pages/vconfig would logically be in a release function > > I see. So the criteria is: the pointer fields pointing to a memory buffer > allocated by the device driver should be logically be free in a release > function. right? Often yes, that is usually a good idea >I can see there are such fields in struct vfio_pci_core_device > and mdev_state (both mbochs and mdpy). So we may go with your option #2. > Is it? otherwise, needs to add release callback for all the related drivers. Yes, that is the approx trade off > > On the other hand ccw needs to rcu free the vfio_device, so that would > > have to be global overhead with this api design. > > not quite get. why ccw is special here? could you elaborate? I added a rcu usage to it in order to fix a race +static inline struct vfio_ccw_private *vfio_ccw_get_priv(struct subchannel *sch) +{ + struct vfio_ccw_private *private; + + rcu_read_lock(); + private = dev_get_drvdata(&sch->dev); + if (private && !vfio_device_try_get(&private->vdev)) + private = NULL; + rcu_read_unlock(); + return private; +} Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Monday, November 1, 2021 8:50 PM > > On Fri, Oct 29, 2021 at 09:47:27AM +, Liu, Yi L wrote: > > Hi Jason, > > > > > From: Jason Gunthorpe > > > Sent: Monday, October 25, 2021 8:53 PM > > > > > > On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote: > > > >thanks for the guiding. will also refer to your vfio_group_cdev > > > > series. > > > > > > > >Need to double confirm here. Not quite following on the kfree. Is > > > >this kfree to free the vfio_device structure? But now the > > > >vfio_device pointer is provided by callers (e.g. vfio-pci). Do > > > >you want to let vfio core allocate the vfio_device struct and > > > >return the pointer to callers? > > > > > > There are several common patterns for this problem, two that would be > > > suitable: > > > > > > - Require each driver to provide a release op inside vfio_device_ops > > > that does the kfree. Have the core provide a struct device release > > > op that calls this one. Keep the kalloc/kfree in the drivers > > > > this way sees to suit the existing vfio registration manner listed > > below. right? > > Not really, most drivers are just doing kfree. The need for release > comes if the drivers are doing more stuff. > > > But device drivers needs to do the kfree in the > > newly added release op instead of doing it on their own (e.g. > > doing kfree in remove). > > Yes > > > > struct ib_device *_ib_alloc_device(size_t size); > > > #define ib_alloc_device(drv_struct, member) > > > \ > > > container_of(_ib_alloc_device(sizeof(struct drv_struct) + > > > \ > > > BUILD_BUG_ON_ZERO(offsetof( > > > \ > > > struct drv_struct, > > > member))),\ > > > struct drv_struct, member) > > > > > > > thanks for the example. If this way, still requires driver to provide > > a release op inside vfio_device_ops. right? > > No, it would optional. It would contain the stuff the driver is doing > before kfree() > > For instance mdev looks like the only driver that cares: > > vfio_uninit_group_dev(&mdev_state->vdev); > kfree(mdev_state->pages); > kfree(mdev_state->vconfig); > kfree(mdev_state); > > pages/vconfig would logically be in a release function I see. So the criteria is: the pointer fields pointing to a memory buffer allocated by the device driver should be logically be free in a release function. right? I can see there are such fields in struct vfio_pci_core_device and mdev_state (both mbochs and mdpy). So we may go with your option #2. Is it? otherwise, needs to add release callback for all the related drivers. struct vfio_pci_core_device { struct vifo_device vdev; ... u8 *pci_config_map; u8 *vconfig; ... }; struct mdev_state { struct vifo_device vdev; ... u8 *vconfig; struct page **pages; ... }; > On the other hand ccw needs to rcu free the vfio_device, so that would > have to be global overhead with this api design. not quite get. why ccw is special here? could you elaborate? Thanks, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Fri, Oct 29, 2021 at 09:47:27AM +, Liu, Yi L wrote: > Hi Jason, > > > From: Jason Gunthorpe > > Sent: Monday, October 25, 2021 8:53 PM > > > > On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote: > > >thanks for the guiding. will also refer to your vfio_group_cdev series. > > > > > >Need to double confirm here. Not quite following on the kfree. Is > > >this kfree to free the vfio_device structure? But now the > > >vfio_device pointer is provided by callers (e.g. vfio-pci). Do > > >you want to let vfio core allocate the vfio_device struct and > > >return the pointer to callers? > > > > There are several common patterns for this problem, two that would be > > suitable: > > > > - Require each driver to provide a release op inside vfio_device_ops > > that does the kfree. Have the core provide a struct device release > > op that calls this one. Keep the kalloc/kfree in the drivers > > this way sees to suit the existing vfio registration manner listed > below. right? Not really, most drivers are just doing kfree. The need for release comes if the drivers are doing more stuff. > But device drivers needs to do the kfree in the > newly added release op instead of doing it on their own (e.g. > doing kfree in remove). Yes > > struct ib_device *_ib_alloc_device(size_t size); > > #define ib_alloc_device(drv_struct, member) > >\ > > container_of(_ib_alloc_device(sizeof(struct drv_struct) + > >\ > > BUILD_BUG_ON_ZERO(offsetof( > >\ > > struct drv_struct, member))), > >\ > > struct drv_struct, member) > > > > thanks for the example. If this way, still requires driver to provide > a release op inside vfio_device_ops. right? No, it would optional. It would contain the stuff the driver is doing before kfree() For instance mdev looks like the only driver that cares: vfio_uninit_group_dev(&mdev_state->vdev); kfree(mdev_state->pages); kfree(mdev_state->vconfig); kfree(mdev_state); pages/vconfig would logically be in a release function On the other hand ccw needs to rcu free the vfio_device, so that would have to be global overhead with this api design. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
Hi Jason, > From: Jason Gunthorpe > Sent: Monday, October 25, 2021 8:53 PM > > On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote: > >thanks for the guiding. will also refer to your vfio_group_cdev series. > > > >Need to double confirm here. Not quite following on the kfree. Is > >this kfree to free the vfio_device structure? But now the > >vfio_device pointer is provided by callers (e.g. vfio-pci). Do > >you want to let vfio core allocate the vfio_device struct and > >return the pointer to callers? > > There are several common patterns for this problem, two that would be > suitable: > > - Require each driver to provide a release op inside vfio_device_ops > that does the kfree. Have the core provide a struct device release > op that calls this one. Keep the kalloc/kfree in the drivers this way sees to suit the existing vfio registration manner listed below. right? But device drivers needs to do the kfree in the newly added release op instead of doing it on their own (e.g. doing kfree in remove). vfio_init_group_dev() vfio_register_group_dev() vfio_unregister_group_dev() vfio_uninit_group_dev() > - Move the kalloc into the core and have the core provide the kfree > with an optional release callback for anydriver specific cleanup > > This requires some macro to make the memory layout work. RDMA has > a version of this: > > struct ib_device *_ib_alloc_device(size_t size); > #define ib_alloc_device(drv_struct, member) > \ > container_of(_ib_alloc_device(sizeof(struct drv_struct) + > \ > BUILD_BUG_ON_ZERO(offsetof( > \ > struct drv_struct, member))), > \ > struct drv_struct, member) > thanks for the example. If this way, still requires driver to provide a release op inside vfio_device_ops. right? > In part the choice is how many drivers require a release callback > anyhow, if they all do then the first is easier to understand. If only > few or none do then the latter is less code in drivers, and never > exposes the driver to the tricky transition from alloc to refcount > cleanup. I'm not quite sure. But per my understanding, since the vfio_device is expected to be embedded in the device state struct (e.g. vfio_pci_core_device), I guess most of the drivers will require callback to do driver specific cleanup. Seems like option #1 may make sense? Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote: >thanks for the guiding. will also refer to your vfio_group_cdev series. > >Need to double confirm here. Not quite following on the kfree. Is >this kfree to free the vfio_device structure? But now the >vfio_device pointer is provided by callers (e.g. vfio-pci). Do >you want to let vfio core allocate the vfio_device struct and >return the pointer to callers? There are several common patterns for this problem, two that would be suitable: - Require each driver to provide a release op inside vfio_device_ops that does the kfree. Have the core provide a struct device release op that calls this one. Keep the kalloc/kfree in the drivers - Move the kalloc into the core and have the core provide the kfree with an optional release callback for anydriver specific cleanup This requires some macro to make the memory layout work. RDMA has a version of this: struct ib_device *_ib_alloc_device(size_t size); #define ib_alloc_device(drv_struct, member)\ container_of(_ib_alloc_device(sizeof(struct drv_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ struct drv_struct, member))),\ struct drv_struct, member) In part the choice is how many drivers require a release callback anyhow, if they all do then the first is easier to understand. If only few or none do then the latter is less code in drivers, and never exposes the driver to the tricky transition from alloc to refcount cleanup. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe mailto:j...@nvidia.com>> > Sent: Tuesday, September 21, 2021 11:57 PM > > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L mailto:yi.l@intel.com>> > > --- > > drivers/vfio/vfio.c | 228 +++--- > - > > include/linux/vfio.h | 2 + > > 2 files changed, 213 insertions(+), 17 deletions(-) > > > +static int vfio_init_device_class(void) > > +{ > > + int ret; > > + > > + mutex_init(&vfio.device_lock); > > + idr_init(&vfio.device_idr); > > + > > + /* /dev/vfio/devices/$DEVICE */ > > + vfio.device_class = class_create(THIS_MODULE, "vfio-device"); > > + if (IS_ERR(vfio.device_class)) > > + return PTR_ERR(vfio.device_class); > > + > > + vfio.device_class->devnode = vfio_device_devnode; > > + > > + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, > "vfio-device"); > > + if (ret) > > + goto err_alloc_chrdev; > > + > > + cdev_init(&vfio.device_cdev, &vfio_device_fops); > > + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + > 1); > > + if (ret) > > + goto err_cdev_add; > > Huh? This is not how cdevs are used. This patch needs rewriting. > > The struct vfio_device should gain a 'struct device' and 'struct cdev' > as non-pointer members > > vfio register path should end up doing cdev_device_add() for each > vfio_device > > vfio_unregister path should do cdev_device_del() > > No idr should be needed, an ida is used to allocate minor numbers > > The struct device release function should trigger a kfree which > requires some reworking of the callers thanks for the guiding. will also refer to your vfio_group_cdev series. Need to double confirm here. Not quite following on the kfree. Is this kfree to free the vfio_device structure? But now the vfio_device pointer is provided by callers (e.g. vfio-pci). Do you want to let vfio core allocate the vfio_device struct and return the pointer to callers? Thanks, Yi Liu > vfio_init_group_dev() should do a device_initialize() > vfio_uninit_group_dev() should do a device_put() > > The opened atomic is aweful. A newly created fd should start in a > state where it has a disabled fops > > The only thing the disabled fops can do is register the device to the > iommu fd. When successfully registered the device gets the normal fops. > > The registration steps should be done under a normal lock inside the > vfio_device. If a vfio_device is already registered then further > registration should fail. > > Getting the device fd via the group fd triggers the same sequence as > above. > > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: David Gibson > Sent: Wednesday, September 29, 2021 10:09 AM > > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > [snip] > > > +static bool vfio_device_in_container(struct vfio_device *device) > > +{ > > + return !!(device->group && device->group->container); > > You don't need !! here. && is already a logical operation, so returns > a valid bool. got it. thanks. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Wed, Sep 29, 2021 at 01:05:21PM -0600, Alex Williamson wrote: > On Wed, 29 Sep 2021 12:08:59 +1000 > David Gibson wrote: > > > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > > userspace to directly open a vfio device w/o relying on container/group > > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > > manner. > > > > > > In case a device is exposed in both legacy and new interfaces (see next > > > patch for how to decide it), this patch also ensures that when the device > > > is already opened via one interface then the other one must be blocked. > > > > > > Signed-off-by: Liu Yi L > > [snip] > > > > > +static bool vfio_device_in_container(struct vfio_device *device) > > > +{ > > > + return !!(device->group && device->group->container); > > > > You don't need !! here. && is already a logical operation, so returns > > a valid bool. > > > > > +} > > > + > > > static int vfio_device_fops_release(struct inode *inode, struct file > > > *filep) > > > { > > > struct vfio_device *device = filep->private_data; > > > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode > > > *inode, struct file *filep) > > > > > > module_put(device->dev->driver->owner); > > > > > > - vfio_group_try_dissolve_container(device->group); > > > + if (vfio_device_in_container(device)) { > > > + vfio_group_try_dissolve_container(device->group); > > > + } else { > > > + atomic_dec(&device->opened); > > > + if (device->group) { > > > + mutex_lock(&device->group->opened_lock); > > > + device->group->opened--; > > > + mutex_unlock(&device->group->opened_lock); > > > + } > > > + } > > > > > > vfio_device_put(device); > > > > > > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file > > > *filep, struct vm_area_struct *vma) > > > > > > static const struct file_operations vfio_device_fops = { > > > .owner = THIS_MODULE, > > > + .open = vfio_device_fops_open, > > > .release= vfio_device_fops_release, > > > .read = vfio_device_fops_read, > > > .write = vfio_device_fops_write, > > > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > > > .mode = S_IRUGO | S_IWUGO, > > > }; > > > > > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > > > +{ > > > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > > > > Others have pointed out some problems with the use of dev_name() > > here. I'll add that I think you'll make things much easier if instead > > of using one huge "devices" subdir, you use a separate subdir for each > > vfio sub-driver (so, one for PCI, one for each type of mdev, one for > > platform, etc.). That should make avoiding name conflicts a lot simpler. > > It seems like this is unnecessary if we use the vfioX naming approach. > Conflicts are trivial to ignore if we don't involve dev_name() and > looking for the correct major:minor chardev in the correct subdirectory > seems like a hassle for userspace. Thanks, Right.. it does sound like a hassle, but AFAICT that's *more* necessary with /dev/vfio/vfioXX than with /dev/vfio/pci/:BB:SS.F, since you have to look up a meaningful name in sysfs to find the right devnode. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Wed, 29 Sep 2021 12:08:59 +1000 David Gibson wrote: > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > [snip] > > > +static bool vfio_device_in_container(struct vfio_device *device) > > +{ > > + return !!(device->group && device->group->container); > > You don't need !! here. && is already a logical operation, so returns > a valid bool. > > > +} > > + > > static int vfio_device_fops_release(struct inode *inode, struct file > > *filep) > > { > > struct vfio_device *device = filep->private_data; > > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode > > *inode, struct file *filep) > > > > module_put(device->dev->driver->owner); > > > > - vfio_group_try_dissolve_container(device->group); > > + if (vfio_device_in_container(device)) { > > + vfio_group_try_dissolve_container(device->group); > > + } else { > > + atomic_dec(&device->opened); > > + if (device->group) { > > + mutex_lock(&device->group->opened_lock); > > + device->group->opened--; > > + mutex_unlock(&device->group->opened_lock); > > + } > > + } > > > > vfio_device_put(device); > > > > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file *filep, > > struct vm_area_struct *vma) > > > > static const struct file_operations vfio_device_fops = { > > .owner = THIS_MODULE, > > + .open = vfio_device_fops_open, > > .release= vfio_device_fops_release, > > .read = vfio_device_fops_read, > > .write = vfio_device_fops_write, > > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > > .mode = S_IRUGO | S_IWUGO, > > }; > > > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > > +{ > > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > > Others have pointed out some problems with the use of dev_name() > here. I'll add that I think you'll make things much easier if instead > of using one huge "devices" subdir, you use a separate subdir for each > vfio sub-driver (so, one for PCI, one for each type of mdev, one for > platform, etc.). That should make avoiding name conflicts a lot simpler. It seems like this is unnecessary if we use the vfioX naming approach. Conflicts are trivial to ignore if we don't involve dev_name() and looking for the correct major:minor chardev in the correct subdirectory seems like a hassle for userspace. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L [snip] > +static bool vfio_device_in_container(struct vfio_device *device) > +{ > + return !!(device->group && device->group->container); You don't need !! here. && is already a logical operation, so returns a valid bool. > +} > + > static int vfio_device_fops_release(struct inode *inode, struct file *filep) > { > struct vfio_device *device = filep->private_data; > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode > *inode, struct file *filep) > > module_put(device->dev->driver->owner); > > - vfio_group_try_dissolve_container(device->group); > + if (vfio_device_in_container(device)) { > + vfio_group_try_dissolve_container(device->group); > + } else { > + atomic_dec(&device->opened); > + if (device->group) { > + mutex_lock(&device->group->opened_lock); > + device->group->opened--; > + mutex_unlock(&device->group->opened_lock); > + } > + } > > vfio_device_put(device); > > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file *filep, > struct vm_area_struct *vma) > > static const struct file_operations vfio_device_fops = { > .owner = THIS_MODULE, > + .open = vfio_device_fops_open, > .release= vfio_device_fops_release, > .read = vfio_device_fops_read, > .write = vfio_device_fops_write, > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > .mode = S_IRUGO | S_IWUGO, > }; > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > +{ > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); Others have pointed out some problems with the use of dev_name() here. I'll add that I think you'll make things much easier if instead of using one huge "devices" subdir, you use a separate subdir for each vfio sub-driver (so, one for PCI, one for each type of mdev, one for platform, etc.). That should make avoiding name conflicts a lot simpler. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:51 PM > > On Wed, Sep 22, 2021 at 03:22:42AM +, Tian, Kevin wrote: > > > From: Tian, Kevin > > > Sent: Wednesday, September 22, 2021 9:07 AM > > > > > > > From: Jason Gunthorpe > > > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > > > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > > > > The opened atomic is aweful. A newly created fd should start in a > > > > > > state where it has a disabled fops > > > > > > > > > > > > The only thing the disabled fops can do is register the device to > > > > > > the > > > > > > iommu fd. When successfully registered the device gets the normal > fops. > > > > > > > > > > > > The registration steps should be done under a normal lock inside > the > > > > > > vfio_device. If a vfio_device is already registered then further > > > > > > registration should fail. > > > > > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > > > above. > > > > > > > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > > > making vfio type1 as a shim. In this case we can use the registration > > > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > > > today, then a new atomic is still necessary. This all depends on how > > > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > > > discussed here just adds another pound to the shim option... > > > > > > > > No, it works the same either way, the group FD path is identical to > > > > the normal FD path, it just triggers some of the state transitions > > > > automatically internally instead of requiring external ioctls. > > > > > > > > The device FDs starts disabled, an internal API binds it to the iommu > > > > via open coding with the group API, and then the rest of the APIs can > > > > be enabled. Same as today. > > > > > > > > After reading your comments on patch08, I may have a clearer picture > > on your suggestion. The key is to handle exclusive access at the binding > > time (based on vdev->iommu_dev). Please see whether below makes > > sense: > > > > Shared sequence: > > > > 1) initialize the device with a parked fops; > > 2) need binding (explicit or implicit) to move away from parked fops; > > 3) switch to normal fops after successful binding; > > > > 1) happens at device probe. > > 1 happens when the cdev is setup with the parked fops, yes. I'd say it > happens at fd open time. > > > for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD: > > > > - 2) is done by calling .bind_iommufd() callback; > > - 3) could be done within .bind_iommufd(), or via a new callback e.g. > > .finalize_device(). The latter may be preferred for the group interface; > > - Two threads may open the same device simultaneously, with exclusive > > access guaranteed by iommufd_bind_device(); > > - Open() after successful binding is rejected, since normal fops has been > > activated. This is checked upon vdev->iommu_dev; > > Almost, open is always successful, what fails is > VFIO_DEVICE_GET_IOMMUFD (or the group equivilant). The user ends up > with a FD that is useless, cannot reach the ops and thus cannot impact > the device it doesn't own in any way. make sense. I had an wrong impression that once a normal fops is activated it is also visible to other threads. But in concept this fops replacement should be local to each thread thus another thread opening the device always gets a parked fops. > > It is similar to opening a group FD > > > for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD: > > > > - 2) is done by open coding bind_iommufd + attach_ioas. Create an > > iommufd_device object and record it to vdev->iommu_dev > > - 3) is done by calling .finalize_device(); > > - open() after a valid vdev->iommu_dev is rejected. this also ensures > > exclusive ownership with the nongroup path. > > Same comment as above, groups should go through the same sequence of > steps, create a FD, attempt to bind, if successuful make the FD > operational. > > The only difference is that failure in these steps does not call > fd_install(). For this reason alone the FD could start out with > operational fops, but it feels like a needless optimization. > > > If Alex also agrees with it, this might be another mini-series to be merged > > (just for group path) before this one. Doing so sort of nullifies the > > existing > > group/container attaching process, where attach_ioas will be skipped and > > now the security context is established when the device is opened. > > I think it is really important to unify DMA exclusion model and lower > to the core iommu code. If there is a reason the exclusion must be > triggered on group fd open then the iommu core code should provide an > API to do that which interworks with the device API iommufd will work. > > But I would start here because it is mu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Wed, Sep 22, 2021 at 03:22:42AM +, Tian, Kevin wrote: > > From: Tian, Kevin > > Sent: Wednesday, September 22, 2021 9:07 AM > > > > > From: Jason Gunthorpe > > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > > > The opened atomic is aweful. A newly created fd should start in a > > > > > state where it has a disabled fops > > > > > > > > > > The only thing the disabled fops can do is register the device to the > > > > > iommu fd. When successfully registered the device gets the normal > > > > > fops. > > > > > > > > > > The registration steps should be done under a normal lock inside the > > > > > vfio_device. If a vfio_device is already registered then further > > > > > registration should fail. > > > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > > above. > > > > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > > making vfio type1 as a shim. In this case we can use the registration > > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > > today, then a new atomic is still necessary. This all depends on how > > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > > discussed here just adds another pound to the shim option... > > > > > > No, it works the same either way, the group FD path is identical to > > > the normal FD path, it just triggers some of the state transitions > > > automatically internally instead of requiring external ioctls. > > > > > > The device FDs starts disabled, an internal API binds it to the iommu > > > via open coding with the group API, and then the rest of the APIs can > > > be enabled. Same as today. > > > > > After reading your comments on patch08, I may have a clearer picture > on your suggestion. The key is to handle exclusive access at the binding > time (based on vdev->iommu_dev). Please see whether below makes > sense: > > Shared sequence: > > 1) initialize the device with a parked fops; > 2) need binding (explicit or implicit) to move away from parked fops; > 3) switch to normal fops after successful binding; > > 1) happens at device probe. 1 happens when the cdev is setup with the parked fops, yes. I'd say it happens at fd open time. > for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD: > > - 2) is done by calling .bind_iommufd() callback; > - 3) could be done within .bind_iommufd(), or via a new callback e.g. > .finalize_device(). The latter may be preferred for the group interface; > - Two threads may open the same device simultaneously, with exclusive > access guaranteed by iommufd_bind_device(); > - Open() after successful binding is rejected, since normal fops has been > activated. This is checked upon vdev->iommu_dev; Almost, open is always successful, what fails is VFIO_DEVICE_GET_IOMMUFD (or the group equivilant). The user ends up with a FD that is useless, cannot reach the ops and thus cannot impact the device it doesn't own in any way. It is similar to opening a group FD > for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD: > > - 2) is done by open coding bind_iommufd + attach_ioas. Create an > iommufd_device object and record it to vdev->iommu_dev > - 3) is done by calling .finalize_device(); > - open() after a valid vdev->iommu_dev is rejected. this also ensures > exclusive ownership with the nongroup path. Same comment as above, groups should go through the same sequence of steps, create a FD, attempt to bind, if successuful make the FD operational. The only difference is that failure in these steps does not call fd_install(). For this reason alone the FD could start out with operational fops, but it feels like a needless optimization. > If Alex also agrees with it, this might be another mini-series to be merged > (just for group path) before this one. Doing so sort of nullifies the existing > group/container attaching process, where attach_ioas will be skipped and > now the security context is established when the device is opened. I think it is really important to unify DMA exclusion model and lower to the core iommu code. If there is a reason the exclusion must be triggered on group fd open then the iommu core code should provide an API to do that which interworks with the device API iommufd will work. But I would start here because it is much simpler to understand.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Wed, Sep 22, 2021 at 01:07:11AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > > The opened atomic is aweful. A newly created fd should start in a > > > > state where it has a disabled fops > > > > > > > > The only thing the disabled fops can do is register the device to the > > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > > > The registration steps should be done under a normal lock inside the > > > > vfio_device. If a vfio_device is already registered then further > > > > registration should fail. > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > above. > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > making vfio type1 as a shim. In this case we can use the registration > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > today, then a new atomic is still necessary. This all depends on how > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > discussed here just adds another pound to the shim option... > > > > No, it works the same either way, the group FD path is identical to > > the normal FD path, it just triggers some of the state transitions > > automatically internally instead of requiring external ioctls. > > > > The device FDs starts disabled, an internal API binds it to the iommu > > via open coding with the group API, and then the rest of the APIs can > > be enabled. Same as today. > > > > Still a bit confused. if vfio type1 also connects to iommufd, whether > the device is registered can be centrally checked based on whether > an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at > all, don't we still need introduce a new state (calling it 'opened' or > 'registered') to protect the two interfaces? The "new state" is if the fops are pointing at the real fops or the pre-fops, which in turn protects everything. You could imagine this as some state in front of every fop call if you want. > In this case what is the point of keeping device FD disabled even > for the group path? I have a feeling when you go through the APIs it will make sense to have some symmetry here. eg creating a device FD should have basically the same flow no matter what triggers it, not confusing special cases where the group code skips steps Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Tian, Kevin > Sent: Wednesday, September 22, 2021 9:07 AM > > > From: Jason Gunthorpe > > Sent: Wednesday, September 22, 2021 8:55 AM > > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > > The opened atomic is aweful. A newly created fd should start in a > > > > state where it has a disabled fops > > > > > > > > The only thing the disabled fops can do is register the device to the > > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > > > The registration steps should be done under a normal lock inside the > > > > vfio_device. If a vfio_device is already registered then further > > > > registration should fail. > > > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > > above. > > > > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > > making vfio type1 as a shim. In this case we can use the registration > > > status as the exclusive switch. But if we keep vfio type1 separate as > > > today, then a new atomic is still necessary. This all depends on how > > > we want to deal with vfio type1 and iommufd, and possibly what's > > > discussed here just adds another pound to the shim option... > > > > No, it works the same either way, the group FD path is identical to > > the normal FD path, it just triggers some of the state transitions > > automatically internally instead of requiring external ioctls. > > > > The device FDs starts disabled, an internal API binds it to the iommu > > via open coding with the group API, and then the rest of the APIs can > > be enabled. Same as today. > > After reading your comments on patch08, I may have a clearer picture on your suggestion. The key is to handle exclusive access at the binding time (based on vdev->iommu_dev). Please see whether below makes sense: Shared sequence: 1) initialize the device with a parked fops; 2) need binding (explicit or implicit) to move away from parked fops; 3) switch to normal fops after successful binding; 1) happens at device probe. for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD: - 2) is done by calling .bind_iommufd() callback; - 3) could be done within .bind_iommufd(), or via a new callback e.g. .finalize_device(). The latter may be preferred for the group interface; - Two threads may open the same device simultaneously, with exclusive access guaranteed by iommufd_bind_device(); - Open() after successful binding is rejected, since normal fops has been activated. This is checked upon vdev->iommu_dev; for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD: - 2) is done by open coding bind_iommufd + attach_ioas. Create an iommufd_device object and record it to vdev->iommu_dev - 3) is done by calling .finalize_device(); - open() after a valid vdev->iommu_dev is rejected. this also ensures exclusive ownership with the nongroup path. If Alex also agrees with it, this might be another mini-series to be merged (just for group path) before this one. Doing so sort of nullifies the existing group/container attaching process, where attach_ioas will be skipped and now the security context is established when the device is opened. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Wednesday, September 22, 2021 8:55 AM > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > > The opened atomic is aweful. A newly created fd should start in a > > > state where it has a disabled fops > > > > > > The only thing the disabled fops can do is register the device to the > > > iommu fd. When successfully registered the device gets the normal fops. > > > > > > The registration steps should be done under a normal lock inside the > > > vfio_device. If a vfio_device is already registered then further > > > registration should fail. > > > > > > Getting the device fd via the group fd triggers the same sequence as > > > above. > > > > > > > Above works if the group interface is also connected to iommufd, i.e. > > making vfio type1 as a shim. In this case we can use the registration > > status as the exclusive switch. But if we keep vfio type1 separate as > > today, then a new atomic is still necessary. This all depends on how > > we want to deal with vfio type1 and iommufd, and possibly what's > > discussed here just adds another pound to the shim option... > > No, it works the same either way, the group FD path is identical to > the normal FD path, it just triggers some of the state transitions > automatically internally instead of requiring external ioctls. > > The device FDs starts disabled, an internal API binds it to the iommu > via open coding with the group API, and then the rest of the APIs can > be enabled. Same as today. > Still a bit confused. if vfio type1 also connects to iommufd, whether the device is registered can be centrally checked based on whether an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at all, don't we still need introduce a new state (calling it 'opened' or 'registered') to protect the two interfaces? In this case what is the point of keeping device FD disabled even for the group path? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Alex Williamson > Sent: Wednesday, September 22, 2021 3:56 AM > > On Sun, 19 Sep 2021 14:38:30 +0800 > Liu Yi L wrote: > > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > > --- > > drivers/vfio/vfio.c | 228 +++ > > include/linux/vfio.h | 2 + > > 2 files changed, 213 insertions(+), 17 deletions(-) > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > > index 02cc51ce6891..84436d7abedd 100644 > > --- a/drivers/vfio/vfio.c > > +++ b/drivers/vfio/vfio.c > ... > > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > > .mode = S_IRUGO | S_IWUGO, > > }; > > > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > > +{ > > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > > +} > > dev_name() doesn't provide us with any uniqueness guarantees, so this > could potentially generate naming conflicts. The similar scheme for > devices within an iommu group appends an instance number if a conflict > occurs, but that solution doesn't work here where the name isn't just a > link to the actual device. Devices within an iommu group are also > likely associated within a bus_type, so the potential for conflict is > pretty negligible, that's not the case as vfio is adopted for new > device types. Thanks, > This is also our concern. Thanks for confirming it. Appreciate if you can help think out some better alternative to deal with it. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote: > > The opened atomic is aweful. A newly created fd should start in a > > state where it has a disabled fops > > > > The only thing the disabled fops can do is register the device to the > > iommu fd. When successfully registered the device gets the normal fops. > > > > The registration steps should be done under a normal lock inside the > > vfio_device. If a vfio_device is already registered then further > > registration should fail. > > > > Getting the device fd via the group fd triggers the same sequence as > > above. > > > > Above works if the group interface is also connected to iommufd, i.e. > making vfio type1 as a shim. In this case we can use the registration > status as the exclusive switch. But if we keep vfio type1 separate as > today, then a new atomic is still necessary. This all depends on how > we want to deal with vfio type1 and iommufd, and possibly what's > discussed here just adds another pound to the shim option... No, it works the same either way, the group FD path is identical to the normal FD path, it just triggers some of the state transitions automatically internally instead of requiring external ioctls. The device FDs starts disabled, an internal API binds it to the iommu via open coding with the group API, and then the rest of the APIs can be enabled. Same as today. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
> From: Jason Gunthorpe > Sent: Tuesday, September 21, 2021 11:57 PM > > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > > userspace to directly open a vfio device w/o relying on container/group > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > > iommufd (more specifically in iommu core by this RFC) in a device-centric > > manner. > > > > In case a device is exposed in both legacy and new interfaces (see next > > patch for how to decide it), this patch also ensures that when the device > > is already opened via one interface then the other one must be blocked. > > > > Signed-off-by: Liu Yi L > > --- > > drivers/vfio/vfio.c | 228 +++ > > include/linux/vfio.h | 2 + > > 2 files changed, 213 insertions(+), 17 deletions(-) > > > +static int vfio_init_device_class(void) > > +{ > > + int ret; > > + > > + mutex_init(&vfio.device_lock); > > + idr_init(&vfio.device_idr); > > + > > + /* /dev/vfio/devices/$DEVICE */ > > + vfio.device_class = class_create(THIS_MODULE, "vfio-device"); > > + if (IS_ERR(vfio.device_class)) > > + return PTR_ERR(vfio.device_class); > > + > > + vfio.device_class->devnode = vfio_device_devnode; > > + > > + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, > "vfio-device"); > > + if (ret) > > + goto err_alloc_chrdev; > > + > > + cdev_init(&vfio.device_cdev, &vfio_device_fops); > > + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + > 1); > > + if (ret) > > + goto err_cdev_add; > > Huh? This is not how cdevs are used. This patch needs rewriting. > > The struct vfio_device should gain a 'struct device' and 'struct cdev' > as non-pointer members > > vfio register path should end up doing cdev_device_add() for each > vfio_device > > vfio_unregister path should do cdev_device_del() > > No idr should be needed, an ida is used to allocate minor numbers > > The struct device release function should trigger a kfree which > requires some reworking of the callers > > vfio_init_group_dev() should do a device_initialize() > vfio_uninit_group_dev() should do a device_put() All above are good suggestions! > > The opened atomic is aweful. A newly created fd should start in a > state where it has a disabled fops > > The only thing the disabled fops can do is register the device to the > iommu fd. When successfully registered the device gets the normal fops. > > The registration steps should be done under a normal lock inside the > vfio_device. If a vfio_device is already registered then further > registration should fail. > > Getting the device fd via the group fd triggers the same sequence as > above. > Above works if the group interface is also connected to iommufd, i.e. making vfio type1 as a shim. In this case we can use the registration status as the exclusive switch. But if we keep vfio type1 separate as today, then a new atomic is still necessary. This all depends on how we want to deal with vfio type1 and iommufd, and possibly what's discussed here just adds another pound to the shim option... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, 19 Sep 2021 14:38:30 +0800 Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L > --- > drivers/vfio/vfio.c | 228 +++ > include/linux/vfio.h | 2 + > 2 files changed, 213 insertions(+), 17 deletions(-) > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c > index 02cc51ce6891..84436d7abedd 100644 > --- a/drivers/vfio/vfio.c > +++ b/drivers/vfio/vfio.c ... > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = { > .mode = S_IRUGO | S_IWUGO, > }; > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode) > +{ > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev)); > +} dev_name() doesn't provide us with any uniqueness guarantees, so this could potentially generate naming conflicts. The similar scheme for devices within an iommu group appends an instance number if a conflict occurs, but that solution doesn't work here where the name isn't just a link to the actual device. Devices within an iommu group are also likely associated within a bus_type, so the potential for conflict is pretty negligible, that's not the case as vfio is adopted for new device types. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote: > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for > userspace to directly open a vfio device w/o relying on container/group > (/dev/vfio/$GROUP). Anything related to group is now hidden behind > iommufd (more specifically in iommu core by this RFC) in a device-centric > manner. > > In case a device is exposed in both legacy and new interfaces (see next > patch for how to decide it), this patch also ensures that when the device > is already opened via one interface then the other one must be blocked. > > Signed-off-by: Liu Yi L > --- > drivers/vfio/vfio.c | 228 +++ > include/linux/vfio.h | 2 + > 2 files changed, 213 insertions(+), 17 deletions(-) > +static int vfio_init_device_class(void) > +{ > + int ret; > + > + mutex_init(&vfio.device_lock); > + idr_init(&vfio.device_idr); > + > + /* /dev/vfio/devices/$DEVICE */ > + vfio.device_class = class_create(THIS_MODULE, "vfio-device"); > + if (IS_ERR(vfio.device_class)) > + return PTR_ERR(vfio.device_class); > + > + vfio.device_class->devnode = vfio_device_devnode; > + > + ret = alloc_chrdev_region(&vfio.device_devt, 0, MINORMASK + 1, > "vfio-device"); > + if (ret) > + goto err_alloc_chrdev; > + > + cdev_init(&vfio.device_cdev, &vfio_device_fops); > + ret = cdev_add(&vfio.device_cdev, vfio.device_devt, MINORMASK + 1); > + if (ret) > + goto err_cdev_add; Huh? This is not how cdevs are used. This patch needs rewriting. The struct vfio_device should gain a 'struct device' and 'struct cdev' as non-pointer members vfio register path should end up doing cdev_device_add() for each vfio_device vfio_unregister path should do cdev_device_del() No idr should be needed, an ida is used to allocate minor numbers The struct device release function should trigger a kfree which requires some reworking of the callers vfio_init_group_dev() should do a device_initialize() vfio_uninit_group_dev() should do a device_put() The opened atomic is aweful. A newly created fd should start in a state where it has a disabled fops The only thing the disabled fops can do is register the device to the iommu fd. When successfully registered the device gets the normal fops. The registration steps should be done under a normal lock inside the vfio_device. If a vfio_device is already registered then further registration should fail. Getting the device fd via the group fd triggers the same sequence as above. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu