RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-11-11 Thread Liu, Yi L
Hi Jason,

> From: Jason Gunthorpe 
> Sent: Wednesday, November 3, 2021 9:26 PM
> 
> On Tue, Nov 02, 2021 at 09:53:29AM +, Liu, Yi L wrote:
> 
> > >   vfio_uninit_group_dev(_state->vdev);
> > >   kfree(mdev_state->pages);
> > >   kfree(mdev_state->vconfig);
> > >   kfree(mdev_state);
> > >
> > > pages/vconfig would logically be in a release function
> >
> > I see. So the criteria is: the pointer fields pointing to a memory buffer
> > allocated by the device driver should be logically be free in a release
> > function. right?
> 
> Often yes, that is usually a good idea
> 
> >I can see there are such fields in struct vfio_pci_core_device
> > and mdev_state (both mbochs and mdpy). So we may go with your option
> #2.
> > Is it? otherwise, needs to add release callback for all the related drivers.
> 
> Yes, that is the approx trade off
> 
> > > On the other hand ccw needs to rcu free the vfio_device, so that would
> > > have to be global overhead with this api design.
> >
> > not quite get. why ccw is special here? could you elaborate?
> 
> I added a rcu usage to it in order to fix a race
> 
> +static inline struct vfio_ccw_private *vfio_ccw_get_priv(struct subchannel
> *sch)
> +{
> +   struct vfio_ccw_private *private;
> +
> +   rcu_read_lock();
> +   private = dev_get_drvdata(>dev);
> +   if (private && !vfio_device_try_get(>vdev))
> +   private = NULL;
> +   rcu_read_unlock();
> +   return private;
> +}

you are right. After checking your ccw patch, the private free triggered
by vfio_ccw_free_private() should use kfree_rcu(). So it is not quite
same with other vfio_device users which only need kfree() to free the
vfio_device. So how can I address the difference when moving the vfio_device
alloc/free into vfio core? any suggestion?

@@ -164,14 +173,14 @@ static void vfio_ccw_free_private(struct vfio_ccw_private 
*private)
kmem_cache_free(vfio_ccw_io_region, private->io_region);
kfree(private->cp.guest_cp);
mutex_destroy(>io_mutex);
-   kfree(private);
+   vfio_uninit_group_dev(>vdev);
+   kfree_rcu(private, rcu);
 }

https://lore.kernel.org/kvm/10-v3-57c1502c62fd+2190-ccw_mdev_...@nvidia.com/

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-11-03 Thread Jason Gunthorpe via iommu
On Tue, Nov 02, 2021 at 09:53:29AM +, Liu, Yi L wrote:

> > vfio_uninit_group_dev(_state->vdev);
> > kfree(mdev_state->pages);
> > kfree(mdev_state->vconfig);
> > kfree(mdev_state);
> > 
> > pages/vconfig would logically be in a release function
> 
> I see. So the criteria is: the pointer fields pointing to a memory buffer
> allocated by the device driver should be logically be free in a release
> function. right? 

Often yes, that is usually a good idea

>I can see there are such fields in struct vfio_pci_core_device
> and mdev_state (both mbochs and mdpy). So we may go with your option #2.
> Is it? otherwise, needs to add release callback for all the related drivers.

Yes, that is the approx trade off

> > On the other hand ccw needs to rcu free the vfio_device, so that would
> > have to be global overhead with this api design.
> 
> not quite get. why ccw is special here? could you elaborate?

I added a rcu usage to it in order to fix a race

+static inline struct vfio_ccw_private *vfio_ccw_get_priv(struct subchannel 
*sch)
+{
+   struct vfio_ccw_private *private;
+
+   rcu_read_lock();
+   private = dev_get_drvdata(>dev);
+   if (private && !vfio_device_try_get(>vdev))
+   private = NULL;
+   rcu_read_unlock();
+   return private;
+}
 

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-11-02 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Monday, November 1, 2021 8:50 PM
> 
> On Fri, Oct 29, 2021 at 09:47:27AM +, Liu, Yi L wrote:
> > Hi Jason,
> >
> > > From: Jason Gunthorpe 
> > > Sent: Monday, October 25, 2021 8:53 PM
> > >
> > > On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote:
> > > >thanks for the guiding. will also refer to your vfio_group_cdev 
> > > > series.
> > > >
> > > >Need to double confirm here. Not quite following on the kfree. Is
> > > >this kfree to free the vfio_device structure? But now the
> > > >vfio_device pointer is provided by callers (e.g. vfio-pci). Do
> > > >you want to let vfio core allocate the vfio_device struct and
> > > >return the pointer to callers?
> > >
> > > There are several common patterns for this problem, two that would be
> > > suitable:
> > >
> > > - Require each driver to provide a release op inside vfio_device_ops
> > >   that does the kfree. Have the core provide a struct device release
> > >   op that calls this one. Keep the kalloc/kfree in the drivers
> >
> > this way sees to suit the existing vfio registration manner listed
> > below. right?
> 
> Not really, most drivers are just doing kfree. The need for release
> comes if the drivers are doing more stuff.
> 
> > But device drivers needs to do the kfree in the
> > newly added release op instead of doing it on their own (e.g.
> > doing kfree in remove).
> 
> Yes
> 
> > > struct ib_device *_ib_alloc_device(size_t size);
> > > #define ib_alloc_device(drv_struct, member)   
> > >  \
> > > container_of(_ib_alloc_device(sizeof(struct drv_struct) + 
> > >  \
> > >   BUILD_BUG_ON_ZERO(offsetof( 
> > >  \
> > >   struct drv_struct, 
> > > member))),\
> > >  struct drv_struct, member)
> > >
> >
> > thanks for the example. If this way, still requires driver to provide
> > a release op inside vfio_device_ops. right?
> 
> No, it would optional. It would contain the stuff the driver is doing
> before kfree()
> 
> For instance mdev looks like the only driver that cares:
> 
>   vfio_uninit_group_dev(_state->vdev);
>   kfree(mdev_state->pages);
>   kfree(mdev_state->vconfig);
>   kfree(mdev_state);
> 
> pages/vconfig would logically be in a release function

I see. So the criteria is: the pointer fields pointing to a memory buffer
allocated by the device driver should be logically be free in a release
function. right? I can see there are such fields in struct vfio_pci_core_device
and mdev_state (both mbochs and mdpy). So we may go with your option #2.
Is it? otherwise, needs to add release callback for all the related drivers.

struct vfio_pci_core_device {
struct vifo_device vdev;
...
u8 *pci_config_map;
u8 *vconfig;
...
};

struct mdev_state {
struct vifo_device vdev;
...
u8 *vconfig;
struct page **pages;
...
};

> On the other hand ccw needs to rcu free the vfio_device, so that would
> have to be global overhead with this api design.

not quite get. why ccw is special here? could you elaborate?

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-11-01 Thread Jason Gunthorpe via iommu
On Fri, Oct 29, 2021 at 09:47:27AM +, Liu, Yi L wrote:
> Hi Jason,
> 
> > From: Jason Gunthorpe 
> > Sent: Monday, October 25, 2021 8:53 PM
> > 
> > On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote:
> > >thanks for the guiding. will also refer to your vfio_group_cdev series.
> > >
> > >Need to double confirm here. Not quite following on the kfree. Is
> > >this kfree to free the vfio_device structure? But now the
> > >vfio_device pointer is provided by callers (e.g. vfio-pci). Do
> > >you want to let vfio core allocate the vfio_device struct and
> > >return the pointer to callers?
> > 
> > There are several common patterns for this problem, two that would be
> > suitable:
> > 
> > - Require each driver to provide a release op inside vfio_device_ops
> >   that does the kfree. Have the core provide a struct device release
> >   op that calls this one. Keep the kalloc/kfree in the drivers
> 
> this way sees to suit the existing vfio registration manner listed
> below. right? 

Not really, most drivers are just doing kfree. The need for release
comes if the drivers are doing more stuff.

> But device drivers needs to do the kfree in the
> newly added release op instead of doing it on their own (e.g.
> doing kfree in remove).

Yes

> > struct ib_device *_ib_alloc_device(size_t size);
> > #define ib_alloc_device(drv_struct, member) 
> >\
> > container_of(_ib_alloc_device(sizeof(struct drv_struct) +   
> >\
> >   BUILD_BUG_ON_ZERO(offsetof(   
> >\
> >   struct drv_struct, member))), 
> >\
> >  struct drv_struct, member)
> > 
> 
> thanks for the example. If this way, still requires driver to provide
> a release op inside vfio_device_ops. right?

No, it would optional. It would contain the stuff the driver is doing
before kfree()

For instance mdev looks like the only driver that cares:

vfio_uninit_group_dev(_state->vdev);
kfree(mdev_state->pages);
kfree(mdev_state->vconfig);
kfree(mdev_state);

pages/vconfig would logically be in a release function

On the other hand ccw needs to rcu free the vfio_device, so that would
have to be global overhead with this api design.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-10-29 Thread Liu, Yi L
Hi Jason,

> From: Jason Gunthorpe 
> Sent: Monday, October 25, 2021 8:53 PM
> 
> On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote:
> >thanks for the guiding. will also refer to your vfio_group_cdev series.
> >
> >Need to double confirm here. Not quite following on the kfree. Is
> >this kfree to free the vfio_device structure? But now the
> >vfio_device pointer is provided by callers (e.g. vfio-pci). Do
> >you want to let vfio core allocate the vfio_device struct and
> >return the pointer to callers?
> 
> There are several common patterns for this problem, two that would be
> suitable:
> 
> - Require each driver to provide a release op inside vfio_device_ops
>   that does the kfree. Have the core provide a struct device release
>   op that calls this one. Keep the kalloc/kfree in the drivers

this way sees to suit the existing vfio registration manner listed
below. right? But device drivers needs to do the kfree in the
newly added release op instead of doing it on their own (e.g.
doing kfree in remove).

vfio_init_group_dev()
vfio_register_group_dev()
vfio_unregister_group_dev()
vfio_uninit_group_dev()

> - Move the kalloc into the core and have the core provide the kfree
>   with an optional release callback for anydriver specific cleanup
> 
>   This requires some macro to make the memory layout work. RDMA has
>   a version of this:
> 
> struct ib_device *_ib_alloc_device(size_t size);
> #define ib_alloc_device(drv_struct, member)   
>  \
> container_of(_ib_alloc_device(sizeof(struct drv_struct) + 
>  \
>   BUILD_BUG_ON_ZERO(offsetof( 
>  \
>   struct drv_struct, member))),   
>  \
>  struct drv_struct, member)
> 

thanks for the example. If this way, still requires driver to provide
a release op inside vfio_device_ops. right?

> In part the choice is how many drivers require a release callback
> anyhow, if they all do then the first is easier to understand. If only
> few or none do then the latter is less code in drivers, and never
> exposes the driver to the tricky transition from alloc to refcount
> cleanup.

I'm not quite sure. But per my understanding, since the vfio_device
is expected to be embedded in the device state struct (e.g.
vfio_pci_core_device), I guess most of the drivers will require callback
to do driver specific cleanup. Seems like option #1 may make sense?

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-10-25 Thread Jason Gunthorpe via iommu
On Mon, Oct 25, 2021 at 06:28:09AM +, Liu, Yi L wrote:
>thanks for the guiding. will also refer to your vfio_group_cdev series.
> 
>Need to double confirm here. Not quite following on the kfree. Is
>this kfree to free the vfio_device structure? But now the
>vfio_device pointer is provided by callers (e.g. vfio-pci). Do
>you want to let vfio core allocate the vfio_device struct and
>return the pointer to callers?

There are several common patterns for this problem, two that would be
suitable:

- Require each driver to provide a release op inside vfio_device_ops
  that does the kfree. Have the core provide a struct device release
  op that calls this one. Keep the kalloc/kfree in the drivers

- Move the kalloc into the core and have the core provide the kfree
  with an optional release callback for anydriver specific cleanup

  This requires some macro to make the memory layout work. RDMA has
  a version of this:

struct ib_device *_ib_alloc_device(size_t size);
#define ib_alloc_device(drv_struct, member)\
container_of(_ib_alloc_device(sizeof(struct drv_struct) +  \
  BUILD_BUG_ON_ZERO(offsetof(  \
  struct drv_struct, member))),\
 struct drv_struct, member)


In part the choice is how many drivers require a release callback
anyhow, if they all do then the first is easier to understand. If only
few or none do then the latter is less code in drivers, and never
exposes the driver to the tricky transition from alloc to refcount
cleanup.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-10-25 Thread Liu, Yi L
> From: Jason Gunthorpe mailto:j...@nvidia.com>>

> Sent: Tuesday, September 21, 2021 11:57 PM

>

> On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:

> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for

> > userspace to directly open a vfio device w/o relying on container/group

> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind

> > iommufd (more specifically in iommu core by this RFC) in a device-centric

> > manner.

> >

> > In case a device is exposed in both legacy and new interfaces (see next

> > patch for how to decide it), this patch also ensures that when the device

> > is already opened via one interface then the other one must be blocked.

> >

> > Signed-off-by: Liu Yi L mailto:yi.l@intel.com>>

> > ---

> >  drivers/vfio/vfio.c  | 228 +++---

> -

> >  include/linux/vfio.h |   2 +

> >  2 files changed, 213 insertions(+), 17 deletions(-)

>

> > +static int vfio_init_device_class(void)

> > +{

> > +  int ret;

> > +

> > +  mutex_init(_lock);

> > +  idr_init(_idr);

> > +

> > +  /* /dev/vfio/devices/$DEVICE */

> > +  vfio.device_class = class_create(THIS_MODULE, "vfio-device");

> > +  if (IS_ERR(vfio.device_class))

> > + return PTR_ERR(vfio.device_class);

> > +

> > +  vfio.device_class->devnode = vfio_device_devnode;

> > +

> > +  ret = alloc_chrdev_region(_devt, 0, MINORMASK + 1,

> "vfio-device");

> > +  if (ret)

> > + goto err_alloc_chrdev;

> > +

> > +  cdev_init(_cdev, _device_fops);

> > +  ret = cdev_add(_cdev, vfio.device_devt, MINORMASK +

> 1);

> > +  if (ret)

> > + goto err_cdev_add;

>

> Huh? This is not how cdevs are used. This patch needs rewriting.

>

> The struct vfio_device should gain a 'struct device' and 'struct cdev'

> as non-pointer members

>

> vfio register path should end up doing cdev_device_add() for each

> vfio_device

>

> vfio_unregister path should do cdev_device_del()

>

> No idr should be needed, an ida is used to allocate minor numbers

>

> The struct device release function should trigger a kfree which

> requires some reworking of the callers



thanks for the guiding. will also refer to your vfio_group_cdev series.



Need to double confirm here. Not quite following on the kfree. Is this

kfree to free the vfio_device structure? But now the vfio_device pointer

is provided by callers (e.g. vfio-pci). Do you want to let vfio core

allocate the vfio_device struct and return the pointer to callers?



Thanks,

Yi Liu



> vfio_init_group_dev() should do a device_initialize()

> vfio_uninit_group_dev() should do a device_put()

>

> The opened atomic is aweful. A newly created fd should start in a

> state where it has a disabled fops

>

> The only thing the disabled fops can do is register the device to the

> iommu fd. When successfully registered the device gets the normal fops.

>

> The registration steps should be done under a normal lock inside the

> vfio_device. If a vfio_device is already registered then further

> registration should fail.

>

> Getting the device fd via the group fd triggers the same sequence as

> above.

>

> Jason

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-10-20 Thread Liu, Yi L
> From: David Gibson 
> Sent: Wednesday, September 29, 2021 10:09 AM
> 
> On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> >
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> >
> > Signed-off-by: Liu Yi L 
> [snip]
> 
> > +static bool vfio_device_in_container(struct vfio_device *device)
> > +{
> > +   return !!(device->group && device->group->container);
> 
> You don't need !! here.  && is already a logical operation, so returns
> a valid bool.

got it. thanks.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-29 Thread David Gibson
On Wed, Sep 29, 2021 at 01:05:21PM -0600, Alex Williamson wrote:
> On Wed, 29 Sep 2021 12:08:59 +1000
> David Gibson  wrote:
> 
> > On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> > > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > > userspace to directly open a vfio device w/o relying on container/group
> > > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > > manner.
> > > 
> > > In case a device is exposed in both legacy and new interfaces (see next
> > > patch for how to decide it), this patch also ensures that when the device
> > > is already opened via one interface then the other one must be blocked.
> > > 
> > > Signed-off-by: Liu Yi L   
> > [snip]
> > 
> > > +static bool vfio_device_in_container(struct vfio_device *device)
> > > +{
> > > + return !!(device->group && device->group->container);  
> > 
> > You don't need !! here.  && is already a logical operation, so returns
> > a valid bool.
> > 
> > > +}
> > > +
> > >  static int vfio_device_fops_release(struct inode *inode, struct file 
> > > *filep)
> > >  {
> > >   struct vfio_device *device = filep->private_data;
> > > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode 
> > > *inode, struct file *filep)
> > >  
> > >   module_put(device->dev->driver->owner);
> > >  
> > > - vfio_group_try_dissolve_container(device->group);
> > > + if (vfio_device_in_container(device)) {
> > > + vfio_group_try_dissolve_container(device->group);
> > > + } else {
> > > + atomic_dec(>opened);
> > > + if (device->group) {
> > > + mutex_lock(>group->opened_lock);
> > > + device->group->opened--;
> > > + mutex_unlock(>group->opened_lock);
> > > + }
> > > + }
> > >  
> > >   vfio_device_put(device);
> > >  
> > > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file 
> > > *filep, struct vm_area_struct *vma)
> > >  
> > >  static const struct file_operations vfio_device_fops = {
> > >   .owner  = THIS_MODULE,
> > > + .open   = vfio_device_fops_open,
> > >   .release= vfio_device_fops_release,
> > >   .read   = vfio_device_fops_read,
> > >   .write  = vfio_device_fops_write,
> > > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
> > >   .mode = S_IRUGO | S_IWUGO,
> > >  };
> > >  
> > > +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> > > +{
> > > + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));  
> > 
> > Others have pointed out some problems with the use of dev_name()
> > here.  I'll add that I think you'll make things much easier if instead
> > of using one huge "devices" subdir, you use a separate subdir for each
> > vfio sub-driver (so, one for PCI, one for each type of mdev, one for
> > platform, etc.).  That should make avoiding name conflicts a lot simpler.
> 
> It seems like this is unnecessary if we use the vfioX naming approach.
> Conflicts are trivial to ignore if we don't involve dev_name() and
> looking for the correct major:minor chardev in the correct subdirectory
> seems like a hassle for userspace.  Thanks,

Right.. it does sound like a hassle, but AFAICT that's *more*
necessary with /dev/vfio/vfioXX than with /dev/vfio/pci/:BB:SS.F,
since you have to look up a meaningful name in sysfs to find the right
devnode.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-29 Thread Alex Williamson
On Wed, 29 Sep 2021 12:08:59 +1000
David Gibson  wrote:

> On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> > 
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> > 
> > Signed-off-by: Liu Yi L   
> [snip]
> 
> > +static bool vfio_device_in_container(struct vfio_device *device)
> > +{
> > +   return !!(device->group && device->group->container);  
> 
> You don't need !! here.  && is already a logical operation, so returns
> a valid bool.
> 
> > +}
> > +
> >  static int vfio_device_fops_release(struct inode *inode, struct file 
> > *filep)
> >  {
> > struct vfio_device *device = filep->private_data;
> > @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode 
> > *inode, struct file *filep)
> >  
> > module_put(device->dev->driver->owner);
> >  
> > -   vfio_group_try_dissolve_container(device->group);
> > +   if (vfio_device_in_container(device)) {
> > +   vfio_group_try_dissolve_container(device->group);
> > +   } else {
> > +   atomic_dec(>opened);
> > +   if (device->group) {
> > +   mutex_lock(>group->opened_lock);
> > +   device->group->opened--;
> > +   mutex_unlock(>group->opened_lock);
> > +   }
> > +   }
> >  
> > vfio_device_put(device);
> >  
> > @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file *filep, 
> > struct vm_area_struct *vma)
> >  
> >  static const struct file_operations vfio_device_fops = {
> > .owner  = THIS_MODULE,
> > +   .open   = vfio_device_fops_open,
> > .release= vfio_device_fops_release,
> > .read   = vfio_device_fops_read,
> > .write  = vfio_device_fops_write,
> > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
> > .mode = S_IRUGO | S_IWUGO,
> >  };
> >  
> > +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> > +{
> > +   return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));  
> 
> Others have pointed out some problems with the use of dev_name()
> here.  I'll add that I think you'll make things much easier if instead
> of using one huge "devices" subdir, you use a separate subdir for each
> vfio sub-driver (so, one for PCI, one for each type of mdev, one for
> platform, etc.).  That should make avoiding name conflicts a lot simpler.

It seems like this is unnecessary if we use the vfioX naming approach.
Conflicts are trivial to ignore if we don't involve dev_name() and
looking for the correct major:minor chardev in the correct subdirectory
seems like a hassle for userspace.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-28 Thread David Gibson
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> userspace to directly open a vfio device w/o relying on container/group
> (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> iommufd (more specifically in iommu core by this RFC) in a device-centric
> manner.
> 
> In case a device is exposed in both legacy and new interfaces (see next
> patch for how to decide it), this patch also ensures that when the device
> is already opened via one interface then the other one must be blocked.
> 
> Signed-off-by: Liu Yi L 
[snip]

> +static bool vfio_device_in_container(struct vfio_device *device)
> +{
> + return !!(device->group && device->group->container);

You don't need !! here.  && is already a logical operation, so returns
a valid bool.

> +}
> +
>  static int vfio_device_fops_release(struct inode *inode, struct file *filep)
>  {
>   struct vfio_device *device = filep->private_data;
> @@ -1560,7 +1691,16 @@ static int vfio_device_fops_release(struct inode 
> *inode, struct file *filep)
>  
>   module_put(device->dev->driver->owner);
>  
> - vfio_group_try_dissolve_container(device->group);
> + if (vfio_device_in_container(device)) {
> + vfio_group_try_dissolve_container(device->group);
> + } else {
> + atomic_dec(>opened);
> + if (device->group) {
> + mutex_lock(>group->opened_lock);
> + device->group->opened--;
> + mutex_unlock(>group->opened_lock);
> + }
> + }
>  
>   vfio_device_put(device);
>  
> @@ -1613,6 +1753,7 @@ static int vfio_device_fops_mmap(struct file *filep, 
> struct vm_area_struct *vma)
>  
>  static const struct file_operations vfio_device_fops = {
>   .owner  = THIS_MODULE,
> + .open   = vfio_device_fops_open,
>   .release= vfio_device_fops_release,
>   .read   = vfio_device_fops_read,
>   .write  = vfio_device_fops_write,
> @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
>   .mode = S_IRUGO | S_IWUGO,
>  };
>  
> +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));

Others have pointed out some problems with the use of dev_name()
here.  I'll add that I think you'll make things much easier if instead
of using one huge "devices" subdir, you use a separate subdir for each
vfio sub-driver (so, one for PCI, one for each type of mdev, one for
platform, etc.).  That should make avoiding name conflicts a lot simpler.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-22 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:51 PM
> 
> On Wed, Sep 22, 2021 at 03:22:42AM +, Tian, Kevin wrote:
> > > From: Tian, Kevin
> > > Sent: Wednesday, September 22, 2021 9:07 AM
> > >
> > > > From: Jason Gunthorpe 
> > > > Sent: Wednesday, September 22, 2021 8:55 AM
> > > >
> > > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > > > > The opened atomic is aweful. A newly created fd should start in a
> > > > > > state where it has a disabled fops
> > > > > >
> > > > > > The only thing the disabled fops can do is register the device to 
> > > > > > the
> > > > > > iommu fd. When successfully registered the device gets the normal
> fops.
> > > > > >
> > > > > > The registration steps should be done under a normal lock inside
> the
> > > > > > vfio_device. If a vfio_device is already registered then further
> > > > > > registration should fail.
> > > > > >
> > > > > > Getting the device fd via the group fd triggers the same sequence as
> > > > > > above.
> > > > > >
> > > > >
> > > > > Above works if the group interface is also connected to iommufd, i.e.
> > > > > making vfio type1 as a shim. In this case we can use the registration
> > > > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > > > today, then a new atomic is still necessary. This all depends on how
> > > > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > > > discussed here just adds another pound to the shim option...
> > > >
> > > > No, it works the same either way, the group FD path is identical to
> > > > the normal FD path, it just triggers some of the state transitions
> > > > automatically internally instead of requiring external ioctls.
> > > >
> > > > The device FDs starts disabled, an internal API binds it to the iommu
> > > > via open coding with the group API, and then the rest of the APIs can
> > > > be enabled. Same as today.
> > > >
> >
> > After reading your comments on patch08, I may have a clearer picture
> > on your suggestion. The key is to handle exclusive access at the binding
> > time (based on vdev->iommu_dev). Please see whether below makes
> > sense:
> >
> > Shared sequence:
> >
> > 1)  initialize the device with a parked fops;
> > 2)  need binding (explicit or implicit) to move away from parked fops;
> > 3)  switch to normal fops after successful binding;
> >
> > 1) happens at device probe.
> 
> 1 happens when the cdev is setup with the parked fops, yes. I'd say it
> happens at fd open time.
> 
> > for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:
> >
> >   - 2) is done by calling .bind_iommufd() callback;
> >   - 3) could be done within .bind_iommufd(), or via a new callback e.g.
> > .finalize_device(). The latter may be preferred for the group interface;
> >   - Two threads may open the same device simultaneously, with exclusive
> > access guaranteed by iommufd_bind_device();
> >   - Open() after successful binding is rejected, since normal fops has been
> > activated. This is checked upon vdev->iommu_dev;
> 
> Almost, open is always successful, what fails is
> VFIO_DEVICE_GET_IOMMUFD (or the group equivilant). The user ends up
> with a FD that is useless, cannot reach the ops and thus cannot impact
> the device it doesn't own in any way.

make sense. I had an wrong impression that once a normal fops is
activated it is also visible to other threads. But in concept this fops
replacement should be local to each thread thus another thread
opening the device always gets a parked fops.

> 
> It is similar to opening a group FD
> 
> > for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:
> >
> >   - 2) is done by open coding bind_iommufd + attach_ioas. Create an
> > iommufd_device object and record it to vdev->iommu_dev
> >   - 3) is done by calling .finalize_device();
> >   - open() after a valid vdev->iommu_dev is rejected. this also ensures
> > exclusive ownership with the nongroup path.
> 
> Same comment as above, groups should go through the same sequence of
> steps, create a FD, attempt to bind, if successuful make the FD
> operational.
> 
> The only difference is that failure in these steps does not call
> fd_install(). For this reason alone the FD could start out with
> operational fops, but it feels like a needless optimization.
> 
> > If Alex also agrees with it, this might be another mini-series to be merged
> > (just for group path) before this one. Doing so sort of nullifies the 
> > existing
> > group/container attaching process, where attach_ioas will be skipped and
> > now the security context is established when the device is opened.
> 
> I think it is really important to unify DMA exclusion model and lower
> to the core iommu code. If there is a reason the exclusion must be
> triggered on group fd open then the iommu core code should provide an
> API to do that which interworks with the device API iommufd will work.
> 
> But I would start here because it is 

Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-22 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 03:22:42AM +, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Wednesday, September 22, 2021 9:07 AM
> > 
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, September 22, 2021 8:55 AM
> > >
> > > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > > > The opened atomic is aweful. A newly created fd should start in a
> > > > > state where it has a disabled fops
> > > > >
> > > > > The only thing the disabled fops can do is register the device to the
> > > > > iommu fd. When successfully registered the device gets the normal 
> > > > > fops.
> > > > >
> > > > > The registration steps should be done under a normal lock inside the
> > > > > vfio_device. If a vfio_device is already registered then further
> > > > > registration should fail.
> > > > >
> > > > > Getting the device fd via the group fd triggers the same sequence as
> > > > > above.
> > > > >
> > > >
> > > > Above works if the group interface is also connected to iommufd, i.e.
> > > > making vfio type1 as a shim. In this case we can use the registration
> > > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > > today, then a new atomic is still necessary. This all depends on how
> > > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > > discussed here just adds another pound to the shim option...
> > >
> > > No, it works the same either way, the group FD path is identical to
> > > the normal FD path, it just triggers some of the state transitions
> > > automatically internally instead of requiring external ioctls.
> > >
> > > The device FDs starts disabled, an internal API binds it to the iommu
> > > via open coding with the group API, and then the rest of the APIs can
> > > be enabled. Same as today.
> > >
> 
> After reading your comments on patch08, I may have a clearer picture
> on your suggestion. The key is to handle exclusive access at the binding
> time (based on vdev->iommu_dev). Please see whether below makes 
> sense:
> 
> Shared sequence:
> 
> 1)  initialize the device with a parked fops;
> 2)  need binding (explicit or implicit) to move away from parked fops;
> 3)  switch to normal fops after successful binding;
> 
> 1) happens at device probe.

1 happens when the cdev is setup with the parked fops, yes. I'd say it
happens at fd open time.

> for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:
> 
>   - 2) is done by calling .bind_iommufd() callback;
>   - 3) could be done within .bind_iommufd(), or via a new callback e.g.
> .finalize_device(). The latter may be preferred for the group interface;
>   - Two threads may open the same device simultaneously, with exclusive 
> access guaranteed by iommufd_bind_device();
>   - Open() after successful binding is rejected, since normal fops has been
> activated. This is checked upon vdev->iommu_dev;

Almost, open is always successful, what fails is
VFIO_DEVICE_GET_IOMMUFD (or the group equivilant). The user ends up
with a FD that is useless, cannot reach the ops and thus cannot impact
the device it doesn't own in any way.

It is similar to opening a group FD

> for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:
> 
>   - 2) is done by open coding bind_iommufd + attach_ioas. Create an 
> iommufd_device object and record it to vdev->iommu_dev
>   - 3) is done by calling .finalize_device();
>   - open() after a valid vdev->iommu_dev is rejected. this also ensures
> exclusive ownership with the nongroup path.

Same comment as above, groups should go through the same sequence of
steps, create a FD, attempt to bind, if successuful make the FD
operational.

The only difference is that failure in these steps does not call
fd_install(). For this reason alone the FD could start out with
operational fops, but it feels like a needless optimization.

> If Alex also agrees with it, this might be another mini-series to be merged
> (just for group path) before this one. Doing so sort of nullifies the existing
> group/container attaching process, where attach_ioas will be skipped and
> now the security context is established when the device is opened.

I think it is really important to unify DMA exclusion model and lower
to the core iommu code. If there is a reason the exclusion must be
triggered on group fd open then the iommu core code should provide an
API to do that which interworks with the device API iommufd will work.

But I would start here because it is much simpler to understand..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-22 Thread Jason Gunthorpe via iommu
On Wed, Sep 22, 2021 at 01:07:11AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 8:55 AM
> > 
> > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > > The opened atomic is aweful. A newly created fd should start in a
> > > > state where it has a disabled fops
> > > >
> > > > The only thing the disabled fops can do is register the device to the
> > > > iommu fd. When successfully registered the device gets the normal fops.
> > > >
> > > > The registration steps should be done under a normal lock inside the
> > > > vfio_device. If a vfio_device is already registered then further
> > > > registration should fail.
> > > >
> > > > Getting the device fd via the group fd triggers the same sequence as
> > > > above.
> > > >
> > >
> > > Above works if the group interface is also connected to iommufd, i.e.
> > > making vfio type1 as a shim. In this case we can use the registration
> > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > today, then a new atomic is still necessary. This all depends on how
> > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > discussed here just adds another pound to the shim option...
> > 
> > No, it works the same either way, the group FD path is identical to
> > the normal FD path, it just triggers some of the state transitions
> > automatically internally instead of requiring external ioctls.
> > 
> > The device FDs starts disabled, an internal API binds it to the iommu
> > via open coding with the group API, and then the rest of the APIs can
> > be enabled. Same as today.
> > 
> 
> Still a bit confused. if vfio type1 also connects to iommufd, whether 
> the device is registered can be centrally checked based on whether
> an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at
> all, don't we still need introduce a new state (calling it 'opened' or
> 'registered') to protect the two interfaces? 

The "new state" is if the fops are pointing at the real fops or the
pre-fops, which in turn protects everything. You could imagine this as
some state in front of every fop call if you want.

> In this case what is the point of keeping device FD disabled even
> for the group path?

I have a feeling when you go through the APIs it will make sense to
have some symmetry here.

eg creating a device FD should have basically the same flow no matter
what triggers it, not confusing special cases where the group code
skips steps

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Wednesday, September 22, 2021 9:07 AM
> 
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 8:55 AM
> >
> > On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > > The opened atomic is aweful. A newly created fd should start in a
> > > > state where it has a disabled fops
> > > >
> > > > The only thing the disabled fops can do is register the device to the
> > > > iommu fd. When successfully registered the device gets the normal fops.
> > > >
> > > > The registration steps should be done under a normal lock inside the
> > > > vfio_device. If a vfio_device is already registered then further
> > > > registration should fail.
> > > >
> > > > Getting the device fd via the group fd triggers the same sequence as
> > > > above.
> > > >
> > >
> > > Above works if the group interface is also connected to iommufd, i.e.
> > > making vfio type1 as a shim. In this case we can use the registration
> > > status as the exclusive switch. But if we keep vfio type1 separate as
> > > today, then a new atomic is still necessary. This all depends on how
> > > we want to deal with vfio type1 and iommufd, and possibly what's
> > > discussed here just adds another pound to the shim option...
> >
> > No, it works the same either way, the group FD path is identical to
> > the normal FD path, it just triggers some of the state transitions
> > automatically internally instead of requiring external ioctls.
> >
> > The device FDs starts disabled, an internal API binds it to the iommu
> > via open coding with the group API, and then the rest of the APIs can
> > be enabled. Same as today.
> >

After reading your comments on patch08, I may have a clearer picture
on your suggestion. The key is to handle exclusive access at the binding
time (based on vdev->iommu_dev). Please see whether below makes 
sense:

Shared sequence:

1)  initialize the device with a parked fops;
2)  need binding (explicit or implicit) to move away from parked fops;
3)  switch to normal fops after successful binding;

1) happens at device probe.

for nongroup 2) and 3) are done together in VFIO_DEVICE_GET_IOMMUFD:

  - 2) is done by calling .bind_iommufd() callback;
  - 3) could be done within .bind_iommufd(), or via a new callback e.g.
.finalize_device(). The latter may be preferred for the group interface;
  - Two threads may open the same device simultaneously, with exclusive 
access guaranteed by iommufd_bind_device();
  - Open() after successful binding is rejected, since normal fops has been
activated. This is checked upon vdev->iommu_dev;

for group 2/3) are done together in VFIO_GROUP_GET_DEVICE_FD:

  - 2) is done by open coding bind_iommufd + attach_ioas. Create an 
iommufd_device object and record it to vdev->iommu_dev
  - 3) is done by calling .finalize_device();
  - open() after a valid vdev->iommu_dev is rejected. this also ensures
exclusive ownership with the nongroup path.

If Alex also agrees with it, this might be another mini-series to be merged
(just for group path) before this one. Doing so sort of nullifies the existing
group/container attaching process, where attach_ioas will be skipped and
now the security context is established when the device is opened.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:55 AM
> 
> On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > > The opened atomic is aweful. A newly created fd should start in a
> > > state where it has a disabled fops
> > >
> > > The only thing the disabled fops can do is register the device to the
> > > iommu fd. When successfully registered the device gets the normal fops.
> > >
> > > The registration steps should be done under a normal lock inside the
> > > vfio_device. If a vfio_device is already registered then further
> > > registration should fail.
> > >
> > > Getting the device fd via the group fd triggers the same sequence as
> > > above.
> > >
> >
> > Above works if the group interface is also connected to iommufd, i.e.
> > making vfio type1 as a shim. In this case we can use the registration
> > status as the exclusive switch. But if we keep vfio type1 separate as
> > today, then a new atomic is still necessary. This all depends on how
> > we want to deal with vfio type1 and iommufd, and possibly what's
> > discussed here just adds another pound to the shim option...
> 
> No, it works the same either way, the group FD path is identical to
> the normal FD path, it just triggers some of the state transitions
> automatically internally instead of requiring external ioctls.
> 
> The device FDs starts disabled, an internal API binds it to the iommu
> via open coding with the group API, and then the rest of the APIs can
> be enabled. Same as today.
> 

Still a bit confused. if vfio type1 also connects to iommufd, whether 
the device is registered can be centrally checked based on whether
an iommu_ctx is recorded. But if type1 doesn't talk to iommufd at
all, don't we still need introduce a new state (calling it 'opened' or
'registered') to protect the two interfaces? In this case what is the
point of keeping device FD disabled even for the group path?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Alex Williamson 
> Sent: Wednesday, September 22, 2021 3:56 AM
> 
> On Sun, 19 Sep 2021 14:38:30 +0800
> Liu Yi L  wrote:
> 
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> >
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> >
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/vfio.c  | 228 +++
> >  include/linux/vfio.h |   2 +
> >  2 files changed, 213 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 02cc51ce6891..84436d7abedd 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> ...
> > @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
> > .mode = S_IRUGO | S_IWUGO,
> >  };
> >
> > +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> > +{
> > +   return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > +}
> 
> dev_name() doesn't provide us with any uniqueness guarantees, so this
> could potentially generate naming conflicts.  The similar scheme for
> devices within an iommu group appends an instance number if a conflict
> occurs, but that solution doesn't work here where the name isn't just a
> link to the actual device.  Devices within an iommu group are also
> likely associated within a bus_type, so the potential for conflict is
> pretty negligible, that's not the case as vfio is adopted for new
> device types.  Thanks,
> 

This is also our concern. Thanks for confirming it. Appreciate if you
can help think out some better alternative to deal with it.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Tue, Sep 21, 2021 at 11:56:06PM +, Tian, Kevin wrote:
> > The opened atomic is aweful. A newly created fd should start in a
> > state where it has a disabled fops
> > 
> > The only thing the disabled fops can do is register the device to the
> > iommu fd. When successfully registered the device gets the normal fops.
> > 
> > The registration steps should be done under a normal lock inside the
> > vfio_device. If a vfio_device is already registered then further
> > registration should fail.
> > 
> > Getting the device fd via the group fd triggers the same sequence as
> > above.
> > 
> 
> Above works if the group interface is also connected to iommufd, i.e.
> making vfio type1 as a shim. In this case we can use the registration
> status as the exclusive switch. But if we keep vfio type1 separate as
> today, then a new atomic is still necessary. This all depends on how
> we want to deal with vfio type1 and iommufd, and possibly what's
> discussed here just adds another pound to the shim option...

No, it works the same either way, the group FD path is identical to
the normal FD path, it just triggers some of the state transitions
automatically internally instead of requiring external ioctls.

The device FDs starts disabled, an internal API binds it to the iommu
via open coding with the group API, and then the rest of the APIs can
be enabled. Same as today.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 21, 2021 11:57 PM
> 
> On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> > This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> > userspace to directly open a vfio device w/o relying on container/group
> > (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> > iommufd (more specifically in iommu core by this RFC) in a device-centric
> > manner.
> >
> > In case a device is exposed in both legacy and new interfaces (see next
> > patch for how to decide it), this patch also ensures that when the device
> > is already opened via one interface then the other one must be blocked.
> >
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/vfio.c  | 228 +++
> >  include/linux/vfio.h |   2 +
> >  2 files changed, 213 insertions(+), 17 deletions(-)
> 
> > +static int vfio_init_device_class(void)
> > +{
> > +   int ret;
> > +
> > +   mutex_init(_lock);
> > +   idr_init(_idr);
> > +
> > +   /* /dev/vfio/devices/$DEVICE */
> > +   vfio.device_class = class_create(THIS_MODULE, "vfio-device");
> > +   if (IS_ERR(vfio.device_class))
> > +   return PTR_ERR(vfio.device_class);
> > +
> > +   vfio.device_class->devnode = vfio_device_devnode;
> > +
> > +   ret = alloc_chrdev_region(_devt, 0, MINORMASK + 1,
> "vfio-device");
> > +   if (ret)
> > +   goto err_alloc_chrdev;
> > +
> > +   cdev_init(_cdev, _device_fops);
> > +   ret = cdev_add(_cdev, vfio.device_devt, MINORMASK +
> 1);
> > +   if (ret)
> > +   goto err_cdev_add;
> 
> Huh? This is not how cdevs are used. This patch needs rewriting.
> 
> The struct vfio_device should gain a 'struct device' and 'struct cdev'
> as non-pointer members
> 
> vfio register path should end up doing cdev_device_add() for each
> vfio_device
> 
> vfio_unregister path should do cdev_device_del()
> 
> No idr should be needed, an ida is used to allocate minor numbers
> 
> The struct device release function should trigger a kfree which
> requires some reworking of the callers
> 
> vfio_init_group_dev() should do a device_initialize()
> vfio_uninit_group_dev() should do a device_put()

All above are good suggestions!

> 
> The opened atomic is aweful. A newly created fd should start in a
> state where it has a disabled fops
> 
> The only thing the disabled fops can do is register the device to the
> iommu fd. When successfully registered the device gets the normal fops.
> 
> The registration steps should be done under a normal lock inside the
> vfio_device. If a vfio_device is already registered then further
> registration should fail.
> 
> Getting the device fd via the group fd triggers the same sequence as
> above.
> 

Above works if the group interface is also connected to iommufd, i.e.
making vfio type1 as a shim. In this case we can use the registration
status as the exclusive switch. But if we keep vfio type1 separate as
today, then a new atomic is still necessary. This all depends on how
we want to deal with vfio type1 and iommufd, and possibly what's
discussed here just adds another pound to the shim option...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Alex Williamson
On Sun, 19 Sep 2021 14:38:30 +0800
Liu Yi L  wrote:

> This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> userspace to directly open a vfio device w/o relying on container/group
> (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> iommufd (more specifically in iommu core by this RFC) in a device-centric
> manner.
> 
> In case a device is exposed in both legacy and new interfaces (see next
> patch for how to decide it), this patch also ensures that when the device
> is already opened via one interface then the other one must be blocked.
> 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/vfio.c  | 228 +++
>  include/linux/vfio.h |   2 +
>  2 files changed, 213 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 02cc51ce6891..84436d7abedd 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
...
> @@ -2295,6 +2436,52 @@ static struct miscdevice vfio_dev = {
>   .mode = S_IRUGO | S_IWUGO,
>  };
>  
> +static char *vfio_device_devnode(struct device *dev, umode_t *mode)
> +{
> + return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> +}

dev_name() doesn't provide us with any uniqueness guarantees, so this
could potentially generate naming conflicts.  The similar scheme for
devices within an iommu group appends an instance number if a conflict
occurs, but that solution doesn't work here where the name isn't just a
link to the actual device.  Devices within an iommu group are also
likely associated within a bus_type, so the potential for conflict is
pretty negligible, that's not the case as vfio is adopted for new
device types.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-09-21 Thread Jason Gunthorpe via iommu
On Sun, Sep 19, 2021 at 02:38:30PM +0800, Liu Yi L wrote:
> This patch introduces a new interface (/dev/vfio/devices/$DEVICE) for
> userspace to directly open a vfio device w/o relying on container/group
> (/dev/vfio/$GROUP). Anything related to group is now hidden behind
> iommufd (more specifically in iommu core by this RFC) in a device-centric
> manner.
> 
> In case a device is exposed in both legacy and new interfaces (see next
> patch for how to decide it), this patch also ensures that when the device
> is already opened via one interface then the other one must be blocked.
> 
> Signed-off-by: Liu Yi L 
> ---
>  drivers/vfio/vfio.c  | 228 +++
>  include/linux/vfio.h |   2 +
>  2 files changed, 213 insertions(+), 17 deletions(-)

> +static int vfio_init_device_class(void)
> +{
> + int ret;
> +
> + mutex_init(_lock);
> + idr_init(_idr);
> +
> + /* /dev/vfio/devices/$DEVICE */
> + vfio.device_class = class_create(THIS_MODULE, "vfio-device");
> + if (IS_ERR(vfio.device_class))
> + return PTR_ERR(vfio.device_class);
> +
> + vfio.device_class->devnode = vfio_device_devnode;
> +
> + ret = alloc_chrdev_region(_devt, 0, MINORMASK + 1, 
> "vfio-device");
> + if (ret)
> + goto err_alloc_chrdev;
> +
> + cdev_init(_cdev, _device_fops);
> + ret = cdev_add(_cdev, vfio.device_devt, MINORMASK + 1);
> + if (ret)
> + goto err_cdev_add;

Huh? This is not how cdevs are used. This patch needs rewriting.

The struct vfio_device should gain a 'struct device' and 'struct cdev'
as non-pointer members

vfio register path should end up doing cdev_device_add() for each
vfio_device

vfio_unregister path should do cdev_device_del()

No idr should be needed, an ida is used to allocate minor numbers

The struct device release function should trigger a kfree which
requires some reworking of the callers

vfio_init_group_dev() should do a device_initialize()
vfio_uninit_group_dev() should do a device_put()

The opened atomic is aweful. A newly created fd should start in a
state where it has a disabled fops

The only thing the disabled fops can do is register the device to the
iommu fd. When successfully registered the device gets the normal fops.

The registration steps should be done under a normal lock inside the
vfio_device. If a vfio_device is already registered then further
registration should fail.

Getting the device fd via the group fd triggers the same sequence as
above.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu