from:"Liu, Yi L"

Re: [Intel-gfx] [PATCH v13 21/22] vfio: Compile vfio_group infrastructure optionally

2023-07-17 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, July 18, 2023 2:46 AM
> 
> On Mon, 17 Jul 2023 08:08:59 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Liu, Yi L 
> > > Sent: Monday, July 17, 2023 2:36 PM
> > >
> > > > From: Liu, Yi L 
> > > > Sent: Friday, June 16, 2023 5:40 PM
> > > >
> > > > vfio_group is not needed for vfio device cdev, so with vfio device cdev
> > > > introduced, the vfio_group infrastructures can be compiled out if only
> > > > cdev is needed.
> > > >
> > > > Tested-by: Nicolin Chen 
> > > > Tested-by: Matthew Rosato 
> > > > Tested-by: Yanting Jiang 
> > > > Tested-by: Shameer Kolothum 
> > > > Tested-by: Terrence Xu 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/iommu/iommufd/Kconfig |  4 +-
> > > >  drivers/vfio/Kconfig  | 15 ++
> > > >  drivers/vfio/Makefile |  2 +-
> > > >  drivers/vfio/vfio.h   | 89 ---
> > > >  include/linux/vfio.h  | 25 --
> > > >  5 files changed, 123 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/drivers/iommu/iommufd/Kconfig 
> > > > b/drivers/iommu/iommufd/Kconfig
> > > > index ada693ea51a7..99d4b075df49 100644
> > > > --- a/drivers/iommu/iommufd/Kconfig
> > > > +++ b/drivers/iommu/iommufd/Kconfig
> > > > @@ -14,8 +14,8 @@ config IOMMUFD
> > > >  if IOMMUFD
> > > >  config IOMMUFD_VFIO_CONTAINER
> > > > bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> > > > -   depends on VFIO && !VFIO_CONTAINER
> > > > -   default VFIO && !VFIO_CONTAINER
> > > > +   depends on VFIO_GROUP && !VFIO_CONTAINER
> > > > +   default VFIO_GROUP && !VFIO_CONTAINER
> > >
> > > Hi Alex, Jason,
> > >
> > > I found a minor nit on the kconfig. The below configuration is valid.
> > > But user cannot use vfio directly as there is no /dev/vfio/vfio. Although
> > > user can open /dev/iommu instead. This is not good.
> > >
> > > CONFIG_IOMMUFD=y
> > > CONFIG_VFIO_DEVICE_CDEv=n
> > > CONFIG_VFIO_GROUP=y
> > > CONFIG_VFIO_CONTAINER=n
> > > CONFIG_IOMMUFD_VFIO_CONTAINER=n
> > >
> > > So need to have the below change. I'll incorporate this change in
> > > this series after your ack.
> > >
> > > diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> > > index 99d4b075df49..d675c96c2bbb 100644
> > > --- a/drivers/iommu/iommufd/Kconfig
> > > +++ b/drivers/iommu/iommufd/Kconfig
> > > @@ -14,8 +14,8 @@ config IOMMUFD
> > >  if IOMMUFD
> > >  config IOMMUFD_VFIO_CONTAINER
> > >   bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> > > - depends on VFIO_GROUP && !VFIO_CONTAINER
> > > - default VFIO_GROUP && !VFIO_CONTAINER
> > > + depends on VFIO_GROUP
> > > + default n
> > >   help
> > > IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> > > IOMMUFD providing compatibility emulation to give the same ioctls.
> > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > > index 6bda6dbb4878..ee3bbad6beb8 100644
> > > --- a/drivers/vfio/Kconfig
> > > +++ b/drivers/vfio/Kconfig
> > > @@ -6,7 +6,7 @@ menuconfig VFIO
> > >   select INTERVAL_TREE
> > >   select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> > >   select VFIO_DEVICE_CDEV if !VFIO_GROUP
> > > - select VFIO_CONTAINER if IOMMUFD=n
> > > + select VFIO_CONTAINER if IOMMUFD_VFIO_CONTAINER=n
> > >   help
> > > VFIO provides a framework for secure userspace device drivers.
> > > See Documentation/driver-api/vfio.rst for more details.
> > >
> >
> > Just realized that it is possible to config both VFIO_CONTAINER and
> > IOMMUFD_VFIO_CONTAINER to "y". Then there will be a conflict when
> > registering /dev/vfio/vfio. Any suggestion?
> 
> This is only an issue with the proposed change, right?

Yes.

>  I agree with
> Jason, removing /dev/vfio/vfio entirely should be possible.  That's
> actually our ultimate goal, but obviously it breaks current userspace
> depending on vfio container compatibility.  It's a configuration error,
> not a Kconfig error if someone finds themselves w

Re: [Intel-gfx] [PATCH v13 21/22] vfio: Compile vfio_group infrastructure optionally

2023-07-17 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Monday, July 17, 2023 8:34 PM
> 
> On Mon, Jul 17, 2023 at 06:36:19AM +0000, Liu, Yi L wrote:
> > Hi Alex, Jason,
> >
> > I found a minor nit on the kconfig. The below configuration is valid.
> > But user cannot use vfio directly as there is no /dev/vfio/vfio. Although
> > user can open /dev/iommu instead. This is not good.
> >
> > CONFIG_IOMMUFD=y
> > CONFIG_VFIO_DEVICE_CDEv=n
> > CONFIG_VFIO_GROUP=y
> > CONFIG_VFIO_CONTAINER=n
> > CONFIG_IOMMUFD_VFIO_CONTAINER=n
> >
> > So need to have the below change. I'll incorporate this change in
> > this series after your ack.
> 
> This is fine, we decided to allow this in the original vfio series. It
> is usable in that you have to use iommufd natively with the group
> interface. It won't be backwards compatible with current userspace.

Sure. I'll keep the current code.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v13 21/22] vfio: Compile vfio_group infrastructure optionally

2023-07-17 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Monday, July 17, 2023 2:36 PM
> 
> > From: Liu, Yi L 
> > Sent: Friday, June 16, 2023 5:40 PM
> >
> > vfio_group is not needed for vfio device cdev, so with vfio device cdev
> > introduced, the vfio_group infrastructures can be compiled out if only
> > cdev is needed.
> >
> > Tested-by: Nicolin Chen 
> > Tested-by: Matthew Rosato 
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Tested-by: Terrence Xu 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/Kconfig |  4 +-
> >  drivers/vfio/Kconfig  | 15 ++
> >  drivers/vfio/Makefile |  2 +-
> >  drivers/vfio/vfio.h   | 89 ---
> >  include/linux/vfio.h  | 25 --
> >  5 files changed, 123 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> > index ada693ea51a7..99d4b075df49 100644
> > --- a/drivers/iommu/iommufd/Kconfig
> > +++ b/drivers/iommu/iommufd/Kconfig
> > @@ -14,8 +14,8 @@ config IOMMUFD
> >  if IOMMUFD
> >  config IOMMUFD_VFIO_CONTAINER
> > bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> > -   depends on VFIO && !VFIO_CONTAINER
> > -   default VFIO && !VFIO_CONTAINER
> > +   depends on VFIO_GROUP && !VFIO_CONTAINER
> > +   default VFIO_GROUP && !VFIO_CONTAINER
> 
> Hi Alex, Jason,
> 
> I found a minor nit on the kconfig. The below configuration is valid.
> But user cannot use vfio directly as there is no /dev/vfio/vfio. Although
> user can open /dev/iommu instead. This is not good.
> 
> CONFIG_IOMMUFD=y
> CONFIG_VFIO_DEVICE_CDEv=n
> CONFIG_VFIO_GROUP=y
> CONFIG_VFIO_CONTAINER=n
> CONFIG_IOMMUFD_VFIO_CONTAINER=n
> 
> So need to have the below change. I'll incorporate this change in
> this series after your ack.
> 
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> index 99d4b075df49..d675c96c2bbb 100644
> --- a/drivers/iommu/iommufd/Kconfig
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -14,8 +14,8 @@ config IOMMUFD
>  if IOMMUFD
>  config IOMMUFD_VFIO_CONTAINER
>   bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> - depends on VFIO_GROUP && !VFIO_CONTAINER
> - default VFIO_GROUP && !VFIO_CONTAINER
> + depends on VFIO_GROUP
> + default n
>   help
> IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> IOMMUFD providing compatibility emulation to give the same ioctls.
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 6bda6dbb4878..ee3bbad6beb8 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -6,7 +6,7 @@ menuconfig VFIO
>   select INTERVAL_TREE
>   select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
>   select VFIO_DEVICE_CDEV if !VFIO_GROUP
> - select VFIO_CONTAINER if IOMMUFD=n
> + select VFIO_CONTAINER if IOMMUFD_VFIO_CONTAINER=n
>   help
> VFIO provides a framework for secure userspace device drivers.
> See Documentation/driver-api/vfio.rst for more details.
> 

Just realized that it is possible to config both VFIO_CONTAINER and
IOMMUFD_VFIO_CONTAINER to "y". Then there will be a conflict when
registering /dev/vfio/vfio. Any suggestion?

Regards,
Yi Liu

> > help
> >   IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> >   IOMMUFD providing compatibility emulation to give the same ioctls.
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index 1cab8e4729de..35ab8ab87688 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -4,6 +4,8 @@ menuconfig VFIO
> > select IOMMU_API
> > depends on IOMMUFD || !IOMMUFD
> > select INTERVAL_TREE
> > +   select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> > +   select VFIO_DEVICE_CDEV if !VFIO_GROUP
> > select VFIO_CONTAINER if IOMMUFD=n
> 
> This should be " select VFIO_CONTAINER if IOMMUFD_VFIO_CONTAINER=n"
> 
> Regards,
> Yi Liu
> 
> > help
> >   VFIO provides a framework for secure userspace device drivers.
> > @@ -15,6 +17,7 @@ if VFIO
> >  config VFIO_DEVICE_CDEV
> > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> > depends on IOMMUFD
> > +   default !VFIO_GROUP
> > help
> >   The VFIO device cdev is another way for userspace to get device
> >   access. Userspace gets device fd by opening device cdev under
> > @@ -24

Re: [Intel-gfx] [PATCH v13 21/22] vfio: Compile vfio_group infrastructure optionally

2023-07-17 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Friday, June 16, 2023 5:40 PM
> 
> vfio_group is not needed for vfio device cdev, so with vfio device cdev
> introduced, the vfio_group infrastructures can be compiled out if only
> cdev is needed.
> 
> Tested-by: Nicolin Chen 
> Tested-by: Matthew Rosato 
> Tested-by: Yanting Jiang 
> Tested-by: Shameer Kolothum 
> Tested-by: Terrence Xu 
> Signed-off-by: Yi Liu 
> ---
>  drivers/iommu/iommufd/Kconfig |  4 +-
>  drivers/vfio/Kconfig  | 15 ++
>  drivers/vfio/Makefile |  2 +-
>  drivers/vfio/vfio.h   | 89 ---
>  include/linux/vfio.h  | 25 --
>  5 files changed, 123 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> index ada693ea51a7..99d4b075df49 100644
> --- a/drivers/iommu/iommufd/Kconfig
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -14,8 +14,8 @@ config IOMMUFD
>  if IOMMUFD
>  config IOMMUFD_VFIO_CONTAINER
>   bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
> - depends on VFIO && !VFIO_CONTAINER
> - default VFIO && !VFIO_CONTAINER
> + depends on VFIO_GROUP && !VFIO_CONTAINER
> + default VFIO_GROUP && !VFIO_CONTAINER

Hi Alex, Jason,

I found a minor nit on the kconfig. The below configuration is valid.
But user cannot use vfio directly as there is no /dev/vfio/vfio. Although
user can open /dev/iommu instead. This is not good.

CONFIG_IOMMUFD=y
CONFIG_VFIO_DEVICE_CDEv=n
CONFIG_VFIO_GROUP=y
CONFIG_VFIO_CONTAINER=n
CONFIG_IOMMUFD_VFIO_CONTAINER=n

So need to have the below change. I'll incorporate this change in
this series after your ack.

diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
index 99d4b075df49..d675c96c2bbb 100644
--- a/drivers/iommu/iommufd/Kconfig
+++ b/drivers/iommu/iommufd/Kconfig
@@ -14,8 +14,8 @@ config IOMMUFD
 if IOMMUFD
 config IOMMUFD_VFIO_CONTAINER
bool "IOMMUFD provides the VFIO container /dev/vfio/vfio"
-   depends on VFIO_GROUP && !VFIO_CONTAINER
-   default VFIO_GROUP && !VFIO_CONTAINER
+   depends on VFIO_GROUP
+   default n
help
  IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
  IOMMUFD providing compatibility emulation to give the same ioctls.
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 6bda6dbb4878..ee3bbad6beb8 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -6,7 +6,7 @@ menuconfig VFIO
select INTERVAL_TREE
select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
select VFIO_DEVICE_CDEV if !VFIO_GROUP
-   select VFIO_CONTAINER if IOMMUFD=n
+   select VFIO_CONTAINER if IOMMUFD_VFIO_CONTAINER=n
help
  VFIO provides a framework for secure userspace device drivers.
  See Documentation/driver-api/vfio.rst for more details.

>   help
> IOMMUFD will provide /dev/vfio/vfio instead of VFIO. This relies on
> IOMMUFD providing compatibility emulation to give the same ioctls.
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 1cab8e4729de..35ab8ab87688 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -4,6 +4,8 @@ menuconfig VFIO
>   select IOMMU_API
>   depends on IOMMUFD || !IOMMUFD
>   select INTERVAL_TREE
> + select VFIO_GROUP if SPAPR_TCE_IOMMU || IOMMUFD=n
> + select VFIO_DEVICE_CDEV if !VFIO_GROUP
>   select VFIO_CONTAINER if IOMMUFD=n

This should be " select VFIO_CONTAINER if IOMMUFD_VFIO_CONTAINER=n"

Regards,
Yi Liu

>   help
> VFIO provides a framework for secure userspace device drivers.
> @@ -15,6 +17,7 @@ if VFIO
>  config VFIO_DEVICE_CDEV
>   bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
>   depends on IOMMUFD
> + default !VFIO_GROUP
>   help
> The VFIO device cdev is another way for userspace to get device
> access. Userspace gets device fd by opening device cdev under
> @@ -24,9 +27,20 @@ config VFIO_DEVICE_CDEV
> 
> If you don't know what to do here, say N.
> 
> +config VFIO_GROUP
> + bool "Support for the VFIO group /dev/vfio/$group_id"
> + default y
> + help
> +VFIO group support provides the traditional model for accessing
> +devices through VFIO and is used by the majority of userspace
> +applications and drivers making use of VFIO.
> +
> +If you don't know what to do here, say Y.
> +
>  config VFIO_CONTAINER
>   bool "Support for the VFIO container /dev/vfio/vfio"
>   select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
> + depends on VFIO_GROUP
>

Re: [Intel-gfx] [PATCH v14 22/26] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-07-15 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Saturday, July 15, 2023 9:05 PM
> 
> On Sat, Jul 15, 2023 at 04:16:52AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Friday, July 14, 2023 10:42 PM
> > >
> > > On Mon, Jul 10, 2023 at 07:59:24PM -0700, Yi Liu wrote:
> > >
> > > > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file 
> > > > *df,
> > > > + struct 
> > > > vfio_device_bind_iommufd __user
> > > *arg)
> > > > +{
> > > > +   return -EOPNOTSUPP;
> > > > +}
> > >
> > > This should be -ENOTTY
> >
> > Okay. Since there are quite a few stub functions in the drivers/vfio/vfio.h.
> > Let me check the rule. All the stub functions should return -ENOTTY in
> > the !IS_ENABLED(CONFIG_XXX) case, if the function returns int., is
> > it?
> 
> No, just ioctl returns ENOTTY, so really just this function.

Ok. I see.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v9 09/10] vfio/pci: Copy hot-reset device info to userspace in the devices loop

2023-07-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Friday, July 14, 2023 9:37 PM
> 
> On Mon, Jul 10, 2023 at 07:31:25PM -0700, Yi Liu wrote:
> 
> > @@ -1311,29 +1296,17 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
> > ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_fill_devs,
> > , slot);
> > mutex_unlock(>vdev.dev_set->lock);
> > +   if (ret)
> > +   return ret;
> >
> > -   /*
> > -* If a device was removed between counting and filling, we may come up
> > -* short of fill.max.  If a device was added, we'll have a return of
> > -* -EAGAIN above.
> > -*/
> > -   if (!ret) {
> > -   hdr.count = fill.cur;
> > -   hdr.flags = fill.flags;
> > -   }
> > -
> > -reset_info_exit:
> > +   hdr.count = fill.count;
> > +   hdr.flags = fill.flags;
> > if (copy_to_user(arg, , minsz))
> > -   ret = -EFAULT;
> > -
> > -   if (!ret) {
> > -   if (copy_to_user(>devices, devices,
> > -hdr.count * sizeof(*devices)))
> > -   ret = -EFAULT;
> > -   }
> > +   return -EFAULT;
> >
> > -   kfree(devices);
> > -   return ret;
> > +   if (fill.count != fill.devices - arg->devices)
> > +   return -ENOSPC;
> 
> This should be > right? The previous code returned ENOSPC only if
> their were more devices than requested, not less.

Yes. it is.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v14 22/26] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-07-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Friday, July 14, 2023 10:42 PM
> 
> On Mon, Jul 10, 2023 at 07:59:24PM -0700, Yi Liu wrote:
> 
> > +static inline long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > + struct vfio_device_bind_iommufd 
> > __user
> *arg)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> 
> This should be -ENOTTY

Okay. Since there are quite a few stub functions in the drivers/vfio/vfio.h.
Let me check the rule. All the stub functions should return -ENOTTY in
the !IS_ENABLED(CONFIG_XXX) case, if the function returns int., is it?

> > @@ -1149,6 +1151,9 @@ static long vfio_device_fops_unl_ioctl(struct file 
> > *filep,
> > void __user *uptr = (void __user *)arg;
> > int ret;
> >
> > +   if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
> > +   return vfio_df_ioctl_bind_iommufd(df, uptr);
> > +
> 
> And this function has a mistake too:
> 
>   default:
>   if (unlikely(!device->ops->ioctl))
>   ret = -EINVAL;
> 
> Should also be -ENOTTY
> 
> All the implementations of the ops already return -ENOTTY
> 
> However, I think this is all slightly not quite right, the proper
> return code is supposed to be ENOIOCTLCMD which vfs_ioctl() then
> translates into ENOTTY for some reason..
> 
> It looks Ok otherwise

This is not in the scope of this series. May need a separate fix patch. @Alex?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v14 20/26] iommufd: Add iommufd_ctx_from_fd()

2023-07-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Friday, July 14, 2023 10:21 PM
> 
> On Mon, Jul 10, 2023 at 07:59:22PM -0700, Yi Liu wrote:
> > It's common to get a reference to the iommufd context from a given file
> > descriptor. So adds an API for it. Existing users of this API are compiled
> > only when IOMMUFD is enabled, so no need to have a stub for the IOMMUFD
> > disabled case.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/main.c | 23 +++
> >  include/linux/iommufd.h  |  1 +
> >  2 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> > index 32ce7befc8dd..e99a338d4fdf 100644
> > --- a/drivers/iommu/iommufd/main.c
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -377,6 +377,29 @@ struct iommufd_ctx *iommufd_ctx_from_file(struct file 
> > *file)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_ctx_from_file, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_from_fd - Acquires a reference to the iommufd context
> > + * @fd: File descriptor to obtain the reference from
> > + *
> > + * Returns a pointer to the iommufd_ctx, otherwise ERR_PTR. On success
> > + * the caller is responsible to call iommufd_ctx_put().
> > + */
> > +struct iommufd_ctx *iommufd_ctx_from_fd(int fd)
> > +{
> > +   struct iommufd_ctx *iommufd;
> > +   struct fd f;
> > +
> > +   f = fdget(fd);
> > +   if (!f.file)
> > +   return ERR_PTR(-EBADF);
> > +
> > +   iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +   fdput(f);
> > +   return iommufd;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_from_fd, IOMMUFD);
> 
> This is a little wonky since iommufd_ctx_from_file() also obtains a
> reference

Yes. that's why need fdput() always.

> Just needs to be like this:
> 
> struct iommufd_ctx *iommufd_ctx_from_fd(int fd)
> {
>   struct file *file;
> 
>   file = fget(fd);
>   if (!file)
>   return ERR_PTR(-EBADF);
> 
>   if (file->f_op != _fops) {
>   fput(file);
>   return ERR_PTR(-EBADFD);
>   }
>   /* fget is the same as iommufd_ctx_get() */
>   return file->private_data;
> }
> EXPORT_SYMBOL_NS_GPL(iommufd_ctx_from_fd, IOMMUFD);

This one looks ok to me. Thanks.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-28 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 28, 2023 10:34 PM
> 
> On Mon, Jun 26, 2023 at 02:51:29PM +0000, Liu, Yi L wrote:
> > > > >
> > > > > Ah, any suggestion on the naming? How about
> vfio_device_get_kvm_safe_locked()?
> > > >
> > > > I thought you were using _df_ now for these functions?
> > > >
> > >
> > > I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> > > test the df->kvm within it.  Hence rename it to be _df_. I think group
> > > path should be ok with this change as well. Let me make it.
> >
> > To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
> > the below changes to the group path. If just wants to test null kvm in the
> > _vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
> > and just move the null kvm test within this function. Is it?
> 
> This does seem a bit nicer, yes
> 
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8a9ebcc6980b..4e6ea2943d28 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device 
> > *device)
> >  EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
> >
> >  #ifdef CONFIG_HAVE_KVM
> > -void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
> > +void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
> 
> But still avoid the leading _ here

Ok, I'll move the kvm pointer test to _vfio_device_get_kvm_safe()
And also rename it as vfio_device_get_kvm_safe()

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v13 22/22] docs: vfio: Add vfio device cdev description

2023-06-27 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, June 28, 2023 1:35 AM

[The Cc list gets broken in the reply from Alex to Jason, here I reply to
Alex's email with the Cc list fixed. @Alex, seems like the same symptom
with last time, do you have any idea on it?]

> On Tue, 27 Jun 2023 13:12:14 -0300
> Jason Gunthorpe  wrote:
> 
> > On Tue, Jun 27, 2023 at 08:54:33AM +, Liu, Yi L wrote:
> > > > From: Alex Williamson 
> > > > Sent: Thursday, June 22, 2023 5:54 AM
> > > >
> > > > On Fri, 16 Jun 2023 02:39:46 -0700
> > > > Yi Liu  wrote:
> > >
> > > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > > +Hence those modules can be fully compiled out in an environment
> > > > > +where no legacy VFIO application exists.
> > > > > +
> > > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support 
> > > > > device
> > > > > +cdev either.
> > > >
> > > > Why isn´t this enforced via Kconfig?  At the vfio level we could simply
> > > > add the following in patch 17/:
> > > >
> > > > config VFIO_DEVICE_CDEV
> > > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> > > > depends on IOMMUFD && !SPAPR_TCE_IOMMU
> > > >^^^
> > > >
> > > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and
> > > > the existing Kconfig options would exclude it.  If we know it doesn't
> > > > work, let's not put the burden on the user to figure that out.  A
> > > > follow-up patch for this would be fine if there's no other reason to
> > > > respin the series.
> > >
> > > @Jason,
> > > How about your opinion? Seems reasonable to make IOMMUFD
> > > depend on !SPAPR_TCE_IOMMU. Is it?
> >
> > The right kconfig would be to list all the iommu drivers that can
> > support iommufd and allow it to be selected if any of them are
> > enabled.
> >
> > This seems too complex to bother with, so I like Alex's version above..
> >
> > > > Otherwise the series is looking pretty good to me.  It still requires
> > > > some reviews/acks in the iommufd space and it would be good to see more
> > > > reviews for the remainder given the amount of collaboration here.
> > > >
> > > > I'm out for the rest of the week, but I'll leave open accepting this
> > > > and the hot-reset series next week for the merge window.  Thanks,
> > >
> > > @Alex,
> > > Given Jason's remarks on cdev v12, I've already got a new version as 
> > > below.
> > > I can post it once the above kconfig open is closed.
> >
> > I think we don't need to bend the rules, Linus would not be happy to
> > see 30 major patches that never hit linux-next at all.
> >
> > I'm happy if we put it on a branch at RC1 and merge it to the vfio &
> > iommufd trees, it is functionally the same outcome in the same time
> > frame.
> 
> Not sure I'm clear on the plan.  My intention would have been to apply
> v14 to my next branch, make sure it did see linux-next exposure,
> and send a pull request for rc1 next week.
> 
> Are you suggesting a post-merge-window pull request for v6.5 (also
> frowned on) or are you suggesting that it simmers in both our next
> branches until v6.6?  Thanks,

It appears to me the latter one. When 6.5-rc1 is released, we immediately
apply the hot-reset and cdev series onto it and put it in a shared tree to
assist the other iommufd feature development (e.g. nesting). Jason, is it?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v13 22/22] docs: vfio: Add vfio device cdev description

2023-06-27 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 28, 2023 12:12 AM
> 
> On Tue, Jun 27, 2023 at 08:54:33AM +0000, Liu, Yi L wrote:
> > > From: Alex Williamson 
> > > Sent: Thursday, June 22, 2023 5:54 AM
> > >
> > > On Fri, 16 Jun 2023 02:39:46 -0700
> > > Yi Liu  wrote:
> >
> > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > +Hence those modules can be fully compiled out in an environment
> > > > +where no legacy VFIO application exists.
> > > > +
> > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > +cdev either.
> > >
> > > Why isn´t this enforced via Kconfig?  At the vfio level we could simply
> > > add the following in patch 17/:
> > >
> > > config VFIO_DEVICE_CDEV
> > > bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> > > depends on IOMMUFD && !SPAPR_TCE_IOMMU
> > >^^^
> > >

Proposal A.

> > > Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and
> > > the existing Kconfig options would exclude it.  If we know it doesn't
> > > work, let's not put the burden on the user to figure that out.  A
> > > follow-up patch for this would be fine if there's no other reason to
> > > respin the series.

Proposal B.

> >
> > @Jason,
> > How about your opinion? Seems reasonable to make IOMMUFD
> > depend on !SPAPR_TCE_IOMMU. Is it?
> 
> The right kconfig would be to list all the iommu drivers that can
> support iommufd and allow it to be selected if any of them are
> enabled.
> 
> This seems too complex to bother with, so I like Alex's version above..

Sorry, I'm not quite clear. Alex has two proposals above (A and B). Which
one do you mean? It looks like you prefer A. is it? :-)

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v13 22/22] docs: vfio: Add vfio device cdev description

2023-06-27 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, June 22, 2023 5:54 AM
> 
> On Fri, 16 Jun 2023 02:39:46 -0700
> Yi Liu  wrote:

> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > +cdev either.
> 
> Why isn´t this enforced via Kconfig?  At the vfio level we could simply
> add the following in patch 17/:
> 
> config VFIO_DEVICE_CDEV
> bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
> depends on IOMMUFD && !SPAPR_TCE_IOMMU
>^^^
> 
> Or if Jason wants, IOMMUFD could depend on !SPAPR_TCE_IOMMU for now and
> the existing Kconfig options would exclude it.  If we know it doesn't
> work, let's not put the burden on the user to figure that out.  A
> follow-up patch for this would be fine if there's no other reason to
> respin the series.

@Jason,
How about your opinion? Seems reasonable to make IOMMUFD
depend on !SPAPR_TCE_IOMMU. Is it?

> Otherwise the series is looking pretty good to me.  It still requires
> some reviews/acks in the iommufd space and it would be good to see more
> reviews for the remainder given the amount of collaboration here.
> 
> I'm out for the rest of the week, but I'll leave open accepting this
> and the hot-reset series next week for the merge window.  Thanks,

@Alex,
Given Jason's remarks on cdev v12, I've already got a new version as below.
I can post it once the above kconfig open is closed.

https://github.com/yiliu1765/iommufd/tree/wip/vfio_device_cdev_v14

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-26 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Monday, June 26, 2023 9:35 PM
> 
> > From: Jason Gunthorpe 
> > Sent: Monday, June 26, 2023 8:56 PM
> >
> > On Mon, Jun 26, 2023 at 08:34:26AM +, Liu, Yi L wrote:
> > > > From: Jason Gunthorpe 
> > > > Sent: Saturday, June 24, 2023 12:15 AM
> > >
> > > > >  }
> > > > >
> > > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > > +{
> > > > > + spin_lock(>kvm_ref_lock);
> > > > > + if (df->kvm)
> > > > > + _vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > > + spin_unlock(>kvm_ref_lock);
> > > > > +}
> > > >
> > > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > > looks like it can..
> > > >
> > > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > > will save a few lines.
> > > >
> > > > Also shouldn't be called _vfio_device...
> > >
> > > Ah, any suggestion on the naming? How about 
> > > vfio_device_get_kvm_safe_locked()?
> >
> > I thought you were using _df_ now for these functions?
> >
> 
> I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
> test the df->kvm within it.  Hence rename it to be _df_. I think group
> path should be ok with this change as well. Let me make it.

To pass vfio_device_file to _vfio_device_get_kvm_safe(), needs to make
the below changes to the group path. If just wants to test null kvm in the
_vfio_device_get_kvm_safe(), maybe simpler to keep the current parameters
and just move the null kvm test within this function. Is it?

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e880c15c44 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -157,15 +157,15 @@ static int vfio_group_ioctl_set_container(struct 
vfio_group *group,
return ret;
 }
 
-static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
+static void vfio_device_group_get_kvm_safe(struct vfio_device_file *df)
 {
-   spin_lock(>group->kvm_ref_lock);
-   if (!device->group->kvm)
-   goto unlock;
-
-   _vfio_device_get_kvm_safe(device, device->group->kvm);
+   struct vfio_device *device = df->device;
 
-unlock:
+   spin_lock(>group->kvm_ref_lock);
+   spin_lock(>kvm_ref_lock);
+   df->kvm = device->group->kvm;
+   _vfio_df_get_kvm_safe(df);
+   spin_unlock(>kvm_ref_lock);
spin_unlock(>group->kvm_ref_lock);
 }
 
@@ -189,7 +189,7 @@ static int vfio_df_group_open(struct vfio_device_file *df)
 * the pointer in the device for use by drivers.
 */
if (device->open_count == 0)
-   vfio_device_group_get_kvm_safe(device);
+   vfio_device_group_get_kvm_safe(df);
 
df->iommufd = device->group->iommufd;
if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count 
== 0) {
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index fb8f2fac3d23..066766d43bdc 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -340,11 +340,10 @@ enum { vfio_noiommu = false };
 #endif
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm);
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df);
 void vfio_device_put_kvm(struct vfio_device *device);
 #else
-static inline void _vfio_device_get_kvm_safe(struct vfio_device *device,
-struct kvm *kvm)
+static inline void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
 }
 
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8a9ebcc6980b..4e6ea2943d28 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -373,14 +373,22 @@ void vfio_unregister_group_dev(struct vfio_device *device)
 EXPORT_SYMBOL_GPL(vfio_unregister_group_dev);
 
 #ifdef CONFIG_HAVE_KVM
-void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm)
+void _vfio_df_get_kvm_safe(struct vfio_device_file *df)
 {
+   struct vfio_device *device = df->device;
void (*pfn)(struct kvm *kvm);
bool (*fn)(struct kvm *kvm);
+   struct kvm *kvm;
bool ret;
 
+   lockdep_assert_held(>kvm_ref_lock);
lockdep_assert_held(>dev_set->lock);
 
+   kvm = df->kvm;
+
+   if (!kvm)
+   return;
+
pfn = symbol_get(kvm_put_kvm);
if (WARN_ON(!pfn))
return;

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-26 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Monday, June 26, 2023 8:56 PM
> 
> On Mon, Jun 26, 2023 at 08:34:26AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Saturday, June 24, 2023 12:15 AM
> >
> > > >  }
> > > >
> > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > +{
> > > > +   spin_lock(>kvm_ref_lock);
> > > > +   if (df->kvm)
> > > > +   _vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > +   spin_unlock(>kvm_ref_lock);
> > > > +}
> > >
> > > I'm surprised symbol_get() can be called from a spinlock, but it sure
> > > looks like it can..
> > >
> > > Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> > > will save a few lines.
> > >
> > > Also shouldn't be called _vfio_device...
> >
> > Ah, any suggestion on the naming? How about 
> > vfio_device_get_kvm_safe_locked()?
> 
> I thought you were using _df_ now for these functions?
> 

I see. Your point is passing df to _vfio_device_get_kvm_safe(() and
test the df->kvm within it.  Hence rename it to be _df_. I think group
path should be ok with this change as well. Let me make it.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-26 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Saturday, June 24, 2023 12:15 AM

> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +   spin_lock(>kvm_ref_lock);
> > +   if (df->kvm)
> > +   _vfio_device_get_kvm_safe(df->device, df->kvm);
> > +   spin_unlock(>kvm_ref_lock);
> > +}
> 
> I'm surprised symbol_get() can be called from a spinlock, but it sure
> looks like it can..
> 
> Also moving the if kvm is null test into _vfio_device_get_kvm_safe()
> will save a few lines.
> 
> Also shouldn't be called _vfio_device...

Ah, any suggestion on the naming? How about vfio_device_get_kvm_safe_locked()?

Regards,
Yi Liu

Re: [Intel-gfx] ✗ Fi.CI.BUILD: git am --abort

2023-06-18 Thread Liu, Yi L

Maybe you could use the below branch? It’s based on Alex’s vfio next branch.

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v13
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

From: philly j 
Sent: Monday, June 19, 2023 4:56 AM
To: Intel-Gfx 
Cc: Liu, Yi L 
Subject: Re: [Intel-gfx] ✗ Fi.CI.BUILD: git am --abort

"git am --abort"

On Jun 16, 2023 at 10:03 AM, 
mailto:patchw...@emeril.freedesktop.org>> wrote:

 == Series Details == Series: Add vfio_device cdev for iommufd support (rev17) 
URL : https://patchwork.freedesktop.org/series/113696/ State : failure == 
Summary == Error: patch 
https://patchwork.freedesktop.org/api/1.0/series/113696/revisions/17/mbox/ not 
applied Applying: vfio: Allocate per device file structure Applying: vfio: 
Refine vfio file kAPIs for KVM Applying: vfio: Accept vfio device file in the 
KVM facing kAPI Applying: kvm/vfio: Prepare for accepting vfio device fd 
Applying: kvm/vfio: Accept vfio device file from userspace Applying: vfio: Pass 
struct vfio_device_file * to vfio_device_open/close() Applying: vfio: Block 
device access via device fd until device is opened Applying: vfio: Add 
cdev_device_open_cnt to vfio_group Applying: vfio: Make vfio_df_open() single 
open for device cdev path Applying: vfio-iommufd: Move noiommu compat 
validation out of vfio_iommufd_bind() Applying: vfio-iommufd: Split bind/attach 
into two steps Applying: vfio: Record devid in vfio_device_file Applying: 
vfio-iommufd: Add detach_ioas support for physical VFIO devices Applying: 
iommufd/device: Add iommufd_access_detach() API Applying: vfio-iommufd: Add 
detach_ioas support for emulated VFIO devices error: sha1 information is 
lacking or useless (drivers/vfio/iommufd.c). error: could not build fake 
ancestor hint: Use 'git am --show-current-patch=diff' to see the failed patch 
Patch failed at 0015 vfio-iommufd: Add detach_ioas support for emulated VFIO 
devices When you have resolved this problem, run "git am --continue". If you 
prefer to skip this patch, run "git am --skip" instead. To restore the original 
branch and stop patching, run "git am --abort". Build failed, no error log 
produced

Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 14, 2023 9:38 PM
> 
> On Wed, Jun 14, 2023 at 01:05:45PM +0000, Liu, Yi L wrote:
> > > -EAGAIN basically means the kernel internally malfunctioned - eg it
> > > allocated too little space for the actual size of devices. That is no
> > > longer possible in this version so it should never return -EAGAIN.
> >
> > I still have one doubt. Per my understanding, this is to handle newly
> > plugged devices during the info reporting path. I don’t think holding
> > dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
> > what about your opinion?
> 
> If the device was plug instantly before we computed the size we returned
> ENOSPC
> 
> If it was plugged instantly after we computed the size we returned
> EAGAIN

Yes.

> Here we just resolve this race consistently to always return ENOSPC,
> which always means we ran out of space in the user provided buffer.

This makes sense.

> > > - kfree(devices);
> > > - return ret;
> > > + if (fill.count != fill.devices - arg->devices)
> >
> > Should be "if (fill.count != (fill.devices - arg->devices) / 
> > sizeof(arg->devices[0]))" 
> 
> devices is already a typed pointer so the compiler computes the
> /sizeof() itself
> 
> Your version  above is needed if it was void *

Got it.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 14, 2023 8:23 PM
> On Wed, Jun 14, 2023 at 06:20:10AM +, Tian, Kevin wrote:
> > > From: Liu, Yi L 
> > > Sent: Wednesday, June 14, 2023 2:14 PM
> > >
> > >
> > > > With that I think Jason's suggestion is to lift that test into main.c:
> > > >
> > > > int vfio_register_group_dev(struct vfio_device *device)
> > > > {
> > > > /*
> > > >  * VFIO always sets IOMMU_CACHE because we offer no way for
> > > userspace to
> > > >  * restore cache coherency. It has to be checked here because 
> > > > it is
> > > only
> > > >  * valid for cases where we are using iommu groups.
> > > >  */
> > > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
> > > > !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > > > return ERR_PTR(-EINVAL);
> > >
> > > vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
> > > Otherwise, it's always false. So still needs to call it in the
> > > __vfio_register_dev().
> >
> > yes
> 
> Right, but it needs to be in vfio_main.c, not in the group.c - so
> another patch should be added to move it.

I've got a patch as below to move it.

>From 306e71325d255eef34a1c44312bf9cdc8c302faa Mon Sep 17 00:00:00 2001
From: Yi Liu 
Date: Wed, 14 Jun 2023 00:37:52 -0700
Subject: [PATCH] vfio: Move the IOMMU_CAP_CACHE_COHERENCY check in
 __vfio_register_dev()

The IOMMU_CAP_CACHE_COHERENCY check only applies to the physical devices
that are IOMMU-backed. This change prepares for compiling the vfio_group
infrastructure optionally as cdev does not support the physical devices
that are not IOMMU-backed. This check help to fail the device registration
for such devices if only vfio_group infrastructure is compiled out.

Signed-off-by: Yi Liu 
---
 drivers/vfio/group.c | 10 --
 drivers/vfio/vfio_main.c | 11 +++
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 41a09a2df690..c2e0128323a7 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -687,16 +687,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct 
device *dev)
if (!iommu_group)
return ERR_PTR(-EINVAL);
 
-   /*
-* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
-* restore cache coherency. It has to be checked here because it is only
-* valid for cases where we are using iommu groups.
-*/
-   if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
-   iommu_group_put(iommu_group);
-   return ERR_PTR(-EINVAL);
-   }
-
mutex_lock(_lock);
group = vfio_group_find_from_iommu(iommu_group);
if (group) {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 51c80eb32af6..ffb4585b7f0e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -292,6 +292,17 @@ static int __vfio_register_dev(struct vfio_device *device,
if (ret)
return ret;
 
+   /*
+* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
+* restore cache coherency. It has to be checked here because it is only
+* valid for cases where we are using iommu groups.
+*/
+   if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
+   !device_iommu_capable(device->dev, IOMMU_CAP_CACHE_COHERENCY)) {
+   ret = -EINVAL;
+   goto err_out;
+   }
+
ret = vfio_device_add(device);
if (ret)
goto err_out;
-- 
2.34.1

> I prefer the idea that vfio_device_is_noiommu() works in all the
> kconfig scenarios rather than adding #ifdefs.

But the vfio_group would be empty when CONFIG_VFIO_GROUP is unset.
>From what I got now, when CONFIG_VFIO_GROUP is unset, the stub
function always returns false.

#if IS_ENABLED(CONFIG_VFIO_GROUP)
struct vfio_group {
...;
};

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
   vdev->group->type == VFIO_NO_IOMMU;
}
#else
struct vfio_group;

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
return false;
}
#endif

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 14, 2023 8:17 PM
> 
> On Wed, Jun 14, 2023 at 10:35:10AM +0000, Liu, Yi L wrote:
> 
> > > - if (fill->cur == fill->max)
> > > - return -EAGAIN; /* Something changed, try again */
> > > + if (fill->devices_end >= fill->devices)
> > > + return -ENOSPC;
> >
> > This should be fill->devices_end <= fill->devices.
> 
> Yep
> 
> > Even it's corrected. The
> > new code does not return -EAGAIN.
> 
> Right, there is no EAGAIN. If the caller didn't provide enough space
> the previous version returned ENOSPC:
> 
> > > - if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) {
> > > - ret = -ENOSPC;
> > > - hdr.count = fill.max;
> > > - goto reset_info_exit;
> > > - }
> 
> -EAGAIN basically means the kernel internally malfunctioned - eg it
> allocated too little space for the actual size of devices. That is no
> longer possible in this version so it should never return -EAGAIN.

I still have one doubt. Per my understanding, this is to handle newly
plugged devices during the info reporting path. I don’t think holding
dev_set lock can prevent it. but maybe -ENOSPC is enough. @Alex,
what about your opinion?

> > And if return -ENOSPC, the expected
> > size should be returned. But I didn't see it. As the hunk below[1] is 
> > removed,
> > seems no way to know how many memory it requires.
> 
> Yes, I missed that, it should keep counting
> 
> Like this then
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..05c064896a7a94 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,25 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, 
> void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> - int max;
> - int cur;
> - struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_dependent_device __user *devices;
> + struct vfio_pci_dependent_device __user *devices_end;
>   struct vfio_device *vdev;
> + u32 count;
>   u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> + struct vfio_pci_dependent_device info = {
> + .segment = pci_domain_nr(pdev->bus),
> + .bus = pdev->bus->number,
> + .devfn = pdev->devfn,
> + };
>   struct vfio_pci_fill_info *fill = data;
> 
> - if (fill->cur == fill->max)
> - return -EAGAIN; /* Something changed, try again */
> + fill.count++;
> + if (fill->devices >= fill->devices_end)
> + return 0;
> 
>   if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>   struct iommufd_ctx *iommufd = 
> vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +806,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> void *data)
>*/
>   vdev = vfio_find_device_in_devset(dev_set, >dev);
>   if (!vdev)
> - fill->devices[fill->cur].devid = 
> VFIO_PCI_DEVID_NOT_OWNED;
> + info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>   else
> - fill->devices[fill->cur].devid =
> - vfio_iommufd_device_hot_reset_devid(vdev, 
> iommufd);
> + info.devid = vfio_iommufd_device_hot_reset_devid(
> + vdev, iommufd);
>   /* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> - if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> + if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>   fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>   } else {
>   struct iommu_group *iommu_group;
> @@ -814,13 +820,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> void *data)
>   if (!iommu_group)
>   return -EPERM; /* Cannot reset non-isolated devices */
> 
> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + info.group_id = iommu_group_id(iommu_group);
>   iommu_group_put(iommu_group);
>   }
> - fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> - fill->devices[fill->cur].bus = pdev->bus->number;
> - fill->devices[fill->cur].devfn = pdev->devfn;
> - fill->cur++;
> +
> + if (copy_to_user(fill->devices, , siz

Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-14 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 14, 2023 2:23 AM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/iommufd.c   | 49 +++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +-
> >  include/linux/vfio.h | 16 ++
> >  include/uapi/linux/vfio.h| 50 +++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> 
> This could use some more fiddling, like we could copy each
> vfio_pci_dependent_device to user memory inside the loop instead of
> allocating an array.

I understand the motivation. But have some concerns. Please check
inline.

> Add another patch with something like this in it:
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> b/drivers/vfio/pci/vfio_pci_core.c
> index b0eadafcbcf502..516e0fda74bec9 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -775,19 +775,23 @@ static int vfio_pci_count_devs(struct pci_dev *pdev, 
> void
> *data)
>  }
> 
>  struct vfio_pci_fill_info {
> - int max;
> - int cur;
> - struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_dependent_device __user *devices;
> + struct vfio_pci_dependent_device __user *devices_end;
>   struct vfio_device *vdev;
>   u32 flags;
>  };
> 
>  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
>  {
> + struct vfio_pci_dependent_device info = {
> + .segment = pci_domain_nr(pdev->bus),
> + .bus = pdev->bus->number,
> + .devfn = pdev->devfn,
> + };
>   struct vfio_pci_fill_info *fill = data;
> 
> - if (fill->cur == fill->max)
> - return -EAGAIN; /* Something changed, try again */
> + if (fill->devices_end >= fill->devices)
> + return -ENOSPC;

This should be fill->devices_end <= fill->devices. Even it's corrected. The
new code does not return -EAGAIN. And if return -ENOSPC, the expected
size should be returned. But I didn't see it. As the hunk below[1] is removed,
seems no way to know how many memory it requires.

> 
>   if (fill->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID) {
>   struct iommufd_ctx *iommufd = 
> vfio_iommufd_device_ictx(fill->vdev);
> @@ -800,12 +804,12 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> void *data)
>*/
>   vdev = vfio_find_device_in_devset(dev_set, >dev);
>   if (!vdev)
> - fill->devices[fill->cur].devid = 
> VFIO_PCI_DEVID_NOT_OWNED;
> + info.devid = VFIO_PCI_DEVID_NOT_OWNED;
>   else
> - fill->devices[fill->cur].devid =
> - vfio_iommufd_device_hot_reset_devid(vdev, 
> iommufd);
> + info.devid = vfio_iommufd_device_hot_reset_devid(
> + vdev, iommufd);
>   /* If devid is VFIO_PCI_DEVID_NOT_OWNED, clear owned flag. */
> - if (fill->devices[fill->cur].devid == VFIO_PCI_DEVID_NOT_OWNED)
> + if (info.devid == VFIO_PCI_DEVID_NOT_OWNED)
>   fill->flags &= ~VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED;
>   } else {
>   struct iommu_group *iommu_group;
> @@ -814,13 +818,13 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> void *data)
>   if (!iommu_group)
>   return -EPERM; /* Cannot reset non-isolated devices */
> 
> - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> + info.group_id = iommu_group_id(iommu_group);
>   iommu_group_put(iommu_group);
>   }
> - fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> - fill->devices[fill->cur].bus = pdev->bus->number;
> - fill->devices[fill->cur].devfn = pdev->devfn;
> - fill->cur++;
> +
> + if (copy_to_user(fill->devices, , sizeof(info)))
> + return -EFAULT;
> + fill->devices++;
>   return 0;
>  }
> 
> @@ -1212,8 +1216,7 @@ static int vfio_pci_ioctl_get_pci_hot_reset_info(
>

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-14 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Wednesday, June 14, 2023 1:42 PM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, June 14, 2023 11:24 AM
> >
> > > From: Alex Williamson 
> > > Sent: Wednesday, June 14, 2023 4:11 AM
> > >
> > > On Tue, 13 Jun 2023 14:35:09 -0300
> > > Jason Gunthorpe  wrote:
> > >
> > > > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > > > [Sorry for breaking threading, replying to my own message id with 
> > > > > reply
> > > > >  content from Yi since the Cc list got broken]
> > > >
> > > > Yikes it is really busted, I think I fixed it?
> > > >
> > > > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > > > couldn't we just wrap device_add like below instead to not have cdev
> > > > > setup for a noiommu device, generate an error for a physical device
> > w/o
> > > > > IOMMU backing, and otherwise setup the cdev device?
> > > > >
> > > > > static inline int vfio_device_add(struct vfio_device *device, enum
> > vfio_group_type
> > > type)
> > > > > {
> > > > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > > > >   if (device->group->type == VFIO_NO_IOMMU)
> > > > >   return device_add(>device);
> > > >
> > > > vfio_device_is_noiommu() embeds the IS_ENABLED
> > >
> > > But patch 23/ makes the definition of struct vfio_group conditional on
> > > CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> > > CONFIG_VFIO_GROUP and the result could be determined, I think the
> > > compiler is still unhappy about the undefined reference.  We'd need a
> > > !CONFIG_VFIO_GROUP stub for the function.
> > >
> > > > > #else
> > > > >   if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > > > >   return -EINVAL;
> > > > > #endif
> > > >
> > > > The require test is this from the group code:
> > > >
> > > > if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
> > {
> > > >
> > > > We could lift it out of the group code and call it from vfio_main.c 
> > > > like:
> > > >
> > > > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> > > && !device_iommu_capable(dev,
> > > >  IOMMU_CAP_CACHE_COHERENCY))
> > > >FAIL
> > >
> > > Ack.  Thanks,
> >
> > So, what I got is:
> >
> > 1) Add bellow check in __vfio_register_dev() to fail the physical devices 
> > that
> > don't have IOMMU protection.
> >
> > /*
> >   * noiommu device is a special type supported by the group interface.
> >   * Such type represents the physical devices  that are not iommu
> > backed.
> >   */
> > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
> > !vfio_device_has_iommu_group(device))
> > return -EINVAL; //or maybe -EOPNOTSUPP?
> >
> > Nit: require a vfio_device_is_noiommu() stub which returns false for
> > the VFIO_GROUP unset case.
> 
> device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY) is valid
> only for cases with iommu groups. So that check already  covers the
> group verification indirectly.

Okay. This IOMMU_CAP_CACHE_COHERENCY check is missed in the cdev
path.

> With that I think Jason's suggestion is to lift that test into main.c:
> 
> int vfio_register_group_dev(struct vfio_device *device)
> {
>   /*
>* VFIO always sets IOMMU_CACHE because we offer no way for userspace to
>* restore cache coherency. It has to be checked here because it is only
>* valid for cases where we are using iommu groups.
>*/
>   if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device) &&
>   !device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY))
>   return ERR_PTR(-EINVAL);

vfio_device_is_noiommu() needs to be called after vfio_device_set_group().
Otherwise, it's always false. So still needs to call it in the 
__vfio_register_dev().

>   return __vfio_register_dev(device, VFIO_IOMMU);
> }
> 
> >
> > 2) Have below functions to add device
> >
> > #if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
> > static inline int vfio_device_add(struct vfio_device *device)
> > {
> > if (vfio_device_is_noiommu(device))
> > return device_add(>device);
> > vfio_init_device_cdev(device);
> > return cdev_device_add(>cdev, >device);
> > }
> >
> > static inline void vfio_device_del(struct vfio_device *device)
> > {
> > if (vfio_device_is_noiommu(device))
> > return device_del(>device);
> > cdev_device_del(>cdev, >device);
> > }
> 
> Correct

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v7 6/9] vfio: Mark cdev usage in vfio_device

2023-06-13 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, June 14, 2023 1:56 AM
> 
> On Fri, Jun 02, 2023 at 05:15:12AM -0700, Yi Liu wrote:
> > This can be used to differentiate whether to report group_id or devid in
> > the revised VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl. At this moment, no
> > cdev path yet, so the vfio_device_cdev_opened() helper always returns false.
> >
> > Reviewed-by: Kevin Tian 
> > Tested-by: Terrence Xu 
> > Signed-off-by: Yi Liu 
> > ---
> >  include/linux/vfio.h | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 2c137ea94a3e..2a45853773a6 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -139,6 +139,11 @@ int vfio_iommufd_emulated_attach_ioas(struct 
> > vfio_device
> *vdev, u32 *pt_id);
> > ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> >  #endif
> >
> > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > +{
> > +   return false;
> > +}
> 
> This and the two hunks in the other two patches that use this function
> should be folded into the cdev series, probably just flattened to one
> patch

Hmmm. I have a doubt about the rule. I think the reason to have this
sub-series is to avoid bumping the cdev series. So perhaps we can still
put this and the patch 9 in this series? Otherwise, most of the series
needs to be in the cdev series.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, June 14, 2023 4:11 AM
> 
> On Tue, 13 Jun 2023 14:35:09 -0300
> Jason Gunthorpe  wrote:
> 
> > On Tue, Jun 13, 2023 at 11:15:11AM -0600, Alex Williamson wrote:
> > > [Sorry for breaking threading, replying to my own message id with reply
> > >  content from Yi since the Cc list got broken]
> >
> > Yikes it is really busted, I think I fixed it?
> >
> > > If we renamed your function above to vfio_device_has_iommu_group(),
> > > couldn't we just wrap device_add like below instead to not have cdev
> > > setup for a noiommu device, generate an error for a physical device w/o
> > > IOMMU backing, and otherwise setup the cdev device?
> > >
> > > static inline int vfio_device_add(struct vfio_device *device, enum 
> > > vfio_group_type
> type)
> > > {
> > > #if IS_ENABLED(CONFIG_VFIO_GROUP)
> > >   if (device->group->type == VFIO_NO_IOMMU)
> > >   return device_add(>device);
> >
> > vfio_device_is_noiommu() embeds the IS_ENABLED
> 
> But patch 23/ makes the definition of struct vfio_group conditional on
> CONFIG_VFIO_GROUP, so while CONFIG_VFIO_NOIOMMU depends on
> CONFIG_VFIO_GROUP and the result could be determined, I think the
> compiler is still unhappy about the undefined reference.  We'd need a
> !CONFIG_VFIO_GROUP stub for the function.
> 
> > > #else
> > >   if (type == VFIO_IOMMU && !vfio_device_has_iommu_group(device))
> > >   return -EINVAL;
> > > #endif
> >
> > The require test is this from the group code:
> >
> > if (!device_iommu_capable(dev, IOMMU_CAP_CACHE_COHERENCY)) {
> >
> > We could lift it out of the group code and call it from vfio_main.c like:
> >
> > if (type == VFIO_IOMMU && !vfio_device_is_noiommu(vdev)
> && !device_iommu_capable(dev,
> >  IOMMU_CAP_CACHE_COHERENCY))
> >FAIL
> 
> Ack.  Thanks,

So, what I got is:

1) Add bellow check in __vfio_register_dev() to fail the physical devices that
don't have IOMMU protection.

/*
  * noiommu device is a special type supported by the group interface.
  * Such type represents the physical devices  that are not iommu 
backed.
  */
if (type == VFIO_IOMMU && !vfio_device_is_noiommu(device)) &&
!vfio_device_has_iommu_group(device))
return -EINVAL; //or maybe -EOPNOTSUPP?

Nit: require a vfio_device_is_noiommu() stub which returns false for
the VFIO_GROUP unset case.

2) Have below functions to add device

#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
static inline int vfio_device_add(struct vfio_device *device)
{
if (vfio_device_is_noiommu(device))
return device_add(>device);
vfio_init_device_cdev(device);
return cdev_device_add(>cdev, >device);
}

static inline void vfio_device_del(struct vfio_device *device)
{
if (vfio_device_is_noiommu(device))
return device_del(>device);
cdev_device_del(>cdev, >device);
}
#else
blabla
#endif

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 11:04 PM
> 
> > > >
> > > > >
> > > > > Unless I missed it, we've not described that vfio device cdev access 
> > > > > is
> > > > > still bound by IOMMU group semantics, ie. there can be one DMA owner
> > > > > for the group.  That's a pretty common failure point for 
> > > > > multi-function
> > > > > consumer device use cases, so the why, where, and how it fails should
> > > > > be well covered.
> > > >
> > > > Yes. this needs to be documented. How about below words:
> > > >
> > > > vfio device cdev access is still bound by IOMMU group semantics, ie. 
> > > > there
> > > > can be only one DMA owner for the group.  Devices belonging to the same
> > > > group can not be bound to multiple iommufd_ctx.
> > >
> > > ... or shared between native kernel and vfio drivers.
> >
> > I suppose you mean the devices in one group are bound to different
> > drivers. right?
> 
> Essentially, but we need to be careful that we're developing multiple
> vfio drivers for a given bus now, which is why I try to distinguish
> between the two sets of drivers.  Thanks,

Indeed. There are a set of vfio drivers. Even pci-stub can be considered
in this set? Perhaps, it is more precise to say : or shared between drivers
that set the struct pci_driver::driver_managed_dma flag and the drivers
that do not.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:48 PM
> 
> On Tue, 13 Jun 2023 14:33:01 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, June 13, 2023 10:19 PM
> > >
> > > On Tue, 13 Jun 2023 05:53:42 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, June 13, 2023 6:42 AM
> > > > >
> > > > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > > > Yi Liu  wrote:
> > > > >
> > > > > > This moves the noiommu device determination and noiommu taint out of
> > > > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > > > __vfio_register_dev() and result is stored in flag 
> > > > > > vfio_device->noiommu,
> > > > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > > > >
> > > > > > This is also a preparation for compiling out vfio_group 
> > > > > > infrastructure
> > > > > > as it makes the noiommu detection and taint common between the cdev 
> > > > > > path
> > > > > > and group path though cdev path does not support noiommu.
> > > > >
> > > > > Does this really still make sense?  The motivation for the change is
> > > > > really not clear without cdev support for noiommu.  Thanks,
> > > >
> > > > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > > > only supports cdev interface. If there is noiommu device, vfio should
> > > > fail the registration. So, the noiommu determination is still needed. 
> > > > But
> > > > I'd admit the taint might still be in the group code.
> > >
> > > How is there going to be a noiommu device when VFIO_GROUP is unset?
> >
> > How about booting a kernel with iommu disabled, then all the devices
> > are not protected by iommu. I suppose they are noiommu devices. If
> > user wants to bound them to vfio, the kernel should have VFIO_GROUP.
> > Otherwise, needs to fail.
> 
> "noiommu" is a vfio designation of a device, it must be created by
> vfio.  

Sure.

> There can certainly be devices which are not IOMMU backed, but
> without vfio designating them as noiommu devices, which is only done
> via the legacy and compat paths, there's no such thing as a noiommu
> device. 

Yes.

> Devices without an IOMMU are simply out of scope for cdev,
> there should never be a vfio cdev entry created for them.  Thanks,

Actually, this is what I want to solve. I need to check if a device is
IOMMU backed or not, and based on this info to prevent creating
cdev entry for them in the coming cdev support or may need to
fail registration if VFIO_GROUP is unset.

If this patch is not good. I can use the vfio_device_is_noiommu()
written like below when VFIO_GROUP is unset. What about your
opinion?

static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
{
struct iommu_group *iommu_group;

iommu_group = iommu_group_get(vdev->dev);
iommu_group_put(iommu_group); /* Accepts NULL */
return !iommu_group;
}

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:24 PM
> 
> On Tue, 13 Jun 2023 12:01:51 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, June 13, 2023 7:06 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:53 -0700
> > > Yi Liu  wrote:
> > >
> > > > This gives notes for userspace applications on device cdev usage.
> > > >
> > > > Reviewed-by: Kevin Tian 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  Documentation/driver-api/vfio.rst | 132 ++
> > > >  1 file changed, 132 insertions(+)
> > > >
> > > > diff --git a/Documentation/driver-api/vfio.rst 
> > > > b/Documentation/driver-api/vfio.rst
> > > > index 363e12c90b87..f00c9b86bda0 100644
> > > > --- a/Documentation/driver-api/vfio.rst
> > > > +++ b/Documentation/driver-api/vfio.rst
> > > > @@ -239,6 +239,130 @@ group and can access them as follows::
> > > > /* Gratuitous device reset and go... */
> > > > ioctl(device, VFIO_DEVICE_RESET);
> > > >
> > > > +IOMMUFD and vfio_iommu_type1
> > > > +
> > > > +
> > > > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > > > +It intends to be the portal of delivering advanced userspace DMA
> > > > +features (nested translation [5]_, PASID [6]_, etc.) while also 
> > > > providing
> > > > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > > > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > > > +vfio container and group model is intended to be deprecated.
> > > > +
> > > > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > > > +In the first method, the kernel can be configured with
> > > > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > > > +transparently provides the entire infrastructure for the VFIO
> > > > +container and IOMMU backend interfaces.  The compatibility mode can
> > > > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > > > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > > > +compatibility mode is not entirely feature complete relative to
> > > > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > > > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > > > +it is not generally advisable at this time to switch from native VFIO
> > > > +implementations to the IOMMUFD compatibility interfaces.
> > > > +
> > > > +Long term, VFIO users should migrate to device access through the cdev
> > > > +interface described below, and native access through the IOMMUFD
> > > > +provided interfaces.
> > > > +
> > > > +VFIO Device cdev
> > > > +
> > > > +
> > > > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > > > +in a VFIO group.
> > > > +
> > > > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > > > +by directly opening a character device /dev/vfio/devices/vfioX where
> > > > +"X" is the number allocated uniquely by VFIO for registered devices.
> > > > +cdev interface does not support noiommu, so user should use the legacy
> > > > +group interface if noiommu is needed.
> > > > +
> > > > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > > > +must adapt to the new cdev security model which requires using
> > > > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > > > +actually use the device.  Once BIND succeeds then a VFIO device can
> > > > +be fully accessed by the user.
> > > > +
> > > > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > > > +Hence those modules can be fully compiled out in an environment
> > > > +where no legacy VFIO application exists.
> > > > +
> > > > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > > > +cdev neither.
> > >
> > > s/neither/either/
> >
> > Got it.
> >
> > >
> > > Unless I missed it, we've not described that vfio device cdev access is

Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:42 PM
> 
> On Tue, 13 Jun 2023 14:36:14 +0000
> "Liu, Yi L"  wrote:

> > > > > >
> > > > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > > > --- a/drivers/vfio/vfio.h
> > > > > > +++ b/drivers/vfio/vfio.h
> > > > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > > > >
> > > > > >  struct vfio_device_file {
> > > > > > struct vfio_device *device;
> > > > > > +   bool access_granted;
> > > > >
> > > > > Should we make this a more strongly defined data type and later move
> > > > > devid (u32) here to partially fill the hole created?
> > > >
> > > > Before your question, let me describe how I place the fields
> > > > of this structure to see if it is common practice. The first two
> > > > fields are static, so they are in the beginning. The access_granted
> > > > is lockless and other fields are protected by locks. So I tried to
> > > > put the lock and the fields it protects closely. So this is why I put
> > > > devid behind iommufd as both are protected by the same lock.
> > >
> > > I think the primary considerations are locality and compactness.  Hot
> > > paths data should be within the first cache line of the structure,
> > > related data should share a cache line, and we should use the space
> > > efficiently.  What you describe seems largely an aesthetic concern,
> > > which was not evident to me by the segmentation alone.
> >
> > Sure.
> >
> > >
> > > > struct vfio_device_file {
> > > > struct vfio_device *device;
> > > > struct vfio_group *group;
> > > >
> > > > bool access_granted;
> > > > spinlock_t kvm_ref_lock; /* protect kvm field */
> > > > struct kvm *kvm;
> > > > struct iommufd_ctx *iommufd; /* protected by struct 
> > > > vfio_device_set::lock */
> > > > u32 devid; /* only valid when iommufd is valid */
> > > > };
> > > >
> > > > >
> > > > > I think this is being placed towards the front of the data structure
> > > > > for cache line locality given this is a hot path for file operations.
> > > > > But bool types have an implementation dependent size, making them
> > > > > difficult to pack.  Also there will be a tendency to want to make this
> > > > > a bit field, which is probably not compatible with the smp lockless
> > > > > operations being used here.  We might get in front of these issues if
> > > > > we just define it as a u8 now.  Thanks,
> > > >
> > > > Not quite get why bit field is going to be incompatible with smp
> > > > lockless operations. Could you elaborate a bit? And should I define
> > > > the access_granted as u8 or "u8:1"?
> > >
> > > Perhaps FUD on my part, but load-acquire type operations have specific
> > > semantics and it's not clear to me that they interest with compiler
> > > generated bit operations.  Thanks,
> >
> > I see. How about below?
> >
> > struct vfio_device_file {
> > struct vfio_device *device;
> > struct vfio_group *group;
> > u8 access_granted;
> > u32 devid; /* only valid when iommufd is valid */
> > spinlock_t kvm_ref_lock; /* protect kvm field */
> > struct kvm *kvm;
> > struct iommufd_ctx *iommufd; /* protected by struct 
> > vfio_device_set::lock */
> > };
> 
> Yep, that's essentially what I was suggesting.  Thanks,

Got it. 

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:40 PM
> 
> On Tue, 13 Jun 2023 14:28:43 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, June 13, 2023 10:18 PM
> >
> > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > > > --- a/include/linux/vfio.h
> > > > > > +++ b/include/linux/vfio.h
> > > > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > > > struct iommufd_device *iommufd_device;
> > > > > > bool iommufd_attached;
> > > > > >  #endif
> > > > > > +   bool cdev_opened:1;
> > > > >
> > > > > Perhaps a more strongly defined data type here as well and roll
> > > > > iommufd_attached into the same bit field scheme.
> > > >
> > > > Ok, then needs to make iommufd_attached always defined.
> > >
> > > That does not follow.  Thanks,
> >
> > Well, I meant the iommufd_attached now is defined only when
> > CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
> > to change this.
> 
> Understood, but I don't think it's true.  If defined we use one more
> bit of the bit field, which is a consideration when we approach filling
> it, but we're not using bit-shift operations to address these bits, so
> why does it matter if one has compiler conditional usage?  Thanks,

Aha, I see. So you are suggesting something like the below. Is it?

#if IS_ENABLED(CONFIG_IOMMUFD)
struct iommufd_device *iommufd_device;
u8 iommufd_attached:1;
#endif
u8 cdev_opened:1;

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:17 PM
> 
> On Tue, 13 Jun 2023 05:46:32 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, June 13, 2023 5:52 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:36 -0700
> > > Yi Liu  wrote:
> > >
> > > > Allow the vfio_device file to be in a state where the device FD is
> > > > opened but the device cannot be used by userspace (i.e. its 
> > > > .open_device()
> > > > hasn't been called). This inbetween state is not used when the device
> > > > FD is spawned from the group FD, however when we create the device FD
> > > > directly by opening a cdev it will be opened in the blocked state.
> > > >
> > > > The reason for the inbetween state is that userspace only gets a FD but
> > > > doesn't gain access permission until binding the FD to an iommufd. So in
> > > > the blocked state, only the bind operation is allowed. Completing bind
> > > > will allow user to further access the device.
> > > >
> > > > This is implemented by adding a flag in struct vfio_device_file to mark
> > > > the blocked state and using a simple smp_load_acquire() to obtain the
> > > > flag value and serialize all the device setup with the thread accessing
> > > > this device.
> > > >
> > > > Following this lockless scheme, it can safely handle the device FD
> > > > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > > > need to add a lock on all the vfio ioctls which seems costly. So once
> > > > device FD is bound, it remains bound until the FD is closed.
> > > >
> > > > Suggested-by: Jason Gunthorpe 
> > > > Reviewed-by: Kevin Tian 
> > > > Reviewed-by: Jason Gunthorpe 
> > > > Reviewed-by: Eric Auger 
> > > > Tested-by: Terrence Xu 
> > > > Tested-by: Nicolin Chen 
> > > > Tested-by: Matthew Rosato 
> > > > Tested-by: Yanting Jiang 
> > > > Tested-by: Shameer Kolothum 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/vfio/group.c | 11 ++-
> > > >  drivers/vfio/vfio.h  |  1 +
> > > >  drivers/vfio/vfio_main.c | 16 
> > > >  3 files changed, 27 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index caf53716ddb2..088dd34c8931 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct 
> > > > vfio_device_file *df)
> > > > df->iommufd = device->group->iommufd;
> > > >
> > > > ret = vfio_df_open(df);
> > > > -   if (ret)
> > > > +   if (ret) {
> > > > df->iommufd = NULL;
> > > > +   goto out_put_kvm;
> > > > +   }
> > > > +
> > > > +   /*
> > > > +* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > > > +* read/write/mmap and vfio_file_has_device_access()
> > > > +*/
> > > > +   smp_store_release(>access_granted, true);
> > > >
> > > > +out_put_kvm:
> > > > if (device->open_count == 0)
> > > > vfio_device_put_kvm(device);
> > > >
> > > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > > > index f9eb52eb9ed7..fdf2fc73f880 100644
> > > > --- a/drivers/vfio/vfio.h
> > > > +++ b/drivers/vfio/vfio.h
> > > > @@ -18,6 +18,7 @@ struct vfio_container;
> > > >
> > > >  struct vfio_device_file {
> > > > struct vfio_device *device;
> > > > +   bool access_granted;
> > >
> > > Should we make this a more strongly defined data type and later move
> > > devid (u32) here to partially fill the hole created?
> >
> > Before your question, let me describe how I place the fields
> > of this structure to see if it is common practice. The first two
> > fields are static, so they are in the beginning. The access_granted
> > is lockless and other fields are protected by locks. So I tried to
> > put the lock and the fields it protects closely. So this is why I put
> > devid behind iommufd as both are pr

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:19 PM
> 
> On Tue, 13 Jun 2023 05:53:42 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, June 13, 2023 6:42 AM
> > >
> > > On Fri,  2 Jun 2023 05:16:50 -0700
> > > Yi Liu  wrote:
> > >
> > > > This moves the noiommu device determination and noiommu taint out of
> > > > vfio_group_find_or_alloc(). noiommu device is determined in
> > > > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > > > the noiommu taint is added in the end of __vfio_register_dev().
> > > >
> > > > This is also a preparation for compiling out vfio_group infrastructure
> > > > as it makes the noiommu detection and taint common between the cdev path
> > > > and group path though cdev path does not support noiommu.
> > >
> > > Does this really still make sense?  The motivation for the change is
> > > really not clear without cdev support for noiommu.  Thanks,
> >
> > I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
> > only supports cdev interface. If there is noiommu device, vfio should
> > fail the registration. So, the noiommu determination is still needed. But
> > I'd admit the taint might still be in the group code.
> 
> How is there going to be a noiommu device when VFIO_GROUP is unset?

How about booting a kernel with iommu disabled, then all the devices
are not protected by iommu. I suppose they are noiommu devices. If
user wants to bound them to vfio, the kernel should have VFIO_GROUP.
Otherwise, needs to fail.

Regards,
Yi Liu

> Thanks,
> 
> Alex
> 
> 
> > > > Suggested-by: Alex Williamson 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/vfio/group.c | 15 ---
> > > >  drivers/vfio/vfio_main.c | 31 ++-
> > > >  include/linux/vfio.h |  1 +
> > > >  3 files changed, 31 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > > > index 653b62f93474..64cdd0ea8825 100644
> > > > --- a/drivers/vfio/group.c
> > > > +++ b/drivers/vfio/group.c
> > > > @@ -668,21 +668,6 @@ static struct vfio_group 
> > > > *vfio_group_find_or_alloc(struct
> > > device *dev)
> > > > struct vfio_group *group;
> > > >
> > > > iommu_group = iommu_group_get(dev);
> > > > -   if (!iommu_group && vfio_noiommu) {
> > > > -   /*
> > > > -* With noiommu enabled, create an IOMMU group for 
> > > > devices that
> > > > -* don't already have one, implying no IOMMU 
> > > > hardware/driver
> > > > -* exists.  Taint the kernel because we're about to 
> > > > give a DMA
> > > > -* capable device to a user without IOMMU protection.
> > > > -*/
> > > > -   group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > > > -   if (!IS_ERR(group)) {
> > > > -   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > > > -   dev_warn(dev, "Adding kernel taint for 
> > > > vfio-noiommu group on
> > > device\n");
> > > > -   }
> > > > -   return group;
> > > > -   }
> > > > -
> > > > if (!iommu_group)
> > > > return ERR_PTR(-EINVAL);
> > > >
> > > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > > > index 6d8f9b0f3637..00a699b9f76b 100644
> > > > --- a/drivers/vfio/vfio_main.c
> > > > +++ b/drivers/vfio/vfio_main.c
> > > > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device 
> > > > *device,
> struct
> > > device *dev,
> > > > return ret;
> > > >  }
> > > >
> > > > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > > > +{
> > > > +   struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > > > +
> > > > +   if (!iommu_group && !vfio_noiommu)
> > > > +   return -EINVAL;
> > > > +
> > > > +   device->noiommu = !iommu_group;
> > > > +

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 10:18 PM

> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index 83cc5dc28b7a..e80a8ac86e46 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -66,6 +66,7 @@ struct vfio_device {
> > > > struct iommufd_device *iommufd_device;
> > > > bool iommufd_attached;
> > > >  #endif
> > > > +   bool cdev_opened:1;
> > >
> > > Perhaps a more strongly defined data type here as well and roll
> > > iommufd_attached into the same bit field scheme.
> >
> > Ok, then needs to make iommufd_attached always defined.
> 
> That does not follow.  Thanks,

Well, I meant the iommufd_attached now is defined only when
CONFIG_IOMMUFD is enabled. To toll it with cdev_opened, needs
to change this.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-13 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Tuesday, June 13, 2023 7:47 PM
> 
> On Fri, Jun 02, 2023 at 05:15:14AM -0700, Yi Liu wrote:
> > +/*
> > + * Return devid for a device which is affected by hot-reset.
> > + * - valid devid > 0 for the device that is bound to the input
> > + *   iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_OWNED for the device that has not
> > + *   been bound to any iommufd_ctx but other device within its
> > + *   group has been bound to the input iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. device
> > + *   is bound to other iommufd_ctx etc.
> > + */
> > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > +   struct iommufd_ctx *ictx)
> > +{
> > +   struct iommu_group *group;
> > +   int devid;
> > +
> > +   if (vfio_iommufd_device_ictx(vdev) == ictx)
> > +   return vfio_iommufd_device_id(vdev);
> > +
> > +   group = iommu_group_get(vdev->dev);
> > +   if (!group)
> > +   return VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +   if (iommufd_ctx_has_group(ictx, group))
> > +   devid = VFIO_PCI_DEVID_OWNED;
> > +   else
> > +   devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +   iommu_group_put(group);
> > +
> > +   return devid;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_hot_reset_devid);
> 
> This function really should not be in the core iommufd.c file - it is
> a purely vfio-pci function - why did you have to place it here?

Put it here can avoid calling iommufd_ctx_has_group() in vfio-pci,
which requires to import IOMMUFD_NS. If this reason is not so
strong I can move it back to vfio-pci code.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 24/24] docs: vfio: Add vfio device cdev description

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 7:06 AM
> 
> On Fri,  2 Jun 2023 05:16:53 -0700
> Yi Liu  wrote:
> 
> > This gives notes for userspace applications on device cdev usage.
> >
> > Reviewed-by: Kevin Tian 
> > Signed-off-by: Yi Liu 
> > ---
> >  Documentation/driver-api/vfio.rst | 132 ++
> >  1 file changed, 132 insertions(+)
> >
> > diff --git a/Documentation/driver-api/vfio.rst 
> > b/Documentation/driver-api/vfio.rst
> > index 363e12c90b87..f00c9b86bda0 100644
> > --- a/Documentation/driver-api/vfio.rst
> > +++ b/Documentation/driver-api/vfio.rst
> > @@ -239,6 +239,130 @@ group and can access them as follows::
> > /* Gratuitous device reset and go... */
> > ioctl(device, VFIO_DEVICE_RESET);
> >
> > +IOMMUFD and vfio_iommu_type1
> > +
> > +
> > +IOMMUFD is the new user API to manage I/O page tables from userspace.
> > +It intends to be the portal of delivering advanced userspace DMA
> > +features (nested translation [5]_, PASID [6]_, etc.) while also providing
> > +a backwards compatibility interface for existing VFIO_TYPE1v2_IOMMU use
> > +cases.  Eventually the vfio_iommu_type1 driver, as well as the legacy
> > +vfio container and group model is intended to be deprecated.
> > +
> > +The IOMMUFD backwards compatibility interface can be enabled two ways.
> > +In the first method, the kernel can be configured with
> > +CONFIG_IOMMUFD_VFIO_CONTAINER, in which case the IOMMUFD subsystem
> > +transparently provides the entire infrastructure for the VFIO
> > +container and IOMMU backend interfaces.  The compatibility mode can
> > +also be accessed if the VFIO container interface, ie. /dev/vfio/vfio is
> > +simply symlink'd to /dev/iommu.  Note that at the time of writing, the
> > +compatibility mode is not entirely feature complete relative to
> > +VFIO_TYPE1v2_IOMMU (ex. DMA mapping MMIO) and does not attempt to
> > +provide compatibility to the VFIO_SPAPR_TCE_IOMMU interface.  Therefore
> > +it is not generally advisable at this time to switch from native VFIO
> > +implementations to the IOMMUFD compatibility interfaces.
> > +
> > +Long term, VFIO users should migrate to device access through the cdev
> > +interface described below, and native access through the IOMMUFD
> > +provided interfaces.
> > +
> > +VFIO Device cdev
> > +
> > +
> > +Traditionally user acquires a device fd via VFIO_GROUP_GET_DEVICE_FD
> > +in a VFIO group.
> > +
> > +With CONFIG_VFIO_DEVICE_CDEV=y the user can now acquire a device fd
> > +by directly opening a character device /dev/vfio/devices/vfioX where
> > +"X" is the number allocated uniquely by VFIO for registered devices.
> > +cdev interface does not support noiommu, so user should use the legacy
> > +group interface if noiommu is needed.
> > +
> > +The cdev only works with IOMMUFD.  Both VFIO drivers and applications
> > +must adapt to the new cdev security model which requires using
> > +VFIO_DEVICE_BIND_IOMMUFD to claim DMA ownership before starting to
> > +actually use the device.  Once BIND succeeds then a VFIO device can
> > +be fully accessed by the user.
> > +
> > +VFIO device cdev doesn't rely on VFIO group/container/iommu drivers.
> > +Hence those modules can be fully compiled out in an environment
> > +where no legacy VFIO application exists.
> > +
> > +So far SPAPR does not support IOMMUFD yet.  So it cannot support device
> > +cdev neither.
> 
> s/neither/either/

Got it.

> 
> Unless I missed it, we've not described that vfio device cdev access is
> still bound by IOMMU group semantics, ie. there can be one DMA owner
> for the group.  That's a pretty common failure point for multi-function
> consumer device use cases, so the why, where, and how it fails should
> be well covered.

Yes. this needs to be documented. How about below words:

vfio device cdev access is still bound by IOMMU group semantics, ie. there
can be only one DMA owner for the group.  Devices belonging to the same
group can not be bound to multiple iommufd_ctx.  The users that try to bind
such device to different iommufd shall be failed in VFIO_DEVICE_BIND_IOMMUFD
which is the start point to get full access for the device.

> 
> In general there's been a lot of cross collaboration to get the series
> this far.  I see an abundance of Tested-by, but unfortunately not a lot
> of Reviewed-by beyond about the first 1/3rd of the series.  Thanks,

Yeah. The rest 2/3rd part has back and forth changes since v8.

Regards,
Yi Liu

> Alex
> 
> > +
> > +Device cdev Example
> > +---
> > +
> > +Assume user wants to access PCI device :6a:01.0::
> > +
> > +   $ ls /sys/bus/pci/devices/:6a:01.0/vfio-dev/
> > +   vfio0
> > +
> > +This device is therefore represented as vfio0.  The user can verify
> > +its existence::
> > +
> > +   $ ls -l /dev/vfio/devices/vfio0
> > +   crw--- 1 root root 511, 0 Feb 16 01:22 /dev/vfio/devices/vfio0
> > +   $ cat

Re: [Intel-gfx] [PATCH v12 20/24] vfio: Only check group->type for noiommu test

2023-06-13 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 6:38 AM
> On Fri,  2 Jun 2023 05:16:49 -0700
> Yi Liu  wrote:
> 
> > group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
> > And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
> > So checking group->type is enough when testing noiommu.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c | 3 +--
> >  drivers/vfio/vfio.h  | 3 +--
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 41a09a2df690..653b62f93474 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct 
> > vfio_group
> *group,
> >
> > iommufd = iommufd_ctx_from_file(f.file);
> > if (!IS_ERR(iommufd)) {
> > -   if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -   group->type == VFIO_NO_IOMMU)
> > +   if (group->type == VFIO_NO_IOMMU)
> > ret = iommufd_vfio_compat_set_no_iommu(iommufd);
> > else
> > ret = iommufd_vfio_compat_ioas_create(iommufd);
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 5835c74e97ce..1b89e8bc8571 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
> >
> >  static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> >  {
> > -   return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -  vdev->group->type == VFIO_NO_IOMMU;
> > +   return vdev->group->type == VFIO_NO_IOMMU;
> >  }
> >
> >  #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
> 
> This patch should be dropped.  It's logically correct, but ignores that
> the config option can be determined at compile time and therefore the
> code can be better optimized based on that test.  I think there was a
> specific case where I questioned it, but this drops an otherwise valid
> compiler optimization.  Thanks,

Yes. in v11, you mentioned the compiler optimization and the fact that
vfio_noiommu can only be valid when VFIO_NOIOMMU is enabled. I'm
ok to drop this patch to keep the compiler optimization.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v12 21/24] vfio: Determine noiommu device in __vfio_register_dev()

2023-06-12 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 6:42 AM
> 
> On Fri,  2 Jun 2023 05:16:50 -0700
> Yi Liu  wrote:
> 
> > This moves the noiommu device determination and noiommu taint out of
> > vfio_group_find_or_alloc(). noiommu device is determined in
> > __vfio_register_dev() and result is stored in flag vfio_device->noiommu,
> > the noiommu taint is added in the end of __vfio_register_dev().
> >
> > This is also a preparation for compiling out vfio_group infrastructure
> > as it makes the noiommu detection and taint common between the cdev path
> > and group path though cdev path does not support noiommu.
> 
> Does this really still make sense?  The motivation for the change is
> really not clear without cdev support for noiommu.  Thanks,

I think it still makes sense. When CONFIG_VFIO_GROUP==n, the kernel
only supports cdev interface. If there is noiommu device, vfio should
fail the registration. So, the noiommu determination is still needed. But
I'd admit the taint might still be in the group code.

Regards,
Yi Liu

> Alex
> 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c | 15 ---
> >  drivers/vfio/vfio_main.c | 31 ++-
> >  include/linux/vfio.h |  1 +
> >  3 files changed, 31 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index 653b62f93474..64cdd0ea8825 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -668,21 +668,6 @@ static struct vfio_group 
> > *vfio_group_find_or_alloc(struct
> device *dev)
> > struct vfio_group *group;
> >
> > iommu_group = iommu_group_get(dev);
> > -   if (!iommu_group && vfio_noiommu) {
> > -   /*
> > -* With noiommu enabled, create an IOMMU group for devices that
> > -* don't already have one, implying no IOMMU hardware/driver
> > -* exists.  Taint the kernel because we're about to give a DMA
> > -* capable device to a user without IOMMU protection.
> > -*/
> > -   group = vfio_noiommu_group_alloc(dev, VFIO_NO_IOMMU);
> > -   if (!IS_ERR(group)) {
> > -   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > -   dev_warn(dev, "Adding kernel taint for vfio-noiommu 
> > group on
> device\n");
> > -   }
> > -   return group;
> > -   }
> > -
> > if (!iommu_group)
> > return ERR_PTR(-EINVAL);
> >
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6d8f9b0f3637..00a699b9f76b 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -265,6 +265,18 @@ static int vfio_init_device(struct vfio_device 
> > *device, struct
> device *dev,
> > return ret;
> >  }
> >
> > +static int vfio_device_set_noiommu(struct vfio_device *device)
> > +{
> > +   struct iommu_group *iommu_group = iommu_group_get(device->dev);
> > +
> > +   if (!iommu_group && !vfio_noiommu)
> > +   return -EINVAL;
> > +
> > +   device->noiommu = !iommu_group;
> > +   iommu_group_put(iommu_group); /* Accepts NULL */
> > +   return 0;
> > +}
> > +
> >  static int __vfio_register_dev(struct vfio_device *device,
> >enum vfio_group_type type)
> >  {
> > @@ -277,6 +289,13 @@ static int __vfio_register_dev(struct vfio_device 
> > *device,
> >  !device->ops->detach_ioas)))
> > return -EINVAL;
> >
> > +   /* Only physical devices can be noiommu device */
> > +   if (type == VFIO_IOMMU) {
> > +   ret = vfio_device_set_noiommu(device);
> > +   if (ret)
> > +   return ret;
> > +   }
> > +
> > /*
> >  * If the driver doesn't specify a set then the device is added to a
> >  * singleton set just for itself.
> > @@ -288,7 +307,8 @@ static int __vfio_register_dev(struct vfio_device 
> > *device,
> > if (ret)
> > return ret;
> >
> > -   ret = vfio_device_set_group(device, type);
> > +   ret = vfio_device_set_group(device,
> > +   device->noiommu ? VFIO_NO_IOMMU : type);
> > if (ret)
> > return ret;
> >
> > @@ -301,6 +321,15 @@ static int __vfio_register_dev(struct vfio_device 
> > *device,
> >
> > vfio_device_group_register(device);
> >
> > +   if (device->noiommu) {
> > +   /*
> > +* noiommu deivces have no IOMMU hardware/driver.  Taint the
> > +* kernel because we're about to give a DMA capable device to
> > +* a user without IOMMU protection.
> > +*/
> > +   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
> > +   dev_warn(device->dev, "Adding kernel taint for vfio-noiommu on
> device\n");
> > +   }
> > return 0;
> >  err_out:
> > vfio_device_remove_group(device);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index e80a8ac86e46..183e620009e7 100644
> > --- a/include/linux/vfio.h
> > +++

Re: [Intel-gfx] [PATCH v12 18/24] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-06-12 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:16:47 -0700
> Yi Liu  wrote:
> 
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> > VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> >   control provided by the iommufd. open_device
> >   op is called after bind_iommufd op.
> >
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Tested-by: Terrence Xu 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/device_cdev.c | 123 +
> >  drivers/vfio/vfio.h|  13 
> >  drivers/vfio/vfio_main.c   |   5 ++
> >  include/linux/vfio.h   |   3 +-
> >  include/uapi/linux/vfio.h  |  27 
> >  5 files changed, 170 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1c640016a824..a4498ddbe774 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include 
> > +#include 
> >
> >  #include "vfio.h"
> >
> > @@ -44,6 +45,128 @@ int vfio_device_fops_cdev_open(struct inode *inode, 
> > struct
> file *filep)
> > return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +   spin_lock(>kvm_ref_lock);
> > +   if (df->kvm)
> > +   _vfio_device_get_kvm_safe(df->device, df->kvm);
> > +   spin_unlock(>kvm_ref_lock);
> > +}
> > +
> > +void vfio_df_cdev_close(struct vfio_device_file *df)
> > +{
> > +   struct vfio_device *device = df->device;
> > +
> > +   /*
> > +* In the time of close, there is no contention with another one
> > +* changing this flag.  So read df->access_granted without lock
> > +* and no smp_load_acquire() is ok.
> > +*/
> > +   if (!df->access_granted)
> > +   return;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   vfio_df_close(df);
> > +   vfio_device_put_kvm(device);
> > +   iommufd_ctx_put(df->iommufd);
> > +   device->cdev_opened = false;
> > +   mutex_unlock(>dev_set->lock);
> > +   vfio_device_unblock_group(device);
> > +}
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +   struct iommufd_ctx *iommufd;
> > +   struct fd f;
> > +
> > +   f = fdget(fd);
> > +   if (!f.file)
> > +   return ERR_PTR(-EBADF);
> > +
> > +   iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +   fdput(f);
> > +   return iommufd;
> > +}
> > +
> > +long vfio_df_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +   struct vfio_device_bind_iommufd __user *arg)
> > +{
> > +   struct vfio_device *device = df->device;
> > +   struct vfio_device_bind_iommufd bind;
> > +   unsigned long minsz;
> > +   int ret;
> > +
> > +   static_assert(__same_type(arg->out_devid, df->devid));
> > +
> > +   minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +   if (copy_from_user(, arg, minsz))
> > +   return -EFAULT;
> > +
> > +   if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > +   return -EINVAL;
> > +
> > +   /* BIND_IOMMUFD only allowed for cdev fds */
> > +   if (df->group)
> > +   return -EINVAL;
> > +
> > +   ret = vfio_device_block_group(device);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   /* one device cannot be bound twice */
> > +   if (df->access_granted) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +   if (IS_ERR(df->iommufd)) {
> > +   ret = PTR_ERR(df->iommufd);
> > +   df->iommufd = NULL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   /*
> > +* Before the device open, get the KVM pointer currently
> > +* associated with the device file (if there is) and obtain
> > +* a reference.  This reference is held until device closed.
> > +* Save the pointer in the device for use by drivers.
> > +*/
> > +   vfio_device_get_kvm_safe(df);
> > +
> > +   ret = vfio_df_open(df);
> > +   if (ret)
> > +   goto out_put_kvm;
> > +
> > +   ret = copy_to_user(>out_devid, >devid,
> > +  sizeof(df->devid)) ? -EFAULT : 0;
> > +   if (ret)
> > +   goto out_close_device;
> > +
> > +   /*
> > +* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +* read/write/mmap
> > +*/
> > +   smp_store_release(>access_granted, true);
> > +   device->cdev_opened = true;
> > +   mutex_unlock(>dev_set->lock);
> > +   return 0;
> > +
> > +out_close_device:
> > +   vfio_df_close(df);
> > +out_put_kvm:
> > +   vfio_device_put_kvm(device);
> > +   iommufd_ctx_put(df->iommufd);
> > +   df->iommufd = NULL;
> > +out_unlock:
> > +   mutex_unlock(>dev_set->lock);
> > +   vfio_device_unblock_group(device);
> > +   return ret;
> > +}
> > +
> >  static char

Re: [Intel-gfx] [PATCH v12 07/24] vfio: Block device access via device fd until device is opened

2023-06-12 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:36 -0700
> Yi Liu  wrote:
> 
> > Allow the vfio_device file to be in a state where the device FD is
> > opened but the device cannot be used by userspace (i.e. its .open_device()
> > hasn't been called). This inbetween state is not used when the device
> > FD is spawned from the group FD, however when we create the device FD
> > directly by opening a cdev it will be opened in the blocked state.
> >
> > The reason for the inbetween state is that userspace only gets a FD but
> > doesn't gain access permission until binding the FD to an iommufd. So in
> > the blocked state, only the bind operation is allowed. Completing bind
> > will allow user to further access the device.
> >
> > This is implemented by adding a flag in struct vfio_device_file to mark
> > the blocked state and using a simple smp_load_acquire() to obtain the
> > flag value and serialize all the device setup with the thread accessing
> > this device.
> >
> > Following this lockless scheme, it can safely handle the device FD
> > unbound->bound but it cannot handle bound->unbound. To allow this we'd
> > need to add a lock on all the vfio ioctls which seems costly. So once
> > device FD is bound, it remains bound until the FD is closed.
> >
> > Suggested-by: Jason Gunthorpe 
> > Reviewed-by: Kevin Tian 
> > Reviewed-by: Jason Gunthorpe 
> > Reviewed-by: Eric Auger 
> > Tested-by: Terrence Xu 
> > Tested-by: Nicolin Chen 
> > Tested-by: Matthew Rosato 
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c | 11 ++-
> >  drivers/vfio/vfio.h  |  1 +
> >  drivers/vfio/vfio_main.c | 16 
> >  3 files changed, 27 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index caf53716ddb2..088dd34c8931 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -194,9 +194,18 @@ static int vfio_df_group_open(struct vfio_device_file 
> > *df)
> > df->iommufd = device->group->iommufd;
> >
> > ret = vfio_df_open(df);
> > -   if (ret)
> > +   if (ret) {
> > df->iommufd = NULL;
> > +   goto out_put_kvm;
> > +   }
> > +
> > +   /*
> > +* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +* read/write/mmap and vfio_file_has_device_access()
> > +*/
> > +   smp_store_release(>access_granted, true);
> >
> > +out_put_kvm:
> > if (device->open_count == 0)
> > vfio_device_put_kvm(device);
> >
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index f9eb52eb9ed7..fdf2fc73f880 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,7 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> > struct vfio_device *device;
> > +   bool access_granted;
> 
> Should we make this a more strongly defined data type and later move
> devid (u32) here to partially fill the hole created?

Before your question, let me describe how I place the fields
of this structure to see if it is common practice. The first two
fields are static, so they are in the beginning. The access_granted
is lockless and other fields are protected by locks. So I tried to
put the lock and the fields it protects closely. So this is why I put
devid behind iommufd as both are protected by the same lock.

struct vfio_device_file {
struct vfio_device *device;
struct vfio_group *group;

bool access_granted;
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
struct iommufd_ctx *iommufd; /* protected by struct 
vfio_device_set::lock */
u32 devid; /* only valid when iommufd is valid */
};

> 
> I think this is being placed towards the front of the data structure
> for cache line locality given this is a hot path for file operations.
> But bool types have an implementation dependent size, making them
> difficult to pack.  Also there will be a tendency to want to make this
> a bit field, which is probably not compatible with the smp lockless
> operations being used here.  We might get in front of these issues if
> we just define it as a u8 now.  Thanks,

Not quite get why bit field is going to be incompatible with smp
lockless operations. Could you elaborate a bit? And should I define
the access_granted as u8 or "u8:1"?

Regards,
Yi Liu

> 
> > spinlock_t kvm_ref_lock; /* protect kvm field */
> > struct kvm *kvm;
> > struct iommufd_ctx *iommufd; /* protected by struct 
> > vfio_device_set::lock */
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index a3c5817fc545..4c8b7713dc3d 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -1129,6 +1129,10 @@ static long vfio_device_fops_unl_ioctl(struct file 
> > *filep,
> > struct vfio_device *device = df->device;
> > int ret;
> >
> > +   /* Paired with

Re: [Intel-gfx] [PATCH v12 06/24] vfio: Pass struct vfio_device_file * to vfio_device_open/close()

2023-06-12 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, June 13, 2023 5:52 AM
> 
> On Fri,  2 Jun 2023 05:16:35 -0700
> Yi Liu  wrote:
> 
> > This avoids passing too much parameters in multiple functions. Per the
> > input parameter change, rename the function to be vfio_df_open/close().
> >
> > Reviewed-by: Kevin Tian 
> > Reviewed-by: Jason Gunthorpe 
> > Reviewed-by: Eric Auger 
> > Tested-by: Terrence Xu 
> > Tested-by: Nicolin Chen 
> > Tested-by: Matthew Rosato 
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c | 20 ++--
> >  drivers/vfio/vfio.h  |  8 
> >  drivers/vfio/vfio_main.c | 25 +++--
> >  3 files changed, 33 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index b56e19d2a02d..caf53716ddb2 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -169,8 +169,9 @@ static void vfio_device_group_get_kvm_safe(struct 
> > vfio_device
> *device)
> > spin_unlock(>group->kvm_ref_lock);
> >  }
> >
> > -static int vfio_device_group_open(struct vfio_device *device)
> > +static int vfio_df_group_open(struct vfio_device_file *df)
> >  {
> > +   struct vfio_device *device = df->device;
> > int ret;
> >
> > mutex_lock(>group->group_lock);
> > @@ -190,7 +191,11 @@ static int vfio_device_group_open(struct vfio_device 
> > *device)
> > if (device->open_count == 0)
> > vfio_device_group_get_kvm_safe(device);
> >
> > -   ret = vfio_device_open(device, device->group->iommufd);
> > +   df->iommufd = device->group->iommufd;
> > +
> > +   ret = vfio_df_open(df);
> > +   if (ret)
> > +   df->iommufd = NULL;
> >
> > if (device->open_count == 0)
> > vfio_device_put_kvm(device);
> > @@ -202,12 +207,15 @@ static int vfio_device_group_open(struct vfio_device
> *device)
> > return ret;
> >  }
> >
> > -void vfio_device_group_close(struct vfio_device *device)
> > +void vfio_df_group_close(struct vfio_device_file *df)
> >  {
> > +   struct vfio_device *device = df->device;
> > +
> > mutex_lock(>group->group_lock);
> > mutex_lock(>dev_set->lock);
> >
> > -   vfio_device_close(device, device->group->iommufd);
> > +   vfio_df_close(df);
> > +   df->iommufd = NULL;
> >
> > if (device->open_count == 0)
> > vfio_device_put_kvm(device);
> > @@ -228,7 +236,7 @@ static struct file *vfio_device_open_file(struct 
> > vfio_device
> *device)
> > goto err_out;
> > }
> >
> > -   ret = vfio_device_group_open(device);
> > +   ret = vfio_df_group_open(df);
> > if (ret)
> > goto err_free;
> >
> > @@ -260,7 +268,7 @@ static struct file *vfio_device_open_file(struct 
> > vfio_device
> *device)
> > return filep;
> >
> >  err_close_device:
> > -   vfio_device_group_close(device);
> > +   vfio_df_group_close(df);
> >  err_free:
> > kfree(df);
> >  err_out:
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 69e1a0692b06..f9eb52eb9ed7 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -20,13 +20,13 @@ struct vfio_device_file {
> > struct vfio_device *device;
> > spinlock_t kvm_ref_lock; /* protect kvm field */
> > struct kvm *kvm;
> > +   struct iommufd_ctx *iommufd; /* protected by struct 
> > vfio_device_set::lock */
> >  };
> >
> >  void vfio_device_put_registration(struct vfio_device *device);
> >  bool vfio_device_try_get_registration(struct vfio_device *device);
> > -int vfio_device_open(struct vfio_device *device, struct iommufd_ctx 
> > *iommufd);
> > -void vfio_device_close(struct vfio_device *device,
> > -  struct iommufd_ctx *iommufd);
> > +int vfio_df_open(struct vfio_device_file *df);
> > +void vfio_df_close(struct vfio_device_file *df);
> >  struct vfio_device_file *
> >  vfio_allocate_device_file(struct vfio_device *device);
> >
> > @@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device 
> > *device);
> >  void vfio_device_group_unregister(struct vfio_device *device);
> >  int vfio_device_group_use_iommu(struct vfio_device *device);
> >  void vfio_device_group_unuse_iommu(struct vfio_device *device);
> > -void vfio_device_group_close(struct vfio_device *device);
> > +void vfio_df_group_close(struct vfio_device_file *df);
> >  struct vfio_group *vfio_group_from_file(struct file *file);
> >  bool vfio_group_enforced_coherent(struct vfio_group *group);
> >  void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 8ef9210ad2aa..a3c5817fc545 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -434,9 +434,10 @@ vfio_allocate_device_file(struct vfio_device *device)
> > return df;
> >  }
> >
> > -static int vfio_device_first_open(struct vfio_device *device,
> > - struct iommufd_ctx *iommufd)
> > +static int

Re: [Intel-gfx] [PATCH v7 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-06-08 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, June 9, 2023 6:30 AM
> 
> On Fri,  2 Jun 2023 05:15:15 -0700
> Yi Liu  wrote:
> 
> > This is the way user to invoke hot-reset for the devices opened by cdev
> > interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> > in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> > hot-reset for cdev devices.
> >
> > Suggested-by: Jason Gunthorpe 
> > Signed-off-by: Jason Gunthorpe 
> > Reviewed-by: Jason Gunthorpe 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 61 ++--
> >  include/uapi/linux/vfio.h| 14 
> >  2 files changed, 64 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index a615a223cdef..b0eadafcbcf5 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct 
> > vfio_pci_core_device
> *vdev)
> >  struct vfio_pci_group_info;
> >  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups);
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx);
> >
> >  /*
> >   * INTx masking requires the ability to disable INTx signaling via 
> > PCI_COMMAND
> > @@ -1308,8 +1309,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > if (ret)
> > return ret;
> >
> > -   /* Somewhere between 1 and count is OK */
> > -   if (!array_count || array_count > count)
> > +   if (array_count > count)
> > return -EINVAL;
> >
> > group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1358,7 +1358,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > info.count = array_count;
> > info.files = files;
> >
> > -   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, );
> > +   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, , NULL);
> >
> >  hot_reset_release:
> > for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1381,13 +1381,21 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > if (hdr.argsz < minsz || hdr.flags)
> > return -EINVAL;
> >
> > +   /* zero-length array is only for cdev opened devices */
> > +   if (!!hdr.count == vfio_device_cdev_opened(>vdev))
> > +   return -EINVAL;
> > +
> > /* Can we do a slot or bus reset or neither? */
> > if (!pci_probe_reset_slot(vdev->pdev->slot))
> > slot = true;
> > else if (pci_probe_reset_bus(vdev->pdev->bus))
> > return -ENODEV;
> >
> > -   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +   if (hdr.count)
> > +   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, 
> > slot, arg);
> > +
> > +   return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> > + 
> > vfio_iommufd_device_ictx(>vdev));
> >  }
> >
> >  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2354,13 +2362,16 @@ const struct pci_error_handlers
> vfio_pci_core_err_handlers = {
> >  };
> >  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
> >
> > -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > +static bool vfio_dev_in_groups(struct vfio_device *vdev,
> >struct vfio_pci_group_info *groups)
> >  {
> > unsigned int i;
> >
> > +   if (!groups)
> > +   return false;
> > +
> > for (i = 0; i < groups->count; i++)
> > -   if (vfio_file_has_dev(groups->files[i], >vdev))
> > +   if (vfio_file_has_dev(groups->files[i], vdev))
> > return true;
> > return false;
> >  }
> > @@ -2436,7 +2447,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> >   * get each memory_lock.
> >   */
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups)
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx)
> >  {
> > struct vfio_pci_core_device *cur_mem;
> > struct vfio_pci_core_device *cur_vma;
> > @@ -2466,11 +2478,38 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> > goto err_unlock;
> >
> > list_for_each_entry(cur_vma, _set->device_list, vdev.dev_set_list) {
> > +   bool owned;
> > +
> > /*
> > -* Test whether all the affected devices are contained by the
> > -* set of groups provided by the user.
> > +*

Re: [Intel-gfx] [PATCH v7 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-08 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, June 9, 2023 6:27 AM
> 
> On Fri,  2 Jun 2023 05:15:14 -0700
> Yi Liu  wrote:
> 
> > This allows VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD managed
> > device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is reported to indicate
> > the values returned are IOMMUFD devids rather than group IDs as used when
> > accessing vfio devices through the conventional vfio group interface.
> > Additionally the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported
> > in this mode if all of the devices affected by the hot-reset are owned by
> > either virtue of being directly bound to the same iommufd context as the
> > calling device, or implicitly owned via a shared IOMMU group.
> >
> > Suggested-by: Jason Gunthorpe 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/iommufd.c   | 49 +++
> >  drivers/vfio/pci/vfio_pci_core.c | 47 +-
> >  include/linux/vfio.h | 16 ++
> >  include/uapi/linux/vfio.h| 50 +++-
> >  4 files changed, 154 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 88b00c501015..a04f3a493437 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -66,6 +66,55 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +struct iommufd_ctx *vfio_iommufd_device_ictx(struct vfio_device *vdev)
> > +{
> > +   if (vdev->iommufd_device)
> > +   return iommufd_device_to_ictx(vdev->iommufd_device);
> > +   return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_device_ictx);
> > +
> > +static int vfio_iommufd_device_id(struct vfio_device *vdev)
> > +{
> > +   if (vdev->iommufd_device)
> > +   return iommufd_device_to_id(vdev->iommufd_device);
> > +   return -EINVAL;
> 
> If this is actually reachable, it allows us to return -EINVAL as a
> devid in the reset-info ioctl, which is not a defined value.  Should
> this return VFIO_PCI_DEVID_NOT_OWNED or do you want to catch the errno
> value in the caller?  Thanks,

This error can be reached if user invokes _INFO or HOT_RESET on an emulated
device or a physical device that has not been bound to iommufd. Both should
be considered as not-owned. So return VFIO_PCI_DEVID_NOT_OWNED makes
more sense.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v7 4/9] iommufd: Add iommufd_ctx_has_group()

2023-06-08 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, June 9, 2023 5:41 AM
> 
> On Fri,  2 Jun 2023 05:15:10 -0700
> Yi Liu  wrote:
> 
> > This adds the helper to check if any device within the given iommu_group
> > has been bound with the iommufd_ctx. This is helpful for the checking on
> > device ownership for the devices which have not been bound but cannot be
> > bound to any other iommufd_ctx as the iommu_group has been bound.
> >
> > Tested-by: Terrence Xu 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/device.c | 30 ++
> >  include/linux/iommufd.h|  8 
> >  2 files changed, 38 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 4f9b2142274c..4571344c8508 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -98,6 +98,36 @@ struct iommufd_device *iommufd_device_bind(struct
> iommufd_ctx *ictx,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_has_group - True if any device within the group is bound
> > + * to the ictx
> > + * @ictx: iommufd file descriptor
> > + * @group: Pointer to a physical iommu_group struct
> > + *
> > + * True if any device within the group has been bound to this ictx, ex. via
> > + * iommufd_device_bind(), therefore implying ictx ownership of the group.
> > + */
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group 
> > *group)
> > +{
> > +   struct iommufd_object *obj;
> > +   unsigned long index;
> > +
> > +   if (!ictx || !group)
> > +   return false;
> > +
> > +   xa_lock(>objects);
> > +   xa_for_each(>objects, index, obj) {
> > +   if (obj->type == IOMMUFD_OBJ_DEVICE &&
> > +   container_of(obj, struct iommufd_device, obj)->group == 
> > group) {
> > +   xa_unlock(>objects);
> > +   return true;
> > +   }
> > +   }
> > +   xa_unlock(>objects);
> > +   return false;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > +
> >  /**
> >   * iommufd_device_unbind - Undo iommufd_device_bind()
> >   * @idev: Device returned by iommufd_device_bind()
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 1129a36a74c4..33fe57e95e42 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -16,6 +16,7 @@ struct page;
> >  struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> > +struct iommu_group;
> >
> >  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> >struct device *dev, u32 *id);
> > @@ -50,6 +51,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> >  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
> >  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group 
> > *group);
> >
> >  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long 
> > iova,
> >  unsigned long length, struct page **out_pages,
> > @@ -71,6 +73,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx 
> > *ictx)
> >  {
> >  }
> >
> > +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> > +struct iommu_group *group)
> > +{
> > +   return false;
> > +}
> > +
> >  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
> >unsigned long iova,
> >unsigned long length,
> 
> It looks like the v12 cdev series no longer requires this stub?  We
> haven't used this function except from iommufd specific code since v5.

You are right. It should be dropped.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v6 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-06-01 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Thursday, June 1, 2023 3:00 AM
> 
> On Fri, May 26, 2023 at 10:04:27AM +0800, Baolu Lu wrote:
> > On 5/25/23 9:02 PM, Liu, Yi L wrote:
> > > >   It's possible that requirement
> > > > might be relaxed in the new DMA ownership model, but as it is right
> > > > now, the code enforces that requirement and any new discussion about
> > > > what makes hot-reset available should note both the ownership and
> > > > dev_set requirement.  Thanks,
> > > I think your point is that if an iommufd_ctx has acquired DMA ownerhisp
> > > of an iommu_group, it means the device is owned. And it should not
> > > matter whether all the devices in the iommu_group is present in the
> > > dev_set. It is allowed that some devices are bound to pci-stub or
> > > pcieport driver. Is it?
> > >
> > > Actually I have a doubt on it. IIUC, the above requirement on dev_set
> > > is to ensure the reset to the devices are protected by the dev_set->lock.
> > > So that either the reset issued by driver itself or a hot reset request
> > > from user, there is no race. But if a device is not in the dev_set, then
> > > hot reset request from user might race with the bound driver. DMA 
> > > ownership
> > > only guarantees the drivers won't handle DMA via DMA API which would have
> > > conflict with DMA mappings from user. I'm not sure if it is able to
> > > guarantee reset is exclusive as well. I see pci-stub and pcieport driver
> > > are the only two drivers that set the driver_managed_dma flag besides the
> > > vfio drivers. pci-stub may be fine. not sure about pcieport driver.
> >
> > commit c7d469849747 ("PCI: portdrv: Set driver_managed_dma") described
> > the criteria of adding driver_managed_dma to the pcieport driver.
> >
> > "
> > We achieve this by setting ".driver_managed_dma = true" in pci_driver
> > structure. It is safe because the portdrv driver meets below criteria:
> >
> > - This driver doesn't use DMA, as you can't find any related calls like
> >   pci_set_master() or any kernel DMA API (dma_map_*() and etc.).
> > - It doesn't use MMIO as you can't find ioremap() or similar calls. It's
> >   tolerant to userspace possibly also touching the same MMIO registers
> >   via P2P DMA access.
> > "
> >
> > pci_rest_device() definitely shouldn't be done by the kernel drivers
> > that have driver_managed_dma set.
> 
> Right
> 
> The only time it is safe to reset is if you know there is no attached
> driver or you know VFIO is the attached driver and the caller owns the
> VFIO too.
> 
> We haven't done a no attached driver test due to races.

Ok. @Alex, should we relax the above dev_set requirement now or should
be in a separate series?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v6 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-05-31 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, June 1, 2023 1:22 AM

> > Now, I intend to disallow it. If compat mode user binds the devices
> > to different containers, it shall be able to do hot reset as it can use
> > group fd to prove ownership. But if using the zero-length array, it
> > would be failed. So we may add below logic and remove the
> > vfio_device_cdev_opened() in vfio_pci_ioctl_pci_hot_reset_groups().
> >
> > vfio_pci_ioctl_pci_hot_reset()
> > {
> > ...
> > if (!!hdr.count == !!vfio_device_cdev_opened(>vdev))
> > return -EINVAL;
> > if (hdr.count)
> > vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> >  
> > vfio_iommufd_device_ictx(>vdev));
> > }
> >
> > >
> > > I thought it would be that this function is called with groups == NULL
> > > and therefore the vfio_dev_in_groups() test below fails, but I don't
> > > think that's true for a compat opened device.  Thanks,
> >
> > How about above logic?
> 
> The double negating a function that already returns bool is a bit

Yes.

> excessive.  I might also move the test closer to the other parameter
> checking with a comment noting the null array interface is only for
> cdev opened devices.  Thanks,

Yes.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v6 10/10] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-05-29 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 25, 2023 4:20 AM
> 
> On Mon, 22 May 2023 04:57:51 -0700
> Yi Liu  wrote:
> 
> > This is the way user to invoke hot-reset for the devices opened by cdev
> > interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED
> > in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> > hot-reset for cdev devices.
> >
> > Suggested-by: Jason Gunthorpe 
> > Signed-off-by: Jason Gunthorpe 
> > Reviewed-by: Jason Gunthorpe 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 56 +---
> >  include/uapi/linux/vfio.h| 14 
> >  2 files changed, 59 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 890065f846e4..67f1cb426505 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -181,7 +181,8 @@ static void vfio_pci_probe_mmaps(struct 
> > vfio_pci_core_device
> *vdev)
> >  struct vfio_pci_group_info;
> >  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups);
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx);
> >
> >  /*
> >   * INTx masking requires the ability to disable INTx signaling via 
> > PCI_COMMAND
> > @@ -1301,8 +1302,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > if (ret)
> > return ret;
> >
> > -   /* Somewhere between 1 and count is OK */
> > -   if (!array_count || array_count > count)
> > +   if (array_count > count || vfio_device_cdev_opened(>vdev))
> > return -EINVAL;
> >
> > group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1351,7 +1351,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > info.count = array_count;
> > info.files = files;
> >
> > -   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, );
> > +   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, , NULL);
> >
> >  hot_reset_release:
> > for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1380,7 +1380,11 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > else if (pci_probe_reset_bus(vdev->pdev->bus))
> > return -ENODEV;
> >
> > -   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +   if (hdr.count)
> > +   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, 
> > slot, arg);
> > +
> > +   return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL,
> > + 
> > vfio_iommufd_device_ictx(>vdev));
> >  }
> >
> >  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2347,13 +2351,16 @@ const struct pci_error_handlers
> vfio_pci_core_err_handlers = {
> >  };
> >  EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
> >
> > -static bool vfio_dev_in_groups(struct vfio_pci_core_device *vdev,
> > +static bool vfio_dev_in_groups(struct vfio_device *vdev,
> >struct vfio_pci_group_info *groups)
> >  {
> > unsigned int i;
> >
> > +   if (!groups)
> > +   return false;
> > +
> > for (i = 0; i < groups->count; i++)
> > -   if (vfio_file_has_dev(groups->files[i], >vdev))
> > +   if (vfio_file_has_dev(groups->files[i], vdev))
> > return true;
> > return false;
> >  }
> > @@ -2429,7 +2436,8 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> >   * get each memory_lock.
> >   */
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups)
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx)
> >  {
> > struct vfio_pci_core_device *cur_mem;
> > struct vfio_pci_core_device *cur_vma;
> > @@ -2459,11 +2467,37 @@ static int vfio_pci_dev_set_hot_reset(struct
> vfio_device_set *dev_set,
> > goto err_unlock;
> >
> > list_for_each_entry(cur_vma, _set->device_list, vdev.dev_set_list) {
> > +   bool owned;
> > +
> > /*
> > -* Test whether all the affected devices are contained by the
> > -* set of groups provided by the user.
> > +* Test whether all the affected devices can be reset by the
> > +* user.
> > +*
> > +* If the user provides a set of groups, all the devices
> > +* in the dev_set should be contained by the set of groups
> > +* provided by the user.
> 
> "If called from a group opened device and the

Re: [Intel-gfx] [PATCH v11 20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

2023-05-26 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, May 26, 2023 12:00 AM
> 
> On Thu, 25 May 2023 03:03:54 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Wednesday, May 24, 2023 11:32 PM
> > >
> > > On Wed, 24 May 2023 02:12:14 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, May 23, 2023 11:50 PM
> > > > >
> > > > > On Tue, 23 May 2023 01:20:17 +
> > > > > "Liu, Yi L"  wrote:
> > > > >
> > > > > > > From: Alex Williamson 
> > > > > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > > > > >
> > > > > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > > > > Yi Liu  wrote:
> > > > > > >
> > > > > > > > return kasprintf(GFP_KERNEL, "vfio/devices/%s", 
> > > > > > > > dev_name(dev));
> > > > > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > > > > --- a/drivers/vfio/iommufd.c
> > > > > > > > +++ b/drivers/vfio/iommufd.c
> > > > > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct 
> > > > > > > > vfio_device_file
> *df)
> > > > > > > > vdev->ops->unbind_iommufd(vdev);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > > > > +{
> > > > > > > > +   lockdep_assert_held(>dev_set->lock);
> > > > > > > > +
> > > > > > > > +   if (vfio_device_is_noiommu(vdev))
> > > > > > > > +   return 0;
> > > > > > >
> > > > > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  
> > > > > > > We
> > > > > > > return success and copy back the provided pt_id, why would a user 
> > > > > > > not
> > > > > > > consider it a bug that they can't use whatever value was there 
> > > > > > > with
> > > > > > > iommufd?
> > > > > >
> > > > > > Yes, this is the question I asked in [1]. At that time, it appears 
> > > > > > to me
> > > > > > that better to allow it [2]. Maybe it's more suitable to ask it 
> > > > > > here.
> > > > >
> > > > > From an API perspective it seems wrong.  We return success without
> > > > > doing anything.  A user would be right to consider it a bug that the
> > > > > attach operation works but there's not actually any association to the
> > > > > IOAS.  Thanks,
> > > >
> > > > The current version is kind of tradeoff based on prior remarks when
> > > > I asked the question. As prior comment[2], it appears to me the attach
> > > > shall success for noiommu devices as well, but per your remark it seems
> > > > not in plan. So anyway, we may just fail the attach/detach for noiommu
> > > > devices. Is it?
> > >
> > > If a user creates an ioas within an iommufd, attaches a device to that
> > > ioas and populates it with mappings, wouldn't the user expect the
> > > device to have access to and honor those mappings?  I think that's the
> > > path we're headed down if we report a successful attach of a noiommu
> > > device to an ioas.
> >
> > makes sense. Let's just fail attach/detach for noiommu devices.
> >
> > >
> > > We need to keep in mind that noiommu was meant to be a minimally
> > > intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> > > the group requirements, solely for the purpose of making use of the
> > > vfio device interface and without providing any DMA mapping services or
> > > expectations.  IMO, an argument that we need the attach op to succeed in
> > > order to avoid too much disruption in userspace code is nonsense.  On
> > > the contrary, userspace needs to be very aware of this difference and
> > > we shouldn't invest effort trying to make noiommu more convenient to
> > > use.  It's inherently unsafe.
> &g

Re: [Intel-gfx] [PATCH v6 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-05-25 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 25, 2023 3:56 AM
> On Mon, 22 May 2023 04:57:50 -0700
> Yi Liu  wrote:
> 
> > +
> > +/*
> > + * Return devid for vfio_device if the device is owned by the input
> > + * ictx.
> > + * - valid devid > 0 for the device that are bound to the input
> > + *   iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_OWNED for the devices that have not
> > + *   been opened but but other device within its group has been
> 
> "but but"

Thanks for catching it.

> 
> > + *   bound to the input iommufd_ctx.
> > + * - devid == VFIO_PCI_DEVID_NOT_OWNED for others. e.g. vdev is
> > + *   NULL.
> > + */
> > +int vfio_iommufd_device_hot_reset_devid(struct vfio_device *vdev,
> > +   struct iommufd_ctx *ictx)
> > +{
> > +   struct iommu_group *group;
> > +   int devid;
> > +
> > +   if (!vdev)
> > +   return VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +   if (vfio_iommufd_device_ictx(vdev) == ictx)
> > +   return vfio_iommufd_device_id(vdev);
> > +
> > +   group = iommu_group_get(vdev->dev);
> > +   if (!group)
> > +   return VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +   if (iommufd_ctx_has_group(ictx, group))
> > +   devid = VFIO_PCI_DEVID_OWNED;
> > +   else
> > +   devid = VFIO_PCI_DEVID_NOT_OWNED;
> > +
> > +   iommu_group_put(group);
> > +
> > +   return devid;
> > +}

> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -650,11 +650,53 @@ enum {
> >   * VFIO_DEVICE_GET_PCI_HOT_RESET_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 12,
> >   *   struct vfio_pci_hot_reset_info)
> >   *
> > + * This command is used to query the affected devices in the hot reset for
> > + * a given device.
> > + *
> > + * This command always reports the segment, bus, and devfn information for
> > + * each affected device, and selectively reports the group_id or devid per
> > + * the way how the calling device is opened.
> > + *
> > + * - If the calling device is opened via the traditional group/container
> > + *   API, group_id is reported.  User should check if it has owned all
> > + *   the affected devices and provides a set of group fds to prove the
> > + *   ownership in VFIO_DEVICE_PCI_HOT_RESET ioctl.
> > + *
> > + * - If the calling device is opened as a cdev, devid is reported.
> > + *   Flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is set to indicate this
> > + *   data type.  For a given affected device, it is considered owned by
> > + *   this interface if it meets the following conditions:
> > + *   1) Has a valid devid within the iommufd_ctx of the calling device.
> > + *  Ownership cannot be determined across separate iommufd_ctx and the
> > + *  cdev calling conventions do not support a proof-of-ownership model
> > + *  as provided in the legacy group interface.  In this case a valid
> > + *  devid with value greater than zero is provided in the return
> > + *  structure.
> > + *   2) Does not have a valid devid within the iommufd_ctx of the calling
> > + *  device, but belongs to the same IOMMU group as the calling device
> > + *  or another opened device that has a valid devid within the
> > + *  iommufd_ctx of the calling device.  This provides implicit 
> > ownership
> > + *  for devices within the same DMA isolation context.  In this case
> > + *  the invalid devid value of zero is provided in the return 
> > structure.
> > + *
> > + *   A devid value of -1 is provided in the return structure for devices
> 
> s/zero/VFIO_PCI_DEVID_OWNED/
> 
> s/-1/VFIO_PCI_DEVID_NOT_OWNED/

Will do.

> 2) above and previously in the code comment where I noted the repeated
> "but" still doesn't actually describe the requirement as I noted in the
> last review.  The user implicitly owns a device if they own another
> device within the IOMMU group, but we also impose a dev_set requirement
> in the hot reset path.  All affected devices need to be represented in
> the dev_set, ex. bound to a vfio driver.

Yes. it is. Btw. dev_set is not visible to user. Is it good to mention it
in uapi header especially w.r.t. the below potential relaxing of this
requirement?

>  It's possible that requirement
> might be relaxed in the new DMA ownership model, but as it is right
> now, the code enforces that requirement and any new discussion about
> what makes hot-reset available should note both the ownership and
> dev_set requirement.  Thanks,

I think your point is that if an iommufd_ctx has acquired DMA ownerhisp
of an iommu_group, it means the device is owned. And it should not
matter whether all the devices in the iommu_group is present in the
dev_set. It is allowed that some devices are bound to pci-stub or
pcieport driver. Is it?

Actually I have a doubt on it. IIUC, the above requirement on dev_set
is to ensure the reset to the devices are protected by the dev_set->lock.
So that either the reset issued by driver itself or a hot reset request
from user, there is no race. But

Re: [Intel-gfx] [PATCH v11 20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

2023-05-24 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, May 24, 2023 11:32 PM
> 
> On Wed, 24 May 2023 02:12:14 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, May 23, 2023 11:50 PM
> > >
> > > On Tue, 23 May 2023 01:20:17 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Tuesday, May 23, 2023 6:16 AM
> > > > >
> > > > > On Sat, 13 May 2023 06:28:24 -0700
> > > > > Yi Liu  wrote:
> > > > >
> > > > > > return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > > > > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > > > > index 83575b65ea01..799ea322a7d4 100644
> > > > > > --- a/drivers/vfio/iommufd.c
> > > > > > +++ b/drivers/vfio/iommufd.c
> > > > > > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct 
> > > > > > vfio_device_file *df)
> > > > > > vdev->ops->unbind_iommufd(vdev);
> > > > > >  }
> > > > > >
> > > > > > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > > > > > +{
> > > > > > +   lockdep_assert_held(>dev_set->lock);
> > > > > > +
> > > > > > +   if (vfio_device_is_noiommu(vdev))
> > > > > > +   return 0;
> > > > >
> > > > > Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> > > > > return success and copy back the provided pt_id, why would a user not
> > > > > consider it a bug that they can't use whatever value was there with
> > > > > iommufd?
> > > >
> > > > Yes, this is the question I asked in [1]. At that time, it appears to me
> > > > that better to allow it [2]. Maybe it's more suitable to ask it here.
> > >
> > > From an API perspective it seems wrong.  We return success without
> > > doing anything.  A user would be right to consider it a bug that the
> > > attach operation works but there's not actually any association to the
> > > IOAS.  Thanks,
> >
> > The current version is kind of tradeoff based on prior remarks when
> > I asked the question. As prior comment[2], it appears to me the attach
> > shall success for noiommu devices as well, but per your remark it seems
> > not in plan. So anyway, we may just fail the attach/detach for noiommu
> > devices. Is it?
> 
> If a user creates an ioas within an iommufd, attaches a device to that
> ioas and populates it with mappings, wouldn't the user expect the
> device to have access to and honor those mappings?  I think that's the
> path we're headed down if we report a successful attach of a noiommu
> device to an ioas.

makes sense. Let's just fail attach/detach for noiommu devices.

> 
> We need to keep in mind that noiommu was meant to be a minimally
> intrusive mechanism to provide a dummy vfio IOMMU backend and satisfy
> the group requirements, solely for the purpose of making use of the
> vfio device interface and without providing any DMA mapping services or
> expectations.  IMO, an argument that we need the attach op to succeed in
> order to avoid too much disruption in userspace code is nonsense.  On
> the contrary, userspace needs to be very aware of this difference and
> we shouldn't invest effort trying to make noiommu more convenient to
> use.  It's inherently unsafe.
> 
> I'm not fond of what a mess noiommu has become with cdev, we're well
> beyond the minimal code trickery of the legacy implementation.  I hate
> to ask, but could we reiterate our requirements for noiommu as a part of
> the native iommufd interface for vfio?  The nested userspace requirement
> is gone now that hypervisors have vIOMMU support, so my assumption is
> that this is only for bare metal systems without an IOMMU, which
> ideally are less and less prevalent.  Are there any noiommu userspaces
> that are actually going to adopt the noiommu cdev interface?  What
> terrible things happen if noiommu only exists in the vfio group compat
> interface to iommufd and at some distant point in the future dies when
> that gets disabled?

vIOMMU may introduce some performance deduction if there
are frequent map/unmap. As far as I know, some cloud service
providers are more willing to use noiommu mode within VM.
Besides the performance consideration, using a booting a VM
without vIOMMU is supposed to be more robust. But I'm not
sure if the noiommu userspace will adapt to cdev noiommu.
Perhaps yes if group may be deprecated in future.

> > btw. Should we document it somewhere as well? E.g. noiommu userspace
> > does not support attach/detach? Userspace should know it is opening
> > noiommu devices.
> 
> Documentation never hurts.  This is such a specialized use case I'm not
> sure we've bothered to do much documentation for noiommu previously.

Seems no, I didn't find special documentation for noiommu. Perhaps
a comment in the source code is enough. Depends on your taste.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v11 19/23] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-05-24 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Wednesday, May 24, 2023 10:41 AM
> 
> > From: Tian, Kevin 
> > Sent: Wednesday, May 24, 2023 10:39 AM
> >
> > > From: Liu, Yi L 
> > > Sent: Wednesday, May 24, 2023 10:21 AM
> > >
> > > > >
> > > > > vfio_device_open_file()
> > > > > {
> > > > >   dev_warn(device->dev, "vfio-noiommu device opened by user "
> > > > >  "(%s:%d)\n", current->comm, task_pid_nr(current));
> > > > > }
> > > >
> > > > There needs to be a taint when VFIO_GROUP is disabled.  Thanks,
> > > I see. I misunderstood you. You are asking for a taint. 
> > >
> > > Actually, I've considered it. But it appears to me the taint in
> > > vfio_group_find_or_alloc() is due to vfio allocates fake iommu_group.
> > > This seems to be a taint to kernel. But now, you are suggesting to add
> > > a taint as long as noiommu device is registered to vfio. Is it? If so,
> >
> > taint is required because the kernel is exposed to user DMA attack
> > due to lacking of IOMMU protection.
> >
> > fake iommu_group is just to meet vfio_group requirement.
> 
> Got it. thanks.

Please refer to the proposed change in [1]. The noiommu taint is
moved to the end of __vfio_register_dev() rely on the noiommu
flag set by vfio_device_set_noiommu().

[1] 
https://lore.kernel.org/kvm/ds0pr11mb752907d211e3703145503a12c3...@ds0pr11mb7529.namprd11.prod.outlook.com/

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v11 21/23] vfio: Determine noiommu device in __vfio_register_dev()

2023-05-24 Thread Liu, Yi L

Hi Alex,

I've two new patches to address the comment in this patch. If
it makes sense then I'll put them in cdev v12.

> From: Liu, Yi L 
> Sent: Tuesday, May 23, 2023 10:13 AM
>
> > From: Alex Williamson 
> > Sent: Tuesday, May 23, 2023 7:04 AM
> >
> > On Sat, 13 May 2023 06:28:25 -0700
> > Yi Liu  wrote:
> >
> > > This is to make the cdev path and group path consistent for the noiommu
> > > devices registration. If vfio_noiommu is disabled, such registration
> > > should fail. However, this check is vfio_device_set_group() which is part
> > > of the vfio_group code. If the vfio_group code is compiled out, noiommu
> > > devices would be registered even vfio_noiommu is disabled.
> > >
> > > This adds vfio_device_set_noiommu() which can fail and calls it in the
> > > device registration. For now, it never fails as long as
> > > vfio_device_set_group() is successful. But when the vfio_group code is
> > > compiled out, vfio_device_set_noiommu() would fail the noiommu devices
> > > when vfio_noiommu is disabled.
> >
> > I'm lost.  After the next patch we end up with the following when
> > CONFIG_VFIO_GROUP is set:
> >
> > static inline int vfio_device_set_noiommu(struct vfio_device *device)
> > {
> > device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> >   device->group->type == VFIO_NO_IOMMU;
> > return 0;
> > }
> >
> > I think this is relying on the fact that vfio_device_set_group() which
> > is called immediately prior to this function would have performed the
> > testing for noiommu and failed prior to this function being called and
> > therefore there is no error return here.
> 
> Yes.

Remove the IS_ENABLED(CONFIG_VFIO_NOIOMMU) check:

>From 3e93d33dc426350389a89130557a212cf370fee6 Mon Sep 17 00:00:00 2001
From: Yi Liu 
Date: Tue, 23 May 2023 20:48:08 -0700
Subject: [PATCH 19/23] vfio: Only check group->type for noiommu test

group->type can be VFIO_NO_IOMMU only when vfio_noiommu option is true.
And vfio_noiommu option can only be true if CONFIG_VFIO_NOIOMMU is enabled.
So checking group->type is enough when testing noiommu.

Signed-off-by: Yi Liu 
---
 drivers/vfio/group.c | 3 +--
 drivers/vfio/vfio.h  | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index c930406cc261..3b56959fcdbb 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -133,8 +133,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group 
*group,
 
iommufd = iommufd_ctx_from_file(f.file);
if (!IS_ERR(iommufd)) {
-   if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-   group->type == VFIO_NO_IOMMU)
+   if (group->type == VFIO_NO_IOMMU)
ret = iommufd_vfio_compat_set_no_iommu(iommufd);
else
ret = iommufd_vfio_compat_ioas_create(iommufd);
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 0f66d0934e91..104c2ee93833 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -108,8 +108,7 @@ void vfio_group_cleanup(void);
 
 static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
 {
-   return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
-  vdev->group->type == VFIO_NO_IOMMU;
+   return vdev->group->type == VFIO_NO_IOMMU;
 }
 
 #if IS_ENABLED(CONFIG_VFIO_CONTAINER)
-- 
2.34.1

> >
> > Note also here that I think CONFIG_VFIO_NOIOMMU was only being tested
> > here previously so that a smart enough compiler would optimize out the
> > entire function, we can never set a VFIO_NO_IOMMU type when
> > !CONFIG_VFIO_NOIOMMU.
> 
> You are right. VFIO_NO_IOMMU type is only set when vfio_noiommu==true.
> 
> > That's no longer the case if the function is
> > refactored like this.
> >
> > When !CONFIG_VFIO_GROUP:
> >
> > static inline int vfio_device_set_noiommu(struct vfio_device *device)
> > {
> > struct iommu_group *iommu_group;
> >
> > iommu_group = iommu_group_get(device->dev);
> > if (!iommu_group) {
> > if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
> > return -EINVAL;
> > device->noiommu = true;
> > } else {
> > iommu_group_put(iommu_group);
> > device->noiommu = false;
> > }
> >
> > return 0;
> > }
> >
> > Here again, the NOIOMMU config option is irrelevant, vfio_noiommu can
> > only be true if the config option is enabled.  Therefore if there's

Re: [Intel-gfx] [PATCH v11 19/23] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-05-23 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Wednesday, May 24, 2023 10:39 AM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, May 24, 2023 10:21 AM
> >
> > > >
> > > > vfio_device_open_file()
> > > > {
> > > > dev_warn(device->dev, "vfio-noiommu device opened by user "
> > > >"(%s:%d)\n", current->comm, task_pid_nr(current));
> > > > }
> > >
> > > There needs to be a taint when VFIO_GROUP is disabled.  Thanks,
> > I see. I misunderstood you. You are asking for a taint. 
> >
> > Actually, I've considered it. But it appears to me the taint in
> > vfio_group_find_or_alloc() is due to vfio allocates fake iommu_group.
> > This seems to be a taint to kernel. But now, you are suggesting to add
> > a taint as long as noiommu device is registered to vfio. Is it? If so,
> 
> taint is required because the kernel is exposed to user DMA attack
> due to lacking of IOMMU protection.
> 
> fake iommu_group is just to meet vfio_group requirement.

Got it. thanks.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v11 19/23] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-05-23 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 11:51 PM
> 
> On Tue, 23 May 2023 01:41:36 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, May 23, 2023 6:01 AM
> > >
> > > On Sat, 13 May 2023 06:28:23 -0700
> > > Yi Liu  wrote:
> > >
> > > > This adds ioctl for userspace to bind device cdev fd to iommufd.
> > > >
> > > > VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> > > >   control provided by the iommufd. 
> > > > open_device
> > > >   op is called after bind_iommufd op.
> > > >
> > > > Tested-by: Yanting Jiang 
> > > > Tested-by: Shameer Kolothum 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/vfio/device_cdev.c | 130 +
> > > >  drivers/vfio/vfio.h|  13 
> > > >  drivers/vfio/vfio_main.c   |   5 ++
> > > >  include/linux/vfio.h   |   3 +-
> > > >  include/uapi/linux/vfio.h  |  28 
> > > >  5 files changed, 178 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > > index 1c640016a824..291cc678a18b 100644
> > > > --- a/drivers/vfio/device_cdev.c
> > > > +++ b/drivers/vfio/device_cdev.c
> > > > @@ -3,6 +3,7 @@
> > > >   * Copyright (c) 2023 Intel Corporation.
> > > >   */
> > > >  #include 
> > > > +#include 
> > > >
> > > >  #include "vfio.h"
> > > >
> > > > @@ -44,6 +45,135 @@ int vfio_device_fops_cdev_open(struct inode *inode, 
> > > > struct
> > > file *filep)
> > > > return ret;
> > > >  }
> > > >
> > > > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > > > +{
> > > > +   spin_lock(>kvm_ref_lock);
> > > > +   if (df->kvm)
> > > > +   _vfio_device_get_kvm_safe(df->device, df->kvm);
> > > > +   spin_unlock(>kvm_ref_lock);
> > > > +}
> > > > +
> > > > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > > > +{
> > > > +   struct vfio_device *device = df->device;
> > > > +
> > > > +   /*
> > > > +* In the time of close, there is no contention with another one
> > > > +* changing this flag.  So read df->access_granted without lock
> > > > +* and no smp_load_acquire() is ok.
> > > > +*/
> > > > +   if (!df->access_granted)
> > > > +   return;
> > > > +
> > > > +   mutex_lock(>dev_set->lock);
> > > > +   vfio_device_close(df);
> > > > +   vfio_device_put_kvm(device);
> > > > +   iommufd_ctx_put(df->iommufd);
> > > > +   device->cdev_opened = false;
> > > > +   mutex_unlock(>dev_set->lock);
> > > > +   vfio_device_unblock_group(device);
> > > > +}
> > > > +
> > > > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > > > +{
> > > > +   struct iommufd_ctx *iommufd;
> > > > +   struct fd f;
> > > > +
> > > > +   f = fdget(fd);
> > > > +   if (!f.file)
> > > > +   return ERR_PTR(-EBADF);
> > > > +
> > > > +   iommufd = iommufd_ctx_from_file(f.file);
> > > > +
> > > > +   fdput(f);
> > > > +   return iommufd;
> > > > +}
> > > > +
> > > > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > > > +   struct vfio_device_bind_iommufd 
> > > > __user *arg)
> > > > +{
> > > > +   struct vfio_device *device = df->device;
> > > > +   struct vfio_device_bind_iommufd bind;
> > > > +   unsigned long minsz;
> > > > +   int ret;
> > > > +
> > > > +   static_assert(__same_type(arg->out_devid, df->devid));
> > > > +
> > > > +   minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > > > +
> > > > +   if (copy_from_user(, arg, mins

Re: [Intel-gfx] [PATCH v11 20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

2023-05-23 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 11:50 PM
> 
> On Tue, 23 May 2023 01:20:17 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Tuesday, May 23, 2023 6:16 AM
> > >
> > > On Sat, 13 May 2023 06:28:24 -0700
> > > Yi Liu  wrote:
> > >
> > > > This adds ioctl for userspace to attach device cdev fd to and detach
> > > > from IOAS/hw_pagetable managed by iommufd.
> > > >
> > > > VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, 
> > > > hw_pagetable
> > > >managed by iommufd. Attach can be
> > > >undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> > > >or device fd close.
> > > > VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current
> attached
> > > >IOAS or hw_pagetable managed by 
> > > > iommufd.
> > > >
> > > > Tested-by: Yanting Jiang 
> > > > Tested-by: Shameer Kolothum 
> > > > Signed-off-by: Yi Liu 
> > > > ---
> > > >  drivers/vfio/device_cdev.c | 66 ++
> > > >  drivers/vfio/iommufd.c | 18 +++
> > > >  drivers/vfio/vfio.h| 18 +++
> > > >  drivers/vfio/vfio_main.c   |  8 +
> > > >  include/uapi/linux/vfio.h  | 52 ++
> > > >  5 files changed, 162 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > > > index 291cc678a18b..3f14edb80a93 100644
> > > > --- a/drivers/vfio/device_cdev.c
> > > > +++ b/drivers/vfio/device_cdev.c
> > > > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct
> vfio_device_file
> > > *df,
> > > > return ret;
> > > >  }
> > > >
> > > > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > > > +struct vfio_device_attach_iommufd_pt 
> > > > __user *arg)
> > > > +{
> > > > +   struct vfio_device *device = df->device;
> > > > +   struct vfio_device_attach_iommufd_pt attach;
> > > > +   unsigned long minsz;
> > > > +   int ret;
> > > > +
> > > > +   minsz = offsetofend(struct vfio_device_attach_iommufd_pt, 
> > > > pt_id);
> > > > +
> > > > +   if (copy_from_user(, arg, minsz))
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (attach.argsz < minsz || attach.flags)
> > > > +   return -EINVAL;
> > > > +
> > > > +   /* ATTACH only allowed for cdev fds */
> > > > +   if (df->group)
> > > > +   return -EINVAL;
> > > > +
> > > > +   mutex_lock(>dev_set->lock);
> > > > +   ret = vfio_iommufd_attach(device, _id);
> > > > +   if (ret)
> > > > +   goto out_unlock;
> > > > +
> > > > +   ret = copy_to_user(>pt_id, _id,
> > > > +  sizeof(attach.pt_id)) ? -EFAULT : 0;
> > > > +   if (ret)
> > > > +   goto out_detach;
> > > > +   mutex_unlock(>dev_set->lock);
> > > > +
> > > > +   return 0;
> > > > +
> > > > +out_detach:
> > > > +   vfio_iommufd_detach(device);
> > > > +out_unlock:
> > > > +   mutex_unlock(>dev_set->lock);
> > > > +   return ret;
> > > > +}
> > > > +
> > > > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > > > +struct vfio_device_detach_iommufd_pt 
> > > > __user *arg)
> > > > +{
> > > > +   struct vfio_device *device = df->device;
> > > > +   struct vfio_device_detach_iommufd_pt detach;
> > > > +   unsigned long minsz;
> > > > +
> > > > +   minsz = offsetofend(struct vfio_device_detach_iommufd_pt, 
> > > > flags);
> > > > +
> > > > +   if (copy_from_user(, arg, minsz))
> > > > +   return -EFAULT;
> > > > +
> > > > +   if (detach.argsz < minsz || detach.flags)
> > > > +

Re: [Intel-gfx] [PATCH v11 21/23] vfio: Determine noiommu device in __vfio_register_dev()

2023-05-22 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 7:04 AM
> 
> On Sat, 13 May 2023 06:28:25 -0700
> Yi Liu  wrote:
> 
> > This is to make the cdev path and group path consistent for the noiommu
> > devices registration. If vfio_noiommu is disabled, such registration
> > should fail. However, this check is vfio_device_set_group() which is part
> > of the vfio_group code. If the vfio_group code is compiled out, noiommu
> > devices would be registered even vfio_noiommu is disabled.
> >
> > This adds vfio_device_set_noiommu() which can fail and calls it in the
> > device registration. For now, it never fails as long as
> > vfio_device_set_group() is successful. But when the vfio_group code is
> > compiled out, vfio_device_set_noiommu() would fail the noiommu devices
> > when vfio_noiommu is disabled.
> 
> I'm lost.  After the next patch we end up with the following when
> CONFIG_VFIO_GROUP is set:
> 
> static inline int vfio_device_set_noiommu(struct vfio_device *device)
> {
> device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
>   device->group->type == VFIO_NO_IOMMU;
> return 0;
> }
> 
> I think this is relying on the fact that vfio_device_set_group() which
> is called immediately prior to this function would have performed the
> testing for noiommu and failed prior to this function being called and
> therefore there is no error return here.

Yes.

> 
> Note also here that I think CONFIG_VFIO_NOIOMMU was only being tested
> here previously so that a smart enough compiler would optimize out the
> entire function, we can never set a VFIO_NO_IOMMU type when
> !CONFIG_VFIO_NOIOMMU.

You are right. VFIO_NO_IOMMU type is only set when vfio_noiommu==true.

> That's no longer the case if the function is
> refactored like this.
> 
> When !CONFIG_VFIO_GROUP:
> 
> static inline int vfio_device_set_noiommu(struct vfio_device *device)
> {
> struct iommu_group *iommu_group;
> 
> iommu_group = iommu_group_get(device->dev);
> if (!iommu_group) {
> if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
> return -EINVAL;
> device->noiommu = true;
> } else {
> iommu_group_put(iommu_group);
> device->noiommu = false;
> }
> 
> return 0;
> }
> 
> Here again, the NOIOMMU config option is irrelevant, vfio_noiommu can
> only be true if the config option is enabled.  Therefore if there's no
> IOMMU group and the module option to enable noiommu is not set, return
> an error.

Yes. I think I missed the part that the vfio_noiommu option can only be
true when CONFIG_VFIO_NOIOMMU is enabled. That's why the check
is "IS_ENABLED(CONFIG_VFIO_NOIOMMU) && device->group->type == VFIO_NO_IOMMU;".
This appears that the two conditions are orthogonal.

> 
> It's really quite ugly that in one mode we rely on this function to
> generate the error and in the other mode it happens prior to getting
> here.
> 
> The above could be simplified to something like:
> 
>   iommu_group = iommu_group_get(device->dev);
>   if (!iommu_group && !vfio_iommu)
>   return -EINVAL;
> 
>   device->noiommu = !iommu_group;
>   iommu_group_put(iommu_group); /* Accepts NULL */
>   return 0;
> 
> Which would actually work regardless of CONFIG_VFIO_GROUP, where maybe
> this could then be moved before vfio_device_set_group() and become the
> de facto exit point for invalid noiommu configurations and maybe we
> could remove the test from the group code (with a comment to note that
> it's been tested prior)?  Thanks,

Yes, this looks much clearer. I think this new logic must be called before
vfio_device_set_group(). Otherwise,  iommu_group_get () may return
the faked groups. Then it would need to work differently per CONFIG_VFIO_GROUP
configuration.

Regards,
Yi Liu
> 
> > As noiommu devices is checked and there are multiple places which needs
> > to test noiommu devices, this also adds a flag to mark noiommu devices.
> > Hence the callers of vfio_device_is_noiommu() can be converted to test
> > vfio_device->noiommu.
> >
> > Reviewed-by: Kevin Tian 
> > Tested-by: Nicolin Chen 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/device_cdev.c |  4 ++--
> >  drivers/vfio/group.c   |  2 +-
> >  drivers/vfio/iommufd.c | 10 +-
> >  drivers/vfio/vfio.h|  7 ---
> >  drivers/vfio/vfio_main.c   |  6 +-
> >  include/linux/vfio.h   |  1 +
> >  6 files changed, 18 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 3f14edb80a93..6d7f50ee535d 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -111,7 +111,7 @@ long vfio_device_ioctl_bind_iommufd(struct 
> > vfio_device_file
> *df,
> > if (df->group)
> > return -EINVAL;
> >
> > -   if (vfio_device_is_noiommu(device) && !capable(CAP_SYS_RAWIO))
> > +

Re: [Intel-gfx] [PATCH v11 19/23] vfio: Add VFIO_DEVICE_BIND_IOMMUFD

2023-05-22 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 6:01 AM
> 
> On Sat, 13 May 2023 06:28:23 -0700
> Yi Liu  wrote:
> 
> > This adds ioctl for userspace to bind device cdev fd to iommufd.
> >
> > VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
> >   control provided by the iommufd. open_device
> >   op is called after bind_iommufd op.
> >
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/device_cdev.c | 130 +
> >  drivers/vfio/vfio.h|  13 
> >  drivers/vfio/vfio_main.c   |   5 ++
> >  include/linux/vfio.h   |   3 +-
> >  include/uapi/linux/vfio.h  |  28 
> >  5 files changed, 178 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 1c640016a824..291cc678a18b 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -3,6 +3,7 @@
> >   * Copyright (c) 2023 Intel Corporation.
> >   */
> >  #include 
> > +#include 
> >
> >  #include "vfio.h"
> >
> > @@ -44,6 +45,135 @@ int vfio_device_fops_cdev_open(struct inode *inode, 
> > struct
> file *filep)
> > return ret;
> >  }
> >
> > +static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
> > +{
> > +   spin_lock(>kvm_ref_lock);
> > +   if (df->kvm)
> > +   _vfio_device_get_kvm_safe(df->device, df->kvm);
> > +   spin_unlock(>kvm_ref_lock);
> > +}
> > +
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > +   struct vfio_device *device = df->device;
> > +
> > +   /*
> > +* In the time of close, there is no contention with another one
> > +* changing this flag.  So read df->access_granted without lock
> > +* and no smp_load_acquire() is ok.
> > +*/
> > +   if (!df->access_granted)
> > +   return;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   vfio_device_close(df);
> > +   vfio_device_put_kvm(device);
> > +   iommufd_ctx_put(df->iommufd);
> > +   device->cdev_opened = false;
> > +   mutex_unlock(>dev_set->lock);
> > +   vfio_device_unblock_group(device);
> > +}
> > +
> > +static struct iommufd_ctx *vfio_get_iommufd_from_fd(int fd)
> > +{
> > +   struct iommufd_ctx *iommufd;
> > +   struct fd f;
> > +
> > +   f = fdget(fd);
> > +   if (!f.file)
> > +   return ERR_PTR(-EBADF);
> > +
> > +   iommufd = iommufd_ctx_from_file(f.file);
> > +
> > +   fdput(f);
> > +   return iommufd;
> > +}
> > +
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > +   struct vfio_device_bind_iommufd __user *arg)
> > +{
> > +   struct vfio_device *device = df->device;
> > +   struct vfio_device_bind_iommufd bind;
> > +   unsigned long minsz;
> > +   int ret;
> > +
> > +   static_assert(__same_type(arg->out_devid, df->devid));
> > +
> > +   minsz = offsetofend(struct vfio_device_bind_iommufd, out_devid);
> > +
> > +   if (copy_from_user(, arg, minsz))
> > +   return -EFAULT;
> > +
> > +   if (bind.argsz < minsz || bind.flags || bind.iommufd < 0)
> > +   return -EINVAL;
> > +
> > +   /* BIND_IOMMUFD only allowed for cdev fds */
> > +   if (df->group)
> > +   return -EINVAL;
> > +
> > +   if (vfio_device_is_noiommu(device) && !capable(CAP_SYS_RAWIO))
> > +   return -EPERM;
> > +
> > +   ret = vfio_device_block_group(device);
> > +   if (ret)
> > +   return ret;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   /* one device cannot be bound twice */
> > +   if (df->access_granted) {
> > +   ret = -EINVAL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   df->iommufd = vfio_get_iommufd_from_fd(bind.iommufd);
> > +   if (IS_ERR(df->iommufd)) {
> > +   ret = PTR_ERR(df->iommufd);
> > +   df->iommufd = NULL;
> > +   goto out_unlock;
> > +   }
> > +
> > +   /*
> > +* Before the device open, get the KVM pointer currently
> > +* associated with the device file (if there is) and obtain
> > +* a reference.  This reference is held until device closed.
> > +* Save the pointer in the device for use by drivers.
> > +*/
> > +   vfio_device_get_kvm_safe(df);
> > +
> > +   ret = vfio_device_open(df);
> > +   if (ret)
> > +   goto out_put_kvm;
> > +
> > +   ret = copy_to_user(>out_devid, >devid,
> > +  sizeof(df->devid)) ? -EFAULT : 0;
> > +   if (ret)
> > +   goto out_close_device;
> > +
> > +   /*
> > +* Paired with smp_load_acquire() in vfio_device_fops::ioctl/
> > +* read/write/mmap
> > +*/
> > +   smp_store_release(>access_granted, true);
> > +   device->cdev_opened = true;
> > +   mutex_unlock(>dev_set->lock);
> > +
> > +   if (vfio_device_is_noiommu(device))
> > +   dev_warn(device->dev, "noiommu device is bound to iommufd by 
> > user
> "
> > +"(%s:%d)\n", current->comm, task_pid_nr(current));
> 
> The noiommu kernel

Re: [Intel-gfx] [PATCH v11 13/23] vfio-iommufd: Add detach_ioas support for physical VFIO devices

2023-05-22 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 4:59 AM
> 
> On Mon, 22 May 2023 14:46:17 -0600
> Alex Williamson  wrote:
> 
> > On Sat, 13 May 2023 06:28:17 -0700
> > Yi Liu  wrote:
> >
> > > this prepares for adding DETACH ioctl for physical VFIO devices.
> > >
> > > Reviewed-by: Kevin Tian 
> > > Tested-by: Terrence Xu 
> > > Tested-by: Nicolin Chen 
> > > Tested-by: Matthew Rosato 
> > > Tested-by: Yanting Jiang 
> > > Tested-by: Shameer Kolothum 
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  Documentation/driver-api/vfio.rst |  8 +---
> > >  drivers/vfio/fsl-mc/vfio_fsl_mc.c |  1 +
> > >  drivers/vfio/iommufd.c| 20 +++
> > >  .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c|  2 ++
> > >  drivers/vfio/pci/mlx5/main.c  |  1 +
> > >  drivers/vfio/pci/vfio_pci.c   |  1 +
> > >  drivers/vfio/platform/vfio_amba.c |  1 +
> > >  drivers/vfio/platform/vfio_platform.c |  1 +
> > >  drivers/vfio/vfio_main.c  |  3 ++-
> > >  include/linux/vfio.h  |  8 +++-
> > >  10 files changed, 41 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/Documentation/driver-api/vfio.rst 
> > > b/Documentation/driver-api/vfio.rst
> > > index 68abc089d6dd..363e12c90b87 100644
> > > --- a/Documentation/driver-api/vfio.rst
> > > +++ b/Documentation/driver-api/vfio.rst
> > > @@ -279,6 +279,7 @@ similar to a file operations structure::
> > >   struct iommufd_ctx *ictx, u32 
> > > *out_device_id);
> > >   void(*unbind_iommufd)(struct vfio_device *vdev);
> > >   int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
> > > + void(*detach_ioas)(struct vfio_device *vdev);
> > >   int (*open_device)(struct vfio_device *vdev);
> > >   void(*close_device)(struct vfio_device *vdev);
> > >   ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
> > > @@ -315,9 +316,10 @@ container_of().
> > >   - The [un]bind_iommufd callbacks are issued when the device is bound to
> > > and unbound from iommufd.
> > >
> > > - - The attach_ioas callback is issued when the device is attached to an
> > > -   IOAS managed by the bound iommufd. The attached IOAS is automatically
> > > -   detached when the device is unbound from iommufd.
> > > + - The [de]attach_ioas callback is issued when the device is attached to
> > > +   and detached from an IOAS managed by the bound iommufd. However, the
> > > +   attached IOAS can also be automatically detached when the device is
> > > +   unbound from iommufd.
> > >
> > >   - The read/write/mmap callbacks implement the device region access 
> > > defined
> > > by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
> > > diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c 
> > > b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > index c89a047a4cd8..d540cf683d93 100644
> > > --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
> > > @@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = 
> > > {
> > >   .bind_iommufd   = vfio_iommufd_physical_bind,
> > >   .unbind_iommufd = vfio_iommufd_physical_unbind,
> > >   .attach_ioas= vfio_iommufd_physical_attach_ioas,
> > > + .detach_ioas= vfio_iommufd_physical_detach_ioas,
> > >  };
> > >
> > >  static struct fsl_mc_driver vfio_fsl_mc_driver = {
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index 07b58c4625b5..e3953e1617a5 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -167,6 +167,14 @@ int vfio_iommufd_physical_attach_ioas(struct 
> > > vfio_device
> *vdev, u32 *pt_id)
> > >  {
> > >   int rc;
> > >
> > > + lockdep_assert_held(>dev_set->lock);
> > > +
> > > + if (WARN_ON(!vdev->iommufd_device))
> > > + return -EINVAL;
> > > +
> > > + if (vdev->iommufd_attached)
> > > + return -EBUSY;
> > > +
> > >   rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
> > >   if (rc)
> > >   return rc;
> > > @@ -175,6 +183,18 @@ int vfio_iommufd_physical_attach_ioas(struct 
> > > vfio_device
> *vdev, u32 *pt_id)
> > >  }
> > >  EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);
> > >
> > > +void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
> > > +{
> > > + lockdep_assert_held(>dev_set->lock);
> > > +
> > > + if (WARN_ON(!vdev->iommufd_device) || !vdev->iommufd_attached)
> > > + return;
> > > +
> > > + iommufd_device_detach(vdev->iommufd_device);
> > > + vdev->iommufd_attached = false;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
> >
> > Can't a user trigger this WARN_ON by simply repeatedly calling the
> > to-be-added detach ioctl?  Thanks,
> 
> Oops, didn't track the close paren correctly, disregard.  Thanks,

Not sure ill it be clearer to have !vdev->iommufd_attached in the next
line? 

Regards,
Yi Liu

> Alex
> 
> > > +
>

Re: [Intel-gfx] [PATCH v11 20/23] vfio: Add VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT

2023-05-22 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 6:16 AM
> 
> On Sat, 13 May 2023 06:28:24 -0700
> Yi Liu  wrote:
> 
> > This adds ioctl for userspace to attach device cdev fd to and detach
> > from IOAS/hw_pagetable managed by iommufd.
> >
> > VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach vfio device to IOAS, hw_pagetable
> >managed by iommufd. Attach can be
> >undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
> >or device fd close.
> > VFIO_DEVICE_DETACH_IOMMUFD_PT: detach vfio device from the current 
> > attached
> >IOAS or hw_pagetable managed by iommufd.
> >
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/device_cdev.c | 66 ++
> >  drivers/vfio/iommufd.c | 18 +++
> >  drivers/vfio/vfio.h| 18 +++
> >  drivers/vfio/vfio_main.c   |  8 +
> >  include/uapi/linux/vfio.h  | 52 ++
> >  5 files changed, 162 insertions(+)
> >
> > diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
> > index 291cc678a18b..3f14edb80a93 100644
> > --- a/drivers/vfio/device_cdev.c
> > +++ b/drivers/vfio/device_cdev.c
> > @@ -174,6 +174,72 @@ long vfio_device_ioctl_bind_iommufd(struct 
> > vfio_device_file
> *df,
> > return ret;
> >  }
> >
> > +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> > +struct vfio_device_attach_iommufd_pt __user *arg)
> > +{
> > +   struct vfio_device *device = df->device;
> > +   struct vfio_device_attach_iommufd_pt attach;
> > +   unsigned long minsz;
> > +   int ret;
> > +
> > +   minsz = offsetofend(struct vfio_device_attach_iommufd_pt, pt_id);
> > +
> > +   if (copy_from_user(, arg, minsz))
> > +   return -EFAULT;
> > +
> > +   if (attach.argsz < minsz || attach.flags)
> > +   return -EINVAL;
> > +
> > +   /* ATTACH only allowed for cdev fds */
> > +   if (df->group)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   ret = vfio_iommufd_attach(device, _id);
> > +   if (ret)
> > +   goto out_unlock;
> > +
> > +   ret = copy_to_user(>pt_id, _id,
> > +  sizeof(attach.pt_id)) ? -EFAULT : 0;
> > +   if (ret)
> > +   goto out_detach;
> > +   mutex_unlock(>dev_set->lock);
> > +
> > +   return 0;
> > +
> > +out_detach:
> > +   vfio_iommufd_detach(device);
> > +out_unlock:
> > +   mutex_unlock(>dev_set->lock);
> > +   return ret;
> > +}
> > +
> > +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> > +struct vfio_device_detach_iommufd_pt __user *arg)
> > +{
> > +   struct vfio_device *device = df->device;
> > +   struct vfio_device_detach_iommufd_pt detach;
> > +   unsigned long minsz;
> > +
> > +   minsz = offsetofend(struct vfio_device_detach_iommufd_pt, flags);
> > +
> > +   if (copy_from_user(, arg, minsz))
> > +   return -EFAULT;
> > +
> > +   if (detach.argsz < minsz || detach.flags)
> > +   return -EINVAL;
> > +
> > +   /* DETACH only allowed for cdev fds */
> > +   if (df->group)
> > +   return -EINVAL;
> > +
> > +   mutex_lock(>dev_set->lock);
> > +   vfio_iommufd_detach(device);
> > +   mutex_unlock(>dev_set->lock);
> > +
> > +   return 0;
> > +}
> > +
> >  static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
> >  {
> > return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index 83575b65ea01..799ea322a7d4 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -112,6 +112,24 @@ void vfio_iommufd_unbind(struct vfio_device_file *df)
> > vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +int vfio_iommufd_attach(struct vfio_device *vdev, u32 *pt_id)
> > +{
> > +   lockdep_assert_held(>dev_set->lock);
> > +
> > +   if (vfio_device_is_noiommu(vdev))
> > +   return 0;
> 
> Isn't this an invalid operation for a noiommu cdev, ie. -EINVAL?  We
> return success and copy back the provided pt_id, why would a user not
> consider it a bug that they can't use whatever value was there with
> iommufd?

Yes, this is the question I asked in [1]. At that time, it appears to me
that better to allow it [2]. Maybe it's more suitable to ask it here.

[1] https://lore.kernel.org/kvm/c203f11f-4d9f-cf43-03ab-e41a858bd...@intel.com/
[2] https://lore.kernel.org/kvm/zffuyhqid+ltub%...@nvidia.com/

> > +
> > +   return vdev->ops->attach_ioas(vdev, pt_id);
> > +}
> > +
> > +void vfio_iommufd_detach(struct vfio_device *vdev)
> > +{
> > +   lockdep_assert_held(>dev_set->lock);
> > +
> > +   if (!vfio_device_is_noiommu(vdev))
> > +   vdev->ops->detach_ioas(vdev);
> > +}
> > +
> >  struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> >  {
> > if (vdev->iommufd_device)
> > diff --git

Re: [Intel-gfx] [PATCH v11 10/23] vfio-iommufd: Move noiommu compat probe out of vfio_iommufd_bind()

2023-05-22 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, May 23, 2023 4:25 AM
> 
> On Sat, 13 May 2023 06:28:14 -0700
> Yi Liu  wrote:
> 
> > into vfio_device_group_open(). This is more consistent with what will
> > be done in vfio device cdev path.
> 
> Same comment regarding flowing commit subject into body on this series.

Yes, I've modified the commit message in my local branch.:-)

> >
> > Reviewed-by: Kevin Tian 
> > Tested-by: Terrence Xu 
> > Tested-by: Nicolin Chen 
> > Tested-by: Yanting Jiang 
> > Tested-by: Shameer Kolothum 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c   |  6 ++
> >  drivers/vfio/iommufd.c | 32 +++-
> >  drivers/vfio/vfio.h|  9 +
> >  3 files changed, 34 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index a17584e8be15..cfd0b9254bbc 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -192,6 +192,12 @@ static int vfio_device_group_open(struct 
> > vfio_device_file *df)
> > vfio_device_group_get_kvm_safe(device);
> >
> > df->iommufd = device->group->iommufd;
> > +   if (df->iommufd && vfio_device_is_noiommu(device) && device->open_count
> == 0) {
> > +   ret = vfio_iommufd_compat_probe_noiommu(device,
> > +   df->iommufd);
> > +   if (ret)
> > +   goto out_put_kvm;
> > +   }
> >
> > ret = vfio_device_open(df);
> > if (ret) {
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index a18e920be164..7a654a1437f0 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -46,6 +46,24 @@ static void vfio_iommufd_noiommu_unbind(struct 
> > vfio_device
> *vdev)
> > }
> >  }
> >
> > +int vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> > + struct iommufd_ctx *ictx)
> > +{
> > +   u32 ioas_id;
> > +
> > +   if (!capable(CAP_SYS_RAWIO))
> > +   return -EPERM;
> > +
> > +   /*
> > +* Require no compat ioas to be assigned to proceed.  The basic
> > +* statement is that the user cannot have done something that
> > +* implies they expected translation to exist
> > +*/
> > +   if (!iommufd_vfio_compat_ioas_get_id(ictx, _id))
> > +   return -EPERM;
> > +   return 0;
> > +}
> 
> I think the purpose of this function is to keep the iommufd namespace
> out of the group,

Yes.

> but we're muddying it as a general grab bag of
> noiommu validation.  What if the caller retained the RAWIO test and
> comment, and this function simply became a wrapper around the iommufd
> function, ex:
> 
> bool vfio_iommufd_device_has_compat_ioas(struct vfio_device *device,
>struct iommufd_ctx *ictx)
> {
>   u32 ioas_id;
> 
>   return !iommufd_vfio_compat_ioas_get_id(ictx, _id));
> }

Sure. This looks better.

Regards,
Yi Liu
 
> Thanks,
> Alex
> 
> > +
> >  int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
> >  {
> > u32 ioas_id;
> > @@ -54,20 +72,8 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct
> iommufd_ctx *ictx)
> >
> > lockdep_assert_held(>dev_set->lock);
> >
> > -   if (vfio_device_is_noiommu(vdev)) {
> > -   if (!capable(CAP_SYS_RAWIO))
> > -   return -EPERM;
> > -
> > -   /*
> > -* Require no compat ioas to be assigned to proceed. The basic
> > -* statement is that the user cannot have done something that
> > -* implies they expected translation to exist
> > -*/
> > -   if (!iommufd_vfio_compat_ioas_get_id(ictx, _id))
> > -   return -EPERM;
> > -
> > +   if (vfio_device_is_noiommu(vdev))
> > return vfio_iommufd_noiommu_bind(vdev, ictx, _id);
> > -   }
> >
> > ret = vdev->ops->bind_iommufd(vdev, ictx, _id);
> > if (ret)
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index 785afc40ece8..8884b557fb26 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -234,9 +234,18 @@ static inline void vfio_container_cleanup(void)
> >  #endif
> >
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> > +int vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> > + struct iommufd_ctx *ictx);
> >  int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx 
> > *ictx);
> >  void vfio_iommufd_unbind(struct vfio_device *device);
> >  #else
> > +static inline int
> > +vfio_iommufd_compat_probe_noiommu(struct vfio_device *device,
> > + struct iommufd_ctx *ictx)
> > +{
> > +   return -EOPNOTSUPP;
> > +}
> > +
> >  static inline int vfio_iommufd_bind(struct vfio_device *device,
> > struct iommufd_ctx *ictx)
> >  {

Re: [Intel-gfx] [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-05-18 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 18, 2023 6:02 AM
> 
> On Sat, 13 May 2023 06:21:35 -0700
> Yi Liu  wrote:
> 
> > This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
> 
> s/makes/allows/?
> 
> s/to//
> 
> > of the cdev device to check the ownership of the other affected devices.
> >
> > This returns devid for each of the affected devices. If it is bound to the
> > iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> > not opened by the calling user, but it belongs to the same iommu_group of
> > a device that is bound to the iommufd_ctx of the cdev device, reports devid
> > value of 0; If the device is un-owned device, configured within a different
> > iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl 
> > shall
> > report devid value of -1.
> >
> > devid >=0 doesn't block hot-reset as the affected devices are considered to
> > be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> > outside of proof-of-ownership calling conventions (ie. via legacy group
> > accessed devices).
> >
> > This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> > returned in case of calling user get device fd from other software stack
> 
> "other software stack"?  I think this is trying to say something like:
> 
>   When VFIO_DEVICE_GET_PCI_HOT_RESET_INFO is called on an IOMMUFD
>   managed device, the new flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID is
>   reported to indicate the values returned are IOMMUFD devids rather
>   than group IDs as used when accessing vfio devices through the
>   conventional vfio group interface.  Additionally the flag
>   VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED will be reported in this mode if
>   all of the devices affected by the hot-reset are owned by either
>   virtue of being directly bound to the same iommufd context as the
>   calling device, or implicitly owned via a shared IOMMU group.

Yes. it is.

> > and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> > the affected devices are owned, so user can know it without looping all
> > the returned devids.
> >
> > Suggested-by: Jason Gunthorpe 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 52 ++--
> >  include/uapi/linux/vfio.h| 46 +++-
> >  2 files changed, 95 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 4df2def35bdd..57586be770af 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #if IS_ENABLED(CONFIG_EEH)
> >  #include 
> >  #endif
> > @@ -36,6 +37,10 @@
> >  #define DRIVER_AUTHOR   "Alex Williamson "
> >  #define DRIVER_DESC "core driver for VFIO based PCI devices"
> >
> > +#ifdef CONFIG_IOMMUFD
> > +MODULE_IMPORT_NS(IOMMUFD);
> > +#endif
> > +
> >  static bool nointxmask;
> >  static bool disable_vga;
> >  static bool disable_idle_d3;
> > @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
> > int max;
> > int cur;
> > struct vfio_pci_dependent_device *devices;
> > +   struct vfio_device *vdev;
> > +   bool devid:1;
> > +   bool dev_owned:1;
> >  };
> >
> >  static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> > void *data)
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > -   fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +   if (fill->devid) {
> > +   struct iommufd_ctx *iommufd = 
> > vfio_iommufd_physical_ictx(fill->vdev);
> > +   struct vfio_device_set *dev_set = fill->vdev->dev_set;
> > +   struct vfio_device *vdev;
> > +
> > +   /*
> > +* Report devid for the affected devices:
> > +* - valid devid > 0 for the devices that are bound with
> > +*   the iommufd of the calling device.
> > +* - devid == 0 for the devices that have not been opened
> > +*   but have same group with one of the devices bound to
> > +*   the iommufd of the calling device.
> > +* - devid == -1 for others, and clear dev_owned flag.
> > +*/
> > +   vdev = vfio_find_device_in_devset(dev_set, >dev);
> > +   if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > +   int ret;
> > +
> > +   ret = vfio_iommufd_physical_devid(vdev);
> > +   if (WARN_ON(ret < 0))
> > +   return ret;
> > +   fill->devices[fill->cur].devid = ret;
> 
> Nit, @devid seems like a better variable name here rather than @ret.
> 
> > +   } else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) 
> > {
> > +

Re: [Intel-gfx] [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-05-18 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Thursday, May 18, 2023 9:22 PM
> 
> > From: Alex Williamson 
> > Sent: Thursday, May 18, 2023 6:02 AM
> >
> > On Sat, 13 May 2023 06:21:35 -0700
> > Yi Liu  wrote:

> >
> > static int vfio_hot_reset_devid(struct vfio_device *vdev,
> > struct iommufd_ctx *iommufd_ctx)
> > {
> > struct iommu_group *group;
> > int devid;
> >
> > if (!vdev)
> > return VFIO_PCI_DEVID_NOT_OWNED;
> >
> > if (vfio_iommufd_physical_ictx(vdev) == iommufd_ctx)
> > return vfio_iommufd_physical_devid(vdev);

Do we need to check the return of this helper? It returns -EINVAL
when iommufd_access and iommufd_device are both null. Though
not possible in this path. Is a WARN_ON needed or not?

Regards,
Yi Liu

> >
> > group = iommu_group_get(vdev->dev);
> > if (!group)
> > return VFIO_PCI_DEVID_NOT_OWNED;
> >
> > if (iommufd_ctx_has_group(iommufd_ctx, group))
> > devid = VFIO_PCI_DEVID_OWNED;
> >
> > iommu_group_put(group);
> >
> > return devid;
> > }

Re: [Intel-gfx] [PATCH v5 06/10] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device

2023-05-18 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 18, 2023 2:15 AM
> 
> On Sat, 13 May 2023 06:21:32 -0700
> Yi Liu  wrote:
> 
> > This is needed by the vfio-pci driver to report affected devices in the
> > hot reset for a given device.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/device.c | 24 
> >  drivers/vfio/iommufd.c | 20 
> >  include/linux/iommufd.h|  6 ++
> >  include/linux/vfio.h   | 14 ++
> >  4 files changed, 64 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 4f9b2142274c..81466b97023f 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -116,6 +116,18 @@ void iommufd_device_unbind(struct iommufd_device *idev)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> >
> > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > +{
> > +   return idev->ictx;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > +
> > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > +{
> > +   return idev->obj.id;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > +
> >  static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > struct iommufd_hw_pagetable *hwpt,
> > phys_addr_t sw_msi_start)
> > @@ -463,6 +475,18 @@ void iommufd_access_destroy(struct iommufd_access
> *access)
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_access_destroy, IOMMUFD);
> >
> > +struct iommufd_ctx *iommufd_access_to_ictx(struct iommufd_access *access)
> > +{
> > +   return access->ictx;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_ictx, IOMMUFD);
> > +
> > +u32 iommufd_access_to_id(struct iommufd_access *access)
> > +{
> > +   return access->obj.id;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_access_to_id, IOMMUFD);
> > +
> >  int iommufd_access_attach(struct iommufd_access *access, u32 ioas_id)
> >  {
> > struct iommufd_ioas *new_ioas;
> > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > index c1379e826052..a18e920be164 100644
> > --- a/drivers/vfio/iommufd.c
> > +++ b/drivers/vfio/iommufd.c
> > @@ -105,6 +105,26 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > vdev->ops->unbind_iommufd(vdev);
> >  }
> >
> > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > +{
> > +   if (vdev->iommufd_device)
> > +   return iommufd_device_to_ictx(vdev->iommufd_device);
> > +   if (vdev->noiommu_access)
> > +   return iommufd_access_to_ictx(vdev->noiommu_access);
> > +   return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > +
> > +int vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > +{
> > +   if (vdev->iommufd_device)
> > +   return iommufd_device_to_id(vdev->iommufd_device);
> > +   if (vdev->noiommu_access)
> > +   return iommufd_access_to_id(vdev->noiommu_access);
> > +   return -EINVAL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> 
> I think these exemplify that it would be better if both emulated and
> noiommu use the same iommufd_access pointer.  Thanks,

Sure. Then I shall rename this helper. vfio_iommufd_device_devid()
What about your opinion?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v5 08/10] iommufd: Add iommufd_ctx_has_group()

2023-05-18 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 18, 2023 3:40 AM
> 
> On Sat, 13 May 2023 06:21:34 -0700
> Yi Liu  wrote:
> 
> > to check if any device within the given iommu_group has been bound with
> 
> Nit, I find these commit logs where the subject line is intended to
> flow into the commit log to form a complete sentence difficult to read.
> I expect complete thoughts within the commit log itself and the subject
> should be a separate summary of the log.  Repeating the subject within
> the commit log is ok.

Sure. I'll go through the commit messages.

> 
> > the iommufd_ctx. This helpful for the checking on device ownership for
> 
> s/This/This is/
> 
> > the devices which have been bound but cannot be bound to any other iommufd
> 
> s/have been/have not been/?
> 
> > as the iommu_group has been bound.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/device.c | 29 +
> >  include/linux/iommufd.h|  8 
> >  2 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > index 81466b97023f..5e5f7912807b 100644
> > --- a/drivers/iommu/iommufd/device.c
> > +++ b/drivers/iommu/iommufd/device.c
> > @@ -98,6 +98,35 @@ struct iommufd_device *iommufd_device_bind(struct
> iommufd_ctx *ictx,
> >  }
> >  EXPORT_SYMBOL_NS_GPL(iommufd_device_bind, IOMMUFD);
> >
> > +/**
> > + * iommufd_ctx_has_group - True if the struct device is bound to this ictx
> 
> What struct device?  Isn't this "True if any device within the group is
> bound to the ictx"?

Yes, yes. a poor copy from a prior version..

> 
> > + * @ictx: iommufd file descriptor
> > + * @group: Pointer to a physical iommu_group struct
> > + *
> > + * True if a iommufd_device_bind() is present for any device within the
> > + * group.
> 
> How can a function be present for a device?  Maybe "True if any device
> within the group has been bound to this ictx, ex. via
> iommufd_device_bind(), therefore implying ictx ownership of the group."  
> Thanks,

Yes, this is the meaning of it. will fix it.

Regards,
Yi Liu

> 
> > + */
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group 
> > *group)
> > +{
> > +   struct iommufd_object *obj;
> > +   unsigned long index;
> > +
> > +   if (!ictx || !group)
> > +   return false;
> > +
> > +   xa_lock(>objects);
> > +   xa_for_each(>objects, index, obj) {
> > +   if (obj->type == IOMMUFD_OBJ_DEVICE &&
> > +   container_of(obj, struct iommufd_device, obj)->group == 
> > group) {
> > +   xa_unlock(>objects);
> > +   return true;
> > +   }
> > +   }
> > +   xa_unlock(>objects);
> > +   return false;
> > +}
> > +EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > +
> >  /**
> >   * iommufd_device_unbind - Undo iommufd_device_bind()
> >   * @idev: Device returned by iommufd_device_bind()
> > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > index 68cd65274e28..e49c16cd6831 100644
> > --- a/include/linux/iommufd.h
> > +++ b/include/linux/iommufd.h
> > @@ -16,6 +16,7 @@ struct page;
> >  struct iommufd_ctx;
> >  struct iommufd_access;
> >  struct file;
> > +struct iommu_group;
> >
> >  struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
> >struct device *dev, u32 *id);
> > @@ -56,6 +57,7 @@ void iommufd_ctx_get(struct iommufd_ctx *ictx);
> >  #if IS_ENABLED(CONFIG_IOMMUFD)
> >  struct iommufd_ctx *iommufd_ctx_from_file(struct file *file);
> >  void iommufd_ctx_put(struct iommufd_ctx *ictx);
> > +bool iommufd_ctx_has_group(struct iommufd_ctx *ictx, struct iommu_group 
> > *group);
> >
> >  int iommufd_access_pin_pages(struct iommufd_access *access, unsigned long 
> > iova,
> >  unsigned long length, struct page **out_pages,
> > @@ -77,6 +79,12 @@ static inline void iommufd_ctx_put(struct iommufd_ctx 
> > *ictx)
> >  {
> >  }
> >
> > +static inline bool iommufd_ctx_has_group(struct iommufd_ctx *ictx,
> > +struct iommu_group *group)
> > +{
> > +   return false;
> > +}
> > +
> >  static inline int iommufd_access_pin_pages(struct iommufd_access *access,
> >unsigned long iova,
> >unsigned long length,

Re: [Intel-gfx] [PATCH v5 07/10] vfio: Add helper to search vfio_device in a dev_set

2023-05-18 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, May 18, 2023 3:13 AM
> 
> On Sat, 13 May 2023 06:21:33 -0700
> Yi Liu  wrote:
> 
> > There are drivers that need to search vfio_device within a given dev_set.
> > e.g. vfio-pci. So add a helper.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c |  8 +++-
> >  drivers/vfio/vfio_main.c | 15 +++
> >  include/linux/vfio.h |  3 +++
> >  3 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 39e7823088e7..4df2def35bdd 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -2335,12 +2335,10 @@ static bool vfio_dev_in_groups(struct
> vfio_pci_core_device *vdev,
> >  static int vfio_pci_is_device_in_set(struct pci_dev *pdev, void *data)
> >  {
> > struct vfio_device_set *dev_set = data;
> > -   struct vfio_device *cur;
> >
> > -   list_for_each_entry(cur, _set->device_list, dev_set_list)
> > -   if (cur->dev == >dev)
> > -   return 0;
> > -   return -EBUSY;
> > +   lockdep_assert_held(_set->lock);
> > +
> > +   return vfio_find_device_in_devset(dev_set, >dev) ? 0 : -EBUSY;
> 
> Maybe an opportunity to revisit why this returns -EBUSY rather than
> something reasonable like -ENODEV.  It looks like we picked up the
> -EBUSY in a882c16a2b7e where I think it was trying to preserve the
> return of vfio_pci_try_zap_and_vma_lock_cb() but the return value here
> is not even propagated so this looks like an chance to have it make
> sense again.  Thanks,

>From the name of this function, yes, -ENODEV is better. is it
Ok to modify it to be -ENODEV in this patch or a separate one?

Regards,
Yi Liu

> 
> >  }
> >
> >  /*
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index f0ca33b2e1df..ab4f3a794f78 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -141,6 +141,21 @@ unsigned int vfio_device_set_open_count(struct
> vfio_device_set *dev_set)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_device_set_open_count);
> >
> > +struct vfio_device *
> > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > +  struct device *dev)
> > +{
> > +   struct vfio_device *cur;
> > +
> > +   lockdep_assert_held(_set->lock);
> > +
> > +   list_for_each_entry(cur, _set->device_list, dev_set_list)
> > +   if (cur->dev == dev)
> > +   return cur;
> > +   return NULL;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_find_device_in_devset);
> > +
> >  /*
> >   * Device objects - create, release, get, put, search
> >   */
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index fcbe084b18c8..4c17395ed4d2 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -259,6 +259,9 @@ void vfio_unregister_group_dev(struct vfio_device 
> > *device);
> >
> >  int vfio_assign_device_set(struct vfio_device *device, void *set_id);
> >  unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
> > +struct vfio_device *
> > +vfio_find_device_in_devset(struct vfio_device_set *dev_set,
> > +  struct device *dev);
> >
> >  int vfio_mig_get_next_state(struct vfio_device *device,
> > enum vfio_device_mig_state cur_fsm,

Re: [Intel-gfx] [PATCH v5 01/10] vfio-iommufd: Create iommufd_access for noiommu devices

2023-05-18 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Thursday, May 18, 2023 2:21 AM
> 
> On Wed, May 17, 2023 at 11:26:09AM -0600, Alex Williamson wrote:
> 
> > It's not clear to me why we need a separate iommufd_access for
> > noiommu.
> 
> The point was to allocate an ID for the device so we can use that ID
> with the other interfaces in all cases.

I guess Alex's question is why adding a new pointer named noiommu_access
while there is already the iommufd_access pointer named iommufd_access.

Maybe we shall reuse the iommufd_access pointer?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v5 09/10] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-05-15 Thread Liu, Yi L

> From: Cédric Le Goater 
> Sent: Monday, May 15, 2023 3:30 PM
> 
> On 5/13/23 15:21, Yi Liu wrote:
> > This makes VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl to use the iommufd_ctx
> > of the cdev device to check the ownership of the other affected devices.
> >
> > This returns devid for each of the affected devices. If it is bound to the
> > iommufd_ctx of the cdev device, _INFO reports a valid devid > 0; If it is
> > not opened by the calling user, but it belongs to the same iommu_group of
> > a device that is bound to the iommufd_ctx of the cdev device, reports devid
> > value of 0; If the device is un-owned device, configured within a different
> > iommufd, or opened outside of the vfio device cdev API, the _INFO ioctl 
> > shall
> > report devid value of -1.
> >
> > devid >=0 doesn't block hot-reset as the affected devices are considered to
> > be owned, while devid == -1 will block the use of VFIO_DEVICE_PCI_HOT_RESET
> > outside of proof-of-ownership calling conventions (ie. via legacy group
> > accessed devices).
> >
> > This adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID to tell the user devid is
> > returned in case of calling user get device fd from other software stack
> > and adds flag VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED to tell user if all
> > the affected devices are owned, so user can know it without looping all
> > the returned devids.
> >
> > Suggested-by: Jason Gunthorpe 
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Yi Liu 
> > ---
> >   drivers/vfio/pci/vfio_pci_core.c | 52 ++--
> >   include/uapi/linux/vfio.h| 46 +++-
> >   2 files changed, 95 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 4df2def35bdd..57586be770af 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -27,6 +27,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #if IS_ENABLED(CONFIG_EEH)
> >   #include 
> >   #endif
> > @@ -36,6 +37,10 @@
> >   #define DRIVER_AUTHOR   "Alex Williamson "
> >   #define DRIVER_DESC "core driver for VFIO based PCI devices"
> >
> > +#ifdef CONFIG_IOMMUFD
> 
> To import the IOMMUFD namespace, I had to use :
> 
> #if IS_ENABLED(CONFIG_IOMMUFD)

Thanks. Yes, IOMMUFD is tristate now, so needs to test CONFIG_IOMMUFD=m.
and "#if IS_ENABLED(CONFIG_IOMMUFD)" fixes the compiling failure.

Regards,
Yi Liu
> 
> 
> > +MODULE_IMPORT_NS(IOMMUFD);
> > +#endif
> > +
> >   static bool nointxmask;
> >   static bool disable_vga;
> >   static bool disable_idle_d3;
> > @@ -776,6 +781,9 @@ struct vfio_pci_fill_info {
> > int max;
> > int cur;
> > struct vfio_pci_dependent_device *devices;
> > +   struct vfio_device *vdev;
> > +   bool devid:1;
> > +   bool dev_owned:1;
> >   };
> >
> >   static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data)
> > @@ -790,7 +798,37 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> > void *data)
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
> >
> > -   fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +   if (fill->devid) {
> > +   struct iommufd_ctx *iommufd = 
> > vfio_iommufd_physical_ictx(fill->vdev);
> > +   struct vfio_device_set *dev_set = fill->vdev->dev_set;
> > +   struct vfio_device *vdev;
> > +
> > +   /*
> > +* Report devid for the affected devices:
> > +* - valid devid > 0 for the devices that are bound with
> > +*   the iommufd of the calling device.
> > +* - devid == 0 for the devices that have not been opened
> > +*   but have same group with one of the devices bound to
> > +*   the iommufd of the calling device.
> > +* - devid == -1 for others, and clear dev_owned flag.
> > +*/
> > +   vdev = vfio_find_device_in_devset(dev_set, >dev);
> > +   if (vdev && iommufd == vfio_iommufd_physical_ictx(vdev)) {
> > +   int ret;
> > +
> > +   ret = vfio_iommufd_physical_devid(vdev);
> > +   if (WARN_ON(ret < 0))
> > +   return ret;
> > +   fill->devices[fill->cur].devid = ret;
> > +   } else if (vdev && iommufd_ctx_has_group(iommufd, iommu_group)) 
> > {
> > +   fill->devices[fill->cur].devid = VFIO_PCI_DEVID_OWNED;
> > +   } else {
> > +   fill->devices[fill->cur].devid = 
> > VFIO_PCI_DEVID_NOT_OWNED;
> > +   fill->dev_owned = false;
> > +   }
> > +   } else {
> > +   fill->devices[fill->cur].group_id = iommu_group_id(iommu_group);
> > +   }
> > fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus);
> > fill->devices[fill->cur].bus = pdev->bus->number;
> > fill->devices[fill->cur].devfn = pdev->devfn;
> > @@ -1229,17 +1267,27 @@ static

Re: [Intel-gfx] [PATCH v10 05/22] kvm/vfio: Accept vfio device file from userspace

2023-05-12 Thread Liu, Yi L

> From: Cédric Le Goater 
> Sent: Thursday, May 11, 2023 3:11 PM
> 
> On 4/26/23 17:03, Yi Liu wrote:
> > This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
> > Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> >
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 8f7fa07e8170..10a3c7ccadf1 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, 
> > long
> attr,
> > int32_t fd;
> >
> > switch (attr) {
> > -   case KVM_DEV_VFIO_GROUP_ADD:
> > +   case KVM_DEV_VFIO_FILE_ADD:
> > if (get_user(fd, argp))
> > return -EFAULT;
> > return kvm_vfio_file_add(dev, fd);
> >
> > -   case KVM_DEV_VFIO_GROUP_DEL:
> > +   case KVM_DEV_VFIO_FILE_DEL:
> > if (get_user(fd, argp))
> > return -EFAULT;
> > return kvm_vfio_file_del(dev, fd);
> >
> >   #ifdef CONFIG_SPAPR_TCE_IOMMU
> > -   case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
> > +   case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
> 
> This should still be DEV_VFIO_GROUP_SET_SPAPR_TCE. Same below.

Thanks. It's a rebase mistake.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 2/9] vfio-iommufd: Create iommufd_access for noiommu devices

2023-05-08 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Wednesday, May 3, 2023 5:49 PM
> 
> > From: Jason Gunthorpe 
> > Sent: Wednesday, May 3, 2023 2:12 AM
> >
> > On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > > > The emulated stuff is for mdev only, it should not be confused with
> > > > no-iommu
> > >
> > > hmmm. I guess the confusion is due to the reuse of
> > > vfio_iommufd_emulated_bind().
> >
> > This is probabl y not a good direction
> 
> I see. But if not reusing, then there may be a few code duplications.
> I'm fine to add separate _bind/unbind() functions for noiommu devices
> if Alex and you prefer it.
> 
> > > > Eg if you had a no_iommu_access value to store the access it would be
> > > > fine and could serve as the 'this is no_iommu' flag
> > >
> > > So this no_iommu_access shall be created per iommufd bind, and call the
> > > iommufd_access_create() with iommufd_access_ops. is it? If so, this is
> > > not 100% the same with no_iommu flag as this flag is static after device
> > > registration.
> >
> > Something like that, yes
> >
> > I don't think it is any real difference with the current flag, both
> > are determined at the first ioctl when the iommufd is presented and
> > both would state permanently until the fd close
> 
> Well, noiommu flag would be static from registration till unregistration.:-)
> While no_iommu_access's life circle is between the bind and fd close. But
> given that the major usage of it is during the duration between fd is bound
> to iommufd and closed, so it's still possible to let no_iommu_access serve
> as noiommu flag. 

Hi Jason,

I found another reason to use noiommu flag here.

Existing vfio will fail the vfio_device registration if there is no iommu_group
and neither CONFIG_VFIO_NOIOMMU and vfio_noiommu is set. But such
logic is compiled out when !CONFIG_VFIO_GROUP.

So cdev path needs to check noiommu explicitly. Just like below code.
It is called by vfio_device registration. If iommu_group is null, and
noiommu is not enabled, then it failed, hence vfio_device registration
failed. As we have such a check for noiommu at registration, so it seems
more reasonable to record this result in a flag instead of using
no_iommu_access. Is it?

+static inline int vfio_device_set_noiommu(struct vfio_device *device)
+{
+   struct iommu_group *iommu_group;
+
+   device->noiommu = false;
+
+   iommu_group = iommu_group_get(device->dev);
+   if (!iommu_group) {
+   if (!IS_ENABLED(CONFIG_VFIO_NOIOMMU) || !vfio_noiommu)
+   return -EINVAL;
+   device->noiommu = true;
+   } else {
+   iommu_group_put(iommu_group);
+   }
+
+   return 0;
+}

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 8/9] vfio/pci: Extend VFIO_DEVICE_GET_PCI_HOT_RESET_INFO for vfio device cdev

2023-05-08 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 28, 2023 4:16 AM
>
> > > + *
> > >   * Return: 0 on success, -errno on failure:
> > >   *   -enospc = insufficient buffer, -enodev = unsupported for device.
> > >   */
> > >  struct vfio_pci_dependent_device {
> > > - __u32   group_id;
> > > + union {
> > > + __u32   group_id;
> > > + __u32   dev_id;
> > > +#define VFIO_PCI_DEVID_NONBLOCKING   0
> > > +#define VFIO_PCI_DEVID_BLOCKING  -1
> >
> > The above description seems like it's leaning towards OWNED rather than
> > BLOCKING.
> 
> Also these should be defined relative to something defined in IOMMUFD
> rather than inventing values here.  We can't have the valid devid
> number space owned by IOMMUFD conflict with these definitions.  Thanks,

Jason has proposed to reserve all negative IDs and 0 in iommufd. In that case,
can vfio define the numbers now?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 2/9] vfio-iommufd: Create iommufd_access for noiommu devices

2023-05-03 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, May 3, 2023 2:22 AM
> 
> On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote:
> 
> > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> > > device interface for userspace drivers that are going to do unsafe
> > > things regardless.  Enabling noiommu to work with mdev, pin pages, or
> > > anything else should not be on our agenda.  Userspaces relying on niommu
> > > get the minimum viable interface and must impose a minuscule
> > > incremental maintenance burden.  The only reason we're spending so much
> > > effort on it here is to make iommufd noiommu support equivalent to
> > > group/container noiommu support.  We should stop at that.  Thanks,
> >
> > btw. I asked a question in [1] to check if we should allow attach/detach
> > on noiommu devices. Jason has replied it. If in future noiommu userspace
> > can pin page, then such userspace will need to attach/detach ioas. So I
> > made cdev series[2] to allow attach ioas on noiommu devices. Supporting
> > it from cdev day-1 may avoid probing if attach/detach is supported or
> > not for specific devices when adding pin page for noiommu userspace.
> >
> > But now, I think such a support will not in plan, is it? If so, will it
> > be better to disallow attach/detach on noiommu devices in patch [2]?
> >
> > [1] https://lore.kernel.org/kvm/zea+khh0tufst...@nvidia.com/
> > [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l@intel.com/
>
> If we block it then userspace has to act quite differently, I think we
> should keep it.

Maybe kernel can simply fail the attach/detach if it happens on noiommu
devices, and noiommu userspace should just know it would fail. @Alex,
how about your opinion?

> My general idea to complete the no-iommu feature is to add a new IOCTL
> to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
> would call instead of trying to abuse mlock and /proc/ to do it. That
> ioctl would use the IOAS attached to the access just like a mdev would
> do, so it has a real IOVA, but it is not a mdev.

This new ioctl may be IOMMUFD ioctl since its input is the IOAS and
addr, nothing related to the device. Is it?

> unmap callback just does nothing, as Alex says it is all still totally
> unsafe.

Sure. That's also why I added a noiommu test to avoid calling
unmap callback although it seems not possible to have unmap
callback as mdev drivers would implement it.

> 
> This just allows it use the mm a little more properly and safely (eg
> mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't
> reject things like DAX and it currently doesn't make an adjustment for
> the PCI offset stuff..) So it would make DPDK a little more robust,
> portable and make the whole VFIO no-iommu feature much easier to use.

Thanks for the explanation.

> To do that we need an iommufd access, an access ID and we need to link
> the current IOAS to the special access, like mdev, but in any mdev
> code paths.
> 
> That creating the access ID solves the reset problem as well is a nice
> side effect and is the only part of this you should focus on for now..

Yes. I get this part. We only need access ID so far to fix the noiommu
gap in hot-reset.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 2/9] vfio-iommufd: Create iommufd_access for noiommu devices

2023-05-03 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, May 3, 2023 2:12 AM
> 
> On Sat, Apr 29, 2023 at 12:07:24AM +0800, Yi Liu wrote:
> > > The emulated stuff is for mdev only, it should not be confused with
> > > no-iommu
> >
> > hmmm. I guess the confusion is due to the reuse of
> > vfio_iommufd_emulated_bind().
> 
> This is probabl y not a good direction

I see. But if not reusing, then there may be a few code duplications.
I'm fine to add separate _bind/unbind() functions for noiommu devices
if Alex and you prefer it.

> > > Eg if you had a no_iommu_access value to store the access it would be
> > > fine and could serve as the 'this is no_iommu' flag
> >
> > So this no_iommu_access shall be created per iommufd bind, and call the
> > iommufd_access_create() with iommufd_access_ops. is it? If so, this is
> > not 100% the same with no_iommu flag as this flag is static after device
> > registration.
> 
> Something like that, yes
> 
> I don't think it is any real difference with the current flag, both
> are determined at the first ioctl when the iommufd is presented and
> both would state permanently until the fd close

Well, noiommu flag would be static from registration till unregistration.:-)
While no_iommu_access's life circle is between the bind and fd close. But
given that the major usage of it is during the duration between fd is bound
to iommufd and closed, so it's still possible to let no_iommu_access serve
as noiommu flag. 

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-05-02 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 28, 2023 5:55 AM
> 
> On Wed, 26 Apr 2023 07:54:19 -0700
> Yi Liu  wrote:
> 
> > This is the way user to invoke hot-reset for the devices opened by cdev
> > interface. User should check the flag VFIO_PCI_HOT_RESET_FLAG_RESETTABLE
> > in the output of VFIO_DEVICE_GET_PCI_HOT_RESET_INFO ioctl before doing
> > hot-reset for cdev devices.
> >
> > Suggested-by: Jason Gunthorpe 
> > Signed-off-by: Jason Gunthorpe 
> > Reviewed-by: Jason Gunthorpe 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/pci/vfio_pci_core.c | 66 +++-
> >  include/uapi/linux/vfio.h| 22 +++
> >  2 files changed, 79 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci_core.c 
> > b/drivers/vfio/pci/vfio_pci_core.c
> > index 43858d471447..f70e3b948b16 100644
> > --- a/drivers/vfio/pci/vfio_pci_core.c
> > +++ b/drivers/vfio/pci/vfio_pci_core.c
> > @@ -180,7 +180,8 @@ static void vfio_pci_probe_mmaps(struct 
> > vfio_pci_core_device
> *vdev)
> >  struct vfio_pci_group_info;
> >  static void vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set);
> >  static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
> > - struct vfio_pci_group_info *groups);
> > + struct vfio_pci_group_info *groups,
> > + struct iommufd_ctx *iommufd_ctx);
> >
> >  /*
> >   * INTx masking requires the ability to disable INTx signaling via 
> > PCI_COMMAND
> > @@ -1364,8 +1365,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > if (ret)
> > return ret;
> >
> > -   /* Somewhere between 1 and count is OK */
> > -   if (!array_count || array_count > count)
> > +   if (array_count > count)
> > return -EINVAL;
> 
> Doesn't this need a || vfio_device_cdev_opened(vdev) test as well?
> It's invalid to pass fds for a cdev device.  Presumably it would fail
> later collecting group fds as well, but might as well enforce the
> semantics early.

Yes, it is.

> 
> >
> > group_fds = kcalloc(array_count, sizeof(*group_fds), GFP_KERNEL);
> > @@ -1414,7 +1414,7 @@ vfio_pci_ioctl_pci_hot_reset_groups(struct
> vfio_pci_core_device *vdev,
> > info.count = array_count;
> > info.files = files;
> >
> > -   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, );
> > +   ret = vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, , NULL);
> >
> >  hot_reset_release:
> > for (file_idx--; file_idx >= 0; file_idx--)
> > @@ -1429,6 +1429,7 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> >  {
> > unsigned long minsz = offsetofend(struct vfio_pci_hot_reset, count);
> > struct vfio_pci_hot_reset hdr;
> > +   struct iommufd_ctx *iommufd;
> > bool slot = false;
> >
> > if (copy_from_user(, arg, minsz))
> > @@ -1443,7 +1444,12 @@ static int vfio_pci_ioctl_pci_hot_reset(struct
> vfio_pci_core_device *vdev,
> > else if (pci_probe_reset_bus(vdev->pdev->bus))
> > return -ENODEV;
> >
> > -   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, slot, arg);
> > +   if (hdr.count)
> > +   return vfio_pci_ioctl_pci_hot_reset_groups(vdev, hdr.count, 
> > slot, arg);
> > +
> > +   iommufd = vfio_iommufd_physical_ictx(>vdev);
> > +
> > +   return vfio_pci_dev_set_hot_reset(vdev->vdev.dev_set, NULL, iommufd);
> 
> Why did we need to store iommufd in a variable?

will remove it.

> >  }
> >
> >  static int vfio_pci_ioctl_ioeventfd(struct vfio_pci_core_device *vdev,
> > @@ -2415,6 +2421,9 @@ static bool vfio_dev_in_groups(struct 
> > vfio_pci_core_device
> *vdev,
> >  {
> > unsigned int i;
> >
> > +   if (!groups)
> > +   return false;
> > +
> > for (i = 0; i < groups->count; i++)
> > if (vfio_file_has_dev(groups->files[i], >vdev))
> > return true;
> > @@ -2488,13 +2497,38 @@ static int vfio_pci_dev_set_pm_runtime_get(struct
> vfio_device_set *dev_set)
> > return ret;
> >  }
> >
> > +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> > +   struct iommufd_ctx *iommufd_ctx)
> > +{
> > +   struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(>vdev);
> > +   struct iommu_group *iommu_group;
> > +
> > +   if (!iommufd_ctx)
> > +   return false;
> > +
> > +   if (iommufd == iommufd_ctx)
> > +   return true;
> > +
> > +   iommu_group = iommu_group_get(vdev->vdev.dev);
> > +   if (!iommu_group)
> > +   return false;
> > +
> > +   /*
> > +* Try to check if any device within iommu_group is bound with
> > +* the input iommufd_ctx.
> > +*/
> > +   return vfio_devset_iommufd_has_group(vdev->vdev.dev_set,
> > +iommufd_ctx, iommu_group);
> > +}
> 
> This last test makes this not do what the function name suggests it
> does.  If it were true, the device

Re: [Intel-gfx] [PATCH v4 7/9] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device

2023-04-27 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Thursday, April 27, 2023 2:46 PM
> 
> > From: Yi Liu
> > Sent: Wednesday, April 26, 2023 10:54 PM
> > +
> > +/*
> > + * Return devid for devices that have been bound with iommufd,
> > + * returns 0 if not bound yet.
> > + */
> > +u32 vfio_iommufd_physical_devid(struct vfio_device *vdev)
> > +{
> > +   if (WARN_ON(!vdev->iommufd_device && !vdev->iommufd_access))
> > +   return 0;
> 
> is WARN_ON too restrictive?

This originated from a comment from Eric[1]. At that time, this helper is
void type, hence there is no message when there is no devid. Now, this returns
0 if the device is not bound. Maybe checking it in the caller and warn on
there?

[1] https://lore.kernel.org/kvm/702c2883-1d51-b609-1e99-337295e6e...@redhat.com/

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 6/9] iommufd: Reserved -1 in the iommufd xarray

2023-04-27 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Thursday, April 27, 2023 2:42 PM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, April 26, 2023 10:54 PM
> >
> > VFIO needs two reserved values. 0 is already reserved by initializing
> > xarray with XA_FLAGS_ALLOC1. This reserves -1 by limiting the xa alloc
> > range.
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/iommu/iommufd/main.c | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommufd/main.c
> > b/drivers/iommu/iommufd/main.c
> > index 3fbe636c3d8a..51b27c96c52f 100644
> > --- a/drivers/iommu/iommufd/main.c
> > +++ b/drivers/iommu/iommufd/main.c
> > @@ -28,6 +28,9 @@ struct iommufd_object_ops {
> >  static const struct iommufd_object_ops iommufd_object_ops[];
> >  static struct miscdevice vfio_misc_dev;
> >
> > +/* -1 is reserved */
> > +#define iommufd_xa_limit_32b XA_LIMIT(0, (-2U))
> > +
> >  struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx,
> >  size_t size,
> >  enum iommufd_object_type type)
> > @@ -50,7 +53,7 @@ struct iommufd_object *_iommufd_object_alloc(struct
> > iommufd_ctx *ictx,
> >  * before calling iommufd_object_finalize().
> >  */
> > rc = xa_alloc(>objects, >id, XA_ZERO_ENTRY,
> > - xa_limit_32b, GFP_KERNEL_ACCOUNT);
> > + iommufd_xa_limit_32b, GFP_KERNEL_ACCOUNT);
> 
> Just direct use XA_LIMIT() here.

Ok.

> btw do we need a contract so vfio can learn 0 and -1 are reserved or
> fine to have a fixed assumption in later patches?

I doubt how to do it. ☹ @Jason? What about your opinion?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 1/9] vfio: Determine noiommu in vfio_device registration

2023-04-27 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Thursday, April 27, 2023 2:36 PM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, April 26, 2023 10:54 PM
> >
> > -static inline bool vfio_device_is_noiommu(struct vfio_device *vdev)
> > +static inline int vfio_device_set_noiommu(struct vfio_device *device)
> >  {
> > -   return IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > -  vdev->group->type == VFIO_NO_IOMMU;
> > +   device->noiommu = IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
> > + device->group->type == VFIO_NO_IOMMU;
> > +   return 0;
> 
> Just void. this can't fail.

Hmmm. Yes, before below commit, it cannot fail. Maybe this can be
converted to int later.

https://lore.kernel.org/kvm/20230426150321.454465-22-yi.l@intel.com/T/#u

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 9/9] vfio/pci: Allow passing zero-length fd array in VFIO_DEVICE_PCI_HOT_RESET

2023-04-27 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Thursday, April 27, 2023 2:54 PM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, April 26, 2023 10:54 PM
> >
> > +static bool vfio_dev_in_iommufd_ctx(struct vfio_pci_core_device *vdev,
> > +   struct iommufd_ctx *iommufd_ctx)
> > +{
> > +   struct iommufd_ctx *iommufd = vfio_iommufd_physical_ictx(
> > >vdev);
> > +   struct iommu_group *iommu_group;
> > +
> > +   if (!iommufd_ctx)
> > +   return false;
> > +
> > +   if (iommufd == iommufd_ctx)
> > +   return true;
> > +
> > +   iommu_group = iommu_group_get(vdev->vdev.dev);
> > +   if (!iommu_group)
> > +   return false;
> > +
> > +   /*
> > +* Try to check if any device within iommu_group is bound with
> > +* the input iommufd_ctx.
> > +*/
> > +   return vfio_devset_iommufd_has_group(vdev->vdev.dev_set,
> > +iommufd_ctx, iommu_group);
> 
> iommu_group is not put.

Oops. Yes.

Re: [Intel-gfx] [PATCH v4 2/9] vfio-iommufd: Create iommufd_access for noiommu devices

2023-04-27 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Thursday, April 27, 2023 2:39 PM
> 
> > From: Liu, Yi L 
> > Sent: Wednesday, April 26, 2023 10:54 PM
> > @@ -121,7 +128,8 @@ static void vfio_emulated_unmap(void *data,
> > unsigned long iova,
> >  {
> > struct vfio_device *vdev = data;
> >
> > -   if (vdev->ops->dma_unmap)
> > +   /* noiommu devices cannot do map/unmap */
> > +   if (vdev->noiommu && vdev->ops->dma_unmap)
> > vdev->ops->dma_unmap(vdev, iova, length);
> 
> Is it necessary? All mdev devices implementing @dma_unmap won't
> set noiommu flag.

Hmmm. Yes, and all the devices set noiommu is not implementing @dma_unmap
as far as I see. Maybe this noiommu check can be removed.

> 
> Instead in the future if we allow noiommu userspace to pin pages
> we'd need similar logic too.

I'm not quite sure about it so far. For mdev devices, the device driver
may use vfio_pin_pages/vfio_dma_rw () to pin page. Hence such drivers
need to listen to dma_unmap() event. But for noiommu users, does the
device driver also participate in the page pin? At least for vfio-pci driver,
it does not, or maybe it will in the future when enabling noiommu
userspace to pin pages. It looks to me such userspace should order
the DMA before calling ioctl to unpin page instead of letting device
driver listen to unmap.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-26 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, April 26, 2023 9:20 PM
> 
> On Wed, 26 Apr 2023 07:22:17 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Thursday, April 20, 2023 10:09 PM
> > [...]
> > > > > Whereas dev-id < 0
> > > > > (== -1) is an affected device which prevents hot-reset, ex. an 
> > > > > un-owned
> > > > > device, device configured within a different iommufd_ctx, or device
> > > > > opened outside of the vfio cdev API."  Is that about right?  Thanks,
> > > >
> > > > Do you mean to have separate err-code for the three possibilities? As
> > > > the devid is generated by iommufd and it is u32. I'm not sure if we can
> > > > have such err-code definition without reserving some ids in iommufd.
> > >
> > > Yes, if we're going to report the full dev-set, I think we need at
> > > least two unique error codes or else the user has no way to determine
> > > the subset of invalid dev-ids which block the reset.  I think Jason is
> > > proposing the set of valid dev-ids are >0, a dev-id of zero indicates
> > > some form of non-blocking, while <0 (or maybe specifically -1)
> > > indicates a blocking device.  I was trying to get consensus on a formal
> > > definition of each of those error codes in my previous reply.  Thanks,
> >
> > Seems like RESETTABLE flag is not needed if we report -1 for the devices
> > that block hotreset. Userspace can deduce if the calling device is 
> > resettable
> > or not by checking if there is any -1 in the affected device list.
> 
> There is some redundancy there, yes.  Given the desire for a null array
> on the actual reset ioctl I assumed there would also be a desire to
> streamline the info ioctl such that userspace isn't required to parse
> the return array, for example maybe userspace isn't required to pass a
> full buffer and can get the reset availability status from only the
> header.  Of course it's still the responsibility of userspace to know
> the extent of the reset.  Thanks,

I keep it and has sent a refreshed version for hot-reset. 

https://lore.kernel.org/kvm/20230426145419.450922-9-yi.l@intel.com/

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v4 0/9] Enhance vfio PCI hot reset for vfio cdev device

2023-04-26 Thread Liu, Yi L

> From: Liu, Yi L 
> Sent: Wednesday, April 26, 2023 10:54 PM
> 
> VFIO_DEVICE_PCI_HOT_RESET requires user to pass an array of group fds
> to prove that it owns all devices affected by resetting the calling
> device. While for cdev devices, user can use an iommufd-based ownership
> checking model and invoke VFIO_DEVICE_PCI_HOT_RESET with a zero-length
> fd array.
> 
> This series first creates iommufd_access for noiommu devices to fill the
> gap for adding iommufd-based ownership checking model, then extends
> VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to check ownership and return the
> check result and the devid of affected devices to user. In the end, extends
> the VFIO_DEVICE_PCI_HOT_RESET to accept zero-length fd array for hot-reset
> with cdev devices.
> 
> The new hot reset method and updated _INFO ioctl are tested with the
> below qemu:
> 
> https://github.com/yiliu1765/qemu/tree/iommufd_rfcv4.mig.reset.v4_var3
> (requires to test with the cdev kernel)

The cdev kernel is below branch, this series is part of below branch.

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v10

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-26 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Thursday, April 20, 2023 10:09 PM
[...]
> > > Whereas dev-id < 0
> > > (== -1) is an affected device which prevents hot-reset, ex. an un-owned
> > > device, device configured within a different iommufd_ctx, or device
> > > opened outside of the vfio cdev API."  Is that about right?  Thanks,
> >
> > Do you mean to have separate err-code for the three possibilities? As
> > the devid is generated by iommufd and it is u32. I'm not sure if we can
> > have such err-code definition without reserving some ids in iommufd.
> 
> Yes, if we're going to report the full dev-set, I think we need at
> least two unique error codes or else the user has no way to determine
> the subset of invalid dev-ids which block the reset.  I think Jason is
> proposing the set of valid dev-ids are >0, a dev-id of zero indicates
> some form of non-blocking, while <0 (or maybe specifically -1)
> indicates a blocking device.  I was trying to get consensus on a formal
> definition of each of those error codes in my previous reply.  Thanks,

Seems like RESETTABLE flag is not needed if we report -1 for the devices
that block hotreset. Userspace can deduce if the calling device is resettable
or not by checking if there is any -1 in the affected device list.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-23 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Saturday, April 22, 2023 6:36 AM
> 
> On Thu, Apr 20, 2023 at 08:08:39AM -0600, Alex Williamson wrote:
> 
> > > Hide this device in the list looks fine to me. But the calling user should
> > > not do any new device open before finishing hot-reset. Otherwise, user may
> > > miss a device that needs to do pre/post reset. I think this requirement is
> > > acceptable. Is it?
> >
> > I think Kevin and Jason are leaning towards reporting the entire
> > dev-set.  The INFO ioctl has always been a point-in-time reading, no
> > guarantees are made if the host or user configuration is changed.
> > Nothing changes in that respect.
> 
> Yeah, I think your point about qemu community formus suggest we should
> err toward having qemu provide some fully detailed debug report.
> 
> > > > Whereas dev-id < 0
> > > > (== -1) is an affected device which prevents hot-reset, ex. an un-owned
> > > > device, device configured within a different iommufd_ctx, or device
> > > > opened outside of the vfio cdev API."  Is that about right?  Thanks,
> > >
> > > Do you mean to have separate err-code for the three possibilities? As
> > > the devid is generated by iommufd and it is u32. I'm not sure if we can
> > > have such err-code definition without reserving some ids in iommufd.
> >
> > Yes, if we're going to report the full dev-set, I think we need at
> > least two unique error codes or else the user has no way to determine
> > the subset of invalid dev-ids which block the reset.
> 
> If you think this is important to report we should report 0 and -1,
> and adjust the iommufd xarray allocator to reserve -1

Then the alloc range should be from 1 to 0x.
 
> 
> It depends what you want to show for the debugging.
> 
> eg if we have debugging where qemu dumps this table:
> 
>BDF   In VM   iommu_group   Has VFIO driver   Has Kernel Driver
> 
> By also doing various sysfs probes based on the BDF, then the admin
> action to remedy the situation is:
> 
> Make "Has VFIO driver = y" or "Has Kernel Driver = n" for every row in
> the table to make the reset work.
> 
> And we don't need the distinction. Adding the 0/-1 lets you make a
> useful table without doing any sysfs work.
>
> > I think Jason is proposing the set of valid dev-ids are >0, a dev-id
> > of zero indicates some form of non-blocking, while <0 (or maybe
> > specifically -1) indicates a blocking device.
> 
> Yes, 0 and -1 would be fine with those definitions. The only use of
> the data is to add a 'blocking use of reset' colum to the table
> above..

Should -1 and 0 be defined in uapi as well? If yes, this seems not easy
to get a proper naming for them. Or just document it in vfio
uapi header to say -1 (blocking) and 0 (no-devid-but-not-blocking)
blabla.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-23 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Tuesday, April 18, 2023 9:02 PM
> 
> On Tue, Apr 18, 2023 at 10:23:55AM +0000, Liu, Yi L wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Monday, April 17, 2023 9:39 PM
> > >
> > > On Fri, Apr 14, 2023 at 09:11:30AM +, Tian, Kevin wrote:
> > >
> > > > The only corner case with this option is when a user mixes group
> > > > and cdev usages. iirc you mentioned it's a valid usage to be supported.
> > > > In that case the kernel doesn't have sufficient knowledge to judge
> > > > 'resettable' as it doesn't know which groups are opened by this user.
> > >
> > > IMHO we don't need to support this combination.
> >
> > Do you mean we don't support hot-reset for this combination or we don't
> > support user using this combination. I guess the prior one. Right?
> 
> Yes
> 
> > Ditto. We just fail hot-reset for the multiple iommufds case. Is it?
> 
> Yes
> 
> > > I suppose we should have done that from the beginning - no-iommu is an
> > > IOMMUFD access, it just uses a crazy /proc based way to learn the
> > > PFNs. Making it a proper access and making a real VFIO ioctl that
> > > calls iommufd_access_pin_pages() and returns the DMA mapped addresses
> > > to userspace would go a long way to making no-iommu work in a logical,
> > > usable, way.
> >
> > This seems to be an improvement for noiommu mode. It can be done later.
> > For now, generating access_id and binding noiommu devices with iommufdctx
> > is enough for supporting noiommu hot-reset.
> 
> Yes, I'm not sure there is much value in improving no-iommu unless
> someone also wants to go in and update dpdk.
> 
> At some point we will need to revise dpdk to use iommufd, maybe that
> would be a good time to fix this too.

This noiommu improvement shall allow user to attach ioas to noiommu devices.
is it? This may be done by calling iommufd_access_attach(). So there is a
quick question. In the cdev series, shall we allow the attachment for noiommu?
I think the noiommu improvement shall require extra effort, so it is not
ready yet. If so, seems like I just need to fail the attachment for noiommu
devices. But when in the future it is ready, how can userspace know attach
is allowed for noiommu devices? Will it be an easy thing? or we may just let
the attach as a noop and always succeed for noiommu devices? any suggestions?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 04/12] vfio-iommufd: Add helper to retrieve iommufd_ctx and devid for vfio_device

2023-04-21 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, April 5, 2023 5:49 AM
> On Tue, 4 Apr 2023 17:28:40 +0200
> Eric Auger  wrote:
> 
> > Hi,
> >
> > On 4/1/23 16:44, Yi Liu wrote:
> > > This is needed by the vfio-pci driver to report affected devices in the
> > > hot reset for a given device.
> > >
> > > Reviewed-by: Jason Gunthorpe 
> > > Tested-by: Yanting Jiang 
> > > Signed-off-by: Yi Liu 
> > > ---
> > >  drivers/iommu/iommufd/device.c | 12 
> > >  drivers/vfio/iommufd.c | 14 ++
> > >  include/linux/iommufd.h|  3 +++
> > >  include/linux/vfio.h   | 13 +
> > >  4 files changed, 42 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommufd/device.c 
> > > b/drivers/iommu/iommufd/device.c
> > > index 25115d401d8f..04a57aa1ae2c 100644
> > > --- a/drivers/iommu/iommufd/device.c
> > > +++ b/drivers/iommu/iommufd/device.c
> > > @@ -131,6 +131,18 @@ void iommufd_device_unbind(struct iommufd_device
> *idev)
> > >  }
> > >  EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
> > >
> > > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev)
> > > +{
> > > + return idev->ictx;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_ictx, IOMMUFD);
> > > +
> > > +u32 iommufd_device_to_id(struct iommufd_device *idev)
> > > +{
> > > + return idev->obj.id;
> > > +}
> > > +EXPORT_SYMBOL_NS_GPL(iommufd_device_to_id, IOMMUFD);
> > > +
> > >  static int iommufd_device_setup_msi(struct iommufd_device *idev,
> > >   struct iommufd_hw_pagetable *hwpt,
> > >   phys_addr_t sw_msi_start)
> > > diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> > > index 88b00c501015..809f2dd73b9e 100644
> > > --- a/drivers/vfio/iommufd.c
> > > +++ b/drivers/vfio/iommufd.c
> > > @@ -66,6 +66,20 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
> > >   vdev->ops->unbind_iommufd(vdev);
> > >  }
> > >
> > > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > > +{
> > > + if (!vdev->iommufd_device)
> > > + return NULL;
> > > + return iommufd_device_to_ictx(vdev->iommufd_device);
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_ictx);
> > > +
> > > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id)
> > > +{
> > > + if (vdev->iommufd_device)
> > > + *id = iommufd_device_to_id(vdev->iommufd_device);
> > since there is no return value, may be worth to add at least a WARN_ON
> > in case of !vdev->iommufd_device

This may be a user-triggerable warning if the input device is not bound
to iommufd.

> Yeah, this is bizarre and makes the one caller of this interface very
> awkward.  We later go on to define IOMMUFD_INVALID_ID, so this should
> simply return that in the case of no iommufd_device and skip this
> unnecessary pointer passing.  Thanks,

Ok. then it can return invalid id when !CONFIG_IOMMUFD. Also
Needs to wait for the decision in the thread that is talking errr-code.

Regards,
Yi Liu

> Alex
> 
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_iommufd_physical_devid);
> > >  /*
> > >   * The physical standard ops mean that the iommufd_device is bound to the
> > >   * physical device vdev->dev that was provided to vfio_init_group_dev(). 
> > > Drivers
> > > diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
> > > index 1129a36a74c4..ac96df406833 100644
> > > --- a/include/linux/iommufd.h
> > > +++ b/include/linux/iommufd.h
> > > @@ -24,6 +24,9 @@ void iommufd_device_unbind(struct iommufd_device *idev);
> > >  int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id);
> > >  void iommufd_device_detach(struct iommufd_device *idev);
> > >
> > > +struct iommufd_ctx *iommufd_device_to_ictx(struct iommufd_device *idev);
> > > +u32 iommufd_device_to_id(struct iommufd_device *idev);
> > > +
> > >  struct iommufd_access_ops {
> > >   u8 needs_pin_pages : 1;
> > >   void (*unmap)(void *data, unsigned long iova, unsigned long length);
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index 3188d8a374bd..97a1174b922f 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -113,6 +113,8 @@ struct vfio_device_ops {
> > >  };
> > >
> > >  #if IS_ENABLED(CONFIG_IOMMUFD)
> > > +struct iommufd_ctx *vfio_iommufd_physical_ictx(struct vfio_device *vdev);
> > > +void vfio_iommufd_physical_devid(struct vfio_device *vdev, u32 *id);
> > >  int vfio_iommufd_physical_bind(struct vfio_device *vdev,
> > >  struct iommufd_ctx *ictx, u32 *out_device_id);
> > >  void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
> > > @@ -122,6 +124,17 @@ int vfio_iommufd_emulated_bind(struct vfio_device
> *vdev,
> > >  void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
> > >  int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 
> > > *pt_id);
> > >  #else
> > > +static inline struct iommufd_ctx *
> > > +vfio_iommufd_physical_ictx(struct vfio_device *vdev)
> > >

Re: [Intel-gfx] [PATCH v3 10/12] vfio: Mark cdev usage in vfio_device

2023-04-21 Thread Liu, Yi L

> From: Eric Auger 
> Sent: Wednesday, April 5, 2023 7:48 PM
> 
> On 4/1/23 16:44, Yi Liu wrote:
> > There are users that need to check if vfio_device is opened as cdev.
> > e.g. vfio-pci. This adds a flag in vfio_device, it will be set in the
> > cdev path when device is opened. This is not used at this moment, but
> > a preparation for vfio device cdev support.
> 
> better to squash this patch with the patch setting cdev_opened then?

But that would be in the cdev series. Maybe only add this helper to
return false and add the cdev_opened in below patch. Will this be
better?

https://lore.kernel.org/kvm/20230401151833.124749-23-yi.l@intel.com/

> Thanks
> 
> Eric
> >
> > Signed-off-by: Yi Liu 
> > ---
> >  include/linux/vfio.h | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index f8fb9ab25188..d9a0770e5fc1 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -62,6 +62,7 @@ struct vfio_device {
> > struct iommufd_device *iommufd_device;
> > bool iommufd_attached;
> >  #endif
> > +   bool cdev_opened;
> >  };
> >
> >  /**
> > @@ -151,6 +152,12 @@ vfio_iommufd_physical_devid(struct vfio_device *vdev,
> u32 *id)
> > ((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
> >  #endif
> >
> > +static inline bool vfio_device_cdev_opened(struct vfio_device *device)
> > +{
> > +   lockdep_assert_held(>dev_set->lock);
> > +   return device->cdev_opened;
> > +}
> > +
> >  /**
> >   * @migration_set_state: Optional callback to change the migration state 
> > for
> >   * devices that support migration. It's mandatory for

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-20 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Wednesday, April 19, 2023 2:39 AM
> 
> On Tue, 18 Apr 2023 09:57:32 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Apr 17, 2023 at 02:06:42PM -0600, Alex Williamson wrote:
> > > On Mon, 17 Apr 2023 16:31:56 -0300
> > > Jason Gunthorpe  wrote:
> > >
> > > > On Mon, Apr 17, 2023 at 01:01:40PM -0600, Alex Williamson wrote:
> > > > > Yes, it's not trivial, but Jason is now proposing that we consider
> > > > > mixing groups, cdevs, and multiple iommufd_ctxs as invalid.  I think
> > > > > this means that regardless of which device calls INFO, there's only 
> > > > > one
> > > > > answer (assuming same set of devices opened, all cdev, all within same
> > > > > iommufd_ctx).  Based on what I explained about my understanding of 
> > > > > INFO2
> > > > > and Jason agreed to, I think the output would be:
> > > > >
> > > > > flags: NOT_RESETABLE | DEV_ID
> > > > > {
> > > > >   { valid devA-id,  devA-BDF },
> > > > >   { valid devC-id,  devC-BDF },
> > > > >   { valid devD-id,  devD-BDF },
> > > > >   { invalid dev-id, devE-BDF },
> > > > > }
> > > > >
> > > > > Here devB gets dropped because the kernel understands that devB is
> > > > > unopened, affected, and owned.  It's therefore not a blocker for
> > > > > hot-reset.
> > > >
> > > > I don't think we want to drop anything because it makes the API
> > > > ill suited for the debugging purpose.
> > > >
> > > > devb should be returned with an invalid dev_id if I understand your
> > > > example. Maybe it should return with -1 as the dev_id instead of 0, to
> > > > make the debugging a bit better.
> > > >
> > > > Userspace should look at only NOT_RESETTABLE to determine if it
> > > > proceeds or not, and it should use the valid dev_id list to iterate
> > > > over the devices it has open to do the config stuff.
> > >
> > > If an affected device is owned, not opened, and not interfering with
> > > the reset, what is it adding to the API to report it for debugging
> > > purposes?
> >
> > It lets it print the entire group of devices, this is the only way
> > something can learn the actual list of all BDFs affected.
> 
> If we do so, userspace must be able to differentiate which devices are
> blocking, which necessitates at least a bi-modal invalid dev-id.
> 
> > dev_id can just return 0, we don't need a complex bitmap. Userspace
> > looks at the flag, if !NOT_RESETABLE then it ignores dev_id=0.
> 
> I'm having trouble with a succinct definition of dev-id == 0, is it "A
> device affected by the hot-reset reset, which does not directly
> contribute to the availability of the hot-reset, ex. an unopened device
> within the same IOMMU group as an opened device (ie. this is not the
> device responsible if hot-reset is unavailable). 

Hide this device in the list looks fine to me. But the calling user should
not do any new device open before finishing hot-reset. Otherwise, user may
miss a device that needs to do pre/post reset. I think this requirement is
acceptable. Is it? 

> Whereas dev-id < 0
> (== -1) is an affected device which prevents hot-reset, ex. an un-owned
> device, device configured within a different iommufd_ctx, or device
> opened outside of the vfio cdev API."  Is that about right?  Thanks,

Do you mean to have separate err-code for the three possibilities? As
the devid is generated by iommufd and it is u32. I'm not sure if we can
have such err-code definition without reserving some ids in iommufd. 

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-18 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Tuesday, April 18, 2023 12:11 PM
> 
[...]
>
> We haven't discussed how it fails when called on a group-opened device
> in a mixed environment.  I'd propose that the INFO ioctl behaves
> exactly as it does today, reporting group-id and BDF for each affected
> device.  However, the hot-reset ioctl itself is not extended to accept
> devicefd because there is no proof-of-ownership model for cdevs.
> Therefore even if the user could map group-id to devicefd, they get
> -EINVAL calling HOT_RESET with a devicefd when the ioctl is called from
> a group-opened device.  Thanks,

Will it be better to let userspace know it shall fail if invoking hot
reset due to no proof-of-ownership as it also has cdev devices? Maybe
the RESETTABLE flag should always be meaningful. Even if the calling
device of _INFO is group-opened device. Old user applications does not
need to check it as it will never have such mixed environment. But for
new applications or the applications that have been updated per latest
vfio uapi, it should strictly check this flag before going ahead to do
hot-reset.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-18 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Monday, April 17, 2023 9:39 PM
> 
> On Fri, Apr 14, 2023 at 09:11:30AM +, Tian, Kevin wrote:
> 
> > The only corner case with this option is when a user mixes group
> > and cdev usages. iirc you mentioned it's a valid usage to be supported.
> > In that case the kernel doesn't have sufficient knowledge to judge
> > 'resettable' as it doesn't know which groups are opened by this user.
> 
> IMHO we don't need to support this combination.

Do you mean we don't support hot-reset for this combination or we don't
support user using this combination. I guess the prior one. Right?

> 
> We can say that to use the hot reset API the user must put all their
> devices into the same iommufd_ctx and cover 100% of the known use
> cases for this.
> 
> There are already other situations, like nesting, that do force users
> to put everything into one iommufd_ctx.
> 
> No reason to make things harder and more complicated.

Ditto. We just fail hot-reset for the multiple iommufds case. Is it?
Otherwise, we need to prevent users from using multiple iommufds.

> I'm coming to the feeling that we should put no-iommu devices in
> iommufd_ctx's as well. They would be an iommufd_access like
> mdevs. That would clean up the complications they cause here.

Ok, the lucky thing is you have merged the patch series that creates
iommufd_access for emulated devices in bind. So cdev series needs
to handle noiommu case by creating iommufd_access.

> 
> I suppose we should have done that from the beginning - no-iommu is an
> IOMMUFD access, it just uses a crazy /proc based way to learn the
> PFNs. Making it a proper access and making a real VFIO ioctl that
> calls iommufd_access_pin_pages() and returns the DMA mapped addresses
> to userspace would go a long way to making no-iommu work in a logical,
> usable, way.

This seems to be an improvement for noiommu mode. It can be done later.
For now, generating access_id and binding noiommu devices with iommufdctx
is enough for supporting noiommu hot-reset.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-16 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Saturday, April 15, 2023 1:11 AM
> 
> On Fri, 14 Apr 2023 11:38:24 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Tian, Kevin 
> > > Sent: Friday, April 14, 2023 5:12 PM
> > >
> > > > From: Alex Williamson 
> > > > Sent: Friday, April 14, 2023 2:07 AM
> > > >
> > > > We had already iterated a proposal where the group-id is replaced with
> > > > a dev-id in the existing ioctl and a flag indicates when the return
> > > > value is a dev-id vs group-id.  This had a gap that userspace cannot
> > > > determine if a reset is available given this information since un-owned
> > > > devices report an invalid dev-id and userspace can't know if it has
> > > > implicit ownership.
> > >
> > > >
> > > > It seems cleaner to me though that we would could still re-use INFO in
> > > > a similar way, simply defining a new flag bit which is valid only in
> > > > the case of returning dev-ids and indicates if the reset is available.
> > > > Therefore in one ioctl, userspace knows if hot-reset is available
> > > > (based on a kernel determination) and can pull valid dev-ids from the
> >
> > Need to confirm the meaning of hot-reset available flag. I think it
> > should at least meet below two conditions to set this flag. Although
> > it may not mean hot-reset is for sure to succeed. (but should be
> > a high chance).
> >
> > 1) dev_set is resettable (all affected device are in dev_set)
> > 2) affected device are owned by the current user
> 
> Per thread with Kevin, ownership can't always be known by the kernel.
> Beyond the group vs cdev discussion there, isn't it also possible
> (though perhaps not recommended) that a user can have multiple iommufd
> ctxs?  So I think 2) becomes "ownership of the affected dev-set can be
> inferred from the iommufd_ctx of the calling device", iow, the
> null-array calling model is available and the flag is redefined to
> match.  Reset may still be available via the proof-of-ownership model.

Yes, if there are multiple iommufd ctxs, this shall fall back to use
the proof-of-ownership model.

> 
> > Also, we need to has assumption that below two cases are rare
> > if user encounters it, it just bad luck for them. I think the existing
> > _INFO and hot-reset already has such assumption. So cdev mode
> > can adopt it as well.
> >
> > a) physical topology change (e.g. new devices plugged to affected slot)
> > b) an affected device is unbound from vfio
> 
> Yes, these are sufficiently rare that we can't do much about them.
> 
> > > So the kernel needs to compare the group id between devices with
> > > valid dev-ids and devices with invalid dev-ids to decide the implicit
> > > ownership. For noiommu device which has no group_id when
> > > VFIO_GROUP is off then it's resettable only if having a valid dev_id.
> >
> > In cdev mode, noiommu device doesn't have dev_id as it is not
> > bound to valid iommufd. So if VFIO_GROUP is off, we may never
> > allow hot-reset for noiommu devices. But we don't want to have
> > regression with noiommu devices. Perhaps we may define the usage
> > of the resettable flag like this:
> > 1) if it is set, user does not need to own all the affected devices as
> > some of them may have been owned implicitly. Kernel should have
> > checked it.
> > 2) if the flag is not set, that means user needs to check ownership
> > by itself. It needs to own all the affected devices. If not, don't
> >do hot-reset.
> 
> Exactly, the flag essentially indicates that the null-array approach is
> available, lack of the flag indicates proof-of-ownership is required.
> 
> > This way we can still make noiommu devices support hot-reset
> > just like VFIO_GROUP is on. Because noiommu devices have fake
> > groups, such groups are all singleton. So checking all affected
> > devices are opened by user is just same as check all affected
> > groups.
> 
> Yep.
> 
> > > The only corner case with this option is when a user mixes group
> > > and cdev usages. iirc you mentioned it's a valid usage to be supported.
> > > In that case the kernel doesn't have sufficient knowledge to judge
> > > 'resettable' as it doesn't know which groups are opened by this user.
> > >
> > > Not sure whether we can leave it in a ugly way so INFO may not tell
> > > 'resettable' accurately in that weird scenario.
> >
> > This seems not easy to support. If above scenario is allowed the

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-14 Thread Liu, Yi L

> From: Tian, Kevin 
> Sent: Friday, April 14, 2023 5:12 PM
> 
> > From: Alex Williamson 
> > Sent: Friday, April 14, 2023 2:07 AM
> >
> > We had already iterated a proposal where the group-id is replaced with
> > a dev-id in the existing ioctl and a flag indicates when the return
> > value is a dev-id vs group-id.  This had a gap that userspace cannot
> > determine if a reset is available given this information since un-owned
> > devices report an invalid dev-id and userspace can't know if it has
> > implicit ownership.
>
> >
> > It seems cleaner to me though that we would could still re-use INFO in
> > a similar way, simply defining a new flag bit which is valid only in
> > the case of returning dev-ids and indicates if the reset is available.
> > Therefore in one ioctl, userspace knows if hot-reset is available
> > (based on a kernel determination) and can pull valid dev-ids from the

Need to confirm the meaning of hot-reset available flag. I think it
should at least meet below two conditions to set this flag. Although
it may not mean hot-reset is for sure to succeed. (but should be
a high chance).

1) dev_set is resettable (all affected device are in dev_set)
2) affected device are owned by the current user

Also, we need to has assumption that below two cases are rare
if user encounters it, it just bad luck for them. I think the existing
_INFO and hot-reset already has such assumption. So cdev mode
can adopt it as well.

a) physical topology change (e.g. new devices plugged to affected slot)
b) an affected device is unbound from vfio

> So the kernel needs to compare the group id between devices with
> valid dev-ids and devices with invalid dev-ids to decide the implicit
> ownership. For noiommu device which has no group_id when
> VFIO_GROUP is off then it's resettable only if having a valid dev_id.

In cdev mode, noiommu device doesn't have dev_id as it is not
bound to valid iommufd. So if VFIO_GROUP is off, we may never
allow hot-reset for noiommu devices. But we don't want to have
regression with noiommu devices. Perhaps we may define the usage
of the resettable flag like this:
1) if it is set, user does not need to own all the affected devices as
some of them may have been owned implicitly. Kernel should have
checked it.
2) if the flag is not set, that means user needs to check ownership
by itself. It needs to own all the affected devices. If not, don't
   do hot-reset.

This way we can still make noiommu devices support hot-reset
just like VFIO_GROUP is on. Because noiommu devices have fake
groups, such groups are all singleton. So checking all affected
devices are opened by user is just same as check all affected
groups.

> The only corner case with this option is when a user mixes group
> and cdev usages. iirc you mentioned it's a valid usage to be supported.
> In that case the kernel doesn't have sufficient knowledge to judge
> 'resettable' as it doesn't know which groups are opened by this user.
>
> Not sure whether we can leave it in a ugly way so INFO may not tell
> 'resettable' accurately in that weird scenario.

This seems not easy to support. If above scenario is allowed there can be
three cases that returns invalid dev_id.
1) devices not opened by user but owned implicitly
2) devices not owned by user
3) devices opened via group but owned by user

User would require more info to tell the above cases from each other.

> > array to associate affected, owned devices, and still has the
> > equivalent information to know that one or more of the devices listed
> > with an invalid dev-id are preventing the hot-reset from being
> > available.
> >
> > Is that an option?  Thanks,
> >
> 
> This works for me if above corner case can be waived.

One side check, perhaps already confirmed in prior email. @Alex, So
the reason for the prediction of hot-reset is to avoid the possible
vfio_pci_pre_reset() which does heavy operations like stop DMA and
copy config space. Is it? Any other special reason? Anyhow, this reason
is enough for this prediction per my understanding.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-13 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Thursday, April 13, 2023 7:51 PM
> 
> On Thu, Apr 13, 2023 at 08:25:52AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Thursday, April 13, 2023 4:07 AM
> > >
> > >
> > > > in which case we need c) a way to
> > > > report the overall set of affected devices regardless of ownership in
> > > > support of 4), BDF?
> > >
> > > Yes, continue to use INFO unmodified.
> > >
> > > > Are we back to replacing group-ids with dev-ids in the INFO structure,
> > > > where an invalid dev-id either indicates an affected device with
> > > > implied ownership (ok) or a gap in ownership (bad) and a flag somewhere
> > > > is meant to indicate the overall disposition based on the availability
> > > > of reset?
> > >
> > > As you explore in the following this gets ugly. I prefer to keep INFO
> > > unchanged and add INFO2.
> > >
> >
> > INFO needs a change when VFIO_GROUP is disabled. Now it assumes
> > a valid iommu group always exists:
> >
> > vfio_pci_fill_devs()
> > {
> > ...
> > iommu_group = iommu_group_get(>dev);
> > if (!iommu_group)
> > return -EPERM; /* Cannot reset non-isolated devices */
> > ...
> > }
> 
> This can still work in a ugly way. With a INFO2 the only purpose of
> INFO would be debugging, so if someone uses no-iommu, with hotreset
> and misconfigures it then the only downside is they don't get the
> debugging print. But we know of nothing that uses this combination
> anyhow..

Today, at least QEMU will not go to do hot-reset if _INFO fails. I think
this check may need to be relaxed if want _INFO work when there is
no VFIO_GROUP (also no fake iommu_group).

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-12 Thread Liu, Yi L

> From: Jason Gunthorpe 
> Sent: Wednesday, April 12, 2023 8:01 AM
> 
> On Tue, Apr 11, 2023 at 03:58:27PM -0600, Alex Williamson wrote:
> 
> > > Management tools already need to understand dev_set if they want to
> > > offer reliable reset support to the VMs. Same as today.
> >
> > I don't think that's true. Our primary hot-reset use case is GPUs and
> > subordinate functions, where the isolation and reset scope are often
> > sufficiently similar to make hot-reset possible, regardless whether
> > all the functions are assigned to a VM.  I don't think you'll find any
> > management tools that takes reset scope into account otherwise.
> 
> When I think of "reliable reset support" I think of the management
> tool offering a checkbox that says "ensure PCI function reset
> availability" and if checked it will not launch the VM without a
> working reset.
> 
> If the user configures a set of VFIO devices and then hopes they get
> working reset, that is fine, but doesn't require any reporting of
> reset groups, or iommu groups to the management layer to work.
> 
> > > > As I understand the proposal, QEMU now gets to attempt to
> > > > claim ownership of the dev_set, so it opportunistically extends its
> > > > ownership and may block other users from the affected devices.
> > >
> > > We can decide the policy for the kernel to accept a claim. I suggested
> > > below "same as today" - it must hold all the groups within the
> > > iommufd_ctx.
> >
> > It must hold all the groups [that the user doesn't know about because
> > it's not a formal part of the cdev API] within the iommufd_ctx?
> 
> You keep going back to this, but I maintain userspace doesn't
> care. qemu is given a list of VFIO devices to use, all it wants to
> know is if it is allowed to use reset or not. Why should it need to
> know groups and group_ids to get that binary signal out of the kernel?
> 
> > > The simplest option for no-iommu is to require it to pass in every
> > > device fd to the reset ioctl.
> >
> > Which ironically is exactly how it ends up working today, each no-iommu
> > device has a fake IOMMU group, so every affected device (group) needs
> > to be provided.
> 
> Sure, that is probably the way forward for no-iommu. Not that anyone
> uses it..
> 
> The kicker is we don't force the user to generate a de-duplicated list
> of devices FDs, one per group, just because.
> 
> > > I want to re-focus on the basics of what cdev is supposed to be doing,
> > > because several of the idea you suggested seem against this direction:
> > >
> > >  - cdev does not have, and cannot rely on vfio_groups. We enforce this
> > >by compiling all the vfio_group infrastructure out. iommu_groups
> > >continue to exist.
> > >
> > >So converting a cdev to a vfio_group is not an allowed operation.
> >
> > My only statements in this respect were towards the notion that IOMMU
> > groups continue to exist.  I'm well aware of the desire to deprecate
> > and remove vfio groups.
> 
> Yes
> 
> > >  - no-iommu should not have iommu_groups. We enforce this by compiling
> > >out all the no-iommu vfio_group infrastructure.
> >
> > This is not logically inferred from the above if IOMMU groups continue
> > to exist and continue to be a basis for describing DMA ownership as
> > well as "reset groups"
> 
> It is not ment to flow out of the above, it is a seperate statement. I
> want the iommu_group mechanism to stop being abused outside the iommu
> core code. The only thing that should be creating groups is an
> attached iommu driver operating under ops->device_group().
> 
> VFIO needed this to support mdev and no-iommu. We already have mdev
> free of iommu_groups, I would like no-iommu to also be free of it too,
> we are very close.
> 
> That would leave POWER as the only abuser of the
> iommu_group_add_device() API, and it is only doing it because it
> hasn't got a proper iommu driver implementation yet. It turns out
> their abuse is mislocked and maybe racy to boot :(
> 
> > >  - cdev APIs should ideally not require the user to know the group_id,
> > >we should try hard to design APIs to avoid this.
> >
> > This is a nuance, group_id vs group, where it's been previously
> > discussed that users will need to continue to know the boundaries of a
> > group for the purpose of DMA isolation and potentially IOAS
> > independence should cdev/iommufd choose to tackle those topics.
> 
> Yes, group_id is a value we have no specific use for and would require
> userspace to keep seperate track of. I'd prefer to rely on dev_id as
> much as possible instead.
> 
> > What is the actual proposal here?
> 
> I don't know anymore, you don't seem to like this direction either...
> 
> > You've said that hot-reset works if the iommufd_ctx has
> > representation from each affected group, the INFO ioctl remains as
> > it is, which suggests that it's reporting group ID and BDF, yet only
> > sysfs tells the user the relation between a vfio cdev and a group
> > and we're trying to

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-11 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, April 7, 2023 8:04 PM
> 
> On Fri, 7 Apr 2023 10:09:58 +0000
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson 
> > > Sent: Monday, April 3, 2023 11:02 PM
> > >
> > > On Mon, 3 Apr 2023 09:25:06 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Liu, Yi L 
> > > > > Sent: Saturday, April 1, 2023 10:44 PM
> > > >
> > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev 
> > > > > *pdev, void
> > > *data)
> > > > >   if (!iommu_group)
> > > > >   return -EPERM; /* Cannot reset non-isolated devices */
> > > >
> > > > Hi Alex,
> > > >
> > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > >
> > > Yes
> > >
> > > > I added intel_iommu=off to disable intel iommu and bind a device to 
> > > > vfio-pci.
> > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. 
> > > > Bind
> > > > iommufd==-1 can succeed, but failed to get hot reset info due to the 
> > > > above
> > > > group check. Reason is that this happens to have some affected devices, 
> > > > and
> > > > these devices have no valid iommu_group (because they are not bound to 
> > > > vfio-
> pci
> > > > hence nobody allocates noiommu group for them). So when hot reset info 
> > > > loops
> > > > such devices, it failed with -EPERM. Is this expected?
> > >
> > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > minimally intrusive approach to no-iommu and the fact that we never
> > > defined an invalid group ID to return to the user, it makes sense that
> > > we just blocked the ioctl for no-iommu use.  I guess we can do the same
> > > for no-iommu cdev.
> >
> > I just realize a further issue related to this limitation. Remember that we
> > may finally compile out the vfio group infrastructure in the future. Say I
> > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > not support hot reset for noiommu in future if vfio group infrastructure is
> > compiled out?
> 
> We're talking about IOMMU groups, IOMMU groups are always present
> regardless of whether we expose a vfio group interface to userspace.
> Remember, we create IOMMU groups even in the no-iommu case.  Even with
> pure cdev, there are underlying IOMMU groups that maintain the DMA
> ownership.

I just realize that there is one case that does not have iommu group.
although not implemented yet. There was a discussion on SIOV support.
IIRC, it was agreed that no need to allocate iommu_group for SIOV case.
Kevin or Jason can keep me honest here. I failed to find out the link
of this discussion.

> > As another thread, we are going to add a new bdf/group capability to
> > DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
> > bdf/group capability or add a flag in the capability to mark the group_id
> > is invalid?
> 
> As above, there's always an IOMMU group, it's never invalid.  Thanks,

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-10 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Monday, April 10, 2023 10:41 PM
> 
> On Mon, 10 Apr 2023 08:48:54 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Sunday, April 9, 2023 9:30 PM
> > [...]
> > > > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > > > would be a prerequisite for [1]
> > > >
> > > > [1] 
> > > > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l@intel.com/
> > > >
> > > > I'll also try out your suggestion to add a capability like below and 
> > > > link
> > > > it in the vfio_device_info cap chain.
> > > >
> > > > #define VFIO_DEVICE_INFO_CAP_PCI_BDF  5
> > > >
> > > > struct vfio_device_info_cap_pci_bdf {
> > > >  struct vfio_info_cap_header header;
> > > >  __u32   group_id;
> > > >  __u16   segment;
> > > >  __u8bus;
> > > >  __u8devfn; /* Use PCI_SLOT/PCI_FUNC */
> > > > };
> > > >
> > >
> > > Group-id and bdf should be separate capabilities, all device should
> > > report a group-id capability and only PCI devices a bdf capability.
> >
> > ok. Since this is to support the device fd passing usage, so we need to
> > let all the vfio device drivers report group-id capability. is it? So may
> > have a below helper in vfio_main.c. How about the sample drivers?
> > seems not necessary for them. right?
> 
> The more common we can make it, the better, but if it ends up that the
> individual drivers need to initialize the capability then it would
> probably be limited to those driver with a need to expose the group.

looks to be such a case. vfio_device_info is assembled by the individual
drivers. If want to report group_id capability as a common behavior, needs
to change all of them. Had a quick draft for it as below commit:

https://github.com/yiliu1765/iommufd/commit/ff4b8bee90761961041126305183a9a7e0f0542d

https://github.com/yiliu1765/iommufd/commits/report_group_id

> Sample drivers for the purpose of illustrating the interface and of
> course anything based on vfio-pci-core which exposes hot-reset.  Thanks

do you see any sample drivers need to report group_id cap? IMHO, seems
no.

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-10 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Sunday, April 9, 2023 9:30 PM
[...]
> > yeah, needs to move the iommu group creation back to vfio_main.c. This
> > would be a prerequisite for [1]
> >
> > [1] https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l@intel.com/
> >
> > I'll also try out your suggestion to add a capability like below and link
> > it in the vfio_device_info cap chain.
> >
> > #define VFIO_DEVICE_INFO_CAP_PCI_BDF  5
> >
> > struct vfio_device_info_cap_pci_bdf {
> >  struct vfio_info_cap_header header;
> >  __u32   group_id;
> >  __u16   segment;
> >  __u8bus;
> >  __u8devfn; /* Use PCI_SLOT/PCI_FUNC */
> > };
> >
> 
> Group-id and bdf should be separate capabilities, all device should
> report a group-id capability and only PCI devices a bdf capability.

ok. Since this is to support the device fd passing usage, so we need to
let all the vfio device drivers report group-id capability. is it? So may
have a below helper in vfio_main.c. How about the sample drivers?
seems not necessary for them. right?

int vfio_pci_info_add_group_cap(struct device *dev,
struct vfio_info_cap *caps)
{
struct vfio_pci_device_info_cap_group cap = {
.header.id = VFIO_DEVICE_INFO_CAP_GROUP_ID,
.header.version = 1,
};
struct iommu_group *iommu_group;

iommu_group = iommu_group_get(>dev);
if (!iommu_group) {
kfree(caps->buf);
return -EPERM;
}

cap.group_id = iommu_group_id(iommu_group);

iommu_group_put(iommu_group);

return vfio_info_add_capability(caps, , sizeof(cap));
}

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Saturday, April 8, 2023 5:07 AM
> 
> On Fri, 7 Apr 2023 15:47:10 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Friday, April 7, 2023 11:14 PM
> > >
> > > On Fri, 7 Apr 2023 14:04:02 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Friday, April 7, 2023 9:52 PM
> > > > >
> > > > > On Fri, 7 Apr 2023 13:24:25 +
> > > > > "Liu, Yi L"  wrote:
> > > > >
> > > > > > > From: Alex Williamson 
> > > > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > > > >
> > > > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct 
> > > > > > > > > > > pci_dev
> > > *pdev,
> > > > > void
> > > > > > > > > *data)
> > > > > > > > > > >   if (!iommu_group)
> > > > > > > > > > >   return -EPERM; /* Cannot reset non-isolated 
> > > > > > > > > > > devices
> */
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Alex,
> > > > > > > > > >
> > > > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > > > >
> > > > > > > > > Yes
> > > > > > > > >
> > > > > > > > > > I added intel_iommu=off to disable intel iommu and bind a 
> > > > > > > > > > device to
> vfio-
> > > pci.
> > > > > > > > > > I can see the /dev/vfio/noiommu-0 and 
> > > > > > > > > > /dev/vfio/devices/noiommu-
> vfio0.
> > > > > Bind
> > > > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info 
> > > > > > > > > > due to the
> > > above
> > > > > > > > > > group check. Reason is that this happens to have some 
> > > > > > > > > > affected
> devices,
> > > and
> > > > > > > > > > these devices have no valid iommu_group (because they are 
> > > > > > > > > > not
> bound to
> > > > > vfio-
> > > > > > > pci
> > > > > > > > > > hence nobody allocates noiommu group for them). So when hot 
> > > > > > > > > > reset
> info
> > > > > loops
> > > > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > > > >
> > > > > > > > > Hmm, I didn't recall that we put in such a limitation, but 
> > > > > > > > > given the
> > > > > > > > > minimally intrusive approach to no-iommu and the fact that we 
> > > > > > > > > never
> > > > > > > > > defined an invalid group ID to return to the user, it makes 
> > > > > > > > > sense that
> > > > > > > > > we just blocked the ioctl for no-iommu use.  I guess we can 
> > > > > > > > > do the same
> > > > > > > > > for no-iommu cdev.
> > > > > > > >
> > > > > > > > I just realize a further issue related to this limitation. 
> > > > > > > > Remember that we
> > > > > > > > may finally compile out the vfio group infrastructure in the 
> > > > > > > > future. Say I
> > > > > > > > want to test noiommu, I may boot such a kernel with iommu 
> > > > > > > > disabled. I
> think
> > > > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it 
> > > > > > > > mean we
> will
> > > > > > > > not support hot reset for noiommu in future if vfio group 
> > > > > > > > infrastructure is
> > > > > > > > compiled out?
> > > > > > >
> > > > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > &g

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 7, 2023 11:14 PM
> 
> On Fri, 7 Apr 2023 14:04:02 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Friday, April 7, 2023 9:52 PM
> > >
> > > On Fri, 7 Apr 2023 13:24:25 +
> > > "Liu, Yi L"  wrote:
> > >
> > > > > From: Alex Williamson 
> > > > > Sent: Friday, April 7, 2023 8:04 PM
> > > > >
> > > > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct 
> > > > > > > > > pci_dev
> *pdev,
> > > void
> > > > > > > *data)
> > > > > > > > >   if (!iommu_group)
> > > > > > > > >   return -EPERM; /* Cannot reset non-isolated 
> > > > > > > > > devices */
> > > >
> > > > [1]
> > > >
> > > > > > > >
> > > > > > > > Hi Alex,
> > > > > > > >
> > > > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > > > >
> > > > > > > Yes
> > > > > > >
> > > > > > > > I added intel_iommu=off to disable intel iommu and bind a 
> > > > > > > > device to vfio-
> pci.
> > > > > > > > I can see the /dev/vfio/noiommu-0 and 
> > > > > > > > /dev/vfio/devices/noiommu-vfio0.
> > > Bind
> > > > > > > > iommufd==-1 can succeed, but failed to get hot reset info due 
> > > > > > > > to the
> above
> > > > > > > > group check. Reason is that this happens to have some affected 
> > > > > > > > devices,
> and
> > > > > > > > these devices have no valid iommu_group (because they are not 
> > > > > > > > bound to
> > > vfio-
> > > > > pci
> > > > > > > > hence nobody allocates noiommu group for them). So when hot 
> > > > > > > > reset info
> > > loops
> > > > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > > > >
> > > > > > > Hmm, I didn't recall that we put in such a limitation, but given 
> > > > > > > the
> > > > > > > minimally intrusive approach to no-iommu and the fact that we 
> > > > > > > never
> > > > > > > defined an invalid group ID to return to the user, it makes sense 
> > > > > > > that
> > > > > > > we just blocked the ioctl for no-iommu use.  I guess we can do 
> > > > > > > the same
> > > > > > > for no-iommu cdev.
> > > > > >
> > > > > > I just realize a further issue related to this limitation. Remember 
> > > > > > that we
> > > > > > may finally compile out the vfio group infrastructure in the 
> > > > > > future. Say I
> > > > > > want to test noiommu, I may boot such a kernel with iommu disabled. 
> > > > > > I think
> > > > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean 
> > > > > > we will
> > > > > > not support hot reset for noiommu in future if vfio group 
> > > > > > infrastructure is
> > > > > > compiled out?
> > > > >
> > > > > We're talking about IOMMU groups, IOMMU groups are always present
> > > > > regardless of whether we expose a vfio group interface to userspace.
> > > > > Remember, we create IOMMU groups even in the no-iommu case.  Even with
> > > > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > > > ownership.
> > > >
> > > > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > > > given device unless it is registered to VFIO, which a fake group is 
> > > > created.
> > > > That's why I hit the limitation [1]. When vfio_group is compiled out, 
> > > > then
> > > > even fake group goes away.
> > >
> > > In the vfio group case, [1] can be hit with no-iommu only when there
> > > are affected devices which are not bound to vfio.
> >
> > yes. because vfio would allocate fake group when device is registered to
> > it.
> >
> > > Why are we not
> > > allocating an IOMMU group to no-iommu devices when vfio group is
> > > disabled?  Thanks,
> >
> > hmmm. when the vfio group code is configured out. The
> > vfio_device_set_group() just returns 0 after below patch is
> > applied and CONFIG_VFIO_GROUP=n. So when there is no
> > vfio group, the fake group also goes away.
> >
> > https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l@intel.com/
> 
> Is this a fundamental issue or just a problem with the current
> implementation proposal?  It seems like the latter.  FWIW, I also don't
> see a taint happening in the cdev path for no-iommu use.  Thanks,

yes. the latter case. The reason I raised it here is to confirm the
policy on the new group/bdf capability in the DEVICE_GET_INFO. If
there is no iommu group, perhaps I only need to exclude the new
group/bdf capability from the cap chain of DEVICE_GET_INFO. is it?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 7, 2023 9:52 PM
> 
> On Fri, 7 Apr 2023 13:24:25 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Alex Williamson 
> > > Sent: Friday, April 7, 2023 8:04 PM
> > >
> > > > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev 
> > > > > > > *pdev,
> void
> > > > > *data)
> > > > > > >   if (!iommu_group)
> > > > > > >   return -EPERM; /* Cannot reset non-isolated devices */
> >
> > [1]
> >
> > > > > >
> > > > > > Hi Alex,
> > > > > >
> > > > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > > > >
> > > > > Yes
> > > > >
> > > > > > I added intel_iommu=off to disable intel iommu and bind a device to 
> > > > > > vfio-pci.
> > > > > > I can see the /dev/vfio/noiommu-0 and 
> > > > > > /dev/vfio/devices/noiommu-vfio0.
> Bind
> > > > > > iommufd==-1 can succeed, but failed to get hot reset info due to 
> > > > > > the above
> > > > > > group check. Reason is that this happens to have some affected 
> > > > > > devices, and
> > > > > > these devices have no valid iommu_group (because they are not bound 
> > > > > > to
> vfio-
> > > pci
> > > > > > hence nobody allocates noiommu group for them). So when hot reset 
> > > > > > info
> loops
> > > > > > such devices, it failed with -EPERM. Is this expected?
> > > > >
> > > > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > > > minimally intrusive approach to no-iommu and the fact that we never
> > > > > defined an invalid group ID to return to the user, it makes sense that
> > > > > we just blocked the ioctl for no-iommu use.  I guess we can do the 
> > > > > same
> > > > > for no-iommu cdev.
> > > >
> > > > I just realize a further issue related to this limitation. Remember 
> > > > that we
> > > > may finally compile out the vfio group infrastructure in the future. 
> > > > Say I
> > > > want to test noiommu, I may boot such a kernel with iommu disabled. I 
> > > > think
> > > > the _INFO ioctl would fail as there is no iommu_group. Does it mean we 
> > > > will
> > > > not support hot reset for noiommu in future if vfio group 
> > > > infrastructure is
> > > > compiled out?
> > >
> > > We're talking about IOMMU groups, IOMMU groups are always present
> > > regardless of whether we expose a vfio group interface to userspace.
> > > Remember, we create IOMMU groups even in the no-iommu case.  Even with
> > > pure cdev, there are underlying IOMMU groups that maintain the DMA
> > > ownership.
> >
> > hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
> > given device unless it is registered to VFIO, which a fake group is created.
> > That's why I hit the limitation [1]. When vfio_group is compiled out, then
> > even fake group goes away.
> 
> In the vfio group case, [1] can be hit with no-iommu only when there
> are affected devices which are not bound to vfio.

yes. because vfio would allocate fake group when device is registered to
it.

> Why are we not
> allocating an IOMMU group to no-iommu devices when vfio group is
> disabled?  Thanks,

hmmm. when the vfio group code is configured out. The
vfio_device_set_group() just returns 0 after below patch is
applied and CONFIG_VFIO_GROUP=n. So when there is no
vfio group, the fake group also goes away.

https://lore.kernel.org/kvm/20230401151833.124749-25-yi.l@intel.com/

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 7, 2023 8:04 PM
> 
> > > > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev 
> > > > > *pdev, void
> > > *data)
> > > > >   if (!iommu_group)
> > > > >   return -EPERM; /* Cannot reset non-isolated devices */

[1]

> > > >
> > > > Hi Alex,
> > > >
> > > > Is disabling iommu a sane way to test vfio noiommu mode?
> > >
> > > Yes
> > >
> > > > I added intel_iommu=off to disable intel iommu and bind a device to 
> > > > vfio-pci.
> > > > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. 
> > > > Bind
> > > > iommufd==-1 can succeed, but failed to get hot reset info due to the 
> > > > above
> > > > group check. Reason is that this happens to have some affected devices, 
> > > > and
> > > > these devices have no valid iommu_group (because they are not bound to 
> > > > vfio-
> pci
> > > > hence nobody allocates noiommu group for them). So when hot reset info 
> > > > loops
> > > > such devices, it failed with -EPERM. Is this expected?
> > >
> > > Hmm, I didn't recall that we put in such a limitation, but given the
> > > minimally intrusive approach to no-iommu and the fact that we never
> > > defined an invalid group ID to return to the user, it makes sense that
> > > we just blocked the ioctl for no-iommu use.  I guess we can do the same
> > > for no-iommu cdev.
> >
> > I just realize a further issue related to this limitation. Remember that we
> > may finally compile out the vfio group infrastructure in the future. Say I
> > want to test noiommu, I may boot such a kernel with iommu disabled. I think
> > the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
> > not support hot reset for noiommu in future if vfio group infrastructure is
> > compiled out?
> 
> We're talking about IOMMU groups, IOMMU groups are always present
> regardless of whether we expose a vfio group interface to userspace.
> Remember, we create IOMMU groups even in the no-iommu case.  Even with
> pure cdev, there are underlying IOMMU groups that maintain the DMA
> ownership.

hmmm. As [1], when iommu is disabled, there will be no iommu_group for a
given device unless it is registered to VFIO, which a fake group is created.
That's why I hit the limitation [1]. When vfio_group is compiled out, then
even fake group goes away.

>
> > As another thread, we are going to add a new bdf/group capability to
> > DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
> > bdf/group capability or add a flag in the capability to mark the group_id
> > is invalid?
> 
> As above, there's always an IOMMU group, it's never invalid.  Thanks,

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v9 06/25] kvm/vfio: Accept vfio device file from userspace

2023-04-07 Thread Liu, Yi L

Hi Eric,

> From: Eric Auger 
> Sent: Friday, April 7, 2023 4:57 PM
> 
> Hi Yi,
> 
> On 4/7/23 05:42, Liu, Yi L wrote:
> >> From: Alex Williamson 
> >> Sent: Friday, April 7, 2023 2:58 AM
> >>>> You don't say anything about potential restriction, ie. what if the user 
> >>>> calls
> >>>> KVM_DEV_VFIO_FILE with device fds while it has been using legacy
> >> container/group
> >>>> API?
> >>> legacy container/group path cannot do it as the below enhancement.
> >>> User needs to call KVM_DEV_VFIO_FILE before open devices, so this
> >>> should happen before _GET_DEVICE_FD. So the legacy path can never
> >>> pass device fds in KVM_DEV_VFIO_FILE.
> >>>
> >>>
> >>
> https://lore.kernel.org/kvm/20230327102059.333d6976.alex.william...@redhat.com
> >> /#t
> >>
> >> Wait, are you suggesting that a comment in the documentation suggesting
> >> a usage policy somehow provides enforcement of that ordering??  That's
> >> not how this works.  Thanks,
> > I don't know if there is a good way to enforce this order in the code. The
> > vfio_device->kvm pointer is optional. If it is NULL, vfio just ignores it.
> > So vfio doesn't have a good way to tell if the order requirement is met or
> > not. Perhaps just trigger NULL pointer dereference when kvm pointer is used
> > in the device drivers like kvmgt if this order is not met.
> >
> > So that's why I come up to document it here. The applications uses kvm
> > should know this and follow this otherwise it may encounter error.
> >
> > Do you have other suggestions for it? This order should be a generic
> > requirement. is it? group path also needs to follow it to make the mdev
> > driver that refers kvm pointer to be workable.
> 
> In the same way as kvm_vfio_file_is_valid() called in kvm_vfio_file_add()
> can't you have a kernel API that checks the fd consistence?

I think we are talking about how to check if the order between
KVM_DEV_VFIO_FILE_ADD and the device open (e.g. invoked by
VFIO_GROUP_GET_DEVICE_FD) is met in the code rather than document
it here. Am I missing anything here? Maybe I've misunderstood Alex's
question. ☹

Regards,
Yi Liu

> Thanks
> 
> Eric
> >
> > Thanks,
> > Yi Liu
> >
> >>>>> -The GROUP_ADD operation above should be invoked prior to accessing the
> >>>>> +The FILE/GROUP_ADD operation above should be invoked prior to accessing
> the
> >>>>>  device file descriptor via VFIO_GROUP_GET_DEVICE_FD in order to support
> >>>>>  drivers which require a kvm pointer to be set in their .open_device()
> >>>>> -callback.
> >>>>> +callback.  It is the same for device file descriptor via character 
> >>>>> device
> >>>>> +open which gets device access via VFIO_DEVICE_BIND_IOMMUFD.  For such
> file
> >>>>> +descriptors, FILE_ADD should be invoked before
> >> VFIO_DEVICE_BIND_IOMMUFD
> >>>>> +to support the drivers mentioned in prior sentence as well.
> >>> just as here. This means device fds can only be passed with 
> >>> KVM_DEV_VFIO_FILE
> >>> in the cdev path.
> >>>
> >>> Regards,
> >>> Yi Liu

Re: [Intel-gfx] [PATCH v9 10/25] vfio: Make vfio_device_open() single open for device cdev path

2023-04-07 Thread Liu, Yi L

Hi Eric,

> From: Eric Auger 
> Sent: Friday, April 7, 2023 5:48 PM
> 
> Hi Yi,
> 
> On 4/1/23 17:18, Yi Liu wrote:
> > VFIO group has historically allowed multi-open of the device FD. This
> > was made secure because the "open" was executed via an ioctl to the
> > group FD which is itself only single open.
> >
> > However, no known use of multiple device FDs today. It is kind of a
> > strange thing to do because new device FDs can naturally be created
> > via dup().
> >
> > When we implement the new device uAPI (only used in cdev path) there is
> > no natural way to allow the device itself from being multi-opened in a
> > secure manner. Without the group FD we cannot prove the security context
> > of the opener.
> >
> > Thus, when moving to the new uAPI we block the ability of opening
> > a device multiple times. Given old group path still allows it we store
> > a vfio_group pointer in struct vfio_device_file to differentiate.
> >
> > Reviewed-by: Kevin Tian 
> > Reviewed-by: Jason Gunthorpe 
> > Tested-by: Terrence Xu 
> > Tested-by: Nicolin Chen 
> > Tested-by: Yanting Jiang 
> > Signed-off-by: Yi Liu 
> > ---
> >  drivers/vfio/group.c | 2 ++
> >  drivers/vfio/vfio.h  | 2 ++
> >  drivers/vfio/vfio_main.c | 7 +++
> >  3 files changed, 11 insertions(+)
> >
> > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
> > index d55ce3ca44b7..1af4b9e012a7 100644
> > --- a/drivers/vfio/group.c
> > +++ b/drivers/vfio/group.c
> > @@ -245,6 +245,8 @@ static struct file *vfio_device_open_file(struct 
> > vfio_device
> *device)
> > goto err_out;
> > }
> >
> > +   df->group = device->group;
> > +
> in previous patches df fields were protected with various locks. I refer
> to vfio_device_group_open() implementation. No need here?

yes, no need for group. It should be static in the lifecircle of df.

> 
> By the way since the group is set here, wrt [PATCH v9 06/25] kvm/vfio:
> Accept vfio device file from userspace you have a way to determine if a
> device was opened in the legacy way, no?

yes, by this we can tell if a device file is opened by legacy or cdev.
But I guess the problem in patch 06/25 is we need to know if the order
between set_kvm and open_device is needed. is it? that order requirement
is due to that the kvm pointer is needed by open_device() callback. e.g.
kvmgt. For other vfio users, this order is not needed or even the
KVM_DEV_VFIO_FILE is not needed if vfio is not used to do device passthrough.

> > ret = vfio_device_group_open(df);
> > if (ret)
> > goto err_free;
> > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
> > index b2f20b78a707..f1a448f9d067 100644
> > --- a/drivers/vfio/vfio.h
> > +++ b/drivers/vfio/vfio.h
> > @@ -18,6 +18,8 @@ struct vfio_container;
> >
> >  struct vfio_device_file {
> > struct vfio_device *device;
> > +   struct vfio_group *group;
> > +
> > bool access_granted;
> > spinlock_t kvm_ref_lock; /* protect kvm field */
> > struct kvm *kvm;
> > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> > index 6d5d3c2180c8..c8721d5d05fa 100644
> > --- a/drivers/vfio/vfio_main.c
> > +++ b/drivers/vfio/vfio_main.c
> > @@ -477,6 +477,13 @@ int vfio_device_open(struct vfio_device_file *df)
> >
> > lockdep_assert_held(>dev_set->lock);
> >
> > +   /*
> > +* Only the group path allows the device opened multiple times.
> allows the device to be opened multiple times

got it.

Thanks,
Yi Liu

> > +* The device cdev path doesn't have a secure way for it.
> > +*/
> > +   if (device->open_count != 0 && !df->group)
> > +   return -EINVAL;
> > +
> > device->open_count++;
> > if (device->open_count == 1) {
> > ret = vfio_device_first_open(df);
> Thanks
> 
> Eric

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Monday, April 3, 2023 11:02 PM
> 
> On Mon, 3 Apr 2023 09:25:06 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Liu, Yi L 
> > > Sent: Saturday, April 1, 2023 10:44 PM
> >
> > > @@ -791,7 +813,21 @@ static int vfio_pci_fill_devs(struct pci_dev *pdev, 
> > > void
> *data)
> > >   if (!iommu_group)
> > >   return -EPERM; /* Cannot reset non-isolated devices */
> >
> > Hi Alex,
> >
> > Is disabling iommu a sane way to test vfio noiommu mode?
> 
> Yes
> 
> > I added intel_iommu=off to disable intel iommu and bind a device to 
> > vfio-pci.
> > I can see the /dev/vfio/noiommu-0 and /dev/vfio/devices/noiommu-vfio0. Bind
> > iommufd==-1 can succeed, but failed to get hot reset info due to the above
> > group check. Reason is that this happens to have some affected devices, and
> > these devices have no valid iommu_group (because they are not bound to 
> > vfio-pci
> > hence nobody allocates noiommu group for them). So when hot reset info loops
> > such devices, it failed with -EPERM. Is this expected?
> 
> Hmm, I didn't recall that we put in such a limitation, but given the
> minimally intrusive approach to no-iommu and the fact that we never
> defined an invalid group ID to return to the user, it makes sense that
> we just blocked the ioctl for no-iommu use.  I guess we can do the same
> for no-iommu cdev.

I just realize a further issue related to this limitation. Remember that we
may finally compile out the vfio group infrastructure in the future. Say I
want to test noiommu, I may boot such a kernel with iommu disabled. I think
the _INFO ioctl would fail as there is no iommu_group. Does it mean we will
not support hot reset for noiommu in future if vfio group infrastructure is
compiled out?

As another thread, we are going to add a new bdf/group capability to
DEVICE_GET_INFO. If the above kernel is booted, shall we exclude the new
bdf/group capability or add a flag in the capability to mark the group_id
is invalid?

Regards,
Yi Liu

Re: [Intel-gfx] [PATCH v3 12/12] vfio/pci: Report dev_id in VFIO_DEVICE_GET_PCI_HOT_RESET_INFO

2023-04-07 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, April 7, 2023 1:54 AM
> 
> On Thu, 6 Apr 2023 10:02:10 +0000
> "Liu, Yi L"  wrote:
> 
> > > From: Jason Gunthorpe 
> > > Sent: Thursday, April 6, 2023 7:23 AM
> > >
> > > On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote:
> > >
> > > > > > QEMU can make a policy decision today because the kernel provides a
> > > > > > sufficiently reliable interface, ie. based on the set of owned 
> > > > > > groups, a
> > > > > > hot-reset is all but guaranteed to work.
> > > > >
> > > > > And we don't change that with cdev. If qemu wants to make the policy
> > > > > decision it keeps using the exact same _INFO interface to make that
> > > > > decision same it has always made.
> > > > >
> > > > > We weaken the actual reset action to only consider the security side.
> > > > >
> > > > > Applications that want this exclusive reset group policy simply must
> > > > > check it on their own. It is a reasonable API design.
> > > >
> > > > I disagree, as I've argued before, the info ioctl becomes so weak and
> > > > effectively arbitrary from a user perspective at being able to predict
> > > > whether the hot-reset ioctl works that it becomes useless, diminishing
> > > > the entire hot-reset info/execute API.
> > >
> > > reset should be strictly more permissive than INFO. If INFO predicts
> > > reset is permitted then reset should succeed.
> > >
> > > We don't change INFO so it cannot "becomes so weak"  ??
> > >
> > > We don't care about the cases where INFO says it will not succeed but
> > > reset does (temporarily) succeed.
> > >
> > > I don't get what argument you are trying to make or what you think is
> > > diminished..
> > >
> > > Again, userspace calls INFO, if info says yes then reset *always
> > > works*, exactly just like today.
> > >
> > > Userspace will call reset with a 0 length FD list and it uses a
> > > security only check that is strictly more permissive than what
> > > get_info will return. So the new check is simple in the kernel and
> > > always works in the cases we need it to work.
> > >
> > > What is getting things into trouble is insisting that RESET have
> > > additional restrictions beyond the minimum checks required for
> > > security.
> > >
> > > > > I don't view it as a loophole, it is flexability to use the API in a
> > > > > way that is different from what qemu wants - eg an app like dpdk may
> > > > > be willing to tolerate a reset group that becomes unavailable after
> > > > > startup. Who knows, why should we force this in the kernel?
> > > >
> > > > Because look at all the problems it's causing to try to introduce these
> > > > loopholes without also introducing subtle bugs.
> > >
> > > These problems are coming from tring to do this integrated version,
> > > not from my approach!
> > >
> > > AFAICT there was nothing wrong with my original plan of using the
> > > empty fd list for reset. What Yi has here is some mashup of what you
> > > and I both suggested.
> >
> > Hi Alex, Jason,
> >
> > could be this reason. So let me try to gather the changes of this series
> > does and the impact as far as I know.
> >
> > 1) only check the ownership of opened devices in the dev_set
> >  in HOT_RESET ioctl.
> >  - Impact: it changes the relationship between _INFO  and HOT_RESET.
> >As " Each group must have IOMMU protection established for the
> >ioctl to succeed." in [1], existing design actually means userspace
> >should own all the affected groups before heading to do HOT_RESET.
> >With the change here, the user does not need to ensure all affected
> >groups are opened and it can do hot-reset successfully as long as the
> >devices in the affected group are just un-opened and can be reset.
> >
> >[1] https://patchwork.kernel.org/project/linux-
> pci/patch/20130814200845.21923.64284.st...@bling.home/
> 
> Where whether a device is opened is subject to change outside of the
> user's control.  This essentially allows the user to perform hot-resets
> of devices outside of their ownership so long as the device is not
> used elsewhere, versus the current

1 2 3 >

1 - 100 of 262 matches

Mail list logo