Re: [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles

2022-03-22 Thread Jason Gunthorpe via iommu
On Tue, Mar 22, 2022 at 03:18:51PM +0100, Niklas Schnelle wrote:
> On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> > This is the basic infrastructure of a new miscdevice to hold the iommufd
> > IOCTL API.
> > 
> > It provides:
> >  - A miscdevice to create file descriptors to run the IOCTL interface over
> > 
> >  - A table based ioctl dispatch and centralized extendable pre-validation
> >step
> > 
> >  - An xarray mapping user ID's to kernel objects. The design has multiple
> >inter-related objects held within in a single IOMMUFD fd
> > 
> >  - A simple usage count to build a graph of object relations and protect
> >against hostile userspace racing ioctls
> 
> For me at this point it seems hard to grok what this "graph of object
> relations" is about. Maybe an example would help? I'm assuming this is
> about e.g. the DEVICE -depends-on-> HW_PAGETABLE -depends-on-> IOAS  so
> the arrows in the picture of PATCH 02? 

Yes, it is basically standard reference relations, think
'kref'. Object A referenced B because A has a pointer to B in its
struct.

Therefore B cannot be destroyed before A without creating a dangling
reference.

> Or is it the other way around
> and the IOAS -depends-on-> HW_PAGETABLE because it's about which
> references which? From the rest of the patch I seem to understand that
> this mostly establishes the order of destruction. So is HW_PAGETABLE
> destroyed before the IOAS because a HW_PAGETABLE must never reference
> an invalid/destoryed IOAS or is it the other way arund because the IOAS
> has a reference to the HW_PAGETABLES in it? I'd guess the former but
> I'm a bit confused still.

Yes HW_PAGETABLE is first because it is responsible to remove the
iommu_domain from the IOAS and the IOAS cannot be destroyed with
iommu_domains in it.

> > +/*
> > + * The objects for an acyclic graph through the users refcount. This enum 
> > must
> > + * be sorted by type depth first so that destruction completes lower 
> > objects and
> > + * releases the users refcount before reaching higher objects in the graph.
> > + */
> 
> A bit like my first comment I think this would benefit from an example
> what lower/higher mean in this case. I believe a lower object must only
> reference/depend on higher objects, correct?

Maybe it is too confusing - I was debating using a try and fail
approach instead which achieves the same thing with a little different
complexity. It seems we may need to do this anyhow for nesting..

Would look like this:

/* Destroy the graph from depth first */
while (!xa_empty(&ictx->objects)) {
unsigned int destroyed = 0;
unsigned long index = 0;

xa_for_each (&ictx->objects, index, obj) {
/*
 * Since we are in release elevated users must come from
 * other objects holding the users. We will eventually
 * destroy the object that holds this one and the next
 * pass will progress it.
 */
if (!refcount_dec_if_one(&obj->users))
continue;
destroyed++;
xa_erase(&ictx->objects, index);
iommufd_object_ops[obj->type].destroy(obj);
kfree(obj);
}
/* Bug related to users refcount */
if (WARN_ON(!destroyed))
break;
}
kfree(ictx);

> > +struct iommufd_object *iommufd_get_object(struct iommufd_ctx *ictx, u32 id,
> > + enum iommufd_object_type type)
> > +{
> > +   struct iommufd_object *obj;
> > +
> > +   xa_lock(&ictx->objects);
> > +   obj = xa_load(&ictx->objects, id);
> > +   if (!obj || (type != IOMMUFD_OBJ_ANY && obj->type != type) ||
> > +   !iommufd_lock_obj(obj))
> 
> Looking at the code it seems iommufd_lock_obj() locks against
> destroy_rw_sem and increases the reference count but there is also an
> iommufd_put_object_keep_user() which only unlocks the destroy_rw_sem. I
> think I personally would benefit from a comment/commit message
> explaining the lifecycle.
> 
> There is the below comment in iommufd_object_destroy_user() but I think
> it would be better placed near the destroy_rwsem decleration and to
> also explain the interaction between the destroy_rwsem and the
> reference count.

I do prefer it near the destroy because that is the only place that
actually requires the property it gives. The code outside this layer
shouldn't know about this at all beyond folowing some rules about
iommufd_put_object_keep_user(). Lets add a comment there instead:

/**
 * iommufd_put_object_keep_user() - Release part of the refcount on obj
 * @obj - Object to release
 *
 * Objects have two protections to ensure that userspace has a consistent
 * experience with destruction. Normally objects are lock

Re: [PATCH RFC 03/12] iommufd: File descriptor, context, kconfig and makefiles

2022-03-22 Thread Niklas Schnelle
On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> This is the basic infrastructure of a new miscdevice to hold the iommufd
> IOCTL API.
> 
> It provides:
>  - A miscdevice to create file descriptors to run the IOCTL interface over
> 
>  - A table based ioctl dispatch and centralized extendable pre-validation
>step
> 
>  - An xarray mapping user ID's to kernel objects. The design has multiple
>inter-related objects held within in a single IOMMUFD fd
> 
>  - A simple usage count to build a graph of object relations and protect
>against hostile userspace racing ioctls

For me at this point it seems hard to grok what this "graph of object
relations" is about. Maybe an example would help? I'm assuming this is
about e.g. the DEVICE -depends-on-> HW_PAGETABLE -depends-on-> IOAS  so
the arrows in the picture of PATCH 02? Or is it the other way around
and the IOAS -depends-on-> HW_PAGETABLE because it's about which
references which? From the rest of the patch I seem to understand that
this mostly establishes the order of destruction. So is HW_PAGETABLE
destroyed before the IOAS because a HW_PAGETABLE must never reference
an invalid/destoryed IOAS or is it the other way arund because the IOAS
has a reference to the HW_PAGETABLES in it? I'd guess the former but
I'm a bit confused still.

> 
> The only IOCTL provided in this patch is the generic 'destroy any object
> by handle' operation.
> 
> Signed-off-by: Yi Liu 
> Signed-off-by: Jason Gunthorpe 
> ---
>  .../userspace-api/ioctl/ioctl-number.rst  |   1 +
>  MAINTAINERS   |  10 +
>  drivers/iommu/Kconfig |   1 +
>  drivers/iommu/Makefile|   2 +-
>  drivers/iommu/iommufd/Kconfig |  13 +
>  drivers/iommu/iommufd/Makefile|   5 +
>  drivers/iommu/iommufd/iommufd_private.h   |  95 ++
>  drivers/iommu/iommufd/main.c  | 305 ++
>  include/uapi/linux/iommufd.h  |  55 
>  9 files changed, 486 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/iommufd/Kconfig
>  create mode 100644 drivers/iommu/iommufd/Makefile
>  create mode 100644 drivers/iommu/iommufd/iommufd_private.h
>  create mode 100644 drivers/iommu/iommufd/main.c
>  create mode 100644 include/uapi/linux/iommufd.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index e6fce2cbd99ed4..4a041dfc61fe95 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -105,6 +105,7 @@ Code  Seq#Include File
>Comments
>  '8'   allSNP8023 
> advanced NIC card
>   
> 
>  ';'   64-7F  linux/vfio.h
> +';'   80-FF  linux/iommufd.h
>  '='   00-3f  uapi/linux/ptp_clock.h  
> 
>  '@'   00-0F  linux/radeonfb.h
> conflict!
>  '@'   00-0F  drivers/video/aty/aty128fb.c
> conflict!
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1ba1e4af2cbc80..23a9c631051ee8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -10038,6 +10038,16 @@ L:   linux-m...@vger.kernel.org
>  S:   Maintained
>  F:   drivers/net/ethernet/sgi/ioc3-eth.c
>  
> +IOMMU FD
> +M:   Jason Gunthorpe 
> +M:   Kevin Tian 
> +L:   iommu@lists.linux-foundation.org
> +S:   Maintained
> +F:   Documentation/userspace-api/iommufd.rst
> +F:   drivers/iommu/iommufd/
> +F:   include/uapi/linux/iommufd.h
> +F:   include/linux/iommufd.h
> +
>  IOMAP FILESYSTEM LIBRARY
>  M:   Christoph Hellwig 
>  M:   Darrick J. Wong 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 3eb68fa1b8cc02..754d2a9ff64623 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -177,6 +177,7 @@ config MSM_IOMMU
>  
>  source "drivers/iommu/amd/Kconfig"
>  source "drivers/iommu/intel/Kconfig"
> +source "drivers/iommu/iommufd/Kconfig"
>  
>  config IRQ_REMAP
>   bool "Support for Interrupt Remapping"
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index bc7f730edbb0be..6b38d12692b213 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-y += amd/ intel/ arm/
> +obj-y += amd/ intel/ arm/ iommufd/
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> diff --git a/drivers/iommu/iommufd/Kconfig b/drivers/iommu/iommufd/Kconfig
> new file mode 100644
> index 00..fddd453bb0e764
> --- /dev/null
> +++ b/drivers/iommu/iommufd/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config IOMMUFD
> + tristate