Re: RFC: vfio / iommu driver for hardware with no iommu

Alex Williamson Tue, 30 Apr 2013 10:55:04 -0700

On Tue, 2013-04-30 at 13:28 -0400, Konrad Rzeszutek Wilk wrote:
> On Sat, Apr 27, 2013 at 12:22:28PM +0800, Andrew Cooks wrote:
> > On Fri, Apr 26, 2013 at 6:23 AM, Don Dutile <ddut...@redhat.com> wrote:
> > > On 04/24/2013 10:49 PM, Sethi Varun-B16395 wrote:
> > >>
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> > >>> boun...@lists.linux-foundation.org] On Behalf Of Don Dutile
> > >>> Sent: Thursday, April 25, 2013 1:11 AM
> > >>> To: Alex Williamson
> > >>> Cc: Yoder Stuart-B08248; iommu@lists.linux-foundation.org
> > >>> Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
> > >>>
> > >>> On 04/23/2013 03:47 PM, Alex Williamson wrote:
> > >>>>
> > >>>> On Tue, 2013-04-23 at 19:16 +0000, Yoder Stuart-B08248 wrote:
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Alex Williamson [mailto:alex.william...@redhat.com]
> > >>>>>> Sent: Tuesday, April 23, 2013 11:56 AM
> > >>>>>> To: Yoder Stuart-B08248
> > >>>>>> Cc: Joerg Roedel; iommu@lists.linux-foundation.org
> > >>>>>> Subject: Re: RFC: vfio / iommu driver for hardware with no iommu
> > >>>>>>
> > >>>>>> On Tue, 2013-04-23 at 16:13 +0000, Yoder Stuart-B08248 wrote:
> > >>>>>>>
> > >>>>>>> Joerg/Alex,
> > >>>>>>>
> > >>>>>>> We have embedded systems where we use QEMU/KVM and have the
> > >>>>>>> requirement to do device assignment, but have no iommu.  So we
> > >>>>>>> would like to get vfio-pci working on systems like this.
> > >>>>>>>
> > >>>>>>> We're aware of the obvious limitations-- no protection, DMA'able
> > >>>>>>> memory must be physically contiguous and will have no iova->phy
> > >>>>>>> translation.  But there are use cases where all OSes involved are
> > >>>>>>> trusted and customers can
> > >>>>>>> live with those limitations.   Virtualization is used
> > >>>>>>> here not to sandbox untrusted code, but to consolidate multiple
> > >>>>>>> OSes.
> > >>>>>>>
> > >>>>>>> We would like to get your feedback on the rough idea.  There are
> > >>>>>>> two parts-- iommu driver and vfio-pci.
> > >>>>>>>
> > >>>>>>> 1.  iommu driver
> > >>>>>>>
> > >>>>>>> First, we still need device groups created because vfio is based on
> > >>>>>>> that, so we envision a 'dummy' iommu driver that implements only
> > >>>>>>> the add/remove device ops.  Something like:
> > >>>>>>>
> > >>>>>>>       static struct iommu_ops fsl_none_ops = {
> > >>>>>>>               .add_device     = fsl_none_add_device,
> > >>>>>>>               .remove_device  = fsl_none_remove_device,
> > >>>>>>>       };
> > >>>>>>>
> > >>>>>>>       int fsl_iommu_none_init()
> > >>>>>>>       {
> > >>>>>>>               int ret = 0;
> > >>>>>>>
> > >>>>>>>               ret = iommu_init_mempool();
> > >>>>>>>               if (ret)
> > >>>>>>>                       return ret;
> > >>>>>>>
> > >>>>>>>               bus_set_iommu(&platform_bus_type,&fsl_none_ops);
> > >>>>>>>               bus_set_iommu(&pci_bus_type,&fsl_none_ops);
> > >>>>>>>
> > >>>>>>>               return ret;
> > >>>>>>>       }
> > >>>>>>>
> > >>>>>>> 2.  vfio-pci
> > >>>>>>>
> > >>>>>>> For vfio-pci, we would ideally like to keep user space mostly
> > >>>>>>> unchanged.  User space will have to follow the semantics of mapping
> > >>>>>>> only physically contiguous chunks...and iova will equal phys.
> > >>>>>>>
> > >>>>>>> So, we propose to implement a new vfio iommu type, called
> > >>>>>>> VFIO_TYPE_NONE_IOMMU.  This implements any needed vfio interfaces,
> > >>>>>>> but there are no calls to the iommu layer...e.g. map_dma() is a
> > >>>>>>> noop.
> > >>>>>>>
> > >>>>>>> Would like your feedback.
> > >>>>>>
> > >>>>>>
> > >>>>>> My first thought is that this really detracts from vfio and iommu
> > >>>>>> groups being a secure interface, so somehow this needs to be clearly
> > >>>>>> an insecure mode that requires an opt-in and maybe taints the
> > >>>>>> kernel.  Any notion of unprivileged use needs to be blocked and it
> > >>>>>> should test CAP_COMPROMISE_KERNEL (or whatever it's called now) at
> > >>>>>> critical access points.  We might even have interfaces exported that
> > >>>>>> would allow this to be an out-of-tree driver (worth a check).
> > >>>>>>
> > >>>>>> I would guess that you would probably want to do all the iommu group
> > >>>>>> setup from the vfio fake-iommu driver.  In other words, that driver
> > >>>>>> both creates the fake groups and provides the dummy iommu backend for
> > >>>
> > >>> vfio.
> > >>>>>>
> > >>>>>> That would be a nice way to compartmentalize this as a
> > >>>>>> vfio-noiommu-special.
> > >>>>>
> > >>>>>
> > >>>>> So you mean don't implement any of the iommu driver ops at all and
> > >>>>> keep everything in the vfio layer?
> > >>>>>
> > >>>>> Would you still have real iommu groups?...i.e.
> > >>>>> $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group
> > >>>>> ../../../../kernel/iommu_groups/26
> > >>>>>
> > >>>>> ...and that is created by vfio-noiommu-special?
> > >>>>
> > >>>>
> > >>>> I'm suggesting (but haven't checked if it's possible), to implement
> > >>>> the iommu driver ops as part of the vfio iommu backend driver.  The
> > >>>> primary motivation for this would be to a) keep a fake iommu groups
> > >>>> interface out of the iommu proper (possibly containing it in an
> > >>>> external driver) and b) modularizing it so we don't have fake iommu
> > >>>> groups being created by default.  It would have to populate the iommu
> > >>>> groups sysfs interfaces to be compatible with vfio.
> > >>>>
> > >>>>> Right now when the PCI and platform buses are probed, the iommu
> > >>>>> driver add-device callback gets called and that is where the
> > >>>>> per-device group gets created.  Are you envisioning registering a
> > >>>>> callback for the PCI bus to do this in vfio-noiommu-special?
> > >>>>
> > >>>>
> > >>>> Yes.  It's just as easy to walk all the devices rather than doing
> > >>>> callbacks, iirc the group code does this when you register.  In fact,
> > >>>> this noiommu interface may not want to add all devices, we may want to
> > >>>> be very selective and only add some.
> > >>>>
> > >>> Right.
> > >>> Sounds like a no-iommu driver is needed to leave vfio unaffected, and
> > >>> still leverage/use vfio for qemu's device assignment.
> > >>> Just not sure how to 'taint' it as 'not secure' if no-iommu driver put 
> > >>> in
> > >>> place.
> > >>>
> > >>> btw -- qemu has the inherent assumption that pci cfg cycles are trapped,
> > >>>          so assigned devices are 'remapped' from system-B:D.F to virt-
> > >>> machine's
> > >>>          (virtualized) B:D.F of the assigned device.
> > >>>          Are pci-cfg cycles trapped in freescale qemu model ?
> > >>>
> > >> The vfio-pci device would be visible (to a KVM guest) as a PCI device on
> > >> the virtual PCI bus (emulated by qemu).
> > >>
> > >> -Varun
> > >>
> > > Understood, but as Alex stated, the whole purpose of VFIO is to
> > > be able to do _secure_, user-level-driven I/O.  Since this would
> > > be 'unsecure', there should be a way to note that during configuration.
> > >
> > 
> > Does vfio work with swiotlb and if not, can/should swiotlb be
> > extended? Or does the time and space overhead make it a moot point?
> 
> It does not work with SWIOTLB as it uses the DMA API, not the IOMMU API.
> 
> It could be extended to use it. I was toying with this b/c for Xen to
> use VFIO I would have to implement an Xen IOMMU driver that would basically
> piggyback on the SWIOTLB (as Xen itself does the IOMMU parts and takes
> care of all the hard work of securing each guest).
> 
> But your requirement would be the same, so it might as well be an generic
> driver called SWIOTLB-IOMMU driver.
> 
> If you are up for writting I am up for reviewing/Ack-ing/etc.
> 
> The complexity would be to figure out the VFIO group thing and how to assign
> PCI B:D:F devices to the SWIOTLB-IOMMU driver. Perhaps the same way as
> xen-pciback does (or pcistub). That is by writting the BDF in the "bind"
> attribute in SysFS (or via a kernel parameter).


Just to reiterate, we need to be very, very careful about fake iommu
groups.  iommu groups are meant to express hardware isolation
capabilities.  swiotlb by definition has no hardware isolation
capabilities.  Except for very specific (likely embedded) use cases,
that makes the whole idea of vfio less interesting.  Devices would be
exposed to userspace with neither isolation nor translation.  You might
as well use the uio pci interface at that point.  The qemu use case of
vfio is very difficult to achieve in that model as you either need to
identity map the guest or expose an iommu to the guest.  You won't
achieve transparent device assignment on x86 with such a model if that's
the goal.  Thanks,

Alex

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: RFC: vfio / iommu driver for hardware with no iommu

Reply via email to