On Mon, 2015-10-05 at 06:27 +0000, Bhushan Bharat wrote: > > > > -----Original Message----- > > From: Alex Williamson [mailto:alex.william...@redhat.com] > > Sent: Saturday, October 03, 2015 4:16 AM > > To: Bhushan Bharat-R65777 <bharat.bhus...@freescale.com> > > Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; > > christoffer.d...@linaro.org; eric.au...@linaro.org; pranavku...@linaro.org; > > marc.zyng...@arm.com; will.dea...@arm.com > > Subject: Re: [RFC PATCH 4/6] vfio: Add interface to iommu-map/unmap MSI > > pages > > > > On Wed, 2015-09-30 at 20:26 +0530, Bharat Bhushan wrote: > > > For MSI interrupts to work for a pass-through devices we need to have > > > mapping of msi-pages in iommu. Now on some platforms (like x86) does > > > this msi-pages mapping happens magically and in these case they > > > chooses an iova which they somehow know that it will never overlap > > > with guest memory. But this magic iova selection may not be always > > > true for all platform (like PowerPC and ARM64). > > > > > > Also on x86 platform, there is no problem as long as running a > > > x86-guest on x86-host but there can be issues when running a non-x86 > > > guest on > > > x86 host or other userspace applications like (I think ODP/DPDK). > > > As in these cases there can be chances that it overlaps with guest > > > memory mapping. > > > > Wow, it's amazing anything works... smoke and mirrors. > > > > > This patch add interface to iommu-map and iommu-unmap msi-pages at > > > reserved iova chosen by userspace. > > > > > > Signed-off-by: Bharat Bhushan <bharat.bhus...@freescale.com> > > > --- > > > drivers/vfio/vfio.c | 52 +++++++++++++++++++ > > > drivers/vfio/vfio_iommu_type1.c | 111 > > ++++++++++++++++++++++++++++++++++++++++ > > > include/linux/vfio.h | 9 +++- > > > 3 files changed, 171 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index > > > 2fb29df..a817d2d 100644 > > > --- a/drivers/vfio/vfio.c > > > +++ b/drivers/vfio/vfio.c > > > @@ -605,6 +605,58 @@ static int vfio_iommu_group_notifier(struct > > notifier_block *nb, > > > return NOTIFY_OK; > > > } > > > > > > +int vfio_device_map_msi(struct vfio_device *device, uint64_t msi_addr, > > > + uint32_t size, uint64_t *msi_iova) { > > > + struct vfio_container *container = device->group->container; > > > + struct vfio_iommu_driver *driver; > > > + int ret; > > > + > > > + /* Validate address and size */ > > > + if (!msi_addr || !size || !msi_iova) > > > + return -EINVAL; > > > + > > > + down_read(&container->group_lock); > > > + > > > + driver = container->iommu_driver; > > > + if (!driver || !driver->ops || !driver->ops->msi_map) { > > > + up_read(&container->group_lock); > > > + return -EINVAL; > > > + } > > > + > > > + ret = driver->ops->msi_map(container->iommu_data, > > > + msi_addr, size, msi_iova); > > > + > > > + up_read(&container->group_lock); > > > + return ret; > > > +} > > > + > > > +int vfio_device_unmap_msi(struct vfio_device *device, uint64_t > > msi_iova, > > > + uint64_t size) > > > +{ > > > + struct vfio_container *container = device->group->container; > > > + struct vfio_iommu_driver *driver; > > > + int ret; > > > + > > > + /* Validate address and size */ > > > + if (!msi_iova || !size) > > > + return -EINVAL; > > > + > > > + down_read(&container->group_lock); > > > + > > > + driver = container->iommu_driver; > > > + if (!driver || !driver->ops || !driver->ops->msi_unmap) { > > > + up_read(&container->group_lock); > > > + return -EINVAL; > > > + } > > > + > > > + ret = driver->ops->msi_unmap(container->iommu_data, > > > + msi_iova, size); > > > + > > > + up_read(&container->group_lock); > > > + return ret; > > > +} > > > + > > > /** > > > * VFIO driver API > > > */ > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > b/drivers/vfio/vfio_iommu_type1.c index 3315fb6..ab376c2 100644 > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > @@ -1003,12 +1003,34 @@ out_free: > > > return ret; > > > } > > > > > > +static void vfio_iommu_unmap_all_reserved_regions(struct vfio_iommu > > > +*iommu) { > > > + struct vfio_resvd_region *region; > > > + struct vfio_domain *d; > > > + > > > + list_for_each_entry(region, &iommu->reserved_iova_list, next) { > > > + list_for_each_entry(d, &iommu->domain_list, next) { > > > + if (!region->map_paddr) > > > + continue; > > > + > > > + if (!iommu_iova_to_phys(d->domain, region->iova)) > > > + continue; > > > + > > > + iommu_unmap(d->domain, region->iova, > > PAGE_SIZE); > > > > PAGE_SIZE? Why not region->size? > > Yes, this should be region->size. > > > > > > + region->map_paddr = 0; > > > + cond_resched(); > > > + } > > > + } > > > +} > > > + > > > static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu) { > > > struct rb_node *node; > > > > > > while ((node = rb_first(&iommu->dma_list))) > > > vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, > > node)); > > > + > > > + vfio_iommu_unmap_all_reserved_regions(iommu); > > > } > > > > > > static void vfio_iommu_type1_detach_group(void *iommu_data, @@ > > > -1048,6 +1070,93 @@ done: > > > mutex_unlock(&iommu->lock); > > > } > > > > > > +static int vfio_iommu_type1_msi_map(void *iommu_data, uint64_t > > msi_addr, > > > + uint64_t size, uint64_t *msi_iova) { > > > + struct vfio_iommu *iommu = iommu_data; > > > + struct vfio_resvd_region *region; > > > + int ret; > > > + > > > + mutex_lock(&iommu->lock); > > > + > > > + /* Do not try ceate iommu-mapping if msi reconfig not allowed */ > > > + if (!iommu->allow_msi_reconfig) { > > > + mutex_unlock(&iommu->lock); > > > + return 0; > > > + } > > > + > > > + /* Check if there is already region mapping the msi page */ > > > + list_for_each_entry(region, &iommu->reserved_iova_list, next) { > > > + if (region->map_paddr == msi_addr) { > > > + *msi_iova = region->iova; > > > + region->refcount++; > > > + mutex_unlock(&iommu->lock); > > > + return 0; > > > + } > > > + } > > > + > > > + /* Get a unmapped reserved region */ > > > + list_for_each_entry(region, &iommu->reserved_iova_list, next) { > > > + if (!region->map_paddr) > > > + break; > > > + } > > > + > > > + if (region == NULL) { > > > + mutex_unlock(&iommu->lock); > > > + return -ENODEV; > > > + } > > > + > > > + ret = vfio_iommu_map(iommu, region->iova, msi_addr >> > > PAGE_SHIFT, > > > + size >> PAGE_SHIFT, region->prot); > > > > So the reserved region has a size and the msi mapping has a size and we > > arbitrarily decide to use the msi mapping size here? > > Reserved region interface is generic and user can set reserved region of any > size (multiple of page-size). But we do not want to create MSI address > mapping beyond the MSI-page otherwise this can be security issue. But I think > I am not tracking how much reserved iova region is mapped, so unmap is called > for same size. > > > > The overlap checks we've done for the reserved region are meaningless > > then. No wonder > > you're unmapping with PAGE_SIZE, we have no idea. > > Do you think we should divide the reserved region in pages and track > map/unmap per page?
I'd certainly expect as a user to do one large reserved region mapping and be done rather than a large number of smaller mappings. I don't really understand how we're providing isolation with this interface though, we're setting up the IOMMU so the guest has a mapping to the MSI, but our IOMMU granularity is page size. Aren't we giving the guest access to everything else that might be mapped into that page? Don't we need to push an reservation down to the MSI allocation in order to have isolation? If we did that, couldn't we pretty much guarantee that all MSI vectors would fit into a page or two? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html