RE: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1

2013-05-27 Thread Sethi Varun-B16395


 -Original Message-
 From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
 boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
 Sent: Friday, May 24, 2013 10:55 PM
 To: alex.william...@redhat.com
 Cc: iommu@lists.linux-foundation.org; chegu_vi...@hp.com; qemu-
 de...@nongnu.org; k...@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1
 
 We currently send all mappings to the iommu in PAGE_SIZE chunks, which
 prevents the iommu from enabling support for larger page sizes.
 We still need to pin pages, which means we step through them in PAGE_SIZE
 chunks, but we can batch up contiguous physical memory chunks to allow
 the iommu the opportunity to use larger pages.  The approach here is a
 bit different that the one currently used for legacy KVM device
 assignment.  Rather than looking at the vma page size and using that as
 the maximum size to pass to the iommu, we instead simply look at whether
 the next page is physically contiguous.  This means we might ask the
 iommu to map a 4MB region, while legacy KVM might limit itself to a

[Sethi Varun-B16395] Wouldn't this depend on the IOMMU page alignment 
constraints?

 maximum of 2MB.
 
 Splitting our mapping path also allows us to be smarter about locked
 memory because we can more easily unwind if the user attempts to exceed
 the limit.  Therefore, rather than assuming that a mapping will result in
 locked memory, we test each page as it is pinned to determine whether it
 locks RAM vs an mmap'd MMIO region.  This should result in better locking
 granularity and less locked page fudge factors in userspace.
 
 The unmap path uses the same algorithm as legacy KVM.  We don't want to
 track the pfn for each mapping ourselves, but we need the pfn in order to
 unpin pages.  We therefore ask the iommu for the iova to physical address
 translation, ask it to unpin a page, and see how many pages were actually
 unpinned.  iommus supporting large pages will often return something
 bigger than a page here, which we know will be physically contiguous and
 we can unpin a batch of pfns.  iommus not supporting large mappings won't
 see an improvement in batching here as they only unmap a page at a time.
 
 With this change, we also make a clarification to the API for mapping and
 unmapping DMA.  We can only guarantee unmaps at the same granularity as
 used for the original mapping.  In other words, unmapping a subregion of
 a previous mapping is not guaranteed and may result in a larger or
 smaller unmapping than requested.  The size field in the unmapping
 structure is updated to reflect this.
 Previously this was unmodified on mapping, always returning the the
 requested unmap size.  This is now updated to return the actual unmap
 size on success, allowing userspace to appropriately track mappings.
 
[Sethi Varun-B16395] The main problem here is that the user space application 
is oblivious of the physical memory contiguity, right?


 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
  drivers/vfio/vfio_iommu_type1.c |  523 +
 --
 +static long vfio_unpin_pages(unsigned long pfn, long npage,
 +  int prot, bool do_accounting)
 +{
 + unsigned long unlocked = 0;
 + long i;
 +
 + for (i = 0; i  npage; i++)
 + unlocked += put_pfn(pfn++, prot);
 +
 + if (do_accounting)
 + vfio_lock_acct(-unlocked);
 +
 + return unlocked;
 +}
 +
 +static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma
 *dma,
 + dma_addr_t iova, size_t *size)
 +{
 + dma_addr_t start = iova, end = iova + *size;
 + long unlocked = 0;
 +
 + while (iova  end) {
 + size_t unmapped;
 + phys_addr_t phys;
 +
   /*
 -  * Only add actual locked pages to accounting
 -  * XXX We're effectively marking a page locked for every
 -  * IOVA page even though it's possible the user could be
 -  * backing multiple IOVAs with the same vaddr.  This over-
 -  * penalizes the user process, but we currently have no
 -  * easy way to do this properly.
 +  * We use the IOMMU to track the physical address.  This
 +  * saves us from having a lot more entries in our mapping
 +  * tree.  The downside is that we don't track the size
 +  * used to do the mapping.  We request unmap of a single
 +  * page, but expect IOMMUs that support large pages to
 +  * unmap a larger chunk.
*/
 - if (!is_invalid_reserved_pfn(pfn))
 - locked++;
 -
 - ret = iommu_map(iommu-domain, iova,
 - (phys_addr_t)pfn  PAGE_SHIFT,
 - PAGE_SIZE, prot);
 - if (ret) {
 - /* Back out mappings on error 

Re: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1

2013-05-25 Thread Konrad Rzeszutek Wilk
 + * Turns out AMD IOMMU has a page table bug where it won't map large pages
 + * to a region that previously mapped smaller pages.  This should be fixed
 + * soon, so this is just a temporary workaround to break mappings down into
 + * PAGE_SIZE.  Better to map smaller pages than nothing.
 + */
 +static int map_try_harder(struct vfio_iommu *iommu, dma_addr_t iova,
 +   unsigned long pfn, long npage, int prot)
 +{
 + long i;
 + int ret;
 +
 + for (i = 0; i  npage; i++, pfn++, iova += PAGE_SIZE) {
 + ret = iommu_map(iommu-domain, iova,
 + (phys_addr_t)pfn  PAGE_SHIFT,
 + PAGE_SIZE, prot);
 + if (ret)
 + break;
 + }
 +
 + for (; i  npage  i  0; i--, iova -= PAGE_SIZE)
 + iommu_unmap(iommu-domain, iova, PAGE_SIZE);
 +
   return ret;
  }

This looks to belong to a vfio-quirk file (a something else) that deals with
various IOMMU's quirks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu