-Original Message-
From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
Sent: Friday, May 24, 2013 10:55 PM
To: alex.william...@redhat.com
Cc: iommu@lists.linux-foundation.org; chegu_vi...@hp.com; qemu-
de...@nongnu.org; k...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1
We currently send all mappings to the iommu in PAGE_SIZE chunks, which
prevents the iommu from enabling support for larger page sizes.
We still need to pin pages, which means we step through them in PAGE_SIZE
chunks, but we can batch up contiguous physical memory chunks to allow
the iommu the opportunity to use larger pages. The approach here is a
bit different that the one currently used for legacy KVM device
assignment. Rather than looking at the vma page size and using that as
the maximum size to pass to the iommu, we instead simply look at whether
the next page is physically contiguous. This means we might ask the
iommu to map a 4MB region, while legacy KVM might limit itself to a
[Sethi Varun-B16395] Wouldn't this depend on the IOMMU page alignment
constraints?
maximum of 2MB.
Splitting our mapping path also allows us to be smarter about locked
memory because we can more easily unwind if the user attempts to exceed
the limit. Therefore, rather than assuming that a mapping will result in
locked memory, we test each page as it is pinned to determine whether it
locks RAM vs an mmap'd MMIO region. This should result in better locking
granularity and less locked page fudge factors in userspace.
The unmap path uses the same algorithm as legacy KVM. We don't want to
track the pfn for each mapping ourselves, but we need the pfn in order to
unpin pages. We therefore ask the iommu for the iova to physical address
translation, ask it to unpin a page, and see how many pages were actually
unpinned. iommus supporting large pages will often return something
bigger than a page here, which we know will be physically contiguous and
we can unpin a batch of pfns. iommus not supporting large mappings won't
see an improvement in batching here as they only unmap a page at a time.
With this change, we also make a clarification to the API for mapping and
unmapping DMA. We can only guarantee unmaps at the same granularity as
used for the original mapping. In other words, unmapping a subregion of
a previous mapping is not guaranteed and may result in a larger or
smaller unmapping than requested. The size field in the unmapping
structure is updated to reflect this.
Previously this was unmodified on mapping, always returning the the
requested unmap size. This is now updated to return the actual unmap
size on success, allowing userspace to appropriately track mappings.
[Sethi Varun-B16395] The main problem here is that the user space application
is oblivious of the physical memory contiguity, right?
Signed-off-by: Alex Williamson alex.william...@redhat.com
---
drivers/vfio/vfio_iommu_type1.c | 523 +
--
+static long vfio_unpin_pages(unsigned long pfn, long npage,
+ int prot, bool do_accounting)
+{
+ unsigned long unlocked = 0;
+ long i;
+
+ for (i = 0; i npage; i++)
+ unlocked += put_pfn(pfn++, prot);
+
+ if (do_accounting)
+ vfio_lock_acct(-unlocked);
+
+ return unlocked;
+}
+
+static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma
*dma,
+ dma_addr_t iova, size_t *size)
+{
+ dma_addr_t start = iova, end = iova + *size;
+ long unlocked = 0;
+
+ while (iova end) {
+ size_t unmapped;
+ phys_addr_t phys;
+
/*
- * Only add actual locked pages to accounting
- * XXX We're effectively marking a page locked for every
- * IOVA page even though it's possible the user could be
- * backing multiple IOVAs with the same vaddr. This over-
- * penalizes the user process, but we currently have no
- * easy way to do this properly.
+ * We use the IOMMU to track the physical address. This
+ * saves us from having a lot more entries in our mapping
+ * tree. The downside is that we don't track the size
+ * used to do the mapping. We request unmap of a single
+ * page, but expect IOMMUs that support large pages to
+ * unmap a larger chunk.
*/
- if (!is_invalid_reserved_pfn(pfn))
- locked++;
-
- ret = iommu_map(iommu-domain, iova,
- (phys_addr_t)pfn PAGE_SHIFT,
- PAGE_SIZE, prot);
- if (ret) {
- /* Back out mappings on error