RE: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org [mailto:iommu- > boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson > Sent: Friday, May 24, 2013 10:55 PM > To: alex.william...@redhat.com > Cc: io...@lists.linux-foundation.org; chegu_vi...@hp.com; qemu- > de...@nongnu.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1 > > We currently send all mappings to the iommu in PAGE_SIZE chunks, which > prevents the iommu from enabling support for larger page sizes. > We still need to pin pages, which means we step through them in PAGE_SIZE > chunks, but we can batch up contiguous physical memory chunks to allow > the iommu the opportunity to use larger pages. The approach here is a > bit different that the one currently used for legacy KVM device > assignment. Rather than looking at the vma page size and using that as > the maximum size to pass to the iommu, we instead simply look at whether > the next page is physically contiguous. This means we might ask the > iommu to map a 4MB region, while legacy KVM might limit itself to a [Sethi Varun-B16395] Wouldn't this depend on the IOMMU page alignment constraints? > maximum of 2MB. > > Splitting our mapping path also allows us to be smarter about locked > memory because we can more easily unwind if the user attempts to exceed > the limit. Therefore, rather than assuming that a mapping will result in > locked memory, we test each page as it is pinned to determine whether it > locks RAM vs an mmap'd MMIO region. This should result in better locking > granularity and less locked page fudge factors in userspace. > > The unmap path uses the same algorithm as legacy KVM. We don't want to > track the pfn for each mapping ourselves, but we need the pfn in order to > unpin pages. We therefore ask the iommu for the iova to physical address > translation, ask it to unpin a page, and see how many pages were actually > unpinned. iommus supporting large pages will often return something > bigger than a page here, which we know will be physically contiguous and > we can unpin a batch of pfns. iommus not supporting large mappings won't > see an improvement in batching here as they only unmap a page at a time. > > With this change, we also make a clarification to the API for mapping and > unmapping DMA. We can only guarantee unmaps at the same granularity as > used for the original mapping. In other words, unmapping a subregion of > a previous mapping is not guaranteed and may result in a larger or > smaller unmapping than requested. The size field in the unmapping > structure is updated to reflect this. > Previously this was unmodified on mapping, always returning the the > requested unmap size. This is now updated to return the actual unmap > size on success, allowing userspace to appropriately track mappings. > [Sethi Varun-B16395] The main problem here is that the user space application is oblivious of the physical memory contiguity, right? > Signed-off-by: Alex Williamson > --- > drivers/vfio/vfio_iommu_type1.c | 523 + > -- > +static long vfio_unpin_pages(unsigned long pfn, long npage, > + int prot, bool do_accounting) > +{ > + unsigned long unlocked = 0; > + long i; > + > + for (i = 0; i < npage; i++) > + unlocked += put_pfn(pfn++, prot); > + > + if (do_accounting) > + vfio_lock_acct(-unlocked); > + > + return unlocked; > +} > + > +static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma > *dma, > + dma_addr_t iova, size_t *size) > +{ > + dma_addr_t start = iova, end = iova + *size; > + long unlocked = 0; > + > + while (iova < end) { > + size_t unmapped; > + phys_addr_t phys; > + > /* > - * Only add actual locked pages to accounting > - * XXX We're effectively marking a page locked for every > - * IOVA page even though it's possible the user could be > - * backing multiple IOVAs with the same vaddr. This over- > - * penalizes the user process, but we currently have no > - * easy way to do this properly. > + * We use the IOMMU to track the physical address. This > + * saves us from having a lot more entries in our mapping > + * tree. The downside is that we don't track the size > + * used to do the mapping. We request unmap of a single > + * page, but expect IOMMUs that support large pages to > +
RE: [PATCH] vfio powerpc: enabled and supported on powernv platform
> + /* Handle a single page request without allocation > > + of pages-to-release array */ > > + if (pages == 1) { > > + spin_lock(&(pool->lock)); > > + page = free_tce(tbl, entry); > > + > > + if (direction != DMA_NONE) > > + ret = put_tce(tbl, entry, tce, direction); > > + > > + tce_flush(tbl); > > + > > + if (page) > > + put_page(page); > > + > > + spin_unlock(&(pool->lock)); > > + return ret; > > + } > > + > > + /* Releasing multiple pages */ > > + /* Allocate an array for pages to be released after TCE table > > + is updated */ > > + oldpages = kmalloc(PAGE_SIZE, GFP_KERNEL); > > + if (!oldpages) > > + return -ENOMEM; > > + > > + spin_lock(&(pool->lock)); > > + > > + for (i = 0; (i < pages) && !ret; ++i, ++entry, tce += > IOMMU_PAGE_SIZE) { > > + page = free_tce(tbl, entry); > > + if (page) { > > + oldpages[pages_to_put] = page; > > + ++pages_to_put; > > + } > > + > > + if (direction != DMA_NONE) > > + ret = put_tce(tbl, entry, tce, direction); > > + > > + /* Release old pages if we reached the end of oldpages[] or > > + it is the last page or we are about to exit the loop */ > > + if ((pages_to_put == oldpagesnum) || (i == pages - 1) || ret) > { > > + tce_flush(tbl); > > Avoiding tce_flush() is the reason for all this extra overhead, right? > I wonder if it'd be cleaner separating map vs unmap, where the map case > can avoid the oldpages array... but that means inserting new mappings on > top of old ones wouldn't put the pages. > > > + > > + /* Release pages after removing them from TCE table */ > > + while (pages_to_put) { > > + --pages_to_put; > > + put_page(oldpages[pages_to_put]); > > + } > > + } > > + } > > + > > + spin_unlock(&(pool->lock)); > > + kfree(oldpages); > > + > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(iommu_put_tces); > > +#endif /* CONFIG_IOMMU_API */ > > diff --git a/arch/powerpc/platforms/powernv/pci.c > > b/arch/powerpc/platforms/powernv/pci.c > > index 05205cf..676f4d9 100644 > > --- a/arch/powerpc/platforms/powernv/pci.c > > +++ b/arch/powerpc/platforms/powernv/pci.c > > @@ -8,30 +8,31 @@ > > * This program is free software; you can redistribute it and/or > > * modify it under the terms of the GNU General Public License > > * as published by the Free Software Foundation; either version > > * 2 of the License, or (at your option) any later version. > > */ > > > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > +#include > > > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > > > #include "powernv.h" > > #include "pci.h" > > @@ -601,15 +602,149 @@ void __init pnv_pci_init(void) > > /* Configure IOMMU DMA hooks */ > > ppc_md.pci_dma_dev_setup = pnv_pci_dma_dev_setup; > > ppc_md.tce_build = pnv_tce_build; > > ppc_md.tce_free = pnv_tce_free; > > ppc_md.tce_get = pnv_tce_get; > > ppc_md.pci_probe_mode = pnv_pci_probe_mode; > > set_pci_dma_ops(&dma_iommu_ops); > > > > /* Configure MSIs */ > > #ifdef CONFIG_PCI_MSI > > ppc_md.msi_check_device = pnv_msi_check_device; > > ppc_md.setup_msi_irqs = pnv_setup_msi_irqs; > > ppc_md.teardown_msi_irqs = pnv_teardown_msi_irqs; #endif } > > + > > +#ifdef CONFIG_IOMMU_API > > +/* > > + * IOMMU groups support required by VFIO */ static int > > +add_device(struct device *dev) { > > + struct iommu_table *tbl; > > + int ret = 0; > > + > > + if (WARN_ON(dev->iommu_group)) { > > + printk(KERN_WARNING "tce_vfio: device %s is already in iommu > group %d, skipping\n", > > + dev->kobj.name, > > dev_name(dev) > > > +
RE: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of select privileged instructions.
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Avi Kivity > Sent: Thursday, November 01, 2012 8:54 PM > To: Sanjay Lal > Cc: kvm@vger.kernel.org; linux-m...@linux-mips.org > Subject: Re: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of > select privileged instructions. > > On 10/31/2012 05:19 PM, Sanjay Lal wrote: > > Currently, the following instructions are translated: > > - CACHE (indexed) > > - CACHE (va based): translated to a synci, overkill on D-CACHE > operations, but still much faster than a trap. > > - mfc0/mtc0: the virtual COP0 registers for the guest are implemented > as 2-D array > > [COP#][SEL] and this is mapped into the guest kernel address space @ > VA 0x0. > > mfc0/mtc0 operations are transformed to load/stores. > > > > Seems to be more of binary patching, yes? Binary translation usually > involves hiding the translated code so the guest is not able to detect > that it is patched. > Typically, a dynamic binary translation solution should also involve a mechanism to trace the guest access to the modified pages. I don't think that support is present as a part of the patch set. Do you plan to implement it? -Varun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> -Original Message- > From: Roedel, Joerg [mailto:joerg.roe...@amd.com] > Sent: Monday, September 12, 2011 6:06 PM > To: Sethi Varun-B16395 > Cc: Joerg Roedel; Greg KH; io...@lists.linux-foundation.org; Alex > Williamson; Ohad Ben-Cohen; David Woodhouse; David Brown; > kvm@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type > > On Mon, Sep 12, 2011 at 08:08:41AM -0400, Sethi Varun-B16395 wrote: > > > The IOMMUs are usually devices on the bus itself, so they are > > > initialized after the bus is set up and the devices on it are > > > populated. So the function can not be called on bus initialization > > > because the IOMMU is not ready at this point. > > Well, at what point would the add_device_group (referring to patch set > posted by Alex) call back be invoked? > > The details are up to Alex Williamson. One option is to register a > notifier for the bus in the iommu_bus_init() function and react to its > notifications. > I think in the end we will have a number of additional call-backs in the > iommu_ops which are called by the notifier (or from the driver-core > directly) to handle actions like added or removed devices. All the > infrastructure for that which is implemented in the iommu-drivers today > will then be in the iommu-core code. I am not sure If I understand this, but as per your earlier statement iommu is a device on the bus and its initialization would happen when bus is set up and devices are populated. So, when would device notifier call an iommu call back? -Varun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Joerg Roedel > Sent: Thursday, September 08, 2011 12:49 AM > To: Greg KH > Cc: Joerg Roedel; io...@lists.linux-foundation.org; Alex Williamson; Ohad > Ben-Cohen; David Woodhouse; David Brown; kvm@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type > > Hi Greg, > > the bus_set_iommu() function will be called by the IOMMU driver. There > can be different drivers for the same bus, depending on the hardware. On > PCI for example, there can be the Intel or the AMD IOMMU driver that > implement the iommu-api and that register for that bus. > > On Wed, Sep 07, 2011 at 11:47:50AM -0700, Greg KH wrote: > > > +#ifdef CONFIG_IOMMU_API > > > +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops) { > > > + if (bus->iommu_ops != NULL) > > > + return -EBUSY; > > > > Busy? > > Yes, it signals to the IOMMU driver that another driver has already > registered for that bus. In the previous register_iommu() interface this > was just a BUG(), but I think returning an error to the caller is better. > It can be turned back into a BUG() if it is considered better, though. > > > > + > > > + bus->iommu_ops = ops; > > > + > > > + /* Do IOMMU specific setup for this bus-type */ > > > + iommu_bus_init(bus, ops); > > > + > > > + return 0; > > > +} > > > +EXPORT_SYMBOL_GPL(bus_set_iommu); > > > > I don't understand what this function is for, and who would call it. > > It is called by the IOMMU driver. > > > Please provide kerneldoc that explains this. > > Will do. > > > > @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *, > struct bus_attribute *); > > > * @resume: Called to bring a device on this bus out of sleep mode. > > > * @pm: Power management operations of this bus, callback > the specific > > > * device driver's pm-ops. > > > + * @iommu_ops IOMMU specific operations for this bus, used to > attach IOMMU > > > + * driver implementations to a bus and allow the driver > to do > > > + * bus-specific setup > > > > So why is this just not set by the bus itself, making the above > > function not needed at all? > > The IOMMUs are usually devices on the bus itself, so they are initialized > after the bus is set up and the devices on it are populated. So the > function can not be called on bus initialization because the IOMMU is not > ready at this point. Well, at what point would the add_device_group (referring to patch set posted by Alex) call back be invoked? -Varun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 03/10] iommu/core: Add bus_type parameter to iommu_domain_alloc
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Joerg Roedel > Sent: Wednesday, September 07, 2011 9:12 PM > To: io...@lists.linux-foundation.org > Cc: Greg Kroah-Hartman; Alex Williamson; Ohad Ben-Cohen; David Woodhouse; > David Brown; j...@8bytes.org; kvm@vger.kernel.org; linux- > ker...@vger.kernel.org; Joerg Roedel > Subject: [PATCH 03/10] iommu/core: Add bus_type parameter to > iommu_domain_alloc > > This is necessary to store a pointer to the bus-specific iommu_ops in the > iommu-domain structure. It will be used later to call into bus-specific > iommu-ops. > > Signed-off-by: Joerg Roedel > --- > drivers/iommu/iommu.c | 14 +- > drivers/media/video/omap3isp/isp.c |2 +- > include/linux/iommu.h |6 -- > virt/kvm/iommu.c |2 +- > 4 files changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index > 3b24a5b..adaee9b 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -16,6 +16,7 @@ > * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 > USA > */ > > +#include > #include > #include > #include > @@ -44,15 +45,26 @@ bool iommu_found(void) } > EXPORT_SYMBOL_GPL(iommu_found); > > -struct iommu_domain *iommu_domain_alloc(void) > +struct iommu_domain *iommu_domain_alloc(struct bus_type *bus) > { > struct iommu_domain *domain; > + struct iommu_ops *ops; > int ret; > > + if (bus->iommu_ops) > + ops = bus->iommu_ops; > + else > + ops = iommu_ops; > + > + if (ops == NULL) > + return NULL; > + > domain = kmalloc(sizeof(*domain), GFP_KERNEL); > if (!domain) > return NULL; > > + domain->ops = ops; > + > ret = iommu_ops->domain_init(domain); > if (ret) > goto out_free; > diff --git a/drivers/media/video/omap3isp/isp.c > b/drivers/media/video/omap3isp/isp.c > index a4baa61..a7ed985 100644 > --- a/drivers/media/video/omap3isp/isp.c > +++ b/drivers/media/video/omap3isp/isp.c > @@ -2141,7 +2141,7 @@ static int isp_probe(struct platform_device *pdev) > /* to be removed once iommu migration is complete */ > isp->iommu = to_iommu(isp->iommu_dev); > > - isp->domain = iommu_domain_alloc(); > + isp->domain = iommu_domain_alloc(pdev->dev.bus); > if (!isp->domain) { > dev_err(isp->dev, "can't alloc iommu domain\n"); > ret = -ENOMEM; > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index > 4739e36..3bd6892 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -25,10 +25,12 @@ > #define IOMMU_WRITE (2) > #define IOMMU_CACHE (4) /* DMA cache coherency */ > > +struct iommu_ops; > struct bus_type; > struct device; > > struct iommu_domain { > + struct iommu_ops *ops; > void *priv; > }; > > @@ -55,7 +57,7 @@ struct iommu_ops { > extern void register_iommu(struct iommu_ops *ops); extern void > iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops); extern bool > iommu_found(void); -extern struct iommu_domain *iommu_domain_alloc(void); > +extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus); > extern void iommu_domain_free(struct iommu_domain *domain); extern int > iommu_attach_device(struct iommu_domain *domain, > struct device *dev); > @@ -79,7 +81,7 @@ static inline bool iommu_found(void) > return false; > } > > -static inline struct iommu_domain *iommu_domain_alloc(void) > +static inline struct iommu_domain *iommu_domain_alloc(struct bus_type > +*bus) > { > return NULL; > } > diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 78c80f6..20115b1 > 100644 > --- a/virt/kvm/iommu.c > +++ b/virt/kvm/iommu.c > @@ -233,7 +233,7 @@ int kvm_iommu_map_guest(struct kvm *kvm) > return -ENODEV; > } > > - kvm->arch.iommu_domain = iommu_domain_alloc(); > + kvm->arch.iommu_domain = iommu_domain_alloc(&pci_bus_type); Although it might require changes starting all the way from the qemu interface, but it would certainly be nice if this could be made more extendable/generic in terms of the bus_type usage. This interface would be used by us (Freescale )for direct assignment of SOC devices sitting on the platform bus. -Varun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of Joerg Roedel > Sent: Thursday, September 08, 2011 2:12 PM > To: Greg KH > Cc: Joerg Roedel; io...@lists.linux-foundation.org; Alex Williamson; Ohad > Ben-Cohen; David Woodhouse; David Brown; kvm@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type > > On Wed, Sep 07, 2011 at 12:44:45PM -0700, Greg KH wrote: > > On Wed, Sep 07, 2011 at 09:19:19PM +0200, Joerg Roedel wrote: > > > Hi Greg, > > > > > > the bus_set_iommu() function will be called by the IOMMU driver. > > > There can be different drivers for the same bus, depending on the > > > hardware. On PCI for example, there can be the Intel or the AMD > > > IOMMU driver that implement the iommu-api and that register for that > bus. > > > > Why are you pushing this down into the driver core? What other busses > > becides PCI use/need this? > > Currently it is the platform_bus besides pci. The pci iommus are on x86 > and ia64 while all arm iommus use the platform_bus (by 'iommus' I only > mean those implementing the iommu-api). Currently there are two drivers > for arm iommus in /drivers/iommu. > We too have an IOMMU on Freescale Socs (Power core based) sitting on the platform bus and are looking at integrating it with the linux IOMMU API. -Varun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html