RE: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1

2013-05-27 Thread Sethi Varun-B16395


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of Alex Williamson
> Sent: Friday, May 24, 2013 10:55 PM
> To: alex.william...@redhat.com
> Cc: io...@lists.linux-foundation.org; chegu_vi...@hp.com; qemu-
> de...@nongnu.org; kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: [PATCH 2/2] vfio: hugepage support for vfio_iommu_type1
> 
> We currently send all mappings to the iommu in PAGE_SIZE chunks, which
> prevents the iommu from enabling support for larger page sizes.
> We still need to pin pages, which means we step through them in PAGE_SIZE
> chunks, but we can batch up contiguous physical memory chunks to allow
> the iommu the opportunity to use larger pages.  The approach here is a
> bit different that the one currently used for legacy KVM device
> assignment.  Rather than looking at the vma page size and using that as
> the maximum size to pass to the iommu, we instead simply look at whether
> the next page is physically contiguous.  This means we might ask the
> iommu to map a 4MB region, while legacy KVM might limit itself to a

[Sethi Varun-B16395] Wouldn't this depend on the IOMMU page alignment 
constraints?

> maximum of 2MB.
> 
> Splitting our mapping path also allows us to be smarter about locked
> memory because we can more easily unwind if the user attempts to exceed
> the limit.  Therefore, rather than assuming that a mapping will result in
> locked memory, we test each page as it is pinned to determine whether it
> locks RAM vs an mmap'd MMIO region.  This should result in better locking
> granularity and less locked page fudge factors in userspace.
> 
> The unmap path uses the same algorithm as legacy KVM.  We don't want to
> track the pfn for each mapping ourselves, but we need the pfn in order to
> unpin pages.  We therefore ask the iommu for the iova to physical address
> translation, ask it to unpin a page, and see how many pages were actually
> unpinned.  iommus supporting large pages will often return something
> bigger than a page here, which we know will be physically contiguous and
> we can unpin a batch of pfns.  iommus not supporting large mappings won't
> see an improvement in batching here as they only unmap a page at a time.
> 
> With this change, we also make a clarification to the API for mapping and
> unmapping DMA.  We can only guarantee unmaps at the same granularity as
> used for the original mapping.  In other words, unmapping a subregion of
> a previous mapping is not guaranteed and may result in a larger or
> smaller unmapping than requested.  The size field in the unmapping
> structure is updated to reflect this.
> Previously this was unmodified on mapping, always returning the the
> requested unmap size.  This is now updated to return the actual unmap
> size on success, allowing userspace to appropriately track mappings.
> 
[Sethi Varun-B16395] The main problem here is that the user space application 
is oblivious of the physical memory contiguity, right?


> Signed-off-by: Alex Williamson 
> ---
>  drivers/vfio/vfio_iommu_type1.c |  523 +
> --
> +static long vfio_unpin_pages(unsigned long pfn, long npage,
> +  int prot, bool do_accounting)
> +{
> + unsigned long unlocked = 0;
> + long i;
> +
> + for (i = 0; i < npage; i++)
> + unlocked += put_pfn(pfn++, prot);
> +
> + if (do_accounting)
> + vfio_lock_acct(-unlocked);
> +
> + return unlocked;
> +}
> +
> +static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma
> *dma,
> + dma_addr_t iova, size_t *size)
> +{
> + dma_addr_t start = iova, end = iova + *size;
> + long unlocked = 0;
> +
> + while (iova < end) {
> + size_t unmapped;
> + phys_addr_t phys;
> +
>   /*
> -  * Only add actual locked pages to accounting
> -  * XXX We're effectively marking a page locked for every
> -  * IOVA page even though it's possible the user could be
> -  * backing multiple IOVAs with the same vaddr.  This over-
> -  * penalizes the user process, but we currently have no
> -  * easy way to do this properly.
> +  * We use the IOMMU to track the physical address.  This
> +  * saves us from having a lot more entries in our mapping
> +  * tree.  The downside is that we don't track the size
> +  * used to do the mapping.  We request unmap of a single
> +  * page, but expect IOMMUs that support large pages to
> +

RE: [PATCH] vfio powerpc: enabled and supported on powernv platform

2012-11-22 Thread Sethi Varun-B16395
> +   /* Handle a single page request without allocation
> > +  of pages-to-release array */
> > +   if (pages == 1) {
> > +   spin_lock(&(pool->lock));
> > +   page = free_tce(tbl, entry);
> > +
> > +   if (direction != DMA_NONE)
> > +   ret = put_tce(tbl, entry, tce, direction);
> > +
> > +   tce_flush(tbl);
> > +
> > +   if (page)
> > +   put_page(page);
> > +
> > +   spin_unlock(&(pool->lock));
> > +   return ret;
> > +   }
> > +
> > +   /* Releasing multiple pages */
> > +   /* Allocate an array for pages to be released after TCE table
> > +  is updated */
> > +   oldpages = kmalloc(PAGE_SIZE, GFP_KERNEL);
> > +   if (!oldpages)
> > +   return -ENOMEM;
> > +
> > +   spin_lock(&(pool->lock));
> > +
> > +   for (i = 0; (i < pages) && !ret; ++i, ++entry, tce +=
> IOMMU_PAGE_SIZE) {
> > +   page = free_tce(tbl, entry);
> > +   if (page) {
> > +   oldpages[pages_to_put] = page;
> > +   ++pages_to_put;
> > +   }
> > +
> > +   if (direction != DMA_NONE)
> > +   ret = put_tce(tbl, entry, tce, direction);
> > +
> > +   /* Release old pages if we reached the end of oldpages[] or
> > +  it is the last page or we are about to exit the loop */
> > +   if ((pages_to_put == oldpagesnum) || (i == pages - 1) || ret)
> {
> > +   tce_flush(tbl);
> 
> Avoiding tce_flush() is the reason for all this extra overhead, right?
> I wonder if it'd be cleaner separating map vs unmap, where the map case
> can avoid the oldpages array... but that means inserting new mappings on
> top of old ones wouldn't put the pages.
> 
> > +
> > +   /* Release pages after removing them from TCE table */
> > +   while (pages_to_put) {
> > +   --pages_to_put;
> > +   put_page(oldpages[pages_to_put]);
> > +   }
> > +   }
> > +   }
> > +
> > +   spin_unlock(&(pool->lock));
> > +   kfree(oldpages);
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_put_tces);
> > +#endif /* CONFIG_IOMMU_API */
> > diff --git a/arch/powerpc/platforms/powernv/pci.c
> > b/arch/powerpc/platforms/powernv/pci.c
> > index 05205cf..676f4d9 100644
> > --- a/arch/powerpc/platforms/powernv/pci.c
> > +++ b/arch/powerpc/platforms/powernv/pci.c
> > @@ -8,30 +8,31 @@
> >   * This program is free software; you can redistribute it and/or
> >   * modify it under the terms of the GNU General Public License
> >   * as published by the Free Software Foundation; either version
> >   * 2 of the License, or (at your option) any later version.
> >   */
> >
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >  #include 
> >
> >  #include "powernv.h"
> >  #include "pci.h"
> > @@ -601,15 +602,149 @@ void __init pnv_pci_init(void)
> > /* Configure IOMMU DMA hooks */
> > ppc_md.pci_dma_dev_setup = pnv_pci_dma_dev_setup;
> > ppc_md.tce_build = pnv_tce_build;
> > ppc_md.tce_free = pnv_tce_free;
> > ppc_md.tce_get = pnv_tce_get;
> > ppc_md.pci_probe_mode = pnv_pci_probe_mode;
> > set_pci_dma_ops(&dma_iommu_ops);
> >
> > /* Configure MSIs */
> >  #ifdef CONFIG_PCI_MSI
> > ppc_md.msi_check_device = pnv_msi_check_device;
> > ppc_md.setup_msi_irqs = pnv_setup_msi_irqs;
> > ppc_md.teardown_msi_irqs = pnv_teardown_msi_irqs;  #endif  }
> > +
> > +#ifdef CONFIG_IOMMU_API
> > +/*
> > + * IOMMU groups support required by VFIO  */ static int
> > +add_device(struct device *dev) {
> > +   struct iommu_table *tbl;
> > +   int ret = 0;
> > +
> > +   if (WARN_ON(dev->iommu_group)) {
> > +   printk(KERN_WARNING "tce_vfio: device %s is already in iommu
> group %d, skipping\n",
> > +   dev->kobj.name,
> 
> dev_name(dev)
> 
> > +   

RE: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of select privileged instructions.

2012-11-01 Thread Sethi Varun-B16395


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Avi Kivity
> Sent: Thursday, November 01, 2012 8:54 PM
> To: Sanjay Lal
> Cc: kvm@vger.kernel.org; linux-m...@linux-mips.org
> Subject: Re: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of
> select privileged instructions.
> 
> On 10/31/2012 05:19 PM, Sanjay Lal wrote:
> > Currently, the following instructions are translated:
> > - CACHE (indexed)
> > - CACHE (va based): translated to a synci, overkill on D-CACHE
> operations, but still much faster than a trap.
> > - mfc0/mtc0: the virtual COP0 registers for the guest are implemented
> as 2-D array
> >   [COP#][SEL] and this is mapped into the guest kernel address space @
> VA 0x0.
> >   mfc0/mtc0 operations are transformed to load/stores.
> >
> 
> Seems to be more of binary patching, yes?  Binary translation usually
> involves hiding the translated code so the guest is not able to detect
> that it is patched.
> 
Typically, a dynamic binary translation solution should also involve a 
mechanism to trace the guest access to the modified pages. I don't think that 
support is present as a part
of the patch set. Do you plan to implement it?

-Varun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-15 Thread Sethi Varun-B16395


> -Original Message-
> From: Roedel, Joerg [mailto:joerg.roe...@amd.com]
> Sent: Monday, September 12, 2011 6:06 PM
> To: Sethi Varun-B16395
> Cc: Joerg Roedel; Greg KH; io...@lists.linux-foundation.org; Alex
> Williamson; Ohad Ben-Cohen; David Woodhouse; David Brown;
> kvm@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> 
> On Mon, Sep 12, 2011 at 08:08:41AM -0400, Sethi Varun-B16395 wrote:
> > > The IOMMUs are usually devices on the bus itself, so they are
> > > initialized after the bus is set up and the devices on it are
> > > populated.  So the function can not be called on bus initialization
> > > because the IOMMU is not ready at this point.
> > Well, at what point would the add_device_group (referring to patch set
> posted by Alex) call back be invoked?
> 
> The details are up to Alex Williamson. One option is to register a
> notifier for the bus in the iommu_bus_init() function and react to its
> notifications.
> I think in the end we will have a number of additional call-backs in the
> iommu_ops which are called by the notifier (or from the driver-core
> directly) to handle actions like added or removed devices. All the
> infrastructure for that which is implemented in the iommu-drivers today
> will then be in the iommu-core code.
I am not sure If I understand this, but as per your earlier statement iommu is 
a device on the bus
and its initialization would happen when bus is set up and devices are 
populated. So, when would device
notifier call an iommu call back?

-Varun
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-12 Thread Sethi Varun-B16395


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Joerg Roedel
> Sent: Thursday, September 08, 2011 12:49 AM
> To: Greg KH
> Cc: Joerg Roedel; io...@lists.linux-foundation.org; Alex Williamson; Ohad
> Ben-Cohen; David Woodhouse; David Brown; kvm@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> 
> Hi Greg,
> 
> the bus_set_iommu() function will be called by the IOMMU driver. There
> can be different drivers for the same bus, depending on the hardware. On
> PCI for example, there can be the Intel or the AMD IOMMU driver that
> implement the iommu-api and that register for that bus.
> 
> On Wed, Sep 07, 2011 at 11:47:50AM -0700, Greg KH wrote:
> > > +#ifdef CONFIG_IOMMU_API
> > > +int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops) {
> > > + if (bus->iommu_ops != NULL)
> > > + return -EBUSY;
> >
> > Busy?
> 
> Yes, it signals to the IOMMU driver that another driver has already
> registered for that bus. In the previous register_iommu() interface this
> was just a BUG(), but I think returning an error to the caller is better.
> It can be turned back into a BUG() if it is considered better, though.
> 
> > > +
> > > + bus->iommu_ops = ops;
> > > +
> > > + /* Do IOMMU specific setup for this bus-type */
> > > + iommu_bus_init(bus, ops);
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(bus_set_iommu);
> >
> > I don't understand what this function is for, and who would call it.
> 
> It is called by the IOMMU driver.
> 
> > Please provide kerneldoc that explains this.
> 
> Will do.
> 
> > > @@ -67,6 +68,9 @@ extern void bus_remove_file(struct bus_type *,
> struct bus_attribute *);
> > >   * @resume:  Called to bring a device on this bus out of sleep mode.
> > >   * @pm:  Power management operations of this bus, callback
> the specific
> > >   *   device driver's pm-ops.
> > > + * @iommu_ops   IOMMU specific operations for this bus, used to
> attach IOMMU
> > > + *  driver implementations to a bus and allow the driver
> to do
> > > + *  bus-specific setup
> >
> > So why is this just not set by the bus itself, making the above
> > function not needed at all?
> 
> The IOMMUs are usually devices on the bus itself, so they are initialized
> after the bus is set up and the devices on it are populated.  So the
> function can not be called on bus initialization because the IOMMU is not
> ready at this point.
Well, at what point would the add_device_group (referring to patch set posted 
by Alex) call back be invoked?

-Varun


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 03/10] iommu/core: Add bus_type parameter to iommu_domain_alloc

2011-09-12 Thread Sethi Varun-B16395


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Joerg Roedel
> Sent: Wednesday, September 07, 2011 9:12 PM
> To: io...@lists.linux-foundation.org
> Cc: Greg Kroah-Hartman; Alex Williamson; Ohad Ben-Cohen; David Woodhouse;
> David Brown; j...@8bytes.org; kvm@vger.kernel.org; linux-
> ker...@vger.kernel.org; Joerg Roedel
> Subject: [PATCH 03/10] iommu/core: Add bus_type parameter to
> iommu_domain_alloc
> 
> This is necessary to store a pointer to the bus-specific iommu_ops in the
> iommu-domain structure. It will be used later to call into bus-specific
> iommu-ops.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/iommu.c  |   14 +-
>  drivers/media/video/omap3isp/isp.c |2 +-
>  include/linux/iommu.h  |6 --
>  virt/kvm/iommu.c   |2 +-
>  4 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> 3b24a5b..adaee9b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -16,6 +16,7 @@
>   * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307
> USA
>   */
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -44,15 +45,26 @@ bool iommu_found(void)  }
> EXPORT_SYMBOL_GPL(iommu_found);
> 
> -struct iommu_domain *iommu_domain_alloc(void)
> +struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)
>  {
>   struct iommu_domain *domain;
> + struct iommu_ops *ops;
>   int ret;
> 
> + if (bus->iommu_ops)
> + ops = bus->iommu_ops;
> + else
> + ops = iommu_ops;
> +
> + if (ops == NULL)
> + return NULL;
> +
>   domain = kmalloc(sizeof(*domain), GFP_KERNEL);
>   if (!domain)
>   return NULL;
> 
> + domain->ops = ops;
> +
>   ret = iommu_ops->domain_init(domain);
>   if (ret)
>   goto out_free;
> diff --git a/drivers/media/video/omap3isp/isp.c
> b/drivers/media/video/omap3isp/isp.c
> index a4baa61..a7ed985 100644
> --- a/drivers/media/video/omap3isp/isp.c
> +++ b/drivers/media/video/omap3isp/isp.c
> @@ -2141,7 +2141,7 @@ static int isp_probe(struct platform_device *pdev)
>   /* to be removed once iommu migration is complete */
>   isp->iommu = to_iommu(isp->iommu_dev);
> 
> - isp->domain = iommu_domain_alloc();
> + isp->domain = iommu_domain_alloc(pdev->dev.bus);
>   if (!isp->domain) {
>   dev_err(isp->dev, "can't alloc iommu domain\n");
>   ret = -ENOMEM;
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> 4739e36..3bd6892 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,10 +25,12 @@
>  #define IOMMU_WRITE  (2)
>  #define IOMMU_CACHE  (4) /* DMA cache coherency */
> 
> +struct iommu_ops;
>  struct bus_type;
>  struct device;
> 
>  struct iommu_domain {
> + struct iommu_ops *ops;
>   void *priv;
>  };
> 
> @@ -55,7 +57,7 @@ struct iommu_ops {
>  extern void register_iommu(struct iommu_ops *ops);  extern void
> iommu_bus_init(struct bus_type *bus, struct iommu_ops *ops);  extern bool
> iommu_found(void); -extern struct iommu_domain *iommu_domain_alloc(void);
> +extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus);
>  extern void iommu_domain_free(struct iommu_domain *domain);  extern int
> iommu_attach_device(struct iommu_domain *domain,
>  struct device *dev);
> @@ -79,7 +81,7 @@ static inline bool iommu_found(void)
>   return false;
>  }
> 
> -static inline struct iommu_domain *iommu_domain_alloc(void)
> +static inline struct iommu_domain *iommu_domain_alloc(struct bus_type
> +*bus)
>  {
>   return NULL;
>  }
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 78c80f6..20115b1
> 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -233,7 +233,7 @@ int kvm_iommu_map_guest(struct kvm *kvm)
>   return -ENODEV;
>   }
> 
> - kvm->arch.iommu_domain = iommu_domain_alloc();
> + kvm->arch.iommu_domain = iommu_domain_alloc(&pci_bus_type);

Although it might require changes starting all the way from the qemu interface, 
but
it would certainly be nice if this could be made more extendable/generic in 
terms
of the bus_type usage. This interface would be used by us (Freescale )for direct
assignment of SOC devices sitting on the platform bus.

-Varun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 02/10] Driver core: Add iommu_ops to bus_type

2011-09-12 Thread Sethi Varun-B16395


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of Joerg Roedel
> Sent: Thursday, September 08, 2011 2:12 PM
> To: Greg KH
> Cc: Joerg Roedel; io...@lists.linux-foundation.org; Alex Williamson; Ohad
> Ben-Cohen; David Woodhouse; David Brown; kvm@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 02/10] Driver core: Add iommu_ops to bus_type
> 
> On Wed, Sep 07, 2011 at 12:44:45PM -0700, Greg KH wrote:
> > On Wed, Sep 07, 2011 at 09:19:19PM +0200, Joerg Roedel wrote:
> > > Hi Greg,
> > >
> > > the bus_set_iommu() function will be called by the IOMMU driver.
> > > There can be different drivers for the same bus, depending on the
> > > hardware. On PCI for example, there can be the Intel or the AMD
> > > IOMMU driver that implement the iommu-api and that register for that
> bus.
> >
> > Why are you pushing this down into the driver core?  What other busses
> > becides PCI use/need this?
> 
> Currently it is the platform_bus besides pci. The pci iommus are on x86
> and ia64 while all arm iommus use the platform_bus (by 'iommus' I only
> mean those implementing the iommu-api). Currently there are two drivers
> for arm iommus in /drivers/iommu.
> 
We too have an IOMMU on Freescale Socs (Power core based) sitting on the 
platform bus
and are looking at integrating it with the linux IOMMU API.

-Varun

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html