Re: [PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2017-11-17 Thread Alex Williamson
On Fri, 17 Nov 2017 14:51:52 -0700
Alex Williamson  wrote:

> On Fri, 17 Nov 2017 15:11:19 -0600
> Suravee Suthikulpanit  wrote:
> 
> > From: Suravee Suthikulpanit 
> > 
> > VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
> > IOTLB flushing for every unmapping. This results in large IOTLB flushing
> > overhead when handling pass-through devices with a large number of mapped
> > IOVAs (e.g. GPUs).  
> 
> Of course the type of device is really irrelevant, QEMU maps the entire
> VM address space for any assigned device.
> 
> > This can be avoided by using the new IOTLB flushing interface.
> > 
> > Cc: Alex Williamson 
> > Cc: Joerg Roedel 
> > Signed-off-by: Suravee Suthikulpanit 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 12 +---
> >  1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 92155cc..28a7ab6 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu 
> > *iommu, struct vfio_dma *dma,
> > break;
> > }
> >  
> > -   unmapped = iommu_unmap(domain->domain, iova, len);
> > +   unmapped = iommu_unmap_fast(domain->domain, iova, len);
> > if (WARN_ON(!unmapped))
> > break;
> >  
> > +   iommu_tlb_range_add(domain->domain, iova, len);
> > +  
> 
> We should only add @unmapped, not @len, right?

Actually, the problems are deeper than that, if we can't guarantee that
the above iommu_unmap_fast has removed the iommu mapping, then we can't
do the unpin below as that would potentially allow the device access to
unknown memory.  Thus, to support this, the unpinning would need to be
pushed until after the sync and we therefore need some mechanism of
remembering the phys addresses that we've unmapped.  Thanks,

Alex
 
> > unlocked += vfio_unpin_pages_remote(dma, iova,
> > phys >> PAGE_SHIFT,
> > unmapped >> PAGE_SHIFT,
> > @@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
> > struct vfio_dma *dma,
> >  
> > cond_resched();
> > }
> > +   iommu_tlb_sync(domain->domain);
> >  
> > dma->iommu_mapped = false;
> > if (do_accounting) {
> > @@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, 
> > dma_addr_t iova,
> > break;
> > }
> >  
> > -   for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
> > -   iommu_unmap(domain->domain, iova, PAGE_SIZE);
> > +   for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
> > +   iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
> > +   iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE);
> > +   }
> > +   iommu_tlb_sync(domain->domain);
> >  
> > return ret;
> >  }  
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/amd: Add support for fast IOTLB flushing

2017-11-17 Thread Tom Lendacky

On 11/17/2017 3:11 PM, Suravee Suthikulpanit wrote:

From: Suravee Suthikulpanit 

Implement the newly added IOTLB flushing interface by introducing
per-protection-domain IOTLB flush list, which maintains a list of
IOVAs to be invalidated (by INVALIDATE_IOTLB_PAGES command) during
IOTLB sync.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd_iommu.c   | 77 -
  drivers/iommu/amd_iommu_init.c  |  2 --
  drivers/iommu/amd_iommu_types.h |  2 ++
  3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8e8874d..bf92809 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -130,6 +130,12 @@ struct dma_ops_domain {
  static struct iova_domain reserved_iova_ranges;
  static struct lock_class_key reserved_rbtree_key;
  
+struct iotlb_flush_entry {

+   struct list_head list;
+   unsigned long iova;
+   size_t size;
+};
+
  /
   *
   * Helper functions
@@ -2838,11 +2844,13 @@ static void protection_domain_free(struct 
protection_domain *domain)
  static int protection_domain_init(struct protection_domain *domain)
  {
spin_lock_init(>lock);
+   spin_lock_init(>iotlb_flush_list_lock);
mutex_init(>api_lock);
domain->id = domain_id_alloc();
if (!domain->id)
return -ENOMEM;
INIT_LIST_HEAD(>dev_list);
+   INIT_LIST_HEAD(>iotlb_flush_list);
  
  	return 0;

  }
@@ -3047,7 +3055,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(>api_lock);
  
-	domain_flush_tlb_pde(domain);

domain_flush_complete(domain);
  
  	return unmap_size;

@@ -3167,6 +3174,71 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
  }
  
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)

+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+}
+
+static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct iotlb_flush_entry *entry, *p;
+   unsigned long flags;
+   bool found = false;
+
+   spin_lock_irqsave(>iotlb_flush_list_lock, flags);
+   list_for_each_entry(p, >iotlb_flush_list, list) {
+   if (iova != p->iova)
+   continue;
+
+   if (size > p->size) {
+   p->size = size;
+   pr_debug("%s: update range: iova=%#lx, size = %#lx\n",
+__func__, p->iova, p->size);
+   }
+   found = true;
+   break;
+   }
+
+   if (!found) {
+   entry = kzalloc(sizeof(struct iotlb_flush_entry),
+   GFP_ATOMIC);
+   if (!entry)
+   return;


You need to release the spinlock before returning here.

Thanks,
Tom


+
+   pr_debug("%s: new range: iova=%lx, size=%#lx\n",
+__func__, iova, size);
+
+   entry->iova = iova;
+   entry->size = size;
+   list_add(>list, >iotlb_flush_list);
+   }
+   spin_unlock_irqrestore(>iotlb_flush_list_lock, flags);
+}
+
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct iotlb_flush_entry *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(>iotlb_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, >iotlb_flush_list, list) {
+   list_del(>list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(>iotlb_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+}
+
  const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3185,6 +3257,9 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
.apply_resv_region = amd_iommu_apply_resv_region,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .iotlb_range_add = amd_iommu_iotlb_range_add,
+   .iotlb_sync = amd_iommu_iotlb_sync,
  };
  
  /*

diff --git 

Re: [PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2017-11-17 Thread Alex Williamson
On Fri, 17 Nov 2017 15:11:19 -0600
Suravee Suthikulpanit  wrote:

> From: Suravee Suthikulpanit 
> 
> VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
> IOTLB flushing for every unmapping. This results in large IOTLB flushing
> overhead when handling pass-through devices with a large number of mapped
> IOVAs (e.g. GPUs).

Of course the type of device is really irrelevant, QEMU maps the entire
VM address space for any assigned device.

> This can be avoided by using the new IOTLB flushing interface.
> 
> Cc: Alex Williamson 
> Cc: Joerg Roedel 
> Signed-off-by: Suravee Suthikulpanit 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 92155cc..28a7ab6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
> struct vfio_dma *dma,
>   break;
>   }
>  
> - unmapped = iommu_unmap(domain->domain, iova, len);
> + unmapped = iommu_unmap_fast(domain->domain, iova, len);
>   if (WARN_ON(!unmapped))
>   break;
>  
> + iommu_tlb_range_add(domain->domain, iova, len);
> +

We should only add @unmapped, not @len, right?

>   unlocked += vfio_unpin_pages_remote(dma, iova,
>   phys >> PAGE_SHIFT,
>   unmapped >> PAGE_SHIFT,
> @@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
> struct vfio_dma *dma,
>  
>   cond_resched();
>   }
> + iommu_tlb_sync(domain->domain);
>  
>   dma->iommu_mapped = false;
>   if (do_accounting) {
> @@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, 
> dma_addr_t iova,
>   break;
>   }
>  
> - for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
> - iommu_unmap(domain->domain, iova, PAGE_SIZE);
> + for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
> + iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
> + iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE);
> + }
> + iommu_tlb_sync(domain->domain);
>  
>   return ret;
>  }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs

2017-11-17 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires
IOTLB flushing for every unmapping. This results in large IOTLB flushing
overhead when handling pass-through devices with a large number of mapped
IOVAs (e.g. GPUs).

This can be avoided by using the new IOTLB flushing interface.

Cc: Alex Williamson 
Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/vfio/vfio_iommu_type1.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 92155cc..28a7ab6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
break;
}
 
-   unmapped = iommu_unmap(domain->domain, iova, len);
+   unmapped = iommu_unmap_fast(domain->domain, iova, len);
if (WARN_ON(!unmapped))
break;
 
+   iommu_tlb_range_add(domain->domain, iova, len);
+
unlocked += vfio_unpin_pages_remote(dma, iova,
phys >> PAGE_SHIFT,
unmapped >> PAGE_SHIFT,
@@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
 
cond_resched();
}
+   iommu_tlb_sync(domain->domain);
 
dma->iommu_mapped = false;
if (do_accounting) {
@@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, 
dma_addr_t iova,
break;
}
 
-   for (; i < npage && i > 0; i--, iova -= PAGE_SIZE)
-   iommu_unmap(domain->domain, iova, PAGE_SIZE);
+   for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) {
+   iommu_unmap_fast(domain->domain, iova, PAGE_SIZE);
+   iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE);
+   }
+   iommu_tlb_sync(domain->domain);
 
return ret;
 }
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Reduce IOTLB flush when pass-through dGPU devices

2017-11-17 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Currently, when pass-through dGPU to a guest VM, there are thousands
of IOTLB flush commands sent from IOMMU to end-point-device. This cause
performance issue when launching new VMs, and could cause IOTLB invalidate
time-out issue on certain dGPUs.

This can be avoided by adopting the new fast IOTLB flush APIs.

Cc: Alex Williamson 
Cc: Joerg Roedel 

Suravee Suthikulpanit (2):
  vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
  iommu/amd: Add support for fast IOTLB flushing

 drivers/iommu/amd_iommu.c   | 77 -
 drivers/iommu/amd_iommu_init.c  |  2 --
 drivers/iommu/amd_iommu_types.h |  2 ++
 drivers/vfio/vfio_iommu_type1.c | 12 +--
 4 files changed, 87 insertions(+), 6 deletions(-)

-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Add support for fast IOTLB flushing

2017-11-17 Thread Suravee Suthikulpanit
From: Suravee Suthikulpanit 

Implement the newly added IOTLB flushing interface by introducing
per-protection-domain IOTLB flush list, which maintains a list of
IOVAs to be invalidated (by INVALIDATE_IOTLB_PAGES command) during
IOTLB sync.

Cc: Joerg Roedel 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 77 -
 drivers/iommu/amd_iommu_init.c  |  2 --
 drivers/iommu/amd_iommu_types.h |  2 ++
 3 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 8e8874d..bf92809 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -130,6 +130,12 @@ struct dma_ops_domain {
 static struct iova_domain reserved_iova_ranges;
 static struct lock_class_key reserved_rbtree_key;
 
+struct iotlb_flush_entry {
+   struct list_head list;
+   unsigned long iova;
+   size_t size;
+};
+
 /
  *
  * Helper functions
@@ -2838,11 +2844,13 @@ static void protection_domain_free(struct 
protection_domain *domain)
 static int protection_domain_init(struct protection_domain *domain)
 {
spin_lock_init(>lock);
+   spin_lock_init(>iotlb_flush_list_lock);
mutex_init(>api_lock);
domain->id = domain_id_alloc();
if (!domain->id)
return -ENOMEM;
INIT_LIST_HEAD(>dev_list);
+   INIT_LIST_HEAD(>iotlb_flush_list);
 
return 0;
 }
@@ -3047,7 +3055,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, 
unsigned long iova,
unmap_size = iommu_unmap_page(domain, iova, page_size);
mutex_unlock(>api_lock);
 
-   domain_flush_tlb_pde(domain);
domain_flush_complete(domain);
 
return unmap_size;
@@ -3167,6 +3174,71 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev_data->defer_attach;
 }
 
+static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain)
+{
+   struct protection_domain *dom = to_pdomain(domain);
+
+   domain_flush_tlb_pde(dom);
+}
+
+static void amd_iommu_iotlb_range_add(struct iommu_domain *domain,
+ unsigned long iova, size_t size)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct iotlb_flush_entry *entry, *p;
+   unsigned long flags;
+   bool found = false;
+
+   spin_lock_irqsave(>iotlb_flush_list_lock, flags);
+   list_for_each_entry(p, >iotlb_flush_list, list) {
+   if (iova != p->iova)
+   continue;
+
+   if (size > p->size) {
+   p->size = size;
+   pr_debug("%s: update range: iova=%#lx, size = %#lx\n",
+__func__, p->iova, p->size);
+   }
+   found = true;
+   break;
+   }
+
+   if (!found) {
+   entry = kzalloc(sizeof(struct iotlb_flush_entry),
+   GFP_ATOMIC);
+   if (!entry)
+   return;
+
+   pr_debug("%s: new range: iova=%lx, size=%#lx\n",
+__func__, iova, size);
+
+   entry->iova = iova;
+   entry->size = size;
+   list_add(>list, >iotlb_flush_list);
+   }
+   spin_unlock_irqrestore(>iotlb_flush_list_lock, flags);
+}
+
+static void amd_iommu_iotlb_sync(struct iommu_domain *domain)
+{
+   struct protection_domain *pdom = to_pdomain(domain);
+   struct iotlb_flush_entry *entry, *next;
+   unsigned long flags;
+
+   /* Note:
+* Currently, IOMMU driver just flushes the whole IO/TLB for
+* a given domain. So, just remove entries from the list here.
+*/
+   spin_lock_irqsave(>iotlb_flush_list_lock, flags);
+   list_for_each_entry_safe(entry, next, >iotlb_flush_list, list) {
+   list_del(>list);
+   kfree(entry);
+   }
+   spin_unlock_irqrestore(>iotlb_flush_list_lock, flags);
+
+   domain_flush_tlb_pde(pdom);
+}
+
 const struct iommu_ops amd_iommu_ops = {
.capable = amd_iommu_capable,
.domain_alloc = amd_iommu_domain_alloc,
@@ -3185,6 +3257,9 @@ static bool amd_iommu_is_attach_deferred(struct 
iommu_domain *domain,
.apply_resv_region = amd_iommu_apply_resv_region,
.is_attach_deferred = amd_iommu_is_attach_deferred,
.pgsize_bitmap  = AMD_IOMMU_PGSIZES,
+   .flush_iotlb_all = amd_iommu_flush_iotlb_all,
+   .iotlb_range_add = amd_iommu_iotlb_range_add,
+   .iotlb_sync = amd_iommu_iotlb_sync,
 };
 
 /*
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 6fe2d03..1659377 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ 

[PATCH v3 12/16] iommu/vt-d: report unrecoverable device faults

2017-11-17 Thread Jacob Pan
Currently, when device DMA faults are detected by IOMMU the fault
reasons are printed but the driver of the offending device is
involved in fault handling.
This patch uses per device fault reporting API to send fault event
data for further processing.
Offending device is identified by the source ID in VT-d fault reason
report registers.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/dmar.c | 94 +++-
 1 file changed, 93 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 38ee91b..b1f67fc2 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1555,6 +1555,31 @@ static const char *irq_remap_fault_reasons[] =
"Blocked an interrupt request due to source-id verification failure",
 };
 
+/* fault data and status */
+enum intel_iommu_fault_reason {
+   INTEL_IOMMU_FAULT_REASON_SW,
+   INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT,
+   INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT,
+   INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID,
+   INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH,
+   INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS,
+   INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS,
+   INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID,
+   INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID,
+   INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID,
+   INTEL_IOMMU_FAULT_REASON_NONE_ZERO_RTP,
+   INTEL_IOMMU_FAULT_REASON_NONE_ZERO_CTP,
+   INTEL_IOMMU_FAULT_REASON_NONE_ZERO_PTE,
+   NR_INTEL_IOMMU_FAULT_REASON,
+};
+
+/* fault reasons that are allowed to be reported outside IOMMU subsystem */
+#define INTEL_IOMMU_FAULT_REASON_ALLOWED   \
+   ((1ULL << INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH) | \
+   (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS) |   \
+   (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS))
+
+
 static const char *dmar_get_fault_reason(u8 fault_reason, int *fault_type)
 {
if (fault_reason >= 0x20 && (fault_reason - 0x20 <
@@ -1635,6 +1660,69 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
raw_spin_unlock_irqrestore(>register_lock, flag);
 }
 
+static enum iommu_fault_reason to_iommu_fault_reason(u8 reason)
+{
+   if (reason >= NR_INTEL_IOMMU_FAULT_REASON) {
+   pr_warn("unknown DMAR fault reason %d\n", reason);
+   return IOMMU_FAULT_REASON_UNKNOWN;
+   }
+   switch (reason) {
+   case INTEL_IOMMU_FAULT_REASON_SW:
+   case INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT:
+   case INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT:
+   case INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID:
+   case INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH:
+   case INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID:
+   case INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID:
+   return IOMMU_FAULT_REASON_INTERNAL;
+   case INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID:
+   case INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS:
+   case INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS:
+   return IOMMU_FAULT_REASON_PERMISSION;
+   default:
+   return IOMMU_FAULT_REASON_UNKNOWN;
+   }
+}
+
+static void report_fault_to_device(struct intel_iommu *iommu, u64 addr, int 
type,
+   int fault_type, enum intel_iommu_fault_reason 
reason, u16 sid)
+{
+   struct iommu_fault_event event;
+   struct pci_dev *pdev;
+   u8 bus, devfn;
+
+   /* check if fault reason is worth reporting outside IOMMU */
+   if (!((1 << reason) & INTEL_IOMMU_FAULT_REASON_ALLOWED)) {
+   pr_debug("Fault reason %d not allowed to report to device\n",
+   reason);
+   return;
+   }
+
+   bus = PCI_BUS_NUM(sid);
+   devfn = PCI_DEVFN(PCI_SLOT(sid), PCI_FUNC(sid));
+   /*
+* we need to check if the fault reporting is requested for the
+* offending device.
+*/
+   pdev = pci_get_bus_and_slot(bus, devfn);
+   if (!pdev) {
+   pr_warn("No PCI device found for source ID %x\n", sid);
+   return;
+   }
+   /*
+* unrecoverable fault is reported per IOMMU, notifier handler can
+* resolve PCI device based on source ID.
+*/
+   event.reason = to_iommu_fault_reason(reason);
+   event.addr = addr;
+   event.type = IOMMU_FAULT_DMA_UNRECOV;
+   event.prot = type ? IOMMU_READ : IOMMU_WRITE;
+   dev_warn(>dev, "report device unrecoverable fault: %d, %x, %d\n",
+   event.reason, sid, event.type);
+   iommu_report_device_fault(>dev, );
+   pci_dev_put(pdev);
+}
+
 static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
u8 fault_reason, u16 source_id, unsigned long long addr)
 {
@@ -1648,11 +1736,15 @@ static int dmar_fault_do_one(struct 

[PATCH v3 14/16] iommu/intel-svm: replace dev ops with fault report API

2017-11-17 Thread Jacob Pan
With the introduction of generic IOMMU device fault reporting API, we
can replace the private fault callback functions with standard function
and event data.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-svm.c |  7 +--
 include/linux/intel-svm.h | 20 +++-
 2 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 77c25d8..93b1849 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -283,7 +283,7 @@ static const struct mmu_notifier_ops intel_mmuops = {
 
 static DEFINE_MUTEX(pasid_mutex);
 
-int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
svm_dev_ops *ops)
+int intel_svm_bind_mm(struct device *dev, int *pasid, int flags)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
struct intel_svm_dev *sdev;
@@ -329,10 +329,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
 
list_for_each_entry(sdev, >devs, list) {
if (dev == sdev->dev) {
-   if (sdev->ops != ops) {
-   ret = -EBUSY;
-   goto out;
-   }
sdev->users++;
goto success;
}
@@ -358,7 +354,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
}
/* Finish the setup now we know we're keeping it */
sdev->users = 1;
-   sdev->ops = ops;
init_rcu_head(>rcu);
 
if (!svm) {
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 99bc5b3..a39a502 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -18,18 +18,6 @@
 
 struct device;
 
-struct svm_dev_ops {
-   void (*fault_cb)(struct device *dev, int pasid, u64 address,
-u32 private, int rwxp, int response);
-};
-
-/* Values for rxwp in fault_cb callback */
-#define SVM_REQ_READ   (1<<3)
-#define SVM_REQ_WRITE  (1<<2)
-#define SVM_REQ_EXEC   (1<<1)
-#define SVM_REQ_PRIV   (1<<0)
-
-
 /*
  * The SVM_FLAG_PRIVATE_PASID flag requests a PASID which is *not* the "main"
  * PASID for the current process. Even if a PASID already exists, a new one
@@ -60,7 +48,6 @@ struct svm_dev_ops {
  * @dev:   Device to be granted acccess
  * @pasid: Address for allocated PASID
  * @flags: Flags. Later for requesting supervisor mode, etc.
- * @ops:   Callbacks to device driver
  *
  * This function attempts to enable PASID support for the given device.
  * If the @pasid argument is non-%NULL, a PASID is allocated for access
@@ -82,8 +69,7 @@ struct svm_dev_ops {
  * Multiple calls from the same process may result in the same PASID
  * being re-used. A reference count is kept.
  */
-extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
-struct svm_dev_ops *ops);
+extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags);
 
 /**
  * intel_svm_unbind_mm() - Unbind a specified PASID
@@ -120,7 +106,7 @@ extern int intel_svm_is_pasid_valid(struct device *dev, int 
pasid);
 #else /* CONFIG_INTEL_IOMMU_SVM */
 
 static inline int intel_svm_bind_mm(struct device *dev, int *pasid,
-   int flags, struct svm_dev_ops *ops)
+   int flags)
 {
return -ENOSYS;
 }
@@ -136,6 +122,6 @@ static int intel_svm_is_pasid_valid(struct device *dev, int 
pasid)
 }
 #endif /* CONFIG_INTEL_IOMMU_SVM */
 
-#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0, NULL))
+#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0))
 
 #endif /* __INTEL_SVM_H__ */
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/16] iommu/vt-d: use threaded irq for dmar_fault

2017-11-17 Thread Jacob Pan
Currently, dmar fault IRQ handler does nothing more than rate
limited printk, no critical hardware handling need to be done
in IRQ context.
Convert it to threaded IRQ would allow fault processing that
requires process context. e.g. find out offending device based
on source ID in the fault rasons.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dmar.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index f69f6ee..38ee91b 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1749,7 +1749,8 @@ int dmar_set_interrupt(struct intel_iommu *iommu)
return -EINVAL;
}
 
-   ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu);
+   ret = request_threaded_irq(irq, NULL, dmar_fault,
+   IRQF_ONESHOT, iommu->name, iommu);
if (ret)
pr_err("Can't request irq\n");
return ret;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/16] iommu: introduce device fault data

2017-11-17 Thread Jacob Pan
Device faults detected by IOMMU can be reported outside IOMMU
subsystem for further processing. This patch intends to provide
a generic device fault data such that device drivers can be
communicated with IOMMU faults without model specific knowledge.

The proposed format is the result of discussion at:
https://lkml.org/lkml/2017/11/10/291
Part of the code is based on Jean-Philippe Brucker's patchset
(https://patchwork.kernel.org/patch/9989315/).

The assumption is that model specific IOMMU driver can filter and
handle most of the internal faults if the cause is within IOMMU driver
control. Therefore, the fault reasons can be reported are grouped
and generalized based common specifications such as PCI ATS.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
---
 include/linux/iommu.h | 108 +-
 1 file changed, 106 insertions(+), 2 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index da684a7..dfda89b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -49,13 +49,17 @@ struct bus_type;
 struct device;
 struct iommu_domain;
 struct notifier_block;
+struct iommu_fault_event;
 
 /* iommu fault flags */
-#define IOMMU_FAULT_READ   0x0
-#define IOMMU_FAULT_WRITE  0x1
+#define IOMMU_FAULT_READ   (1 << 0)
+#define IOMMU_FAULT_WRITE  (1 << 1)
+#define IOMMU_FAULT_EXEC   (1 << 2)
+#define IOMMU_FAULT_PRIV   (1 << 3)
 
 typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
struct device *, unsigned long, int, void *);
+typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *);
 
 struct iommu_domain_geometry {
dma_addr_t aperture_start; /* First address that can be mapped*/
@@ -264,6 +268,105 @@ struct iommu_device {
struct device *dev;
 };
 
+enum iommu_model {
+   IOMMU_MODEL_INTEL = 1,
+   IOMMU_MODEL_AMD,
+   IOMMU_MODEL_SMMU3,
+};
+
+/*  Generic fault types, can be expanded IRQ remapping fault */
+enum iommu_fault_type {
+   IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
+   IOMMU_FAULT_PAGE_REQ,   /* page request fault */
+};
+
+enum iommu_fault_reason {
+   IOMMU_FAULT_REASON_UNKNOWN = 0,
+
+   /* IOMMU internal error, no specific reason to report out */
+   IOMMU_FAULT_REASON_INTERNAL,
+
+   /* Could not access the PASID table */
+   IOMMU_FAULT_REASON_PASID_FETCH,
+
+   /*
+* PASID is out of range (e.g. exceeds the maximum PASID
+* supported by the IOMMU) or disabled.
+*/
+   IOMMU_FAULT_REASON_PASID_INVALID,
+
+   /* Could not access the page directory (Invalid PASID entry) */
+   IOMMU_FAULT_REASON_PGD_FETCH,
+
+   /* Could not access the page table entry (Bad address) */
+   IOMMU_FAULT_REASON_PTE_FETCH,
+
+   /* Protection flag check failed */
+   IOMMU_FAULT_REASON_PERMISSION,
+};
+
+/**
+ * struct iommu_fault_event - Generic per device fault data
+ *
+ * - PCI and non-PCI devices
+ * - Recoverable faults (e.g. page request), information based on PCI ATS
+ * and PASID spec.
+ * - Un-recoverable faults of device interest
+ * - DMA remapping and IRQ remapping faults
+
+ * @type contains fault type.
+ * @reason fault reasons if relevant outside IOMMU driver, IOMMU driver 
internal
+ * faults are not reported
+ * @addr: tells the offending page address
+ * @pasid: contains process address space ID, used in shared virtual 
memory(SVM)
+ * @rid: requestor ID
+ * @page_req_group_id: page request group index
+ * @last_req: last request in a page request group
+ * @pasid_valid: indicates if the PRQ has a valid PASID
+ * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, IOMMU_FAULT_WRITE
+ * @device_private: if present, uniquely identify device-specific
+ *  private data for an individual page request.
+ * @iommu_private: used by the IOMMU driver for storing fault-specific
+ * data. Users should not modify this field before
+ * sending the fault response.
+ */
+struct iommu_fault_event {
+   enum iommu_fault_type type;
+   enum iommu_fault_reason reason;
+   u64 addr;
+   u32 pasid;
+   u32 page_req_group_id : 9;
+   u32 last_req : 1;
+   u32 pasid_valid : 1;
+   u32 prot;
+   u64 device_private;
+   u64 iommu_private;
+};
+
+/**
+ * struct iommu_fault_param - per-device IOMMU fault data
+ * @dev_fault_handler: Callback function to handle IOMMU faults at device level
+ * @data: handler private data
+ *
+ */
+struct iommu_fault_param {
+   iommu_dev_fault_handler_t handler;
+   void *data;
+};
+
+/**
+ * struct iommu_param - collection of per-device IOMMU data
+ *
+ * @fault_param: IOMMU detected device fault reporting data
+ *
+ * TODO: migrate other per device data pointers 

[PATCH v3 16/16] iommu/vt-d: add intel iommu page response function

2017-11-17 Thread Jacob Pan
This patch adds page response support for Intel VT-d.
Generic response data is taken from the IOMMU API
then parsed into VT-d specific response descriptor format.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e1bd219..7f95827 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5171,6 +5171,35 @@ static int intel_iommu_sva_invalidate(struct 
iommu_domain *domain,
return ret;
 }
 
+int intel_iommu_page_response(struct iommu_domain *domain, struct device *dev,
+   struct page_response_msg *msg)
+{
+   struct qi_desc resp;
+   struct intel_iommu *iommu = dev_to_intel_iommu(dev);
+
+   /* TODO: sanitize response message */
+   if (msg->last_req) {
+   /* Page Group Response */
+   resp.low = QI_PGRP_PASID(msg->pasid) |
+   QI_PGRP_DID(msg->did) |
+   QI_PGRP_PASID_P(msg->pasid_present) |
+   QI_PGRP_RESP_TYPE;
+   /* REVISIT: allow private data passing from device prq */
+   resp.high = QI_PGRP_IDX(msg->page_req_group_id) |
+   QI_PGRP_PRIV(msg->private_data) | 
QI_PGRP_RESP_CODE(msg->resp_code);
+   } else {
+   /* Page Stream Response */
+   resp.low = QI_PSTRM_IDX(msg->page_req_group_id) |
+   QI_PSTRM_PRIV(msg->private_data) | 
QI_PSTRM_BUS(PCI_BUS_NUM(msg->did)) |
+   QI_PSTRM_PASID(msg->pasid) | QI_PSTRM_RESP_TYPE;
+   resp.high = QI_PSTRM_ADDR(msg->paddr) | QI_PSTRM_DEVFN(msg->did 
& 0xff) |
+   QI_PSTRM_RESP_CODE(msg->resp_code);
+   }
+   qi_submit_sync(, iommu);
+
+   return 0;
+}
+
 static int intel_iommu_map(struct iommu_domain *domain,
   unsigned long iova, phys_addr_t hpa,
   size_t size, int iommu_prot)
@@ -5606,6 +5635,7 @@ const struct iommu_ops intel_iommu_ops = {
.bind_pasid_table   = intel_iommu_bind_pasid_table,
.unbind_pasid_table = intel_iommu_unbind_pasid_table,
.sva_invalidate = intel_iommu_sva_invalidate,
+   .page_response  = intel_iommu_page_response,
 #endif
.map= intel_iommu_map,
.unmap  = intel_iommu_unmap,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 13/16] iommu/intel-svm: notify page request to guest

2017-11-17 Thread Jacob Pan
If the source device of a page request has its PASID table pointer
bond to a guest, the first level page tables are owned by the guest.
In this case, we shall let guest OS to manage page fault.

This patch uses the IOMMU fault notification API to send notifications,
possibly via VFIO, to the guest OS. Once guest pages are fault in, guest
will issue page response which will be passed down via the invalidation
passdown APIs.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/intel-svm.c | 80 ++-
 include/linux/iommu.h |  1 +
 2 files changed, 74 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index f6697e5..77c25d8 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -555,6 +555,71 @@ static bool is_canonical_address(u64 addr)
return (((saddr << shift) >> shift) == saddr);
 }
 
+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+   int prot = 0;
+
+   if (req->rd_req)
+   prot |= IOMMU_FAULT_READ;
+   if (req->wr_req)
+   prot |= IOMMU_FAULT_WRITE;
+   if (req->exe_req)
+   prot |= IOMMU_FAULT_EXEC;
+   if (req->priv_req)
+   prot |= IOMMU_FAULT_PRIV;
+
+   return prot;
+}
+
+static int intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc)
+{
+   int ret = 0;
+   struct iommu_fault_event event;
+   struct pci_dev *pdev;
+
+   /**
+* If caller does not provide struct device, this is the case where
+* guest PASID table is bound to the device. So we need to retrieve
+* struct device from the page request descriptor then proceed.
+*/
+   if (!dev) {
+   pdev = pci_get_bus_and_slot(desc->bus, desc->devfn);
+   if (!pdev) {
+   pr_err("No PCI device found for PRQ [%02x:%02x.%d]\n",
+   desc->bus, PCI_SLOT(desc->devfn),
+   PCI_FUNC(desc->devfn));
+   return -ENODEV;
+   }
+   dev = >dev;
+   } else if (dev_is_pci(dev)) {
+   pdev = to_pci_dev(dev);
+   pci_dev_get(pdev);
+   } else
+   return -ENODEV;
+
+   pr_debug("Notify PRQ device [%02x:%02x.%d]\n",
+   desc->bus, PCI_SLOT(desc->devfn),
+   PCI_FUNC(desc->devfn));
+
+   /* invoke device fault handler if registered */
+   if (iommu_has_device_fault_handler(dev)) {
+   /* Fill in event data for device specific processing */
+   event.type = IOMMU_FAULT_PAGE_REQ;
+   event.addr = desc->addr;
+   event.pasid = desc->pasid;
+   event.page_req_group_id = desc->prg_index;
+   event.prot = prq_to_iommu_prot(desc);
+   event.last_req = desc->lpig;
+   event.pasid_valid = 1;
+   event.iommu_private = desc->private;
+   ret = iommu_report_device_fault(>dev, );
+   }
+
+   pci_dev_put(pdev);
+
+   return ret;
+}
+
 static irqreturn_t prq_event_thread(int irq, void *d)
 {
struct intel_iommu *iommu = d;
@@ -578,7 +643,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
handled = 1;
 
req = >prq[head / sizeof(*req)];
-
+   /**
+* If prq is to be handled outside iommu driver via receiver of
+* the fault notifiers, we skip the page response here.
+*/
+   if (!intel_svm_prq_report(NULL, req))
+   goto prq_advance;
result = QI_RESP_FAILURE;
address = (u64)req->addr << VTD_PAGE_SHIFT;
if (!req->pasid_present) {
@@ -649,11 +719,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (WARN_ON(>list == >devs))
sdev = NULL;
 
-   if (sdev && sdev->ops && sdev->ops->fault_cb) {
-   int rwxp = (req->rd_req << 3) | (req->wr_req << 2) |
-   (req->exe_req << 1) | (req->priv_req);
-   sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr, 
req->private, rwxp, result);
-   }
+   intel_svm_prq_report(sdev->dev, req);
/* We get here in the error case where the PASID lookup failed,
   and these can be NULL. Do not use them below this point! */
sdev = NULL;
@@ -679,7 +745,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 
qi_submit_sync(, iommu);
}
-
+   prq_advance:
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 841c044..3083796b 100644
--- a/include/linux/iommu.h
+++ 

[PATCH v3 09/16] driver core: add iommu device fault reporting data

2017-11-17 Thread Jacob Pan
DMA faults can be detected by IOMMU at device level. Adding a pointer
to struct device allows IOMMU subsystem to report relevant faults
back to the device driver for further handling.
For direct assigned device (or user space drivers), guest OS holds
responsibility to handle and respond per device IOMMU fault.
Therefore we need fault reporting mechanism to propagate faults beyond
IOMMU subsystem.

There are two other IOMMU data pointers under struct device today, here
we introduce iommu_param as a parent pointer such that all device IOMMU
data can be consolidated here. The idea was suggested here by Greg KH
and Joerg. The name iommu_param is chosen here since iommu_data has been used.

Suggested-by: Greg Kroah-Hartman 
Signed-off-by: Jacob Pan 
Link: https://lkml.org/lkml/2017/10/6/81
---
 include/linux/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 66fe271..540e5e5 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -42,6 +42,7 @@ struct fwnode_handle;
 struct iommu_ops;
 struct iommu_group;
 struct iommu_fwspec;
+struct iommu_param;
 
 struct bus_attribute {
struct attributeattr;
@@ -871,6 +872,7 @@ struct dev_links_info {
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu_fwspec: IOMMU-specific properties supplied by firmware.
+ * @iommu_param: Per device generic IOMMU runtime data
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:   Set after successful invocation of bus type's .offline().
@@ -960,6 +962,7 @@ struct device {
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
struct iommu_fwspec *iommu_fwspec;
+   struct iommu_param  *iommu_param;
 
booloffline_disabled:1;
booloffline:1;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/16] iommu: introduce device fault report API

2017-11-17 Thread Jacob Pan
Traditionally, device specific faults are detected and handled within
their own device drivers. When IOMMU is enabled, faults such as DMA
related transactions are detected by IOMMU. There is no generic
reporting mechanism to report faults back to the in-kernel device
driver or the guest OS in case of assigned devices.

Faults detected by IOMMU is based on the transaction's source ID which
can be reported at per device basis, regardless of the device type is a
PCI device or not.

The fault types include recoverable (e.g. page request) and
unrecoverable faults(e.g. access error). In most cases, faults can be
handled by IOMMU drivers internally. The primary use cases are as
follows:
1. page request fault originated from an SVM capable device that is
assigned to guest via vIOMMU. In this case, the first level page tables
are owned by the guest. Page request must be propagated to the guest to
let guest OS fault in the pages then send page response. In this
mechanism, the direct receiver of IOMMU fault notification is VFIO,
which can relay notification events to QEMU or other user space
software.

2. faults need more subtle handling by device drivers. Other than
simply invoke reset function, there are needs to let device driver
handle the fault with a smaller impact.

This patchset is intended to create a generic fault report API such
that it can scale as follows:
- all IOMMU types
- PCI and non-PCI devices
- recoverable and unrecoverable faults
- VFIO and other other in kernel users
- DMA & IRQ remapping (TBD)
The original idea was brought up by David Woodhouse and discussions
summarized at https://lwn.net/Articles/608914/.

Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/iommu.c | 63 ++-
 include/linux/iommu.h | 36 +
 2 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 829e9e9..97b7990 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -581,6 +581,12 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
goto err_free_name;
}
 
+   dev->iommu_param = kzalloc(sizeof(struct iommu_fault_param), 
GFP_KERNEL);
+   if (!dev->iommu_param) {
+   ret = -ENOMEM;
+   goto err_free_name;
+   }
+
kobject_get(group->devices_kobj);
 
dev->iommu_group = group;
@@ -657,7 +663,7 @@ void iommu_group_remove_device(struct device *dev)
sysfs_remove_link(>kobj, "iommu_group");
 
trace_remove_device_from_group(group->id, dev);
-
+   kfree(dev->iommu_param);
kfree(device->name);
kfree(device);
dev->iommu_group = NULL;
@@ -791,6 +797,61 @@ int iommu_group_unregister_notifier(struct iommu_group 
*group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+int iommu_register_device_fault_handler(struct device *dev,
+   iommu_dev_fault_handler_t handler,
+   void *data)
+{
+   struct iommu_param *idata = dev->iommu_param;
+
+   /*
+* Device iommu_param should have been allocated when device is
+* added to its iommu_group.
+*/
+   if (!idata)
+   return -EINVAL;
+   /* Only allow one fault handler registered for each device */
+   if (idata->fault_param)
+   return -EBUSY;
+   get_device(dev);
+   idata->fault_param =
+   kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
+   if (!idata->fault_param)
+   return -ENOMEM;
+   idata->fault_param->handler = handler;
+   idata->fault_param->data = data;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
+
+int iommu_unregister_device_fault_handler(struct device *dev)
+{
+   struct iommu_param *idata = dev->iommu_param;
+
+   if (!idata)
+   return -EINVAL;
+
+   kfree(idata->fault_param);
+   idata->fault_param = NULL;
+   put_device(dev);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler);
+
+
+int iommu_report_device_fault(struct device *dev, struct iommu_fault_event 
*evt)
+{
+   /* we only report device fault if there is a handler registered */
+   if (!dev->iommu_param || !dev->iommu_param->fault_param ||
+   !dev->iommu_param->fault_param->handler)
+   return -ENOSYS;
+
+   return dev->iommu_param->fault_param->handler(evt,
+   
dev->iommu_param->fault_param->data);
+}
+EXPORT_SYMBOL_GPL(iommu_report_device_fault);
+
 /**
  * iommu_group_id - Return ID for a group
  * @group: the group to ID
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dfda89b..841c044 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -463,6 +463,14 @@ extern int 

[PATCH v3 06/16] iommu/vt-d: add svm/sva invalidate function

2017-11-17 Thread Jacob Pan
This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c | 200 +++-
 include/linux/intel-iommu.h |  17 +++-
 2 files changed, 211 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 556bdd2..000b2b3 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4981,6 +4981,183 @@ static void intel_iommu_detach_device(struct 
iommu_domain *domain,
dmar_remove_one_dev_info(to_dmar_domain(domain), dev);
 }
 
+/*
+ * 3D array for converting IOMMU generic type-granularity to VT-d granularity
+ * X indexed by enum iommu_inv_type
+ * Y indicates request without and with PASID
+ * Z indexed by enum enum iommu_inv_granularity
+ *
+ * For an example, if we want to find the VT-d granularity encoding for IOTLB
+ * type, DMA request with PASID, and page selective. The look up indices are:
+ * [1][1][8], where
+ * 1: IOMMU_INV_TYPE_TLB
+ * 1: with PASID
+ * 8: IOMMU_INV_GRANU_PAGE_PASID
+ *
+ */
+const static int inv_type_granu_map[IOMMU_INV_NR_TYPE][2][IOMMU_INV_NR_GRANU] 
= {
+   /* extended dev IOTLBs, for dev-IOTLB, only global is valid,
+  for dev-EXIOTLB, two valid granu */
+   {
+   {1},
+   {0, 0, 0, 0, 1, 1, 0, 0, 0}
+   },
+   /* IOTLB and EIOTLB */
+   {
+   {1, 1, 0, 1, 0, 0, 0, 0, 0},
+   {0, 0, 0, 0, 1, 0, 1, 1, 1}
+   },
+   /* PASID cache */
+   {
+   {0},
+   {0, 0, 0, 0, 1, 1, 0, 0, 0}
+   },
+   /* context cache */
+   {
+   {1, 1, 1}
+   }
+};
+
+const static u64 
inv_type_granu_table[IOMMU_INV_NR_TYPE][2][IOMMU_INV_NR_GRANU] = {
+   /* extended dev IOTLBs, only global is valid */
+   {
+   {QI_DEV_IOTLB_GRAN_ALL},
+   {0, 0, 0, 0, QI_DEV_IOTLB_GRAN_ALL, 
QI_DEV_IOTLB_GRAN_PASID_SEL, 0, 0, 0}
+   },
+   /* IOTLB and EIOTLB */
+   {
+   {DMA_TLB_GLOBAL_FLUSH, DMA_TLB_DSI_FLUSH, 0, DMA_TLB_PSI_FLUSH},
+   {0, 0, 0, 0, QI_GRAN_ALL_ALL, 0, QI_GRAN_NONG_ALL, 
QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}
+   },
+   /* PASID cache */
+   {
+   {0},
+   {0, 0, 0, 0, QI_PC_ALL_PASIDS, QI_PC_PASID_SEL}
+   },
+   /* context cache */
+   {
+   {DMA_CCMD_GLOBAL_INVL, DMA_CCMD_DOMAIN_INVL, 
DMA_CCMD_DEVICE_INVL}
+   }
+};
+
+static inline int to_vtd_granularity(int type, int granu, int with_pasid, u64 
*vtd_granu)
+{
+   if (type >= IOMMU_INV_NR_TYPE || granu >= IOMMU_INV_NR_GRANU || 
with_pasid > 1)
+   return -EINVAL;
+
+   if (inv_type_granu_map[type][with_pasid][granu] == 0)
+   return -EINVAL;
+
+   *vtd_granu = inv_type_granu_table[type][with_pasid][granu];
+
+   return 0;
+}
+
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   struct intel_iommu *iommu;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct pci_dev *pdev;
+   u16 did, sid, pfsid;
+   u8 bus, devfn;
+   int ret = 0;
+   u64 granu;
+   unsigned long flags;
+
+   if (!inv_info || !dmar_domain)
+   return -EINVAL;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+
+   if (!dev || !dev_is_pci(dev))
+   return -ENODEV;
+
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   sid = PCI_DEVID(bus, devfn);
+   ret = to_vtd_granularity(inv_info->hdr.type, inv_info->granularity,
+   !!(inv_info->flags & 
IOMMU_INVALIDATE_PASID_TAGGED), );
+   if (ret) {
+   pr_err("Invalid range type %d, granu %d\n", inv_info->hdr.type,
+   inv_info->granularity);
+   return ret;
+   }
+
+   spin_lock(>lock);
+   spin_lock_irqsave(_domain_lock, flags);
+
+   switch (inv_info->hdr.type) {
+   case IOMMU_INV_TYPE_CONTEXT:
+   iommu->flush.flush_context(iommu, did, sid,
+   DMA_CCMD_MASK_NOBIT, granu);
+   break;
+   case IOMMU_INV_TYPE_TLB:
+

[PATCH v3 15/16] iommu: introduce page response function

2017-11-17 Thread Jacob Pan
When nested translation is turned on and guest owns the
first level page tables, device page request can be forwared
to the guest for handling faults. As the page response returns
by the guest, IOMMU driver on the host need to process the
response which informs the device and completes the page request
transaction.

This patch introduces generic API function for page response
passing from the guest or other in-kernel users. The definitions of
the generic data is based on PCI ATS specification not limited to
any vendor.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 14 ++
 include/linux/iommu.h | 42 ++
 2 files changed, 56 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 97b7990..7aefb40 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1416,6 +1416,20 @@ int iommu_sva_invalidate(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_GPL(iommu_sva_invalidate);
 
+int iommu_page_response(struct iommu_domain *domain, struct device *dev,
+   struct page_response_msg *msg)
+{
+   int ret = 0;
+
+   if (unlikely(!domain->ops->page_response))
+   return -ENODEV;
+
+   ret = domain->ops->page_response(domain, dev, msg);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_page_response);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 3083796b..17f698b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -163,6 +163,43 @@ struct iommu_resv_region {
 
 #ifdef CONFIG_IOMMU_API
 
+enum page_response_type {
+   IOMMU_PAGE_STREAM_RESP = 1,
+   IOMMU_PAGE_GROUP_RESP,
+};
+
+/**
+ * Generic page response information based on PCI ATS and PASID spec.
+ * @paddr: servicing page address
+ * @pasid: contains process address space ID, used in shared virtual 
memory(SVM)
+ * @rid: requestor ID
+ * @did: destination device ID
+ * @last_req: last request in a page request group
+ * @resp_code: response code
+ * @page_req_group_id: page request group index
+ * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, IOMMU_FAULT_WRITE
+ * @type: group or stream response
+ * @private_data: uniquely identify device-specific private data for an
+ *individual page response
+
+ */
+struct page_response_msg {
+   u64 paddr;
+   u32 pasid;
+   u32 rid:16;
+   u32 did:16;
+   u32 resp_code:4;
+   u32 last_req:1;
+   u32 pasid_present:1;
+#define IOMMU_PAGE_RESP_SUCCESS0
+#define IOMMU_PAGE_RESP_INVALID1
+#define IOMMU_PAGE_RESP_FAILURE0xF
+   u32 page_req_group_id : 9;
+   u32 prot;
+   enum page_response_type type;
+   u32 private_data;
+};
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
@@ -196,6 +233,7 @@ struct iommu_resv_region {
  * @bind_pasid_table: bind pasid table pointer for guest SVM
  * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  * @sva_invalidate: invalidate translation caches of shared virtual address
+ * @page_response: handle page request response
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -251,6 +289,8 @@ struct iommu_ops {
struct device *dev);
int (*sva_invalidate)(struct iommu_domain *domain,
struct device *dev, struct tlb_invalidate_info *inv_info);
+   int (*page_response)(struct iommu_domain *domain, struct device *dev,
+   struct page_response_msg *msg);
 
unsigned long pgsize_bitmap;
 };
@@ -472,6 +512,8 @@ extern int iommu_unregister_device_fault_handler(struct 
device *dev);
 
 extern int iommu_report_device_fault(struct device *dev, struct 
iommu_fault_event *evt);
 
+extern int iommu_page_response(struct iommu_domain *domain, struct device *dev,
+   struct page_response_msg *msg);
 extern int iommu_group_id(struct iommu_group *group);
 extern struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/16] iommu/vt-d: support flushing more TLB types

2017-11-17 Thread Jacob Pan
With shared virtual memory vitualization, extended IOTLB invalidation
may be passed down from outside IOMMU subsystems. This patch adds
invalidation functions that can be used for each IOTLB types.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/dmar.c| 54 ++---
 drivers/iommu/intel-iommu.c |  3 ++-
 include/linux/intel-iommu.h | 10 +++--
 3 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 57c920c..f69f6ee 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1336,11 +1336,25 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
qi_submit_sync(, iommu);
 }
 
-void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
-   u64 addr, unsigned mask)
+void qi_flush_eiotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+   unsigned int size_order, u64 granu, bool global)
 {
struct qi_desc desc;
 
+   desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+   desc.high = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_GL(global) |
+   QI_EIOTLB_IH(0) | QI_EIOTLB_AM(size_order);
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u16 qdep, u64 addr, unsigned mask)
+{
+   struct qi_desc desc;
+
+   pr_debug_ratelimited("%s: sid %d, pfsid %d, qdep %d, addr %llx, mask 
%d\n",
+   __func__, sid, pfsid, qdep, addr, mask);
if (mask) {
BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
@@ -1352,7 +1366,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 qdep,
qdep = 0;
 
desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
-  QI_DIOTLB_TYPE;
+  QI_DIOTLB_TYPE | QI_DEV_IOTLB_SID(pfsid);
+
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_dev_eiotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+   u32 pasid,  u16 qdep, u64 addr, unsigned size, u64 granu)
+{
+   struct qi_desc desc;
+
+   desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+   QI_DEV_EIOTLB_PFSID(pfsid);
+   desc.high |= QI_DEV_EIOTLB_GLOB(granu);
+
+   /* If S bit is 0, we only flush a single page. If S bit is set,
+* The least significant zero bit indicates the size. VT-d spec
+* 6.5.2.6
+*/
+   if (!size)
+   desc.high = QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+   else {
+   unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size);
+
+   desc.high = QI_DEV_EIOTLB_ADDR(addr & ~mask) | 
QI_DEV_EIOTLB_SIZE;
+   }
+   qi_submit_sync(, iommu);
+}
+
+void qi_flush_pasid(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
+{
+   struct qi_desc desc;
+
+   desc.high = 0;
+   desc.low = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | 
QI_PC_PASID(pasid);
 
qi_submit_sync(, iommu);
 }
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 399b504..556bdd2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1524,7 +1524,8 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
 
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
-   qi_flush_dev_iotlb(info->iommu, sid, qdep, addr, mask);
+   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
+   qdep, addr, mask);
}
spin_unlock_irqrestore(_domain_lock, flags);
 }
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 8d38e24..3c83f7e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -305,6 +305,7 @@ enum {
 #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32)
 #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16)
 #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4)
+#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid 
& 0xff0) << 48))
 #define QI_DEV_EIOTLB_MAX_INVS 32
 
 #define QI_PGRP_IDX(idx)   (((u64)(idx)) << 55)
@@ -496,8 +497,13 @@ extern void qi_flush_context(struct intel_iommu *iommu, 
u16 did, u16 sid,
 u8 fm, u64 type);
 extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
  unsigned int size_order, u64 type);
-extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep,
-  u64 addr, unsigned mask);
+extern void qi_flush_eiotlb(struct intel_iommu 

[PATCH v3 07/16] iommu/vt-d: assign PFSID in device TLB invalidation

2017-11-17 Thread Jacob Pan
When SRIOV VF device IOTLB is invalidated, we need to provide
the PF source SID such that IOMMU hardware can gauge the depth
of invalidation queue which is shared among VFs. This is needed
when device invalidation throttle (DIT) capability is supported.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 13 +
 include/linux/intel-iommu.h |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 000b2b3..e1bd219 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1459,6 +1459,19 @@ static void iommu_enable_dev_iotlb(struct 
device_domain_info *info)
return;
 
pdev = to_pci_dev(info->dev);
+   /* For IOMMU that supports device IOTLB throttling (DIT), we assign
+* PFSID to the invalidation desc of a VF such that IOMMU HW can gauge
+* queue depth at PF level. If DIT is not set, PFSID will be treated as
+* reserved, which should be set to 0.
+*/
+   if (!ecap_dit(info->iommu->ecap))
+   info->pfsid = 0;
+   else if (pdev && pdev->is_virtfn) {
+   if (ecap_dit(info->iommu->ecap))
+   dev_warn(>dev, "SRIOV VF device IOTLB enabled 
without flow control\n");
+   info->pfsid = PCI_DEVID(pdev->physfn->bus->number, 
pdev->physfn->devfn);
+   } else
+   info->pfsid = PCI_DEVID(info->bus, info->devfn);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
/* The PCIe spec, in its wisdom, declares that the behaviour of
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7f05e36..6956a4e 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -112,6 +112,7 @@
  * Extended Capability Register
  */
 
+#define ecap_dit(e)((e >> 41) & 0x1)
 #define ecap_pasid(e)  ((e >> 40) & 0x1)
 #define ecap_pss(e)((e >> 35) & 0x1f)
 #define ecap_eafs(e)   ((e >> 34) & 0x1)
@@ -285,6 +286,7 @@ enum {
 #define QI_DEV_IOTLB_SID(sid)  ((u64)((sid) & 0x) << 32)
 #define QI_DEV_IOTLB_QDEP(qdep)(((qdep) & 0x1f) << 16)
 #define QI_DEV_IOTLB_ADDR(addr)((u64)(addr) & VTD_PAGE_MASK)
+#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 
0xff0) << 48))
 #define QI_DEV_IOTLB_SIZE  1
 #define QI_DEV_IOTLB_MAX_INVS  32
 
@@ -475,6 +477,7 @@ struct device_domain_info {
struct list_head global; /* link to global list */
u8 bus; /* PCI bus number */
u8 devfn;   /* PCI devfn number */
+   u16 pfsid;  /* SRIOV physical function source ID */
u8 pasid_supported:3;
u8 pasid_enabled:1;
u8 pri_supported:1;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/16] [PATCH v3 00/16] IOMMU driver support for SVM virtualization

2017-11-17 Thread Jacob Pan
Hi All,

Shared virtual memory (SVM), or more precisely shared virtual address (SVA),
between device DMA and applications can reduce programming complexity
and enhance security. To enable SVM in the guest, i.e. shared guest application
address space and physical device DMA address, IOMMU driver must provide
some new functionalities.

This patchset is a follow-up on the discussions held at LPC 2017
VFIO/IOMMU/PCI track. Slides and notes can be found here:
https://linuxplumbersconf.org/2017/ocw/events/LPC2017/tracks/636

The complete guest SVM support also involves changes in QEMU and VFIO,
which has been posted earlier.
https://www.spinics.net/lists/kvm/msg148798.html

This is the IOMMU portion follow up of the more complete series of the
kernel changes to support vSVM. Please refer to the link below for more
details. https://www.spinics.net/lists/kvm/msg148819.html

Generic APIs are introduced in addition to Intel VT-d specific changes,
the goal is to have common interfaces across IOMMU and device types for
both VFIO and other in-kernel users.

At the top level, new IOMMU interfaces are introduced as follows:
 - bind guest PASID table
 - passdown invalidations of translation caches
 - IOMMU device fault reporting including page request/response and
   non-recoverable faults.

For IOMMU detected device fault reporting, struct device is extended to
provide callback and tracking at device level. The original proposal was
discussed here "Error handling for I/O memory management units"
(https://lwn.net/Articles/608914/). I have experimented two alternative
solutions:
1. use a shared group notifier, this does not scale well also causes unwanted
notification traffic when group sibling device is reported with faults.
2. place fault callback at device IOMMU arch data, e.g. device_domain_info
in Intel/FSL IOMMU driver. This will cause code duplication, since per
device fault reporting is generic.

The additional patches are Intel VT-d specific, which either implements or
replaces existing private interfaces with the generic ones.

This patchset is based on the work and ideas from many people, especially:
Ashok Raj 
Liu, Yi L 
Jean-Philippe Brucker 

Thanks,

Jacob

V3
- Consolidated fault reporting data format based on discussions on v2,
  including input from ARM and AMD.
- Renamed invalidation APIs from svm to sva based on discussions on v2
- Use a parent pointer under struct device for all iommu per device data
- Simplified device fault callback, allow driver private data to be
  registered. This might make it easy to replace domain fault handler.
V2
- Replaced hybrid interface data model (generic data + vendor specific
data) with all generic data. This will have the security benefit where
data passed from user space can be sanitized by all software layers if
needed.
- Addressed review comments from V1
- Use per device fault report data
- Support page request/response communications between host IOMMU and
guest or other in-kernel users.
- Added unrecoverable fault reporting to DMAR
- Use threaded IRQ function for DMAR fault interrupt and fault
reporting


Jacob Pan (15):
  iommu: introduce bind_pasid_table API function
  iommu/vt-d: add bind_pasid_table function
  iommu/vt-d: move device_domain_info to header
  iommu/vt-d: support flushing more TLB types
  iommu/vt-d: add svm/sva invalidate function
  iommu/vt-d: assign PFSID in device TLB invalidation
  iommu: introduce device fault data
  driver core: add iommu device fault reporting data
  iommu: introduce device fault report API
  iommu/vt-d: use threaded irq for dmar_fault
  iommu/vt-d: report unrecoverable device faults
  iommu/intel-svm: notify page request to guest
  iommu/intel-svm: replace dev ops with fault report API
  iommu: introduce page response function
  iommu/vt-d: add intel iommu page response function

Liu, Yi L (1):
  iommu: introduce iommu invalidate API function

 drivers/iommu/dmar.c  | 151 -
 drivers/iommu/intel-iommu.c   | 365 +++---
 drivers/iommu/intel-svm.c |  87 --
 drivers/iommu/iommu.c | 110 -
 include/linux/device.h|   3 +
 include/linux/dma_remapping.h |   1 +
 include/linux/intel-iommu.h   |  47 +-
 include/linux/intel-svm.h |  20 +--
 include/linux/iommu.h | 223 +-
 include/uapi/linux/iommu.h| 101 
 10 files changed, 1047 insertions(+), 61 deletions(-)
 create mode 100644 include/uapi/linux/iommu.h

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/16] iommu/vt-d: add bind_pasid_table function

2017-11-17 Thread Jacob Pan
Add Intel VT-d ops to the generic iommu_bind_pasid_table API
functions.

The primary use case is for direct assignment of SVM capable
device. Originated from emulated IOMMU in the guest, the request goes
through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller
passes guest PASID table pointer (GPA) and size.

Device context table entry is modified by Intel IOMMU specific
bind_pasid_table function. This will turn on nesting mode and matching
translation type.

The unbind operation restores default context mapping.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c   | 107 ++
 include/linux/dma_remapping.h |   1 +
 2 files changed, 108 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2087cd8..3d1901d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5176,6 +5176,7 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 #define MAX_NR_PASID_BITS (20)
+#define MIN_NR_PASID_BITS (5)
 static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu)
 {
/*
@@ -5302,6 +5303,108 @@ struct intel_iommu *intel_svm_device_to_iommu(struct 
device *dev)
 
return iommu;
 }
+
+static int intel_iommu_bind_pasid_table(struct iommu_domain *domain,
+   struct device *dev, struct pasid_table_config *pasidt_binfo)
+{
+   struct intel_iommu *iommu;
+   struct context_entry *context;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct device_domain_info *info;
+   struct pci_dev *pdev;
+   u8 bus, devfn, host_table_pasid_bits;
+   u16 did, sid;
+   int ret = 0;
+   unsigned long flags;
+   u64 ctx_lo;
+
+   iommu = device_to_iommu(dev, , );
+   if (!iommu)
+   return -ENODEV;
+   /* VT-d spec 9.4 says pasid table size is encoded as 2^(x+5) */
+   host_table_pasid_bits = intel_iommu_get_pts(iommu) + MIN_NR_PASID_BITS;
+   if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits ||
+   pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) {
+   pr_err("Invalid gPASID bits %d, host range %d - %d\n",
+   pasidt_binfo->pasid_bits,
+   MIN_NR_PASID_BITS, host_table_pasid_bits);
+   return -ERANGE;
+   }
+
+   pdev = to_pci_dev(dev);
+   sid = PCI_DEVID(bus, devfn);
+   info = dev->archdata.iommu;
+
+   if (!info) {
+   dev_err(dev, "Invalid device domain info\n");
+   ret = -EINVAL;
+   goto out;
+   }
+   if (!info->pasid_enabled) {
+   ret = pci_enable_pasid(pdev, info->pasid_supported & ~1);
+   if (ret) {
+   dev_err(dev, "Failed to enable PASID\n");
+   goto out;
+   }
+   }
+   if (!device_context_mapped(iommu, bus, devfn)) {
+   pr_warn("ctx not mapped for bus devfn %x:%x\n", bus, devfn);
+   ret = -EINVAL;
+   goto out;
+   }
+   spin_lock_irqsave(>lock, flags);
+   context = iommu_context_addr(iommu, bus, devfn, 0);
+   if (!context) {
+   ret = -EINVAL;
+   goto out_unlock;
+   }
+
+   /* Anticipate guest to use SVM and owns the first level, so we turn
+* nested mode on
+*/
+   ctx_lo = context[0].lo;
+   ctx_lo |= CONTEXT_NESTE | CONTEXT_PRS | CONTEXT_PASIDE;
+   ctx_lo &= ~CONTEXT_TT_MASK;
+   ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2;
+   context[0].lo = ctx_lo;
+
+   /* Assign guest PASID table pointer and size order */
+   ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) |
+   (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS);
+   context[1].lo = ctx_lo;
+   /* make sure context entry is updated before flushing */
+   wmb();
+   did = dmar_domain->iommu_did[iommu->seq_id];
+   iommu->flush.flush_context(iommu, did,
+   (((u16)bus) << 8) | devfn,
+   DMA_CCMD_MASK_NOBIT,
+   DMA_CCMD_DEVICE_INVL);
+   iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH);
+
+out_unlock:
+   spin_unlock_irqrestore(>lock, flags);
+out:
+   return ret;
+}
+
+static void intel_iommu_unbind_pasid_table(struct iommu_domain *domain,
+   struct device *dev)
+{
+   struct intel_iommu *iommu;
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   u8 bus, devfn;
+
+   assert_spin_locked(_domain_lock);
+   iommu = device_to_iommu(dev, , );
+   if (!iommu) {
+   dev_err(dev, "No IOMMU for device to unbind PASID table\n");
+   return;
+   }
+
+  

[PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-11-17 Thread Jacob Pan
From: "Liu, Yi L" 

When an SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before passing them down to the
host/physical IOMMU.
Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/iommu.c  | 14 +++
 include/linux/iommu.h  | 12 +
 include/uapi/linux/iommu.h | 62 ++
 3 files changed, 88 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c7e0d64..829e9e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1341,6 +1341,20 @@ void iommu_unbind_pasid_table(struct iommu_domain 
*domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   int ret = 0;
+
+   if (unlikely(!domain->ops->sva_invalidate))
+   return -ENODEV;
+
+   ret = domain->ops->sva_invalidate(domain, dev, inv_info);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0f6f6c5..da684a7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -190,6 +190,7 @@ struct iommu_resv_region {
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @bind_pasid_table: bind pasid table pointer for guest SVM
  * @unbind_pasid_table: unbind pasid table pointer and restore defaults
+ * @sva_invalidate: invalidate translation caches of shared virtual address
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -243,6 +244,8 @@ struct iommu_ops {
struct pasid_table_config *pasidt_binfo);
void (*unbind_pasid_table)(struct iommu_domain *domain,
struct device *dev);
+   int (*sva_invalidate)(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -309,6 +312,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain 
*domain,
struct device *dev, struct pasid_table_config *pasidt_binfo);
 extern void iommu_unbind_pasid_table(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
@@ -720,6 +726,12 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, 
struct device *dev)
 {
 }
 
+static inline int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 651ad5d..039ba36 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -36,4 +36,66 @@ struct pasid_table_config {
};
 };
 
+enum iommu_inv_granularity {
+   IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */
+   IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */
+   IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a
+* device ID
+*/
+   IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */
+   IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given PASID */
+   IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate specified PASID */
+
+   IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within all PASIDs */
+   IOMMU_INV_GRANU_NG_PASID,   /* non-global within a PASIDs */
+   IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */
+   IOMMU_INV_NR_GRANU,
+};
+
+enum iommu_inv_type {
+   

[PATCH v3 01/16] iommu: introduce bind_pasid_table API function

2017-11-17 Thread Jacob Pan
Virtual IOMMU was proposed to support Shared Virtual Memory (SVM)
use in the guest:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html

As part of the proposed architecture, when an SVM capable PCI
device is assigned to a guest, nested mode is turned on. Guest owns the
first level page tables (request with PASID) which performs GVA->GPA
translation. Second level page tables are owned by the host for GPA->HPA
translation for both request with and without PASID.

A new IOMMU driver interface is therefore needed to perform tasks as
follows:
* Enable nested translation and appropriate translation type
* Assign guest PASID table pointer (in GPA) and size to host IOMMU

This patch introduces new API functions to perform bind/unbind guest PASID
tables. Based on common data, model specific IOMMU drivers can be extended
to perform the specific steps for binding pasid table of assigned devices.

Signed-off-by: Jacob Pan 
Signed-off-by: Liu, Yi L 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/iommu.c  | 19 +++
 include/linux/iommu.h  | 24 
 include/uapi/linux/iommu.h | 39 +++
 3 files changed, 82 insertions(+)
 create mode 100644 include/uapi/linux/iommu.h

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0b..c7e0d64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1322,6 +1322,25 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_config *pasidt_binfo)
+{
+   if (unlikely(!domain->ops->bind_pasid_table))
+   return -ENODEV;
+
+   return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo);
+}
+EXPORT_SYMBOL_GPL(iommu_bind_pasid_table);
+
+void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+   if (unlikely(!domain->ops->unbind_pasid_table))
+   return;
+
+   domain->ops->unbind_pasid_table(domain, dev);
+}
+EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 41b8c57..0f6f6c5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define IOMMU_READ (1 << 0)
 #define IOMMU_WRITE(1 << 1)
@@ -187,6 +188,8 @@ struct iommu_resv_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of all possible supported page sizes
+ * @bind_pasid_table: bind pasid table pointer for guest SVM
+ * @unbind_pasid_table: unbind pasid table pointer and restore defaults
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -233,8 +236,14 @@ struct iommu_ops {
u32 (*domain_get_windows)(struct iommu_domain *domain);
 
int (*of_xlate)(struct device *dev, struct of_phandle_args *args);
+
bool (*is_attach_deferred)(struct iommu_domain *domain, struct device 
*dev);
 
+   int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_config *pasidt_binfo);
+   void (*unbind_pasid_table)(struct iommu_domain *domain,
+   struct device *dev);
+
unsigned long pgsize_bitmap;
 };
 
@@ -296,6 +305,10 @@ extern int iommu_attach_device(struct iommu_domain *domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_bind_pasid_table(struct iommu_domain *domain,
+   struct device *dev, struct pasid_table_config *pasidt_binfo);
+extern void iommu_unbind_pasid_table(struct iommu_domain *domain,
+   struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
@@ -696,6 +709,17 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct 
fwnode_handle *fwnode)
return NULL;
 }
 
+static inline
+int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev,
+   struct pasid_table_config *pasidt_binfo)
+{
+   return -EINVAL;
+}
+static inline
+void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev)
+{
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
new file mode 100644
index 000..651ad5d
--- /dev/null
+++ 

[PATCH v3 04/16] iommu/vt-d: move device_domain_info to header

2017-11-17 Thread Jacob Pan
Allow both intel-iommu.c and dmar.c to access device_domain_info.
Prepare for additional per device arch data used in TLB flush function

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel-iommu.c | 18 --
 include/linux/intel-iommu.h | 19 +++
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3d1901d..399b504 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -391,24 +391,6 @@ struct dmar_domain {
   iommu core */
 };
 
-/* PCI domain-device relationship */
-struct device_domain_info {
-   struct list_head link;  /* link to domain siblings */
-   struct list_head global; /* link to global list */
-   u8 bus; /* PCI bus number */
-   u8 devfn;   /* PCI devfn number */
-   u8 pasid_supported:3;
-   u8 pasid_enabled:1;
-   u8 pri_supported:1;
-   u8 pri_enabled:1;
-   u8 ats_supported:1;
-   u8 ats_enabled:1;
-   u8 ats_qdep;
-   struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
-   struct intel_iommu *iommu; /* IOMMU used by this device */
-   struct dmar_domain *domain; /* pointer to domain */
-};
-
 struct dmar_rmrr_unit {
struct list_head list;  /* list of rmrr units   */
struct acpi_dmar_header *hdr;   /* ACPI header  */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 77ea056..8d38e24 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -458,6 +458,25 @@ struct intel_iommu {
u32 flags;  /* Software defined flags */
 };
 
+/* PCI domain-device relationship */
+struct device_domain_info {
+   struct list_head link;  /* link to domain siblings */
+   struct list_head global; /* link to global list */
+   u8 bus; /* PCI bus number */
+   u8 devfn;   /* PCI devfn number */
+   u8 pasid_supported:3;
+   u8 pasid_enabled:1;
+   u8 pri_supported:1;
+   u8 pri_enabled:1;
+   u8 ats_supported:1;
+   u8 ats_enabled:1;
+   u8 ats_qdep;
+   u64 fault_mask; /* selected IOMMU faults to be reported */
+   struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
+   struct intel_iommu *iommu; /* IOMMU used by this device */
+   struct dmar_domain *domain; /* pointer to domain */
+};
+
 static inline void __iommu_flush_cache(
struct intel_iommu *iommu, void *addr, int size)
 {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v2 5/5] ACPI/IORT: Move IORT to the ACPI folder

2017-11-17 Thread Jean-Philippe Brucker
IORT can be used (by QEMU) to describe a virtual topology containing an
architecture-agnostic paravirtualized device. The rationale behind this
blasphemy is explained in patch 4/5.

In order to build IORT for x86 systems, the driver has to be moved outside
of arm64/. Since there is nothing specific to arm64 in the driver, it
simply requires moving Makefile and Kconfig entries.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/Kconfig| 3 +++
 drivers/acpi/Makefile   | 1 +
 drivers/acpi/arm64/Kconfig  | 3 ---
 drivers/acpi/arm64/Makefile | 1 -
 drivers/acpi/{arm64 => }/iort.c | 0
 5 files changed, 4 insertions(+), 4 deletions(-)
 rename drivers/acpi/{arm64 => }/iort.c (100%)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 5b1938f4b626..ce40275646c8 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -536,4 +536,7 @@ if ARM64
 source "drivers/acpi/arm64/Kconfig"
 endif
 
+config ACPI_IORT
+   bool
+
 endif  # ACPI
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index cd1abc9bc325..689c470c013b 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -112,3 +112,4 @@ video-objs  += acpi_video.o video_detect.o
 obj-y  += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
+obj-$(CONFIG_ACPI_IORT)+= iort.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 5a6f80fce0d6..403f917ab274 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -2,8 +2,5 @@
 # ACPI Configuration for ARM64
 #
 
-config ACPI_IORT
-   bool
-
 config ACPI_GTDT
bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 1017def2ea12..47925dc6cfc8 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,2 +1 @@
-obj-$(CONFIG_ACPI_IORT)+= iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c
similarity index 100%
rename from drivers/acpi/arm64/iort.c
rename to drivers/acpi/iort.c
-- 
2.14.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v2 4/5] ACPI/IORT: Support paravirtualized IOMMU

2017-11-17 Thread Jean-Philippe Brucker
To describe the virtual topology in relation to a virtio-iommu device,
ACPI-based systems use a "paravirtualized IOMMU" IORT node. Add support
for it.

This is a RFC because the IORT specification doesn't describe the
paravirtualized node at the moment, it is only provided as an example in
the virtio-iommu spec. What we need to do first is confirm that x86
kernels are able to use the IORT driver with the virtio-iommu. There isn't
anything specific to arm64 in the driver but there might be other blockers
we're not aware of (I know for example that x86 also requires custom DMA
ops rather than iommu-dma ones, but it's unrelated) so this needs to be
tested on the x86 prototype.

Rationale: virtio-iommu requires an ACPI table to be passed between host
and guest that describes its relation to PCI and platform endpoints in the
virtual system. A table that maps PCI RIDs and integrated devices to IOMMU
device IDs, telling the IOMMU driver which endpoints it manages.

As far as I'm aware, there are three existing tables that solve this
problem: Intel DMAR, AMD IVRS and ARM IORT. The first two are specific to
Intel VT-d and AMD IOMMU respectively, while the third describes multiple
remapping devices -- currently only ARM IOMMUs and MSI controllers, but it
is easy to extend.

IORT table and drivers are easiest to extend and they do the job, so
rather than introducing a fourth solution to solve a generic problem,
reuse what exists.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/iort.c | 95 +++
 drivers/iommu/Kconfig |  1 +
 include/acpi/actbl2.h | 18 -
 3 files changed, 106 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index fde279b0a6d8..c7132e4a0560 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -29,7 +29,8 @@
 #define IORT_TYPE_MASK(type)   (1 << (type))
 #define IORT_MSI_TYPE  (1 << ACPI_IORT_NODE_ITS_GROUP)
 #define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) |   \
-   (1 << ACPI_IORT_NODE_SMMU_V3))
+   (1 << ACPI_IORT_NODE_SMMU_V3) | \
+   (1 << ACPI_IORT_NODE_PARAVIRT))
 
 /* Until ACPICA headers cover IORT rev. C */
 #ifndef ACPI_IORT_SMMU_V3_CAVIUM_CN99XX
@@ -616,6 +617,8 @@ static inline bool iort_iommu_driver_enabled(u8 type)
return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
case ACPI_IORT_NODE_SMMU:
return IS_BUILTIN(CONFIG_ARM_SMMU);
+   case ACPI_IORT_NODE_PARAVIRT:
+   return IS_BUILTIN(CONFIG_VIRTIO_IOMMU);
default:
pr_warn("IORT node type %u does not describe an SMMU\n", type);
return false;
@@ -1062,6 +1065,48 @@ static bool __init arm_smmu_is_coherent(struct 
acpi_iort_node *node)
return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK;
 }
 
+static int __init paravirt_count_resources(struct acpi_iort_node *node)
+{
+   struct acpi_iort_pviommu *pviommu;
+
+   pviommu = (struct acpi_iort_pviommu *)node->node_data;
+
+   /* Mem + IRQs */
+   return 1 + pviommu->interrupt_count;
+}
+
+static void __init paravirt_init_resources(struct resource *res,
+  struct acpi_iort_node *node)
+{
+   int i;
+   int num_res = 0;
+   int hw_irq, trigger;
+   struct acpi_iort_pviommu *pviommu;
+
+   pviommu = (struct acpi_iort_pviommu *)node->node_data;
+
+   res[num_res].start = pviommu->base_address;
+   res[num_res].end = pviommu->base_address + pviommu->span - 1;
+   res[num_res].flags = IORESOURCE_MEM;
+   num_res++;
+
+   for (i = 0; i < pviommu->interrupt_count; i++) {
+   hw_irq = IORT_IRQ_MASK(pviommu->interrupts[i]);
+   trigger = IORT_IRQ_TRIGGER_MASK(pviommu->interrupts[i]);
+
+   acpi_iort_register_irq(hw_irq, "pviommu", trigger, 
[num_res++]);
+   }
+}
+
+static bool __init paravirt_is_coherent(struct acpi_iort_node *node)
+{
+   struct acpi_iort_pviommu *pviommu;
+
+   pviommu = (struct acpi_iort_pviommu *)node->node_data;
+
+   return pviommu->flags & ACPI_IORT_NODE_PV_CACHE_COHERENT;
+}
+
 struct iort_iommu_config {
const char *name;
int (*iommu_init)(struct acpi_iort_node *node);
@@ -1088,6 +1133,13 @@ static const struct iort_iommu_config iort_arm_smmu_cfg 
__initconst = {
.iommu_init_resources = arm_smmu_init_resources
 };
 
+static const struct iort_iommu_config iort_paravirt_cfg __initconst = {
+   .name = "pviommu",
+   .iommu_is_coherent = paravirt_is_coherent,
+   .iommu_count_resources = paravirt_count_resources,
+   .iommu_init_resources = paravirt_init_resources
+};
+
 static __init
 const struct iort_iommu_config *iort_get_iommu_cfg(struct acpi_iort_node *node)
 {
@@ -1096,18 +1148,22 @@ const struct iort_iommu_config 

[RFC PATCH v2 2/5] iommu/virtio-iommu: Add probe request

2017-11-17 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 165 --
 include/uapi/linux/virtio_iommu.h |  37 +
 2 files changed, 195 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index feb8c8925c3a..79e0add94e05 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -45,6 +45,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -72,6 +73,7 @@ struct viommu_domain {
 struct viommu_endpoint {
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -139,6 +141,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
case VIRTIO_IOMMU_T_UNMAP:
size = sizeof(r->unmap);
break;
+   case VIRTIO_IOMMU_T_PROBE:
+   *bottom += viommu->probe_size;
+   size = sizeof(r->probe) + *bottom;
+   break;
default:
return -EINVAL;
}
@@ -448,6 +454,106 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   u64 addr = le64_to_cpu(mem->addr);
+   u64 size = le64_to_cpu(mem->size);
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(addr, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   default:
+   region = iommu_alloc_resv_region(addr, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   }
+
+   list_add(>resv_regions, >list);
+
+   if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
+   mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
+   /* Please update your driver. */
+   pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   /* Trouble ahead. */
+   return -EINVAL;
+
+   probe = kzalloc(sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe);
+   if (ret) {
+   kfree(probe);
+   return ret;
+   }
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
len);
+   break;
+   default:
+   dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 0x%x\n", 
type);
+
+   cur += sizeof(*prop) + len;
+   if (cur >= viommu->probe_size)
+   break;
+
+   prop = (void 

[RFC PATCH v2 3/5] iommu/virtio-iommu: Add event queue

2017-11-17 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
devices. It is implemented on virtqueue #1, whenever the host needs to
signal a fault it fills one of the buffers offered by the guest and
interrupts it.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 138 ++
 include/uapi/linux/virtio_iommu.h |  18 +
 2 files changed, 142 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 79e0add94e05..fe0d449bf489 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -30,6 +30,12 @@
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
+enum viommu_vq_idx {
+   VIOMMU_REQUEST_VQ   = 0,
+   VIOMMU_EVENT_VQ = 1,
+   VIOMMU_NUM_VQS  = 2,
+};
+
 struct viommu_dev {
struct iommu_device iommu;
struct device   *dev;
@@ -37,7 +43,7 @@ struct viommu_dev {
 
struct ida  domain_ids;
 
-   struct virtqueue*vq;
+   struct virtqueue*vqs[VIOMMU_NUM_VQS];
/* Serialize anything touching the request queue */
spinlock_t  request_lock;
 
@@ -84,6 +90,15 @@ struct viommu_request {
struct list_headlist;
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain) container_of(domain, struct viommu_domain, 
domain)
 
 /* Virtio transport */
@@ -160,12 +175,13 @@ static int viommu_receive_resp(struct viommu_dev *viommu, 
int nr_sent,
unsigned int len;
int nr_received = 0;
struct viommu_request *req, *pending;
+   struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 
pending = list_first_entry_or_null(sent, struct viommu_request, list);
if (WARN_ON(!pending))
return 0;
 
-   while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) {
+   while ((req = virtqueue_get_buf(vq, )) != NULL) {
if (req != pending) {
dev_warn(viommu->dev, "discarding stale request\n");
continue;
@@ -202,6 +218,7 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu,
 * dies.
 */
unsigned long timeout_ms = 1000;
+   struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 
*nr_sent = 0;
 
@@ -211,15 +228,14 @@ static int _viommu_send_reqs_sync(struct viommu_dev 
*viommu,
sg[0] = >top;
sg[1] = >bottom;
 
-   ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req,
-   GFP_ATOMIC);
+   ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
if (ret)
break;
 
list_add_tail(>list, );
}
 
-   if (i && !virtqueue_kick(viommu->vq))
+   if (i && !virtqueue_kick(vq))
return -EPIPE;
 
timeout = ktime_add_ms(ktime_get(), timeout_ms * i);
@@ -554,6 +570,70 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return 0;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+
+   return 0;
+}
+
+static void viommu_event_handler(struct virtqueue *vq)
+{
+   int ret;
+   unsigned int len;
+   struct scatterlist sg[1];
+  

[RFC PATCH v2 1/5] iommu: Add virtio-iommu driver

2017-11-17 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio-mmio transport without emulating
page tables. This implementation handle ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code is to create requests and send them through virtio.
Implementing the IOMMU API is fairly straightforward since the
virtio-iommu MAP/UNMAP interface is almost identical.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 958 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 140 ++
 5 files changed,  insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 17b212f56e6a..7271e59e8b23 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -403,4 +403,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO_MMIO
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index dca71fe1c885..432242f3a328 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -31,3 +31,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..feb8c8925c3a
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,958 @@
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2017 ARM Limited
+ * Author: Jean-Philippe Brucker 
+ *
+ * SPDX-License-Identifier: GPL-2.0
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vq;
+   /* Serialize anything touching the request queue */
+   spinlock_t  request_lock;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   union {
+   struct virtio_iommu_req_map map;
+   struct virtio_iommu_req_unmap unmap;
+   } req;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex;
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   /* Number of endpoints attached to this domain */
+   refcount_t  endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct scatterlist  top;
+   struct scatterlist  bottom;
+
+   int written;
+   struct list_headlist;
+};
+
+#define to_viommu_domain(domain) container_of(domain, struct viommu_domain, 
domain)
+
+/* Virtio transport */
+
+static int viommu_status_to_errno(u8 status)
+{
+   switch (status) {
+   case VIRTIO_IOMMU_S_OK:
+   return 0;
+   case VIRTIO_IOMMU_S_UNSUPP:
+   return -ENOSYS;
+   case VIRTIO_IOMMU_S_INVAL:
+   return -EINVAL;
+   case VIRTIO_IOMMU_S_RANGE:
+   return -ERANGE;
+   case VIRTIO_IOMMU_S_NOENT:
+   return -ENOENT;
+   case VIRTIO_IOMMU_S_FAULT:
+   return -EFAULT;
+   case VIRTIO_IOMMU_S_IOERR:
+   case VIRTIO_IOMMU_S_DEVERR:
+   default:
+   return -EIO;
+   }
+}
+
+/*
+ * viommu_get_req_size - 

[RFC PATCH v2 0/5] Add virtio-iommu driver

2017-11-17 Thread Jean-Philippe Brucker
Implement the virtio-iommu driver following version 0.5 of the
specification [1]. Previous version of this code was sent back in April
[2], implementing the first public RFC. Since then there has been lots of
progress and discussion on the specification side, and I think the driver
is in a good shape now.

The reason patches 1-3 are only RFC is that I'm waiting on feedback from
the Virtio TC to reserve a device ID.

List of changes since previous RFC:
* Add per-endpoint probe request, for hardware MSI and reserved regions.
* Add a virtqueue for the device to report translation faults. Only
  non-recoverable ones at the moment.
* Removed the iommu_map_sg specialization for now, because none of the
  device drivers I use for testing (virtio, ixgbe and internal DMA
  engines) seem to use map_sg. This kind of feature is a lot more
  interesting when accompanied by benchmark numbers, and can be added back
  during future optimization work.
* Many fixes and cleanup

The driver works out of the box on DT-based systems, but ACPI support
still needs to be tested and discussed. In the specification I proposed
IORT tables as a nice candidate for describing the virtual topology.
Patches 4 and 5 propose small changes to the IORT driver for
instantiating a paravirtualized IOMMU. The IORT node is described in the
specification [1]. x86 support will also require some hacks since the
driver is based on the IOMMU DMA ops, that x86 doesn't use.

Eric's latest QEMU device [3] works with v0.4. For the moment you can use
the kvmtool device [4] to test v0.5 on arm64, and inject arbitrary fault
with the debug tool. The driver can also be pulled from my Linux tree [5].

[1] https://www.spinics.net/lists/kvm/msg157402.html
[2] https://patchwork.kernel.org/patch/9670273/
[3] https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00413.html
[4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/base
[5] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.5-dev

Jean-Philippe Brucker (5):
  iommu: Add virtio-iommu driver
  iommu/virtio-iommu: Add probe request
  iommu/virtio-iommu: Add event queue
  ACPI/IORT: Support paravirtualized IOMMU
  ACPI/IORT: Move IORT to the ACPI folder

 drivers/acpi/Kconfig  |3 +
 drivers/acpi/Makefile |1 +
 drivers/acpi/arm64/Kconfig|3 -
 drivers/acpi/arm64/Makefile   |1 -
 drivers/acpi/{arm64 => }/iort.c   |   95 ++-
 drivers/iommu/Kconfig |   12 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1219 +
 include/acpi/actbl2.h |   18 +-
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  195 ++
 11 files changed, 1537 insertions(+), 12 deletions(-)
 rename drivers/acpi/{arm64 => }/iort.c (92%)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.14.3

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-11-17 Thread Jacob Pan
On Fri, 17 Nov 2017 17:44:57 +
Casey Leedom  wrote:

> | From: Raj, Ashok 
> | Sent: Friday, November 17, 2017 7:48 AM
> | 
> | Reported by: Harsh 
> | Reviewed by: Ashok Raj 
> | Tested by: Jacob Pan 
> 
> Thanks everyone!  I've updated our internal bug on this issue
> and noted that we need to track down the remaining problems
> which may be in our own code.
> 
All sounds good to me, let me know if you need further assistance on
vt-d driver.

Jacob
> Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-11-17 Thread Casey Leedom
| From: Raj, Ashok 
| Sent: Friday, November 17, 2017 7:48 AM
| 
| Reported by: Harsh 
| Reviewed by: Ashok Raj 
| Tested by: Jacob Pan 

Thanks everyone!  I've updated our internal bug on this issue
and noted that we need to track down the remaining problems
which may be in our own code.

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-11-17 Thread Raj, Ashok
Hi Alex

On Fri, Nov 17, 2017 at 09:18:14AM -0700, Alex Williamson wrote:
> On Thu, 16 Nov 2017 13:09:33 -0800
> "Raj, Ashok"  wrote:
> 
> > > 
> > > What do we do about this?  I certainly can't rip out large page support
> > > and put a stable tag on the patch.  I'm not really spotting what's
> > > wrong with large page support here, other than the comment about it
> > > being a mess.  Suggestions?  Thanks,
> > >   
> > 
> > Largepage seems to work and i don't think we need to rip it out. When
> > Harsh tested it at one point we thought disabling super-page seemed to make
> > the problem go away. Jacob tested and we still saw the need for Robin's 
> > patch.
> > 
> > Yes, the function looks humongous but i don't think we should wait for that 
> > before this merge.
> 
> Ok.  Who wants to toss in review and testing sign-offs?  Clearly
> there's been a lot more eyes and effort on this patch than reflected in
> the original posting.  I'll add a stable cc.  Thanks,

Reported by: Harsh 
Reviewed by: Ashok Raj 
Tested by: Jacob Pan 
> 
> Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling

2017-11-17 Thread Alex Williamson
On Thu, 16 Nov 2017 13:09:33 -0800
"Raj, Ashok"  wrote:

> Hi Alex
> 
> On Thu, Nov 16, 2017 at 02:32:44PM -0700, Alex Williamson wrote:
> > On Wed, 15 Nov 2017 15:54:56 -0800
> > Jacob Pan  wrote:
> >   
> > > Hi Alex and all,
> > > 
> > > Just wondering if you could merge Robin's patch for the next rc. From
> > > all our testing, this seems to be a solid fix and should be included in
> > > the stable releases as well.  
> > 
> > Hi Jacob,
> > 
> > Sorry, this wasn't on my radar, I only scanned for patches back through
> > about when Joerg refreshed his next branch (others on the list speak up
> > if I didn't pickup your patches for the v4.15 merge window).
> > 
> > This patch makes sense to me and I'm glad you were able to work through
> > the anomaly Harsh saw in testing as an unrelated issue, but...
> > 
> > 
> > What do we do about this?  I certainly can't rip out large page support
> > and put a stable tag on the patch.  I'm not really spotting what's
> > wrong with large page support here, other than the comment about it
> > being a mess.  Suggestions?  Thanks,
> >   
> 
> Largepage seems to work and i don't think we need to rip it out. When
> Harsh tested it at one point we thought disabling super-page seemed to make
> the problem go away. Jacob tested and we still saw the need for Robin's patch.
> 
> Yes, the function looks humongous but i don't think we should wait for that 
> before this merge.

Ok.  Who wants to toss in review and testing sign-offs?  Clearly
there's been a lot more eyes and effort on this patch than reflected in
the original posting.  I'll add a stable cc.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS

2017-11-17 Thread Jean-Philippe Brucker
On 17/11/17 06:11, Bharat Kumar Gogada wrote:
[...]
> Thanks Jean, I see that currently vfio_group_fops_open does not allow 
> multiple instances. 
> If a device supports multiple PASID there might be different applications 
> running parallel. 
> So why is multiple instances restricted ?

You can't have multiple processes owning the same PCI device, it's
unmanageable.

For using multiple PASIDs, my idea was that the userspace driver ("the
server"), that owns the device, would have a way to partition it into
smaller frames. It forks to create "clients" and assigns a PASID to each
of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID
into a privileged MMIO frame that defines the partition properties). Each
client accesses an unprivileged MMIO frame to use a device partition (or
sends commands to the server via IPC), and can perform DMA on its own
virtual memory.

This is complete speculation of course, we have very little information on
how PASID-capable devices will be designed, so I'm trying to imagine
likely scenarios.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu