Re: [PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
On Fri, 17 Nov 2017 14:51:52 -0700 Alex Williamsonwrote: > On Fri, 17 Nov 2017 15:11:19 -0600 > Suravee Suthikulpanit wrote: > > > From: Suravee Suthikulpanit > > > > VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires > > IOTLB flushing for every unmapping. This results in large IOTLB flushing > > overhead when handling pass-through devices with a large number of mapped > > IOVAs (e.g. GPUs). > > Of course the type of device is really irrelevant, QEMU maps the entire > VM address space for any assigned device. > > > This can be avoided by using the new IOTLB flushing interface. > > > > Cc: Alex Williamson > > Cc: Joerg Roedel > > Signed-off-by: Suravee Suthikulpanit > > --- > > drivers/vfio/vfio_iommu_type1.c | 12 +--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c > > index 92155cc..28a7ab6 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu > > *iommu, struct vfio_dma *dma, > > break; > > } > > > > - unmapped = iommu_unmap(domain->domain, iova, len); > > + unmapped = iommu_unmap_fast(domain->domain, iova, len); > > if (WARN_ON(!unmapped)) > > break; > > > > + iommu_tlb_range_add(domain->domain, iova, len); > > + > > We should only add @unmapped, not @len, right? Actually, the problems are deeper than that, if we can't guarantee that the above iommu_unmap_fast has removed the iommu mapping, then we can't do the unpin below as that would potentially allow the device access to unknown memory. Thus, to support this, the unpinning would need to be pushed until after the sync and we therefore need some mechanism of remembering the phys addresses that we've unmapped. Thanks, Alex > > unlocked += vfio_unpin_pages_remote(dma, iova, > > phys >> PAGE_SHIFT, > > unmapped >> PAGE_SHIFT, > > @@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, > > struct vfio_dma *dma, > > > > cond_resched(); > > } > > + iommu_tlb_sync(domain->domain); > > > > dma->iommu_mapped = false; > > if (do_accounting) { > > @@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, > > dma_addr_t iova, > > break; > > } > > > > - for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) > > - iommu_unmap(domain->domain, iova, PAGE_SIZE); > > + for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) { > > + iommu_unmap_fast(domain->domain, iova, PAGE_SIZE); > > + iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE); > > + } > > + iommu_tlb_sync(domain->domain); > > > > return ret; > > } > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/amd: Add support for fast IOTLB flushing
On 11/17/2017 3:11 PM, Suravee Suthikulpanit wrote: From: Suravee SuthikulpanitImplement the newly added IOTLB flushing interface by introducing per-protection-domain IOTLB flush list, which maintains a list of IOVAs to be invalidated (by INVALIDATE_IOTLB_PAGES command) during IOTLB sync. Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 77 - drivers/iommu/amd_iommu_init.c | 2 -- drivers/iommu/amd_iommu_types.h | 2 ++ 3 files changed, 78 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8e8874d..bf92809 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -130,6 +130,12 @@ struct dma_ops_domain { static struct iova_domain reserved_iova_ranges; static struct lock_class_key reserved_rbtree_key; +struct iotlb_flush_entry { + struct list_head list; + unsigned long iova; + size_t size; +}; + / * * Helper functions @@ -2838,11 +2844,13 @@ static void protection_domain_free(struct protection_domain *domain) static int protection_domain_init(struct protection_domain *domain) { spin_lock_init(>lock); + spin_lock_init(>iotlb_flush_list_lock); mutex_init(>api_lock); domain->id = domain_id_alloc(); if (!domain->id) return -ENOMEM; INIT_LIST_HEAD(>dev_list); + INIT_LIST_HEAD(>iotlb_flush_list); return 0; } @@ -3047,7 +3055,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, unmap_size = iommu_unmap_page(domain, iova, page_size); mutex_unlock(>api_lock); - domain_flush_tlb_pde(domain); domain_flush_complete(domain); return unmap_size; @@ -3167,6 +3174,71 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, return dev_data->defer_attach; } +static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct protection_domain *dom = to_pdomain(domain); + + domain_flush_tlb_pde(dom); +} + +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct iotlb_flush_entry *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(>iotlb_flush_list_lock, flags); + list_for_each_entry(p, >iotlb_flush_list, list) { + if (iova != p->iova) + continue; + + if (size > p->size) { + p->size = size; + pr_debug("%s: update range: iova=%#lx, size = %#lx\n", +__func__, p->iova, p->size); + } + found = true; + break; + } + + if (!found) { + entry = kzalloc(sizeof(struct iotlb_flush_entry), + GFP_ATOMIC); + if (!entry) + return; You need to release the spinlock before returning here. Thanks, Tom + + pr_debug("%s: new range: iova=%lx, size=%#lx\n", +__func__, iova, size); + + entry->iova = iova; + entry->size = size; + list_add(>list, >iotlb_flush_list); + } + spin_unlock_irqrestore(>iotlb_flush_list_lock, flags); +} + +static void amd_iommu_iotlb_sync(struct iommu_domain *domain) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct iotlb_flush_entry *entry, *next; + unsigned long flags; + + /* Note: +* Currently, IOMMU driver just flushes the whole IO/TLB for +* a given domain. So, just remove entries from the list here. +*/ + spin_lock_irqsave(>iotlb_flush_list_lock, flags); + list_for_each_entry_safe(entry, next, >iotlb_flush_list, list) { + list_del(>list); + kfree(entry); + } + spin_unlock_irqrestore(>iotlb_flush_list_lock, flags); + + domain_flush_tlb_pde(pdom); +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3185,6 +3257,9 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, .apply_resv_region = amd_iommu_apply_resv_region, .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, + .flush_iotlb_all = amd_iommu_flush_iotlb_all, + .iotlb_range_add = amd_iommu_iotlb_range_add, + .iotlb_sync = amd_iommu_iotlb_sync, }; /* diff --git
Re: [PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
On Fri, 17 Nov 2017 15:11:19 -0600 Suravee Suthikulpanitwrote: > From: Suravee Suthikulpanit > > VFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires > IOTLB flushing for every unmapping. This results in large IOTLB flushing > overhead when handling pass-through devices with a large number of mapped > IOVAs (e.g. GPUs). Of course the type of device is really irrelevant, QEMU maps the entire VM address space for any assigned device. > This can be avoided by using the new IOTLB flushing interface. > > Cc: Alex Williamson > Cc: Joerg Roedel > Signed-off-by: Suravee Suthikulpanit > --- > drivers/vfio/vfio_iommu_type1.c | 12 +--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 92155cc..28a7ab6 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, > struct vfio_dma *dma, > break; > } > > - unmapped = iommu_unmap(domain->domain, iova, len); > + unmapped = iommu_unmap_fast(domain->domain, iova, len); > if (WARN_ON(!unmapped)) > break; > > + iommu_tlb_range_add(domain->domain, iova, len); > + We should only add @unmapped, not @len, right? > unlocked += vfio_unpin_pages_remote(dma, iova, > phys >> PAGE_SHIFT, > unmapped >> PAGE_SHIFT, > @@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, > struct vfio_dma *dma, > > cond_resched(); > } > + iommu_tlb_sync(domain->domain); > > dma->iommu_mapped = false; > if (do_accounting) { > @@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, > dma_addr_t iova, > break; > } > > - for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) > - iommu_unmap(domain->domain, iova, PAGE_SIZE); > + for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) { > + iommu_unmap_fast(domain->domain, iova, PAGE_SIZE); > + iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE); > + } > + iommu_tlb_sync(domain->domain); > > return ret; > } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
From: Suravee SuthikulpanitVFIO IOMMU type1 currently upmaps IOVA pages synchronously, which requires IOTLB flushing for every unmapping. This results in large IOTLB flushing overhead when handling pass-through devices with a large number of mapped IOVAs (e.g. GPUs). This can be avoided by using the new IOTLB flushing interface. Cc: Alex Williamson Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- drivers/vfio/vfio_iommu_type1.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 92155cc..28a7ab6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -698,10 +698,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, break; } - unmapped = iommu_unmap(domain->domain, iova, len); + unmapped = iommu_unmap_fast(domain->domain, iova, len); if (WARN_ON(!unmapped)) break; + iommu_tlb_range_add(domain->domain, iova, len); + unlocked += vfio_unpin_pages_remote(dma, iova, phys >> PAGE_SHIFT, unmapped >> PAGE_SHIFT, @@ -710,6 +712,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, cond_resched(); } + iommu_tlb_sync(domain->domain); dma->iommu_mapped = false; if (do_accounting) { @@ -884,8 +887,11 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova, break; } - for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) - iommu_unmap(domain->domain, iova, PAGE_SIZE); + for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) { + iommu_unmap_fast(domain->domain, iova, PAGE_SIZE); + iommu_tlb_range_add(domain->domain, iova, PAGE_SIZE); + } + iommu_tlb_sync(domain->domain); return ret; } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] Reduce IOTLB flush when pass-through dGPU devices
From: Suravee SuthikulpanitCurrently, when pass-through dGPU to a guest VM, there are thousands of IOTLB flush commands sent from IOMMU to end-point-device. This cause performance issue when launching new VMs, and could cause IOTLB invalidate time-out issue on certain dGPUs. This can be avoided by adopting the new fast IOTLB flush APIs. Cc: Alex Williamson Cc: Joerg Roedel Suravee Suthikulpanit (2): vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs iommu/amd: Add support for fast IOTLB flushing drivers/iommu/amd_iommu.c | 77 - drivers/iommu/amd_iommu_init.c | 2 -- drivers/iommu/amd_iommu_types.h | 2 ++ drivers/vfio/vfio_iommu_type1.c | 12 +-- 4 files changed, 87 insertions(+), 6 deletions(-) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/amd: Add support for fast IOTLB flushing
From: Suravee SuthikulpanitImplement the newly added IOTLB flushing interface by introducing per-protection-domain IOTLB flush list, which maintains a list of IOVAs to be invalidated (by INVALIDATE_IOTLB_PAGES command) during IOTLB sync. Cc: Joerg Roedel Signed-off-by: Suravee Suthikulpanit --- drivers/iommu/amd_iommu.c | 77 - drivers/iommu/amd_iommu_init.c | 2 -- drivers/iommu/amd_iommu_types.h | 2 ++ 3 files changed, 78 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8e8874d..bf92809 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -130,6 +130,12 @@ struct dma_ops_domain { static struct iova_domain reserved_iova_ranges; static struct lock_class_key reserved_rbtree_key; +struct iotlb_flush_entry { + struct list_head list; + unsigned long iova; + size_t size; +}; + / * * Helper functions @@ -2838,11 +2844,13 @@ static void protection_domain_free(struct protection_domain *domain) static int protection_domain_init(struct protection_domain *domain) { spin_lock_init(>lock); + spin_lock_init(>iotlb_flush_list_lock); mutex_init(>api_lock); domain->id = domain_id_alloc(); if (!domain->id) return -ENOMEM; INIT_LIST_HEAD(>dev_list); + INIT_LIST_HEAD(>iotlb_flush_list); return 0; } @@ -3047,7 +3055,6 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, unmap_size = iommu_unmap_page(domain, iova, page_size); mutex_unlock(>api_lock); - domain_flush_tlb_pde(domain); domain_flush_complete(domain); return unmap_size; @@ -3167,6 +3174,71 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, return dev_data->defer_attach; } +static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct protection_domain *dom = to_pdomain(domain); + + domain_flush_tlb_pde(dom); +} + +static void amd_iommu_iotlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct iotlb_flush_entry *entry, *p; + unsigned long flags; + bool found = false; + + spin_lock_irqsave(>iotlb_flush_list_lock, flags); + list_for_each_entry(p, >iotlb_flush_list, list) { + if (iova != p->iova) + continue; + + if (size > p->size) { + p->size = size; + pr_debug("%s: update range: iova=%#lx, size = %#lx\n", +__func__, p->iova, p->size); + } + found = true; + break; + } + + if (!found) { + entry = kzalloc(sizeof(struct iotlb_flush_entry), + GFP_ATOMIC); + if (!entry) + return; + + pr_debug("%s: new range: iova=%lx, size=%#lx\n", +__func__, iova, size); + + entry->iova = iova; + entry->size = size; + list_add(>list, >iotlb_flush_list); + } + spin_unlock_irqrestore(>iotlb_flush_list_lock, flags); +} + +static void amd_iommu_iotlb_sync(struct iommu_domain *domain) +{ + struct protection_domain *pdom = to_pdomain(domain); + struct iotlb_flush_entry *entry, *next; + unsigned long flags; + + /* Note: +* Currently, IOMMU driver just flushes the whole IO/TLB for +* a given domain. So, just remove entries from the list here. +*/ + spin_lock_irqsave(>iotlb_flush_list_lock, flags); + list_for_each_entry_safe(entry, next, >iotlb_flush_list, list) { + list_del(>list); + kfree(entry); + } + spin_unlock_irqrestore(>iotlb_flush_list_lock, flags); + + domain_flush_tlb_pde(pdom); +} + const struct iommu_ops amd_iommu_ops = { .capable = amd_iommu_capable, .domain_alloc = amd_iommu_domain_alloc, @@ -3185,6 +3257,9 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, .apply_resv_region = amd_iommu_apply_resv_region, .is_attach_deferred = amd_iommu_is_attach_deferred, .pgsize_bitmap = AMD_IOMMU_PGSIZES, + .flush_iotlb_all = amd_iommu_flush_iotlb_all, + .iotlb_range_add = amd_iommu_iotlb_range_add, + .iotlb_sync = amd_iommu_iotlb_sync, }; /* diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 6fe2d03..1659377 100644 --- a/drivers/iommu/amd_iommu_init.c +++
[PATCH v3 12/16] iommu/vt-d: report unrecoverable device faults
Currently, when device DMA faults are detected by IOMMU the fault reasons are printed but the driver of the offending device is involved in fault handling. This patch uses per device fault reporting API to send fault event data for further processing. Offending device is identified by the source ID in VT-d fault reason report registers. Signed-off-by: Liu, Yi LSigned-off-by: Jacob Pan Signed-off-by: Ashok Raj --- drivers/iommu/dmar.c | 94 +++- 1 file changed, 93 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 38ee91b..b1f67fc2 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1555,6 +1555,31 @@ static const char *irq_remap_fault_reasons[] = "Blocked an interrupt request due to source-id verification failure", }; +/* fault data and status */ +enum intel_iommu_fault_reason { + INTEL_IOMMU_FAULT_REASON_SW, + INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT, + INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT, + INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID, + INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH, + INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS, + INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS, + INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID, + INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID, + INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID, + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_RTP, + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_CTP, + INTEL_IOMMU_FAULT_REASON_NONE_ZERO_PTE, + NR_INTEL_IOMMU_FAULT_REASON, +}; + +/* fault reasons that are allowed to be reported outside IOMMU subsystem */ +#define INTEL_IOMMU_FAULT_REASON_ALLOWED \ + ((1ULL << INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH) | \ + (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS) | \ + (1ULL << INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS)) + + static const char *dmar_get_fault_reason(u8 fault_reason, int *fault_type) { if (fault_reason >= 0x20 && (fault_reason - 0x20 < @@ -1635,6 +1660,69 @@ void dmar_msi_read(int irq, struct msi_msg *msg) raw_spin_unlock_irqrestore(>register_lock, flag); } +static enum iommu_fault_reason to_iommu_fault_reason(u8 reason) +{ + if (reason >= NR_INTEL_IOMMU_FAULT_REASON) { + pr_warn("unknown DMAR fault reason %d\n", reason); + return IOMMU_FAULT_REASON_UNKNOWN; + } + switch (reason) { + case INTEL_IOMMU_FAULT_REASON_SW: + case INTEL_IOMMU_FAULT_REASON_ROOT_NOT_PRESENT: + case INTEL_IOMMU_FAULT_REASON_CONTEXT_NOT_PRESENT: + case INTEL_IOMMU_FAULT_REASON_CONTEXT_INVALID: + case INTEL_IOMMU_FAULT_REASON_BEYOND_ADDR_WIDTH: + case INTEL_IOMMU_FAULT_REASON_ROOT_ADDR_INVALID: + case INTEL_IOMMU_FAULT_REASON_CONTEXT_PTR_INVALID: + return IOMMU_FAULT_REASON_INTERNAL; + case INTEL_IOMMU_FAULT_REASON_NEXT_PT_INVALID: + case INTEL_IOMMU_FAULT_REASON_PTE_WRITE_ACCESS: + case INTEL_IOMMU_FAULT_REASON_PTE_READ_ACCESS: + return IOMMU_FAULT_REASON_PERMISSION; + default: + return IOMMU_FAULT_REASON_UNKNOWN; + } +} + +static void report_fault_to_device(struct intel_iommu *iommu, u64 addr, int type, + int fault_type, enum intel_iommu_fault_reason reason, u16 sid) +{ + struct iommu_fault_event event; + struct pci_dev *pdev; + u8 bus, devfn; + + /* check if fault reason is worth reporting outside IOMMU */ + if (!((1 << reason) & INTEL_IOMMU_FAULT_REASON_ALLOWED)) { + pr_debug("Fault reason %d not allowed to report to device\n", + reason); + return; + } + + bus = PCI_BUS_NUM(sid); + devfn = PCI_DEVFN(PCI_SLOT(sid), PCI_FUNC(sid)); + /* +* we need to check if the fault reporting is requested for the +* offending device. +*/ + pdev = pci_get_bus_and_slot(bus, devfn); + if (!pdev) { + pr_warn("No PCI device found for source ID %x\n", sid); + return; + } + /* +* unrecoverable fault is reported per IOMMU, notifier handler can +* resolve PCI device based on source ID. +*/ + event.reason = to_iommu_fault_reason(reason); + event.addr = addr; + event.type = IOMMU_FAULT_DMA_UNRECOV; + event.prot = type ? IOMMU_READ : IOMMU_WRITE; + dev_warn(>dev, "report device unrecoverable fault: %d, %x, %d\n", + event.reason, sid, event.type); + iommu_report_device_fault(>dev, ); + pci_dev_put(pdev); +} + static int dmar_fault_do_one(struct intel_iommu *iommu, int type, u8 fault_reason, u16 source_id, unsigned long long addr) { @@ -1648,11 +1736,15 @@ static int dmar_fault_do_one(struct
[PATCH v3 14/16] iommu/intel-svm: replace dev ops with fault report API
With the introduction of generic IOMMU device fault reporting API, we can replace the private fault callback functions with standard function and event data. Signed-off-by: Jacob Pan--- drivers/iommu/intel-svm.c | 7 +-- include/linux/intel-svm.h | 20 +++- 2 files changed, 4 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 77c25d8..93b1849 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -283,7 +283,7 @@ static const struct mmu_notifier_ops intel_mmuops = { static DEFINE_MUTEX(pasid_mutex); -int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops) +int intel_svm_bind_mm(struct device *dev, int *pasid, int flags) { struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); struct intel_svm_dev *sdev; @@ -329,10 +329,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ list_for_each_entry(sdev, >devs, list) { if (dev == sdev->dev) { - if (sdev->ops != ops) { - ret = -EBUSY; - goto out; - } sdev->users++; goto success; } @@ -358,7 +354,6 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ } /* Finish the setup now we know we're keeping it */ sdev->users = 1; - sdev->ops = ops; init_rcu_head(>rcu); if (!svm) { diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index 99bc5b3..a39a502 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -18,18 +18,6 @@ struct device; -struct svm_dev_ops { - void (*fault_cb)(struct device *dev, int pasid, u64 address, -u32 private, int rwxp, int response); -}; - -/* Values for rxwp in fault_cb callback */ -#define SVM_REQ_READ (1<<3) -#define SVM_REQ_WRITE (1<<2) -#define SVM_REQ_EXEC (1<<1) -#define SVM_REQ_PRIV (1<<0) - - /* * The SVM_FLAG_PRIVATE_PASID flag requests a PASID which is *not* the "main" * PASID for the current process. Even if a PASID already exists, a new one @@ -60,7 +48,6 @@ struct svm_dev_ops { * @dev: Device to be granted acccess * @pasid: Address for allocated PASID * @flags: Flags. Later for requesting supervisor mode, etc. - * @ops: Callbacks to device driver * * This function attempts to enable PASID support for the given device. * If the @pasid argument is non-%NULL, a PASID is allocated for access @@ -82,8 +69,7 @@ struct svm_dev_ops { * Multiple calls from the same process may result in the same PASID * being re-used. A reference count is kept. */ -extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, -struct svm_dev_ops *ops); +extern int intel_svm_bind_mm(struct device *dev, int *pasid, int flags); /** * intel_svm_unbind_mm() - Unbind a specified PASID @@ -120,7 +106,7 @@ extern int intel_svm_is_pasid_valid(struct device *dev, int pasid); #else /* CONFIG_INTEL_IOMMU_SVM */ static inline int intel_svm_bind_mm(struct device *dev, int *pasid, - int flags, struct svm_dev_ops *ops) + int flags) { return -ENOSYS; } @@ -136,6 +122,6 @@ static int intel_svm_is_pasid_valid(struct device *dev, int pasid) } #endif /* CONFIG_INTEL_IOMMU_SVM */ -#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0, NULL)) +#define intel_svm_available(dev) (!intel_svm_bind_mm((dev), NULL, 0)) #endif /* __INTEL_SVM_H__ */ -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 11/16] iommu/vt-d: use threaded irq for dmar_fault
Currently, dmar fault IRQ handler does nothing more than rate limited printk, no critical hardware handling need to be done in IRQ context. Convert it to threaded IRQ would allow fault processing that requires process context. e.g. find out offending device based on source ID in the fault rasons. Signed-off-by: Jacob Pan--- drivers/iommu/dmar.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index f69f6ee..38ee91b 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1749,7 +1749,8 @@ int dmar_set_interrupt(struct intel_iommu *iommu) return -EINVAL; } - ret = request_irq(irq, dmar_fault, IRQF_NO_THREAD, iommu->name, iommu); + ret = request_threaded_irq(irq, NULL, dmar_fault, + IRQF_ONESHOT, iommu->name, iommu); if (ret) pr_err("Can't request irq\n"); return ret; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 08/16] iommu: introduce device fault data
Device faults detected by IOMMU can be reported outside IOMMU subsystem for further processing. This patch intends to provide a generic device fault data such that device drivers can be communicated with IOMMU faults without model specific knowledge. The proposed format is the result of discussion at: https://lkml.org/lkml/2017/11/10/291 Part of the code is based on Jean-Philippe Brucker's patchset (https://patchwork.kernel.org/patch/9989315/). The assumption is that model specific IOMMU driver can filter and handle most of the internal faults if the cause is within IOMMU driver control. Therefore, the fault reasons can be reported are grouped and generalized based common specifications such as PCI ATS. Signed-off-by: Jacob PanSigned-off-by: Liu, Yi L Signed-off-by: Ashok Raj --- include/linux/iommu.h | 108 +- 1 file changed, 106 insertions(+), 2 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index da684a7..dfda89b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -49,13 +49,17 @@ struct bus_type; struct device; struct iommu_domain; struct notifier_block; +struct iommu_fault_event; /* iommu fault flags */ -#define IOMMU_FAULT_READ 0x0 -#define IOMMU_FAULT_WRITE 0x1 +#define IOMMU_FAULT_READ (1 << 0) +#define IOMMU_FAULT_WRITE (1 << 1) +#define IOMMU_FAULT_EXEC (1 << 2) +#define IOMMU_FAULT_PRIV (1 << 3) typedef int (*iommu_fault_handler_t)(struct iommu_domain *, struct device *, unsigned long, int, void *); +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, void *); struct iommu_domain_geometry { dma_addr_t aperture_start; /* First address that can be mapped*/ @@ -264,6 +268,105 @@ struct iommu_device { struct device *dev; }; +enum iommu_model { + IOMMU_MODEL_INTEL = 1, + IOMMU_MODEL_AMD, + IOMMU_MODEL_SMMU3, +}; + +/* Generic fault types, can be expanded IRQ remapping fault */ +enum iommu_fault_type { + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */ + IOMMU_FAULT_PAGE_REQ, /* page request fault */ +}; + +enum iommu_fault_reason { + IOMMU_FAULT_REASON_UNKNOWN = 0, + + /* IOMMU internal error, no specific reason to report out */ + IOMMU_FAULT_REASON_INTERNAL, + + /* Could not access the PASID table */ + IOMMU_FAULT_REASON_PASID_FETCH, + + /* +* PASID is out of range (e.g. exceeds the maximum PASID +* supported by the IOMMU) or disabled. +*/ + IOMMU_FAULT_REASON_PASID_INVALID, + + /* Could not access the page directory (Invalid PASID entry) */ + IOMMU_FAULT_REASON_PGD_FETCH, + + /* Could not access the page table entry (Bad address) */ + IOMMU_FAULT_REASON_PTE_FETCH, + + /* Protection flag check failed */ + IOMMU_FAULT_REASON_PERMISSION, +}; + +/** + * struct iommu_fault_event - Generic per device fault data + * + * - PCI and non-PCI devices + * - Recoverable faults (e.g. page request), information based on PCI ATS + * and PASID spec. + * - Un-recoverable faults of device interest + * - DMA remapping and IRQ remapping faults + + * @type contains fault type. + * @reason fault reasons if relevant outside IOMMU driver, IOMMU driver internal + * faults are not reported + * @addr: tells the offending page address + * @pasid: contains process address space ID, used in shared virtual memory(SVM) + * @rid: requestor ID + * @page_req_group_id: page request group index + * @last_req: last request in a page request group + * @pasid_valid: indicates if the PRQ has a valid PASID + * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, IOMMU_FAULT_WRITE + * @device_private: if present, uniquely identify device-specific + * private data for an individual page request. + * @iommu_private: used by the IOMMU driver for storing fault-specific + * data. Users should not modify this field before + * sending the fault response. + */ +struct iommu_fault_event { + enum iommu_fault_type type; + enum iommu_fault_reason reason; + u64 addr; + u32 pasid; + u32 page_req_group_id : 9; + u32 last_req : 1; + u32 pasid_valid : 1; + u32 prot; + u64 device_private; + u64 iommu_private; +}; + +/** + * struct iommu_fault_param - per-device IOMMU fault data + * @dev_fault_handler: Callback function to handle IOMMU faults at device level + * @data: handler private data + * + */ +struct iommu_fault_param { + iommu_dev_fault_handler_t handler; + void *data; +}; + +/** + * struct iommu_param - collection of per-device IOMMU data + * + * @fault_param: IOMMU detected device fault reporting data + * + * TODO: migrate other per device data pointers
[PATCH v3 16/16] iommu/vt-d: add intel iommu page response function
This patch adds page response support for Intel VT-d. Generic response data is taken from the IOMMU API then parsed into VT-d specific response descriptor format. Signed-off-by: Jacob Pan--- drivers/iommu/intel-iommu.c | 30 ++ 1 file changed, 30 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index e1bd219..7f95827 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5171,6 +5171,35 @@ static int intel_iommu_sva_invalidate(struct iommu_domain *domain, return ret; } +int intel_iommu_page_response(struct iommu_domain *domain, struct device *dev, + struct page_response_msg *msg) +{ + struct qi_desc resp; + struct intel_iommu *iommu = dev_to_intel_iommu(dev); + + /* TODO: sanitize response message */ + if (msg->last_req) { + /* Page Group Response */ + resp.low = QI_PGRP_PASID(msg->pasid) | + QI_PGRP_DID(msg->did) | + QI_PGRP_PASID_P(msg->pasid_present) | + QI_PGRP_RESP_TYPE; + /* REVISIT: allow private data passing from device prq */ + resp.high = QI_PGRP_IDX(msg->page_req_group_id) | + QI_PGRP_PRIV(msg->private_data) | QI_PGRP_RESP_CODE(msg->resp_code); + } else { + /* Page Stream Response */ + resp.low = QI_PSTRM_IDX(msg->page_req_group_id) | + QI_PSTRM_PRIV(msg->private_data) | QI_PSTRM_BUS(PCI_BUS_NUM(msg->did)) | + QI_PSTRM_PASID(msg->pasid) | QI_PSTRM_RESP_TYPE; + resp.high = QI_PSTRM_ADDR(msg->paddr) | QI_PSTRM_DEVFN(msg->did & 0xff) | + QI_PSTRM_RESP_CODE(msg->resp_code); + } + qi_submit_sync(, iommu); + + return 0; +} + static int intel_iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t hpa, size_t size, int iommu_prot) @@ -5606,6 +5635,7 @@ const struct iommu_ops intel_iommu_ops = { .bind_pasid_table = intel_iommu_bind_pasid_table, .unbind_pasid_table = intel_iommu_unbind_pasid_table, .sva_invalidate = intel_iommu_sva_invalidate, + .page_response = intel_iommu_page_response, #endif .map= intel_iommu_map, .unmap = intel_iommu_unmap, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 13/16] iommu/intel-svm: notify page request to guest
If the source device of a page request has its PASID table pointer bond to a guest, the first level page tables are owned by the guest. In this case, we shall let guest OS to manage page fault. This patch uses the IOMMU fault notification API to send notifications, possibly via VFIO, to the guest OS. Once guest pages are fault in, guest will issue page response which will be passed down via the invalidation passdown APIs. Signed-off-by: Jacob PanSigned-off-by: Ashok Raj --- drivers/iommu/intel-svm.c | 80 ++- include/linux/iommu.h | 1 + 2 files changed, 74 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index f6697e5..77c25d8 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -555,6 +555,71 @@ static bool is_canonical_address(u64 addr) return (((saddr << shift) >> shift) == saddr); } +static int prq_to_iommu_prot(struct page_req_dsc *req) +{ + int prot = 0; + + if (req->rd_req) + prot |= IOMMU_FAULT_READ; + if (req->wr_req) + prot |= IOMMU_FAULT_WRITE; + if (req->exe_req) + prot |= IOMMU_FAULT_EXEC; + if (req->priv_req) + prot |= IOMMU_FAULT_PRIV; + + return prot; +} + +static int intel_svm_prq_report(struct device *dev, struct page_req_dsc *desc) +{ + int ret = 0; + struct iommu_fault_event event; + struct pci_dev *pdev; + + /** +* If caller does not provide struct device, this is the case where +* guest PASID table is bound to the device. So we need to retrieve +* struct device from the page request descriptor then proceed. +*/ + if (!dev) { + pdev = pci_get_bus_and_slot(desc->bus, desc->devfn); + if (!pdev) { + pr_err("No PCI device found for PRQ [%02x:%02x.%d]\n", + desc->bus, PCI_SLOT(desc->devfn), + PCI_FUNC(desc->devfn)); + return -ENODEV; + } + dev = >dev; + } else if (dev_is_pci(dev)) { + pdev = to_pci_dev(dev); + pci_dev_get(pdev); + } else + return -ENODEV; + + pr_debug("Notify PRQ device [%02x:%02x.%d]\n", + desc->bus, PCI_SLOT(desc->devfn), + PCI_FUNC(desc->devfn)); + + /* invoke device fault handler if registered */ + if (iommu_has_device_fault_handler(dev)) { + /* Fill in event data for device specific processing */ + event.type = IOMMU_FAULT_PAGE_REQ; + event.addr = desc->addr; + event.pasid = desc->pasid; + event.page_req_group_id = desc->prg_index; + event.prot = prq_to_iommu_prot(desc); + event.last_req = desc->lpig; + event.pasid_valid = 1; + event.iommu_private = desc->private; + ret = iommu_report_device_fault(>dev, ); + } + + pci_dev_put(pdev); + + return ret; +} + static irqreturn_t prq_event_thread(int irq, void *d) { struct intel_iommu *iommu = d; @@ -578,7 +643,12 @@ static irqreturn_t prq_event_thread(int irq, void *d) handled = 1; req = >prq[head / sizeof(*req)]; - + /** +* If prq is to be handled outside iommu driver via receiver of +* the fault notifiers, we skip the page response here. +*/ + if (!intel_svm_prq_report(NULL, req)) + goto prq_advance; result = QI_RESP_FAILURE; address = (u64)req->addr << VTD_PAGE_SHIFT; if (!req->pasid_present) { @@ -649,11 +719,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) if (WARN_ON(>list == >devs)) sdev = NULL; - if (sdev && sdev->ops && sdev->ops->fault_cb) { - int rwxp = (req->rd_req << 3) | (req->wr_req << 2) | - (req->exe_req << 1) | (req->priv_req); - sdev->ops->fault_cb(sdev->dev, req->pasid, req->addr, req->private, rwxp, result); - } + intel_svm_prq_report(sdev->dev, req); /* We get here in the error case where the PASID lookup failed, and these can be NULL. Do not use them below this point! */ sdev = NULL; @@ -679,7 +745,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) qi_submit_sync(, iommu); } - + prq_advance: head = (head + sizeof(*req)) & PRQ_RING_MASK; } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 841c044..3083796b 100644 --- a/include/linux/iommu.h +++
[PATCH v3 09/16] driver core: add iommu device fault reporting data
DMA faults can be detected by IOMMU at device level. Adding a pointer to struct device allows IOMMU subsystem to report relevant faults back to the device driver for further handling. For direct assigned device (or user space drivers), guest OS holds responsibility to handle and respond per device IOMMU fault. Therefore we need fault reporting mechanism to propagate faults beyond IOMMU subsystem. There are two other IOMMU data pointers under struct device today, here we introduce iommu_param as a parent pointer such that all device IOMMU data can be consolidated here. The idea was suggested here by Greg KH and Joerg. The name iommu_param is chosen here since iommu_data has been used. Suggested-by: Greg Kroah-HartmanSigned-off-by: Jacob Pan Link: https://lkml.org/lkml/2017/10/6/81 --- include/linux/device.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index 66fe271..540e5e5 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -42,6 +42,7 @@ struct fwnode_handle; struct iommu_ops; struct iommu_group; struct iommu_fwspec; +struct iommu_param; struct bus_attribute { struct attributeattr; @@ -871,6 +872,7 @@ struct dev_links_info { * device (i.e. the bus driver that discovered the device). * @iommu_group: IOMMU group the device belongs to. * @iommu_fwspec: IOMMU-specific properties supplied by firmware. + * @iommu_param: Per device generic IOMMU runtime data * * @offline_disabled: If set, the device is permanently online. * @offline: Set after successful invocation of bus type's .offline(). @@ -960,6 +962,7 @@ struct device { void(*release)(struct device *dev); struct iommu_group *iommu_group; struct iommu_fwspec *iommu_fwspec; + struct iommu_param *iommu_param; booloffline_disabled:1; booloffline:1; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 10/16] iommu: introduce device fault report API
Traditionally, device specific faults are detected and handled within their own device drivers. When IOMMU is enabled, faults such as DMA related transactions are detected by IOMMU. There is no generic reporting mechanism to report faults back to the in-kernel device driver or the guest OS in case of assigned devices. Faults detected by IOMMU is based on the transaction's source ID which can be reported at per device basis, regardless of the device type is a PCI device or not. The fault types include recoverable (e.g. page request) and unrecoverable faults(e.g. access error). In most cases, faults can be handled by IOMMU drivers internally. The primary use cases are as follows: 1. page request fault originated from an SVM capable device that is assigned to guest via vIOMMU. In this case, the first level page tables are owned by the guest. Page request must be propagated to the guest to let guest OS fault in the pages then send page response. In this mechanism, the direct receiver of IOMMU fault notification is VFIO, which can relay notification events to QEMU or other user space software. 2. faults need more subtle handling by device drivers. Other than simply invoke reset function, there are needs to let device driver handle the fault with a smaller impact. This patchset is intended to create a generic fault report API such that it can scale as follows: - all IOMMU types - PCI and non-PCI devices - recoverable and unrecoverable faults - VFIO and other other in kernel users - DMA & IRQ remapping (TBD) The original idea was brought up by David Woodhouse and discussions summarized at https://lwn.net/Articles/608914/. Signed-off-by: Jacob PanSigned-off-by: Ashok Raj --- drivers/iommu/iommu.c | 63 ++- include/linux/iommu.h | 36 + 2 files changed, 98 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 829e9e9..97b7990 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -581,6 +581,12 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev) goto err_free_name; } + dev->iommu_param = kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL); + if (!dev->iommu_param) { + ret = -ENOMEM; + goto err_free_name; + } + kobject_get(group->devices_kobj); dev->iommu_group = group; @@ -657,7 +663,7 @@ void iommu_group_remove_device(struct device *dev) sysfs_remove_link(>kobj, "iommu_group"); trace_remove_device_from_group(group->id, dev); - + kfree(dev->iommu_param); kfree(device->name); kfree(device); dev->iommu_group = NULL; @@ -791,6 +797,61 @@ int iommu_group_unregister_notifier(struct iommu_group *group, } EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); +int iommu_register_device_fault_handler(struct device *dev, + iommu_dev_fault_handler_t handler, + void *data) +{ + struct iommu_param *idata = dev->iommu_param; + + /* +* Device iommu_param should have been allocated when device is +* added to its iommu_group. +*/ + if (!idata) + return -EINVAL; + /* Only allow one fault handler registered for each device */ + if (idata->fault_param) + return -EBUSY; + get_device(dev); + idata->fault_param = + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL); + if (!idata->fault_param) + return -ENOMEM; + idata->fault_param->handler = handler; + idata->fault_param->data = data; + + return 0; +} +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler); + +int iommu_unregister_device_fault_handler(struct device *dev) +{ + struct iommu_param *idata = dev->iommu_param; + + if (!idata) + return -EINVAL; + + kfree(idata->fault_param); + idata->fault_param = NULL; + put_device(dev); + + return 0; +} +EXPORT_SYMBOL_GPL(iommu_unregister_device_fault_handler); + + +int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt) +{ + /* we only report device fault if there is a handler registered */ + if (!dev->iommu_param || !dev->iommu_param->fault_param || + !dev->iommu_param->fault_param->handler) + return -ENOSYS; + + return dev->iommu_param->fault_param->handler(evt, + dev->iommu_param->fault_param->data); +} +EXPORT_SYMBOL_GPL(iommu_report_device_fault); + /** * iommu_group_id - Return ID for a group * @group: the group to ID diff --git a/include/linux/iommu.h b/include/linux/iommu.h index dfda89b..841c044 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -463,6 +463,14 @@ extern int
[PATCH v3 06/16] iommu/vt-d: add svm/sva invalidate function
This patch adds Intel VT-d specific function to implement iommu passdown invalidate API for shared virtual address. The use case is for supporting caching structure invalidation of assigned SVM capable devices. Emulated IOMMU exposes queue invalidation capability and passes down all descriptors from the guest to the physical IOMMU. The assumption is that guest to host device ID mapping should be resolved prior to calling IOMMU driver. Based on the device handle, host IOMMU driver can replace certain fields before submit to the invalidation queue. Signed-off-by: Liu, Yi LSigned-off-by: Jacob Pan Signed-off-by: Ashok Raj --- drivers/iommu/intel-iommu.c | 200 +++- include/linux/intel-iommu.h | 17 +++- 2 files changed, 211 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 556bdd2..000b2b3 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4981,6 +4981,183 @@ static void intel_iommu_detach_device(struct iommu_domain *domain, dmar_remove_one_dev_info(to_dmar_domain(domain), dev); } +/* + * 3D array for converting IOMMU generic type-granularity to VT-d granularity + * X indexed by enum iommu_inv_type + * Y indicates request without and with PASID + * Z indexed by enum enum iommu_inv_granularity + * + * For an example, if we want to find the VT-d granularity encoding for IOTLB + * type, DMA request with PASID, and page selective. The look up indices are: + * [1][1][8], where + * 1: IOMMU_INV_TYPE_TLB + * 1: with PASID + * 8: IOMMU_INV_GRANU_PAGE_PASID + * + */ +const static int inv_type_granu_map[IOMMU_INV_NR_TYPE][2][IOMMU_INV_NR_GRANU] = { + /* extended dev IOTLBs, for dev-IOTLB, only global is valid, + for dev-EXIOTLB, two valid granu */ + { + {1}, + {0, 0, 0, 0, 1, 1, 0, 0, 0} + }, + /* IOTLB and EIOTLB */ + { + {1, 1, 0, 1, 0, 0, 0, 0, 0}, + {0, 0, 0, 0, 1, 0, 1, 1, 1} + }, + /* PASID cache */ + { + {0}, + {0, 0, 0, 0, 1, 1, 0, 0, 0} + }, + /* context cache */ + { + {1, 1, 1} + } +}; + +const static u64 inv_type_granu_table[IOMMU_INV_NR_TYPE][2][IOMMU_INV_NR_GRANU] = { + /* extended dev IOTLBs, only global is valid */ + { + {QI_DEV_IOTLB_GRAN_ALL}, + {0, 0, 0, 0, QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0, 0, 0} + }, + /* IOTLB and EIOTLB */ + { + {DMA_TLB_GLOBAL_FLUSH, DMA_TLB_DSI_FLUSH, 0, DMA_TLB_PSI_FLUSH}, + {0, 0, 0, 0, QI_GRAN_ALL_ALL, 0, QI_GRAN_NONG_ALL, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID} + }, + /* PASID cache */ + { + {0}, + {0, 0, 0, 0, QI_PC_ALL_PASIDS, QI_PC_PASID_SEL} + }, + /* context cache */ + { + {DMA_CCMD_GLOBAL_INVL, DMA_CCMD_DOMAIN_INVL, DMA_CCMD_DEVICE_INVL} + } +}; + +static inline int to_vtd_granularity(int type, int granu, int with_pasid, u64 *vtd_granu) +{ + if (type >= IOMMU_INV_NR_TYPE || granu >= IOMMU_INV_NR_GRANU || with_pasid > 1) + return -EINVAL; + + if (inv_type_granu_map[type][with_pasid][granu] == 0) + return -EINVAL; + + *vtd_granu = inv_type_granu_table[type][with_pasid][granu]; + + return 0; +} + +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + struct intel_iommu *iommu; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct device_domain_info *info; + struct pci_dev *pdev; + u16 did, sid, pfsid; + u8 bus, devfn; + int ret = 0; + u64 granu; + unsigned long flags; + + if (!inv_info || !dmar_domain) + return -EINVAL; + + iommu = device_to_iommu(dev, , ); + if (!iommu) + return -ENODEV; + + if (!dev || !dev_is_pci(dev)) + return -ENODEV; + + did = dmar_domain->iommu_did[iommu->seq_id]; + sid = PCI_DEVID(bus, devfn); + ret = to_vtd_granularity(inv_info->hdr.type, inv_info->granularity, + !!(inv_info->flags & IOMMU_INVALIDATE_PASID_TAGGED), ); + if (ret) { + pr_err("Invalid range type %d, granu %d\n", inv_info->hdr.type, + inv_info->granularity); + return ret; + } + + spin_lock(>lock); + spin_lock_irqsave(_domain_lock, flags); + + switch (inv_info->hdr.type) { + case IOMMU_INV_TYPE_CONTEXT: + iommu->flush.flush_context(iommu, did, sid, + DMA_CCMD_MASK_NOBIT, granu); + break; + case IOMMU_INV_TYPE_TLB: +
[PATCH v3 15/16] iommu: introduce page response function
When nested translation is turned on and guest owns the first level page tables, device page request can be forwared to the guest for handling faults. As the page response returns by the guest, IOMMU driver on the host need to process the response which informs the device and completes the page request transaction. This patch introduces generic API function for page response passing from the guest or other in-kernel users. The definitions of the generic data is based on PCI ATS specification not limited to any vendor. Signed-off-by: Jacob Pan--- drivers/iommu/iommu.c | 14 ++ include/linux/iommu.h | 42 ++ 2 files changed, 56 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 97b7990..7aefb40 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1416,6 +1416,20 @@ int iommu_sva_invalidate(struct iommu_domain *domain, } EXPORT_SYMBOL_GPL(iommu_sva_invalidate); +int iommu_page_response(struct iommu_domain *domain, struct device *dev, + struct page_response_msg *msg) +{ + int ret = 0; + + if (unlikely(!domain->ops->page_response)) + return -ENODEV; + + ret = domain->ops->page_response(domain, dev, msg); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_page_response); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3083796b..17f698b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -163,6 +163,43 @@ struct iommu_resv_region { #ifdef CONFIG_IOMMU_API +enum page_response_type { + IOMMU_PAGE_STREAM_RESP = 1, + IOMMU_PAGE_GROUP_RESP, +}; + +/** + * Generic page response information based on PCI ATS and PASID spec. + * @paddr: servicing page address + * @pasid: contains process address space ID, used in shared virtual memory(SVM) + * @rid: requestor ID + * @did: destination device ID + * @last_req: last request in a page request group + * @resp_code: response code + * @page_req_group_id: page request group index + * @prot: page access protection flag, e.g. IOMMU_FAULT_READ, IOMMU_FAULT_WRITE + * @type: group or stream response + * @private_data: uniquely identify device-specific private data for an + *individual page response + + */ +struct page_response_msg { + u64 paddr; + u32 pasid; + u32 rid:16; + u32 did:16; + u32 resp_code:4; + u32 last_req:1; + u32 pasid_present:1; +#define IOMMU_PAGE_RESP_SUCCESS0 +#define IOMMU_PAGE_RESP_INVALID1 +#define IOMMU_PAGE_RESP_FAILURE0xF + u32 page_req_group_id : 9; + u32 prot; + enum page_response_type type; + u32 private_data; +}; + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability @@ -196,6 +233,7 @@ struct iommu_resv_region { * @bind_pasid_table: bind pasid table pointer for guest SVM * @unbind_pasid_table: unbind pasid table pointer and restore defaults * @sva_invalidate: invalidate translation caches of shared virtual address + * @page_response: handle page request response */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -251,6 +289,8 @@ struct iommu_ops { struct device *dev); int (*sva_invalidate)(struct iommu_domain *domain, struct device *dev, struct tlb_invalidate_info *inv_info); + int (*page_response)(struct iommu_domain *domain, struct device *dev, + struct page_response_msg *msg); unsigned long pgsize_bitmap; }; @@ -472,6 +512,8 @@ extern int iommu_unregister_device_fault_handler(struct device *dev); extern int iommu_report_device_fault(struct device *dev, struct iommu_fault_event *evt); +extern int iommu_page_response(struct iommu_domain *domain, struct device *dev, + struct page_response_msg *msg); extern int iommu_group_id(struct iommu_group *group); extern struct iommu_group *iommu_group_get_for_dev(struct device *dev); extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *); -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 05/16] iommu/vt-d: support flushing more TLB types
With shared virtual memory vitualization, extended IOTLB invalidation may be passed down from outside IOMMU subsystems. This patch adds invalidation functions that can be used for each IOTLB types. Signed-off-by: Jacob PanSigned-off-by: Liu, Yi L Signed-off-by: Ashok Raj --- drivers/iommu/dmar.c| 54 ++--- drivers/iommu/intel-iommu.c | 3 ++- include/linux/intel-iommu.h | 10 +++-- 3 files changed, 61 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 57c920c..f69f6ee 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1336,11 +1336,25 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, qi_submit_sync(, iommu); } -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, - u64 addr, unsigned mask) +void qi_flush_eiotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid, + unsigned int size_order, u64 granu, bool global) { struct qi_desc desc; + desc.low = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) | + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE; + desc.high = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_GL(global) | + QI_EIOTLB_IH(0) | QI_EIOTLB_AM(size_order); + qi_submit_sync(, iommu); +} + +void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, + u16 qdep, u64 addr, unsigned mask) +{ + struct qi_desc desc; + + pr_debug_ratelimited("%s: sid %d, pfsid %d, qdep %d, addr %llx, mask %d\n", + __func__, sid, pfsid, qdep, addr, mask); if (mask) { BUG_ON(addr & ((1 << (VTD_PAGE_SHIFT + mask)) - 1)); addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1; @@ -1352,7 +1366,41 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, qdep = 0; desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) | - QI_DIOTLB_TYPE; + QI_DIOTLB_TYPE | QI_DEV_IOTLB_SID(pfsid); + + qi_submit_sync(, iommu); +} + +void qi_flush_dev_eiotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, + u32 pasid, u16 qdep, u64 addr, unsigned size, u64 granu) +{ + struct qi_desc desc; + + desc.low = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) | + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE | + QI_DEV_EIOTLB_PFSID(pfsid); + desc.high |= QI_DEV_EIOTLB_GLOB(granu); + + /* If S bit is 0, we only flush a single page. If S bit is set, +* The least significant zero bit indicates the size. VT-d spec +* 6.5.2.6 +*/ + if (!size) + desc.high = QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE; + else { + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size); + + desc.high = QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE; + } + qi_submit_sync(, iommu); +} + +void qi_flush_pasid(struct intel_iommu *iommu, u16 did, u64 granu, int pasid) +{ + struct qi_desc desc; + + desc.high = 0; + desc.low = QI_PC_TYPE | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_PASID(pasid); qi_submit_sync(, iommu); } diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 399b504..556bdd2 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1524,7 +1524,8 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, sid = info->bus << 8 | info->devfn; qdep = info->ats_qdep; - qi_flush_dev_iotlb(info->iommu, sid, qdep, addr, mask); + qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, + qdep, addr, mask); } spin_unlock_irqrestore(_domain_lock, flags); } diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 8d38e24..3c83f7e 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -305,6 +305,7 @@ enum { #define QI_DEV_EIOTLB_PASID(p) (((u64)p) << 32) #define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0x) << 16) #define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) +#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xff0) << 48)) #define QI_DEV_EIOTLB_MAX_INVS 32 #define QI_PGRP_IDX(idx) (((u64)(idx)) << 55) @@ -496,8 +497,13 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type); extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int size_order, u64 type); -extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 qdep, - u64 addr, unsigned mask); +extern void qi_flush_eiotlb(struct intel_iommu
[PATCH v3 07/16] iommu/vt-d: assign PFSID in device TLB invalidation
When SRIOV VF device IOTLB is invalidated, we need to provide the PF source SID such that IOMMU hardware can gauge the depth of invalidation queue which is shared among VFs. This is needed when device invalidation throttle (DIT) capability is supported. Signed-off-by: Jacob Pan--- drivers/iommu/intel-iommu.c | 13 + include/linux/intel-iommu.h | 3 +++ 2 files changed, 16 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 000b2b3..e1bd219 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1459,6 +1459,19 @@ static void iommu_enable_dev_iotlb(struct device_domain_info *info) return; pdev = to_pci_dev(info->dev); + /* For IOMMU that supports device IOTLB throttling (DIT), we assign +* PFSID to the invalidation desc of a VF such that IOMMU HW can gauge +* queue depth at PF level. If DIT is not set, PFSID will be treated as +* reserved, which should be set to 0. +*/ + if (!ecap_dit(info->iommu->ecap)) + info->pfsid = 0; + else if (pdev && pdev->is_virtfn) { + if (ecap_dit(info->iommu->ecap)) + dev_warn(>dev, "SRIOV VF device IOTLB enabled without flow control\n"); + info->pfsid = PCI_DEVID(pdev->physfn->bus->number, pdev->physfn->devfn); + } else + info->pfsid = PCI_DEVID(info->bus, info->devfn); #ifdef CONFIG_INTEL_IOMMU_SVM /* The PCIe spec, in its wisdom, declares that the behaviour of diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 7f05e36..6956a4e 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -112,6 +112,7 @@ * Extended Capability Register */ +#define ecap_dit(e)((e >> 41) & 0x1) #define ecap_pasid(e) ((e >> 40) & 0x1) #define ecap_pss(e)((e >> 35) & 0x1f) #define ecap_eafs(e) ((e >> 34) & 0x1) @@ -285,6 +286,7 @@ enum { #define QI_DEV_IOTLB_SID(sid) ((u64)((sid) & 0x) << 32) #define QI_DEV_IOTLB_QDEP(qdep)(((qdep) & 0x1f) << 16) #define QI_DEV_IOTLB_ADDR(addr)((u64)(addr) & VTD_PAGE_MASK) +#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | ((u64)(pfsid & 0xff0) << 48)) #define QI_DEV_IOTLB_SIZE 1 #define QI_DEV_IOTLB_MAX_INVS 32 @@ -475,6 +477,7 @@ struct device_domain_info { struct list_head global; /* link to global list */ u8 bus; /* PCI bus number */ u8 devfn; /* PCI devfn number */ + u16 pfsid; /* SRIOV physical function source ID */ u8 pasid_supported:3; u8 pasid_enabled:1; u8 pri_supported:1; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 00/16] [PATCH v3 00/16] IOMMU driver support for SVM virtualization
Hi All, Shared virtual memory (SVM), or more precisely shared virtual address (SVA), between device DMA and applications can reduce programming complexity and enhance security. To enable SVM in the guest, i.e. shared guest application address space and physical device DMA address, IOMMU driver must provide some new functionalities. This patchset is a follow-up on the discussions held at LPC 2017 VFIO/IOMMU/PCI track. Slides and notes can be found here: https://linuxplumbersconf.org/2017/ocw/events/LPC2017/tracks/636 The complete guest SVM support also involves changes in QEMU and VFIO, which has been posted earlier. https://www.spinics.net/lists/kvm/msg148798.html This is the IOMMU portion follow up of the more complete series of the kernel changes to support vSVM. Please refer to the link below for more details. https://www.spinics.net/lists/kvm/msg148819.html Generic APIs are introduced in addition to Intel VT-d specific changes, the goal is to have common interfaces across IOMMU and device types for both VFIO and other in-kernel users. At the top level, new IOMMU interfaces are introduced as follows: - bind guest PASID table - passdown invalidations of translation caches - IOMMU device fault reporting including page request/response and non-recoverable faults. For IOMMU detected device fault reporting, struct device is extended to provide callback and tracking at device level. The original proposal was discussed here "Error handling for I/O memory management units" (https://lwn.net/Articles/608914/). I have experimented two alternative solutions: 1. use a shared group notifier, this does not scale well also causes unwanted notification traffic when group sibling device is reported with faults. 2. place fault callback at device IOMMU arch data, e.g. device_domain_info in Intel/FSL IOMMU driver. This will cause code duplication, since per device fault reporting is generic. The additional patches are Intel VT-d specific, which either implements or replaces existing private interfaces with the generic ones. This patchset is based on the work and ideas from many people, especially: Ashok RajLiu, Yi L Jean-Philippe Brucker Thanks, Jacob V3 - Consolidated fault reporting data format based on discussions on v2, including input from ARM and AMD. - Renamed invalidation APIs from svm to sva based on discussions on v2 - Use a parent pointer under struct device for all iommu per device data - Simplified device fault callback, allow driver private data to be registered. This might make it easy to replace domain fault handler. V2 - Replaced hybrid interface data model (generic data + vendor specific data) with all generic data. This will have the security benefit where data passed from user space can be sanitized by all software layers if needed. - Addressed review comments from V1 - Use per device fault report data - Support page request/response communications between host IOMMU and guest or other in-kernel users. - Added unrecoverable fault reporting to DMAR - Use threaded IRQ function for DMAR fault interrupt and fault reporting Jacob Pan (15): iommu: introduce bind_pasid_table API function iommu/vt-d: add bind_pasid_table function iommu/vt-d: move device_domain_info to header iommu/vt-d: support flushing more TLB types iommu/vt-d: add svm/sva invalidate function iommu/vt-d: assign PFSID in device TLB invalidation iommu: introduce device fault data driver core: add iommu device fault reporting data iommu: introduce device fault report API iommu/vt-d: use threaded irq for dmar_fault iommu/vt-d: report unrecoverable device faults iommu/intel-svm: notify page request to guest iommu/intel-svm: replace dev ops with fault report API iommu: introduce page response function iommu/vt-d: add intel iommu page response function Liu, Yi L (1): iommu: introduce iommu invalidate API function drivers/iommu/dmar.c | 151 - drivers/iommu/intel-iommu.c | 365 +++--- drivers/iommu/intel-svm.c | 87 -- drivers/iommu/iommu.c | 110 - include/linux/device.h| 3 + include/linux/dma_remapping.h | 1 + include/linux/intel-iommu.h | 47 +- include/linux/intel-svm.h | 20 +-- include/linux/iommu.h | 223 +- include/uapi/linux/iommu.h| 101 10 files changed, 1047 insertions(+), 61 deletions(-) create mode 100644 include/uapi/linux/iommu.h -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 02/16] iommu/vt-d: add bind_pasid_table function
Add Intel VT-d ops to the generic iommu_bind_pasid_table API functions. The primary use case is for direct assignment of SVM capable device. Originated from emulated IOMMU in the guest, the request goes through many layers (e.g. VFIO). Upon calling host IOMMU driver, caller passes guest PASID table pointer (GPA) and size. Device context table entry is modified by Intel IOMMU specific bind_pasid_table function. This will turn on nesting mode and matching translation type. The unbind operation restores default context mapping. Signed-off-by: Jacob PanSigned-off-by: Liu, Yi L Signed-off-by: Ashok Raj --- drivers/iommu/intel-iommu.c | 107 ++ include/linux/dma_remapping.h | 1 + 2 files changed, 108 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2087cd8..3d1901d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5176,6 +5176,7 @@ static void intel_iommu_put_resv_regions(struct device *dev, #ifdef CONFIG_INTEL_IOMMU_SVM #define MAX_NR_PASID_BITS (20) +#define MIN_NR_PASID_BITS (5) static inline unsigned long intel_iommu_get_pts(struct intel_iommu *iommu) { /* @@ -5302,6 +5303,108 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev) return iommu; } + +static int intel_iommu_bind_pasid_table(struct iommu_domain *domain, + struct device *dev, struct pasid_table_config *pasidt_binfo) +{ + struct intel_iommu *iommu; + struct context_entry *context; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + struct device_domain_info *info; + struct pci_dev *pdev; + u8 bus, devfn, host_table_pasid_bits; + u16 did, sid; + int ret = 0; + unsigned long flags; + u64 ctx_lo; + + iommu = device_to_iommu(dev, , ); + if (!iommu) + return -ENODEV; + /* VT-d spec 9.4 says pasid table size is encoded as 2^(x+5) */ + host_table_pasid_bits = intel_iommu_get_pts(iommu) + MIN_NR_PASID_BITS; + if (!pasidt_binfo || pasidt_binfo->pasid_bits > host_table_pasid_bits || + pasidt_binfo->pasid_bits < MIN_NR_PASID_BITS) { + pr_err("Invalid gPASID bits %d, host range %d - %d\n", + pasidt_binfo->pasid_bits, + MIN_NR_PASID_BITS, host_table_pasid_bits); + return -ERANGE; + } + + pdev = to_pci_dev(dev); + sid = PCI_DEVID(bus, devfn); + info = dev->archdata.iommu; + + if (!info) { + dev_err(dev, "Invalid device domain info\n"); + ret = -EINVAL; + goto out; + } + if (!info->pasid_enabled) { + ret = pci_enable_pasid(pdev, info->pasid_supported & ~1); + if (ret) { + dev_err(dev, "Failed to enable PASID\n"); + goto out; + } + } + if (!device_context_mapped(iommu, bus, devfn)) { + pr_warn("ctx not mapped for bus devfn %x:%x\n", bus, devfn); + ret = -EINVAL; + goto out; + } + spin_lock_irqsave(>lock, flags); + context = iommu_context_addr(iommu, bus, devfn, 0); + if (!context) { + ret = -EINVAL; + goto out_unlock; + } + + /* Anticipate guest to use SVM and owns the first level, so we turn +* nested mode on +*/ + ctx_lo = context[0].lo; + ctx_lo |= CONTEXT_NESTE | CONTEXT_PRS | CONTEXT_PASIDE; + ctx_lo &= ~CONTEXT_TT_MASK; + ctx_lo |= CONTEXT_TT_DEV_IOTLB << 2; + context[0].lo = ctx_lo; + + /* Assign guest PASID table pointer and size order */ + ctx_lo = (pasidt_binfo->base_ptr & VTD_PAGE_MASK) | + (pasidt_binfo->pasid_bits - MIN_NR_PASID_BITS); + context[1].lo = ctx_lo; + /* make sure context entry is updated before flushing */ + wmb(); + did = dmar_domain->iommu_did[iommu->seq_id]; + iommu->flush.flush_context(iommu, did, + (((u16)bus) << 8) | devfn, + DMA_CCMD_MASK_NOBIT, + DMA_CCMD_DEVICE_INVL); + iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); + +out_unlock: + spin_unlock_irqrestore(>lock, flags); +out: + return ret; +} + +static void intel_iommu_unbind_pasid_table(struct iommu_domain *domain, + struct device *dev) +{ + struct intel_iommu *iommu; + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + u8 bus, devfn; + + assert_spin_locked(_domain_lock); + iommu = device_to_iommu(dev, , ); + if (!iommu) { + dev_err(dev, "No IOMMU for device to unbind PASID table\n"); + return; + } + +
[PATCH v3 03/16] iommu: introduce iommu invalidate API function
From: "Liu, Yi L"When an SVM capable device is assigned to a guest, the first level page tables are owned by the guest and the guest PASID table pointer is linked to the device context entry of the physical IOMMU. Host IOMMU driver has no knowledge of caching structure updates unless the guest invalidation activities are passed down to the host. The primary usage is derived from emulated IOMMU in the guest, where QEMU can trap invalidation activities before passing them down to the host/physical IOMMU. Since the invalidation data are obtained from user space and will be written into physical IOMMU, we must allow security check at various layers. Therefore, generic invalidation data format are proposed here, model specific IOMMU drivers need to convert them into their own format. Signed-off-by: Liu, Yi L Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj --- drivers/iommu/iommu.c | 14 +++ include/linux/iommu.h | 12 + include/uapi/linux/iommu.h | 62 ++ 3 files changed, 88 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c7e0d64..829e9e9 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1341,6 +1341,20 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); +int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + int ret = 0; + + if (unlikely(!domain->ops->sva_invalidate)) + return -ENODEV; + + ret = domain->ops->sva_invalidate(domain, dev, inv_info); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_sva_invalidate); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0f6f6c5..da684a7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -190,6 +190,7 @@ struct iommu_resv_region { * @pgsize_bitmap: bitmap of all possible supported page sizes * @bind_pasid_table: bind pasid table pointer for guest SVM * @unbind_pasid_table: unbind pasid table pointer and restore defaults + * @sva_invalidate: invalidate translation caches of shared virtual address */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -243,6 +244,8 @@ struct iommu_ops { struct pasid_table_config *pasidt_binfo); void (*unbind_pasid_table)(struct iommu_domain *domain, struct device *dev); + int (*sva_invalidate)(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); unsigned long pgsize_bitmap; }; @@ -309,6 +312,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, struct pasid_table_config *pasidt_binfo); extern void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev); +extern int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info); + extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); @@ -720,6 +726,12 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) { } +static inline int iommu_sva_invalidate(struct iommu_domain *domain, + struct device *dev, struct tlb_invalidate_info *inv_info) +{ + return -EINVAL; +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */ diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h index 651ad5d..039ba36 100644 --- a/include/uapi/linux/iommu.h +++ b/include/uapi/linux/iommu.h @@ -36,4 +36,66 @@ struct pasid_table_config { }; }; +enum iommu_inv_granularity { + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */ + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */ + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a +* device ID +*/ + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */ + IOMMU_INV_GRANU_ALL_PASID, /* cache of a given PASID */ + IOMMU_INV_GRANU_PASID_SEL, /* only invalidate specified PASID */ + + IOMMU_INV_GRANU_NG_ALL_PASID, /* non-global within all PASIDs */ + IOMMU_INV_GRANU_NG_PASID, /* non-global within a PASIDs */ + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */ + IOMMU_INV_NR_GRANU, +}; + +enum iommu_inv_type { +
[PATCH v3 01/16] iommu: introduce bind_pasid_table API function
Virtual IOMMU was proposed to support Shared Virtual Memory (SVM) use in the guest: https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg05311.html As part of the proposed architecture, when an SVM capable PCI device is assigned to a guest, nested mode is turned on. Guest owns the first level page tables (request with PASID) which performs GVA->GPA translation. Second level page tables are owned by the host for GPA->HPA translation for both request with and without PASID. A new IOMMU driver interface is therefore needed to perform tasks as follows: * Enable nested translation and appropriate translation type * Assign guest PASID table pointer (in GPA) and size to host IOMMU This patch introduces new API functions to perform bind/unbind guest PASID tables. Based on common data, model specific IOMMU drivers can be extended to perform the specific steps for binding pasid table of assigned devices. Signed-off-by: Jacob PanSigned-off-by: Liu, Yi L Signed-off-by: Ashok Raj --- drivers/iommu/iommu.c | 19 +++ include/linux/iommu.h | 24 include/uapi/linux/iommu.h | 39 +++ 3 files changed, 82 insertions(+) create mode 100644 include/uapi/linux/iommu.h diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3de5c0b..c7e0d64 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1322,6 +1322,25 @@ int iommu_attach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_attach_device); +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, + struct pasid_table_config *pasidt_binfo) +{ + if (unlikely(!domain->ops->bind_pasid_table)) + return -ENODEV; + + return domain->ops->bind_pasid_table(domain, dev, pasidt_binfo); +} +EXPORT_SYMBOL_GPL(iommu_bind_pasid_table); + +void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) +{ + if (unlikely(!domain->ops->unbind_pasid_table)) + return; + + domain->ops->unbind_pasid_table(domain, dev); +} +EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table); + static void __iommu_detach_device(struct iommu_domain *domain, struct device *dev) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 41b8c57..0f6f6c5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -25,6 +25,7 @@ #include #include #include +#include #define IOMMU_READ (1 << 0) #define IOMMU_WRITE(1 << 1) @@ -187,6 +188,8 @@ struct iommu_resv_region { * @domain_get_windows: Return the number of windows for a domain * @of_xlate: add OF master IDs to iommu grouping * @pgsize_bitmap: bitmap of all possible supported page sizes + * @bind_pasid_table: bind pasid table pointer for guest SVM + * @unbind_pasid_table: unbind pasid table pointer and restore defaults */ struct iommu_ops { bool (*capable)(enum iommu_cap); @@ -233,8 +236,14 @@ struct iommu_ops { u32 (*domain_get_windows)(struct iommu_domain *domain); int (*of_xlate)(struct device *dev, struct of_phandle_args *args); + bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev); + int (*bind_pasid_table)(struct iommu_domain *domain, struct device *dev, + struct pasid_table_config *pasidt_binfo); + void (*unbind_pasid_table)(struct iommu_domain *domain, + struct device *dev); + unsigned long pgsize_bitmap; }; @@ -296,6 +305,10 @@ extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); +extern int iommu_bind_pasid_table(struct iommu_domain *domain, + struct device *dev, struct pasid_table_config *pasidt_binfo); +extern void iommu_unbind_pasid_table(struct iommu_domain *domain, + struct device *dev); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); @@ -696,6 +709,17 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } +static inline +int iommu_bind_pasid_table(struct iommu_domain *domain, struct device *dev, + struct pasid_table_config *pasidt_binfo) +{ + return -EINVAL; +} +static inline +void iommu_unbind_pasid_table(struct iommu_domain *domain, struct device *dev) +{ +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */ diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h new file mode 100644 index 000..651ad5d --- /dev/null +++
[PATCH v3 04/16] iommu/vt-d: move device_domain_info to header
Allow both intel-iommu.c and dmar.c to access device_domain_info. Prepare for additional per device arch data used in TLB flush function Signed-off-by: Jacob Pan--- drivers/iommu/intel-iommu.c | 18 -- include/linux/intel-iommu.h | 19 +++ 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 3d1901d..399b504 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -391,24 +391,6 @@ struct dmar_domain { iommu core */ }; -/* PCI domain-device relationship */ -struct device_domain_info { - struct list_head link; /* link to domain siblings */ - struct list_head global; /* link to global list */ - u8 bus; /* PCI bus number */ - u8 devfn; /* PCI devfn number */ - u8 pasid_supported:3; - u8 pasid_enabled:1; - u8 pri_supported:1; - u8 pri_enabled:1; - u8 ats_supported:1; - u8 ats_enabled:1; - u8 ats_qdep; - struct device *dev; /* it's NULL for PCIe-to-PCI bridge */ - struct intel_iommu *iommu; /* IOMMU used by this device */ - struct dmar_domain *domain; /* pointer to domain */ -}; - struct dmar_rmrr_unit { struct list_head list; /* list of rmrr units */ struct acpi_dmar_header *hdr; /* ACPI header */ diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 77ea056..8d38e24 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -458,6 +458,25 @@ struct intel_iommu { u32 flags; /* Software defined flags */ }; +/* PCI domain-device relationship */ +struct device_domain_info { + struct list_head link; /* link to domain siblings */ + struct list_head global; /* link to global list */ + u8 bus; /* PCI bus number */ + u8 devfn; /* PCI devfn number */ + u8 pasid_supported:3; + u8 pasid_enabled:1; + u8 pri_supported:1; + u8 pri_enabled:1; + u8 ats_supported:1; + u8 ats_enabled:1; + u8 ats_qdep; + u64 fault_mask; /* selected IOMMU faults to be reported */ + struct device *dev; /* it's NULL for PCIe-to-PCI bridge */ + struct intel_iommu *iommu; /* IOMMU used by this device */ + struct dmar_domain *domain; /* pointer to domain */ +}; + static inline void __iommu_flush_cache( struct intel_iommu *iommu, void *addr, int size) { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v2 5/5] ACPI/IORT: Move IORT to the ACPI folder
IORT can be used (by QEMU) to describe a virtual topology containing an architecture-agnostic paravirtualized device. The rationale behind this blasphemy is explained in patch 4/5. In order to build IORT for x86 systems, the driver has to be moved outside of arm64/. Since there is nothing specific to arm64 in the driver, it simply requires moving Makefile and Kconfig entries. Signed-off-by: Jean-Philippe Brucker--- drivers/acpi/Kconfig| 3 +++ drivers/acpi/Makefile | 1 + drivers/acpi/arm64/Kconfig | 3 --- drivers/acpi/arm64/Makefile | 1 - drivers/acpi/{arm64 => }/iort.c | 0 5 files changed, 4 insertions(+), 4 deletions(-) rename drivers/acpi/{arm64 => }/iort.c (100%) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 5b1938f4b626..ce40275646c8 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -536,4 +536,7 @@ if ARM64 source "drivers/acpi/arm64/Kconfig" endif +config ACPI_IORT + bool + endif # ACPI diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index cd1abc9bc325..689c470c013b 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -112,3 +112,4 @@ video-objs += acpi_video.o video_detect.o obj-y += dptf/ obj-$(CONFIG_ARM64)+= arm64/ +obj-$(CONFIG_ACPI_IORT)+= iort.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index 5a6f80fce0d6..403f917ab274 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -2,8 +2,5 @@ # ACPI Configuration for ARM64 # -config ACPI_IORT - bool - config ACPI_GTDT bool diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 1017def2ea12..47925dc6cfc8 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -1,2 +1 @@ -obj-$(CONFIG_ACPI_IORT)+= iort.o obj-$(CONFIG_ACPI_GTDT)+= gtdt.o diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c similarity index 100% rename from drivers/acpi/arm64/iort.c rename to drivers/acpi/iort.c -- 2.14.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH v2 4/5] ACPI/IORT: Support paravirtualized IOMMU
To describe the virtual topology in relation to a virtio-iommu device, ACPI-based systems use a "paravirtualized IOMMU" IORT node. Add support for it. This is a RFC because the IORT specification doesn't describe the paravirtualized node at the moment, it is only provided as an example in the virtio-iommu spec. What we need to do first is confirm that x86 kernels are able to use the IORT driver with the virtio-iommu. There isn't anything specific to arm64 in the driver but there might be other blockers we're not aware of (I know for example that x86 also requires custom DMA ops rather than iommu-dma ones, but it's unrelated) so this needs to be tested on the x86 prototype. Rationale: virtio-iommu requires an ACPI table to be passed between host and guest that describes its relation to PCI and platform endpoints in the virtual system. A table that maps PCI RIDs and integrated devices to IOMMU device IDs, telling the IOMMU driver which endpoints it manages. As far as I'm aware, there are three existing tables that solve this problem: Intel DMAR, AMD IVRS and ARM IORT. The first two are specific to Intel VT-d and AMD IOMMU respectively, while the third describes multiple remapping devices -- currently only ARM IOMMUs and MSI controllers, but it is easy to extend. IORT table and drivers are easiest to extend and they do the job, so rather than introducing a fourth solution to solve a generic problem, reuse what exists. Signed-off-by: Jean-Philippe Brucker--- drivers/acpi/arm64/iort.c | 95 +++ drivers/iommu/Kconfig | 1 + include/acpi/actbl2.h | 18 - 3 files changed, 106 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index fde279b0a6d8..c7132e4a0560 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -29,7 +29,8 @@ #define IORT_TYPE_MASK(type) (1 << (type)) #define IORT_MSI_TYPE (1 << ACPI_IORT_NODE_ITS_GROUP) #define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) | \ - (1 << ACPI_IORT_NODE_SMMU_V3)) + (1 << ACPI_IORT_NODE_SMMU_V3) | \ + (1 << ACPI_IORT_NODE_PARAVIRT)) /* Until ACPICA headers cover IORT rev. C */ #ifndef ACPI_IORT_SMMU_V3_CAVIUM_CN99XX @@ -616,6 +617,8 @@ static inline bool iort_iommu_driver_enabled(u8 type) return IS_BUILTIN(CONFIG_ARM_SMMU_V3); case ACPI_IORT_NODE_SMMU: return IS_BUILTIN(CONFIG_ARM_SMMU); + case ACPI_IORT_NODE_PARAVIRT: + return IS_BUILTIN(CONFIG_VIRTIO_IOMMU); default: pr_warn("IORT node type %u does not describe an SMMU\n", type); return false; @@ -1062,6 +1065,48 @@ static bool __init arm_smmu_is_coherent(struct acpi_iort_node *node) return smmu->flags & ACPI_IORT_SMMU_COHERENT_WALK; } +static int __init paravirt_count_resources(struct acpi_iort_node *node) +{ + struct acpi_iort_pviommu *pviommu; + + pviommu = (struct acpi_iort_pviommu *)node->node_data; + + /* Mem + IRQs */ + return 1 + pviommu->interrupt_count; +} + +static void __init paravirt_init_resources(struct resource *res, + struct acpi_iort_node *node) +{ + int i; + int num_res = 0; + int hw_irq, trigger; + struct acpi_iort_pviommu *pviommu; + + pviommu = (struct acpi_iort_pviommu *)node->node_data; + + res[num_res].start = pviommu->base_address; + res[num_res].end = pviommu->base_address + pviommu->span - 1; + res[num_res].flags = IORESOURCE_MEM; + num_res++; + + for (i = 0; i < pviommu->interrupt_count; i++) { + hw_irq = IORT_IRQ_MASK(pviommu->interrupts[i]); + trigger = IORT_IRQ_TRIGGER_MASK(pviommu->interrupts[i]); + + acpi_iort_register_irq(hw_irq, "pviommu", trigger, [num_res++]); + } +} + +static bool __init paravirt_is_coherent(struct acpi_iort_node *node) +{ + struct acpi_iort_pviommu *pviommu; + + pviommu = (struct acpi_iort_pviommu *)node->node_data; + + return pviommu->flags & ACPI_IORT_NODE_PV_CACHE_COHERENT; +} + struct iort_iommu_config { const char *name; int (*iommu_init)(struct acpi_iort_node *node); @@ -1088,6 +1133,13 @@ static const struct iort_iommu_config iort_arm_smmu_cfg __initconst = { .iommu_init_resources = arm_smmu_init_resources }; +static const struct iort_iommu_config iort_paravirt_cfg __initconst = { + .name = "pviommu", + .iommu_is_coherent = paravirt_is_coherent, + .iommu_count_resources = paravirt_count_resources, + .iommu_init_resources = paravirt_init_resources +}; + static __init const struct iort_iommu_config *iort_get_iommu_cfg(struct acpi_iort_node *node) { @@ -1096,18 +1148,22 @@ const struct iort_iommu_config
[RFC PATCH v2 2/5] iommu/virtio-iommu: Add probe request
When the device offers the probe feature, send a probe request for each device managed by the IOMMU. Extract RESV_MEM information. When we encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. This will tell other subsystems that there is no need to map the MSI doorbell in the virtio-iommu, because MSIs bypass it. Signed-off-by: Jean-Philippe Brucker--- drivers/iommu/virtio-iommu.c | 165 -- include/uapi/linux/virtio_iommu.h | 37 + 2 files changed, 195 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index feb8c8925c3a..79e0add94e05 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -45,6 +45,7 @@ struct viommu_dev { struct iommu_domain_geometrygeometry; u64 pgsize_bitmap; u8 domain_bits; + u32 probe_size; }; struct viommu_mapping { @@ -72,6 +73,7 @@ struct viommu_domain { struct viommu_endpoint { struct viommu_dev *viommu; struct viommu_domain*vdomain; + struct list_headresv_regions; }; struct viommu_request { @@ -139,6 +141,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu, case VIRTIO_IOMMU_T_UNMAP: size = sizeof(r->unmap); break; + case VIRTIO_IOMMU_T_PROBE: + *bottom += viommu->probe_size; + size = sizeof(r->probe) + *bottom; + break; default: return -EINVAL; } @@ -448,6 +454,106 @@ static int viommu_replay_mappings(struct viommu_domain *vdomain) return ret; } +static int viommu_add_resv_mem(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_resv_mem *mem, + size_t len) +{ + struct iommu_resv_region *region = NULL; + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + u64 addr = le64_to_cpu(mem->addr); + u64 size = le64_to_cpu(mem->size); + + if (len < sizeof(*mem)) + return -EINVAL; + + switch (mem->subtype) { + case VIRTIO_IOMMU_RESV_MEM_T_MSI: + region = iommu_alloc_resv_region(addr, size, prot, +IOMMU_RESV_MSI); + break; + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: + default: + region = iommu_alloc_resv_region(addr, size, 0, +IOMMU_RESV_RESERVED); + break; + } + + list_add(>resv_regions, >list); + + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED && + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) { + /* Please update your driver. */ + pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype); + return -EINVAL; + } + + return 0; +} + +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) +{ + int ret; + u16 type, len; + size_t cur = 0; + struct virtio_iommu_req_probe *probe; + struct virtio_iommu_probe_property *prop; + struct iommu_fwspec *fwspec = dev->iommu_fwspec; + struct viommu_endpoint *vdev = fwspec->iommu_priv; + + if (!fwspec->num_ids) + /* Trouble ahead. */ + return -EINVAL; + + probe = kzalloc(sizeof(*probe) + viommu->probe_size + + sizeof(struct virtio_iommu_req_tail), GFP_KERNEL); + if (!probe) + return -ENOMEM; + + probe->head.type = VIRTIO_IOMMU_T_PROBE; + /* +* For now, assume that properties of an endpoint that outputs multiple +* IDs are consistent. Only probe the first one. +*/ + probe->endpoint = cpu_to_le32(fwspec->ids[0]); + + ret = viommu_send_req_sync(viommu, probe); + if (ret) { + kfree(probe); + return ret; + } + + prop = (void *)probe->properties; + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; + + while (type != VIRTIO_IOMMU_PROBE_T_NONE && + cur < viommu->probe_size) { + len = le16_to_cpu(prop->length); + + switch (type) { + case VIRTIO_IOMMU_PROBE_T_RESV_MEM: + ret = viommu_add_resv_mem(vdev, (void *)prop->value, len); + break; + default: + dev_dbg(dev, "unknown viommu prop 0x%x\n", type); + } + + if (ret) + dev_err(dev, "failed to parse viommu prop 0x%x\n", type); + + cur += sizeof(*prop) + len; + if (cur >= viommu->probe_size) + break; + + prop = (void
[RFC PATCH v2 3/5] iommu/virtio-iommu: Add event queue
The event queue offers a way for the device to report access faults from devices. It is implemented on virtqueue #1, whenever the host needs to signal a fault it fills one of the buffers offered by the guest and interrupts it. Signed-off-by: Jean-Philippe Brucker--- drivers/iommu/virtio-iommu.c | 138 ++ include/uapi/linux/virtio_iommu.h | 18 + 2 files changed, 142 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 79e0add94e05..fe0d449bf489 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -30,6 +30,12 @@ #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 +enum viommu_vq_idx { + VIOMMU_REQUEST_VQ = 0, + VIOMMU_EVENT_VQ = 1, + VIOMMU_NUM_VQS = 2, +}; + struct viommu_dev { struct iommu_device iommu; struct device *dev; @@ -37,7 +43,7 @@ struct viommu_dev { struct ida domain_ids; - struct virtqueue*vq; + struct virtqueue*vqs[VIOMMU_NUM_VQS]; /* Serialize anything touching the request queue */ spinlock_t request_lock; @@ -84,6 +90,15 @@ struct viommu_request { struct list_headlist; }; +#define VIOMMU_FAULT_RESV_MASK 0xff00 + +struct viommu_event { + union { + u32 head; + struct virtio_iommu_fault fault; + }; +}; + #define to_viommu_domain(domain) container_of(domain, struct viommu_domain, domain) /* Virtio transport */ @@ -160,12 +175,13 @@ static int viommu_receive_resp(struct viommu_dev *viommu, int nr_sent, unsigned int len; int nr_received = 0; struct viommu_request *req, *pending; + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; pending = list_first_entry_or_null(sent, struct viommu_request, list); if (WARN_ON(!pending)) return 0; - while ((req = virtqueue_get_buf(viommu->vq, )) != NULL) { + while ((req = virtqueue_get_buf(vq, )) != NULL) { if (req != pending) { dev_warn(viommu->dev, "discarding stale request\n"); continue; @@ -202,6 +218,7 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu, * dies. */ unsigned long timeout_ms = 1000; + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ]; *nr_sent = 0; @@ -211,15 +228,14 @@ static int _viommu_send_reqs_sync(struct viommu_dev *viommu, sg[0] = >top; sg[1] = >bottom; - ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req, - GFP_ATOMIC); + ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC); if (ret) break; list_add_tail(>list, ); } - if (i && !virtqueue_kick(viommu->vq)) + if (i && !virtqueue_kick(vq)) return -EPIPE; timeout = ktime_add_ms(ktime_get(), timeout_ms * i); @@ -554,6 +570,70 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) return 0; } +static int viommu_fault_handler(struct viommu_dev *viommu, + struct virtio_iommu_fault *fault) +{ + char *reason_str; + + u8 reason = fault->reason; + u32 flags = le32_to_cpu(fault->flags); + u32 endpoint= le32_to_cpu(fault->endpoint); + u64 address = le64_to_cpu(fault->address); + + switch (reason) { + case VIRTIO_IOMMU_FAULT_R_DOMAIN: + reason_str = "domain"; + break; + case VIRTIO_IOMMU_FAULT_R_MAPPING: + reason_str = "page"; + break; + case VIRTIO_IOMMU_FAULT_R_UNKNOWN: + default: + reason_str = "unknown"; + break; + } + + /* TODO: find EP by ID and report_iommu_fault */ + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", + reason_str, endpoint, address, + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); + else + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n", + reason_str, endpoint); + + return 0; +} + +static void viommu_event_handler(struct virtqueue *vq) +{ + int ret; + unsigned int len; + struct scatterlist sg[1]; +
[RFC PATCH v2 1/5] iommu: Add virtio-iommu driver
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU requests such as map/unmap over virtio-mmio transport without emulating page tables. This implementation handle ATTACH, DETACH, MAP and UNMAP requests. The bulk of the code is to create requests and send them through virtio. Implementing the IOMMU API is fairly straightforward since the virtio-iommu MAP/UNMAP interface is almost identical. Signed-off-by: Jean-Philippe Brucker--- drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu.c | 958 ++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_iommu.h | 140 ++ 5 files changed, insertions(+) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 17b212f56e6a..7271e59e8b23 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -403,4 +403,15 @@ config QCOM_IOMMU help Support for IOMMU on certain Qualcomm SoCs. +config VIRTIO_IOMMU + bool "Virtio IOMMU driver" + depends on VIRTIO_MMIO + select IOMMU_API + select INTERVAL_TREE + select ARM_DMA_USE_IOMMU if ARM + help + Para-virtualised IOMMU driver with virtio. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index dca71fe1c885..432242f3a328 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -31,3 +31,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c new file mode 100644 index ..feb8c8925c3a --- /dev/null +++ b/drivers/iommu/virtio-iommu.c @@ -0,0 +1,958 @@ +/* + * Virtio driver for the paravirtualized IOMMU + * + * Copyright (C) 2017 ARM Limited + * Author: Jean-Philippe Brucker + * + * SPDX-License-Identifier: GPL-2.0 + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define MSI_IOVA_BASE 0x800 +#define MSI_IOVA_LENGTH0x10 + +struct viommu_dev { + struct iommu_device iommu; + struct device *dev; + struct virtio_device*vdev; + + struct ida domain_ids; + + struct virtqueue*vq; + /* Serialize anything touching the request queue */ + spinlock_t request_lock; + + /* Device configuration */ + struct iommu_domain_geometrygeometry; + u64 pgsize_bitmap; + u8 domain_bits; +}; + +struct viommu_mapping { + phys_addr_t paddr; + struct interval_tree_node iova; + union { + struct virtio_iommu_req_map map; + struct virtio_iommu_req_unmap unmap; + } req; +}; + +struct viommu_domain { + struct iommu_domain domain; + struct viommu_dev *viommu; + struct mutexmutex; + unsigned intid; + + spinlock_t mappings_lock; + struct rb_root_cached mappings; + + /* Number of endpoints attached to this domain */ + refcount_t endpoints; +}; + +struct viommu_endpoint { + struct viommu_dev *viommu; + struct viommu_domain*vdomain; +}; + +struct viommu_request { + struct scatterlist top; + struct scatterlist bottom; + + int written; + struct list_headlist; +}; + +#define to_viommu_domain(domain) container_of(domain, struct viommu_domain, domain) + +/* Virtio transport */ + +static int viommu_status_to_errno(u8 status) +{ + switch (status) { + case VIRTIO_IOMMU_S_OK: + return 0; + case VIRTIO_IOMMU_S_UNSUPP: + return -ENOSYS; + case VIRTIO_IOMMU_S_INVAL: + return -EINVAL; + case VIRTIO_IOMMU_S_RANGE: + return -ERANGE; + case VIRTIO_IOMMU_S_NOENT: + return -ENOENT; + case VIRTIO_IOMMU_S_FAULT: + return -EFAULT; + case VIRTIO_IOMMU_S_IOERR: + case VIRTIO_IOMMU_S_DEVERR: + default: + return -EIO; + } +} + +/* + * viommu_get_req_size -
[RFC PATCH v2 0/5] Add virtio-iommu driver
Implement the virtio-iommu driver following version 0.5 of the specification [1]. Previous version of this code was sent back in April [2], implementing the first public RFC. Since then there has been lots of progress and discussion on the specification side, and I think the driver is in a good shape now. The reason patches 1-3 are only RFC is that I'm waiting on feedback from the Virtio TC to reserve a device ID. List of changes since previous RFC: * Add per-endpoint probe request, for hardware MSI and reserved regions. * Add a virtqueue for the device to report translation faults. Only non-recoverable ones at the moment. * Removed the iommu_map_sg specialization for now, because none of the device drivers I use for testing (virtio, ixgbe and internal DMA engines) seem to use map_sg. This kind of feature is a lot more interesting when accompanied by benchmark numbers, and can be added back during future optimization work. * Many fixes and cleanup The driver works out of the box on DT-based systems, but ACPI support still needs to be tested and discussed. In the specification I proposed IORT tables as a nice candidate for describing the virtual topology. Patches 4 and 5 propose small changes to the IORT driver for instantiating a paravirtualized IOMMU. The IORT node is described in the specification [1]. x86 support will also require some hacks since the driver is based on the IOMMU DMA ops, that x86 doesn't use. Eric's latest QEMU device [3] works with v0.4. For the moment you can use the kvmtool device [4] to test v0.5 on arm64, and inject arbitrary fault with the debug tool. The driver can also be pulled from my Linux tree [5]. [1] https://www.spinics.net/lists/kvm/msg157402.html [2] https://patchwork.kernel.org/patch/9670273/ [3] https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00413.html [4] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/base [5] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.5-dev Jean-Philippe Brucker (5): iommu: Add virtio-iommu driver iommu/virtio-iommu: Add probe request iommu/virtio-iommu: Add event queue ACPI/IORT: Support paravirtualized IOMMU ACPI/IORT: Move IORT to the ACPI folder drivers/acpi/Kconfig |3 + drivers/acpi/Makefile |1 + drivers/acpi/arm64/Kconfig|3 - drivers/acpi/arm64/Makefile |1 - drivers/acpi/{arm64 => }/iort.c | 95 ++- drivers/iommu/Kconfig | 12 + drivers/iommu/Makefile|1 + drivers/iommu/virtio-iommu.c | 1219 + include/acpi/actbl2.h | 18 +- include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_iommu.h | 195 ++ 11 files changed, 1537 insertions(+), 12 deletions(-) rename drivers/acpi/{arm64 => }/iort.c (92%) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h -- 2.14.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling
On Fri, 17 Nov 2017 17:44:57 + Casey Leedomwrote: > | From: Raj, Ashok > | Sent: Friday, November 17, 2017 7:48 AM > | > | Reported by: Harsh > | Reviewed by: Ashok Raj > | Tested by: Jacob Pan > > Thanks everyone! I've updated our internal bug on this issue > and noted that we need to track down the remaining problems > which may be in our own code. > All sounds good to me, let me know if you need further assistance on vt-d driver. Jacob > Casey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling
| From: Raj, Ashok| Sent: Friday, November 17, 2017 7:48 AM | | Reported by: Harsh | Reviewed by: Ashok Raj | Tested by: Jacob Pan Thanks everyone! I've updated our internal bug on this issue and noted that we need to track down the remaining problems which may be in our own code. Casey ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling
Hi Alex On Fri, Nov 17, 2017 at 09:18:14AM -0700, Alex Williamson wrote: > On Thu, 16 Nov 2017 13:09:33 -0800 > "Raj, Ashok"wrote: > > > > > > > What do we do about this? I certainly can't rip out large page support > > > and put a stable tag on the patch. I'm not really spotting what's > > > wrong with large page support here, other than the comment about it > > > being a mess. Suggestions? Thanks, > > > > > > > Largepage seems to work and i don't think we need to rip it out. When > > Harsh tested it at one point we thought disabling super-page seemed to make > > the problem go away. Jacob tested and we still saw the need for Robin's > > patch. > > > > Yes, the function looks humongous but i don't think we should wait for that > > before this merge. > > Ok. Who wants to toss in review and testing sign-offs? Clearly > there's been a lot more eyes and effort on this patch than reflected in > the original posting. I'll add a stable cc. Thanks, Reported by: Harsh Reviewed by: Ashok Raj Tested by: Jacob Pan > > Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling
On Thu, 16 Nov 2017 13:09:33 -0800 "Raj, Ashok"wrote: > Hi Alex > > On Thu, Nov 16, 2017 at 02:32:44PM -0700, Alex Williamson wrote: > > On Wed, 15 Nov 2017 15:54:56 -0800 > > Jacob Pan wrote: > > > > > Hi Alex and all, > > > > > > Just wondering if you could merge Robin's patch for the next rc. From > > > all our testing, this seems to be a solid fix and should be included in > > > the stable releases as well. > > > > Hi Jacob, > > > > Sorry, this wasn't on my radar, I only scanned for patches back through > > about when Joerg refreshed his next branch (others on the list speak up > > if I didn't pickup your patches for the v4.15 merge window). > > > > This patch makes sense to me and I'm glad you were able to work through > > the anomaly Harsh saw in testing as an unrelated issue, but... > > > > > > What do we do about this? I certainly can't rip out large page support > > and put a stable tag on the patch. I'm not really spotting what's > > wrong with large page support here, other than the comment about it > > being a mess. Suggestions? Thanks, > > > > Largepage seems to work and i don't think we need to rip it out. When > Harsh tested it at one point we thought disabling super-page seemed to make > the problem go away. Jacob tested and we still saw the need for Robin's patch. > > Yes, the function looks humongous but i don't think we should wait for that > before this merge. Ok. Who wants to toss in review and testing sign-offs? Clearly there's been a lot more eyes and effort on this patch than reflected in the original posting. I'll add a stable cc. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFCv2 PATCH 31/36] iommu/arm-smmu-v3: Add support for PCI ATS
On 17/11/17 06:11, Bharat Kumar Gogada wrote: [...] > Thanks Jean, I see that currently vfio_group_fops_open does not allow > multiple instances. > If a device supports multiple PASID there might be different applications > running parallel. > So why is multiple instances restricted ? You can't have multiple processes owning the same PCI device, it's unmanageable. For using multiple PASIDs, my idea was that the userspace driver ("the server"), that owns the device, would have a way to partition it into smaller frames. It forks to create "clients" and assigns a PASID to each of them (by issuing VFIO_BIND(client_pid) -> pasid, then writing the PASID into a privileged MMIO frame that defines the partition properties). Each client accesses an unprivileged MMIO frame to use a device partition (or sends commands to the server via IPC), and can perform DMA on its own virtual memory. This is complete speculation of course, we have very little information on how PASID-capable devices will be designed, so I'm trying to imagine likely scenarios. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu