Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-08 Thread Lu Baolu

Hi Longpeng,

On 4/8/21 3:37 PM, Longpeng (Mike, Cloud Infrastructure Service Product 
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Thursday, April 8, 2021 12:32 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
Amit ; Alex Williamson ;
Kevin Tian ; Gonglei (Arei) ;
sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

Hi Longpeng,

On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
Dept.) wrote:

Hi Baolu,


-Original Message-
From: Lu Baolu [mailto:baolu...@linux.intel.com]
Sent: Friday, April 2, 2021 12:44 PM
To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org
Cc: baolu...@linux.intel.com; David Woodhouse ;
Nadav Amit ; Alex Williamson
; Kevin Tian ;
Gonglei (Arei) ; sta...@vger.kernel.org
Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
superpage

Hi Longpeng,

On 4/1/21 3:18 PM, Longpeng(Mike) wrote:

diff --git a/drivers/iommu/intel/iommu.c
b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int
hardware_largepage_caps(struct

dmar_domain *domain,

 * removed to make room for superpage(s).
 * We're adding new large pages, so make sure
 * we don't remove their parent tables.
+*
+* We also need to flush the iotlb before 
creating
+* superpage to ensure it does not perserves any
+* obsolete info.
 */
-   dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
-  largepage_lvl + 1);
+   if (dma_pte_present(pte)) {


The dma_pte_free_pagetable() clears a batch of PTEs. So checking
current PTE is insufficient. How about removing this check and always
performing cache invalidation?



Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )

orNOT-present ( e.g. create a totally new superpage mapping ), but we only need 
to
call free_pagetable and flush_iotlb in the former case, right ?

But this code covers multiple PTEs and perhaps crosses the page boundary.

How about moving this code into a separated function and check PTE presence
there. A sample code could look like below: [compiled but not tested!]

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
d334f5b4e382..0e04d450c38a 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
dmar_domain *domain,
  return level;
   }

+/*
+ * Ensure that old small page tables are removed to make room for
superpage(s).
+ * We're going to add new large pages, so make sure we don't remove
their parent
+ * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
+ */
+static void switch_to_super_page(struct dmar_domain *domain,
+unsigned long start_pfn,
+unsigned long end_pfn, int level) {


Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something 
like "remove_room_for_super_page" or "prepare_for_super_page" ?


I named it like this because we also want to have a opposite operation
split_from_super_page() which switch a PDE or PDPE from super page
setting up to small pages, which is needed to optimize dirty bit
tracking during VM live migration.




+   unsigned long lvl_pages = lvl_to_nr_pages(level);
+   struct dma_pte *pte = NULL;
+   int i;
+
+   while (start_pfn <= end_pfn) {


start_pfn < end_pfn ?


end_pfn is inclusive.




+   if (!pte)
+   pte = pfn_to_dma_pte(domain, start_pfn, );
+
+   if (dma_pte_present(pte)) {
+   dma_pte_free_pagetable(domain, start_pfn,
+  start_pfn + lvl_pages - 1,
+  level + 1);
+
+   for_each_domain_iommu(i, domain)
+   iommu_flush_iotlb_psi(g_iommus[i],
domain,
+ start_pfn,
lvl_pages,
+ 0, 0);
+   }
+
+   pte++;
+   start_pfn += lvl_pages;
+   if (first_pte_in_page(pte))
+   pte = NULL;
+   }
+}
+
   static int
   __domain_mapping(struct dmar_domain 

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-08 Thread Kunkun Jiang

Hi Eric,

On 2021/4/8 20:30, Auger Eric wrote:

Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:

Hi Eric,

On 2021/2/24 4:56, Eric Auger wrote:

With nested stage support, soon we will need to invalidate
S1 contexts and ranges tagged with an unmanaged asid, this
latter being managed by the guest. So let's introduce 2 helpers
that allow to invalidate with externally managed ASIDs

Signed-off-by: Eric Auger 

---

v13 -> v14
- Actually send the NH_ASID command (reported by Xingang Wang)
---
   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 -
   1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5579ec4fccc8..4c19a1114de4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
arm_smmu_domain *smmu_domain, int ssid,
   }
     /* IO_PGTABLE API */
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
*smmu_domain,
+   int ext_asid)
   {
-    struct arm_smmu_domain *smmu_domain = cookie;
   struct arm_smmu_device *smmu = smmu_domain->smmu;
   struct arm_smmu_cmdq_ent cmd;
   @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
*cookie)
    * insertion to guarantee those are observed before the TLBI. Do be
    * careful, 007.
    */
-    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+    if (ext_asid >= 0) { /* guest stage 1 invalidation */
+    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
+    cmd.tlbi.asid    = ext_asid;
+    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
+    arm_smmu_cmdq_issue_cmd(smmu, );
+    arm_smmu_cmdq_issue_sync(smmu);
+    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
   arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
   } else {
   cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
@@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
   arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
   }
   +static void arm_smmu_tlb_inv_context(void *cookie)
+{
+    struct arm_smmu_domain *smmu_domain = cookie;
+
+    __arm_smmu_tlb_inv_context(smmu_domain, -1);
+}
+
   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
    unsigned long iova, size_t size,
    size_t granule,
@@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
arm_smmu_cmdq_ent *cmd,
   arm_smmu_cmdq_batch_submit(smmu, );
   }
   

Here is the part of code in __arm_smmu_tlb_inv_range():

     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
     /* Get the leaf page size */
     tg = __ffs(smmu_domain->domain.pgsize_bitmap);

     /* Convert page size of 12,14,16 (log2) to 1,2,3 */
     cmd->tlbi.tg = (tg - 10) / 2;

     /* Determine what level the granule is at */
     cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));

     num_pages = size >> tg;
     }

When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
domain.pgsize_bitmap). In nested mode, it is determined by host
PAGE_SIZE. If
the host kernel and guest kernel has different translation granule (e.g.
host 16K,
guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.

Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/

In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

I think they are different. In normal cases, when we want to invalidate the
cache of stage 1, we should use the granule size supported by vSMMU to
implement and issue an tlbi command if pSMMU supports RIL.

But in the current __arm_smmu_tlb_inv_range(), it always uses the granule
size supported by host.
(tg = __ffs(smmu_domain->domain.pgsize_bitmap);)

Let me explain more clearly.
Preconditions of this issue:
1. pSMMU supports RIL
2. host and guest use different translation granule (e.g. host 16K, 
guest 4K)


Guest wants to invalidate 4K, so info->granule_size = 4K.
In __arm_smmu_tlb_inv_range(),   if pSMMU supports RIL and host 16K,
tg = 14, tlbi.tg = 2, tlbi.ttl = 4, tlbi.scale = 0, tlbi.num = -1. It is 
an incorrect

tlbi command.

So it would be better to pass the leaf page size supported by vSMMU to
host.  Perhaps this issue and the one reported by Chenxiang can be solved
together.

Thanks,
Kunkun Jiang

Best Regards,
Kunkun Jiang

-static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
size,
-  size_t granule, bool leaf,
-  struct arm_smmu_domain *smmu_domain)
+static void

[RFC PATCH v3 7/8] vfio/type1: Add selective DMA faulting support

2021-04-08 Thread Shenming Lu
Some devices only allow selective DMA faulting. Similar to the selective
dirty page tracking, the vendor driver can call vfio_pin_pages() to
indicate the non-faultable scope, we add a new struct vfio_range to
record it, then when the IOPF handler receives any page request out
of the scope, we can directly return with an invalid response.

Suggested-by: Kevin Tian 
Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio.c |   4 +-
 drivers/vfio/vfio_iommu_type1.c | 357 +++-
 include/linux/vfio.h|   1 +
 3 files changed, 358 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 38779e6fd80c..44c8dfabf7de 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2013,7 +2013,8 @@ int vfio_unpin_pages(struct device *dev, unsigned long 
*user_pfn, int npage)
container = group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->unpin_pages))
-   ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
+   ret = driver->ops->unpin_pages(container->iommu_data,
+  group->iommu_group, user_pfn,
   npage);
else
ret = -ENOTTY;
@@ -2112,6 +2113,7 @@ int vfio_group_unpin_pages(struct vfio_group *group,
driver = container->iommu_driver;
if (likely(driver && driver->ops->unpin_pages))
ret = driver->ops->unpin_pages(container->iommu_data,
+  group->iommu_group,
   user_iova_pfn, npage);
else
ret = -ENOTTY;
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index dcc93c3b258c..ba2b5a1cf6e9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -150,10 +150,19 @@ struct vfio_regions {
 static struct rb_root iopf_group_list = RB_ROOT;
 static DEFINE_MUTEX(iopf_group_list_lock);
 
+struct vfio_range {
+   struct rb_node  node;
+   dma_addr_t  base_iova;
+   size_t  span;
+   unsigned intref_count;
+};
+
 struct vfio_iopf_group {
struct rb_node  node;
struct iommu_group  *iommu_group;
struct vfio_iommu   *iommu;
+   struct rb_root  pinned_range_list;
+   boolselective_faulting;
 };
 
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
@@ -496,6 +505,255 @@ static void vfio_unlink_iopf_group(struct vfio_iopf_group 
*old)
mutex_unlock(_group_list_lock);
 }
 
+/*
+ * Helper functions for range list, handle one page at a time.
+ */
+static struct vfio_range *vfio_find_range(struct rb_root *range_list,
+ dma_addr_t iova)
+{
+   struct rb_node *node = range_list->rb_node;
+   struct vfio_range *range;
+
+   while (node) {
+   range = rb_entry(node, struct vfio_range, node);
+
+   if (iova + PAGE_SIZE <= range->base_iova)
+   node = node->rb_left;
+   else if (iova >= range->base_iova + range->span)
+   node = node->rb_right;
+   else
+   return range;
+   }
+
+   return NULL;
+}
+
+/* Do the possible merge adjacent to the input range. */
+static void vfio_merge_range_list(struct rb_root *range_list,
+ struct vfio_range *range)
+{
+   struct rb_node *node_prev = rb_prev(>node);
+   struct rb_node *node_next = rb_next(>node);
+
+   if (node_next) {
+   struct vfio_range *range_next = rb_entry(node_next,
+struct vfio_range,
+node);
+
+   if (range_next->base_iova == (range->base_iova + range->span) &&
+   range_next->ref_count == range->ref_count) {
+   rb_erase(node_next, range_list);
+   range->span += range_next->span;
+   kfree(range_next);
+   }
+   }
+
+   if (node_prev) {
+   struct vfio_range *range_prev = rb_entry(node_prev,
+struct vfio_range,
+node);
+
+   if (range->base_iova == (range_prev->base_iova + 
range_prev->span)
+   && range->ref_count == range_prev->ref_count) {
+   rb_erase(>node, range_list);
+   range_prev->span += range->span;
+   kfree(range);
+   }
+   }
+}
+
+static void vfio_link_range(struct rb_root *range_list, struct vfio_range *new)
+{
+   struct rb_node **link, *parent = NULL;
+   struct vfio_range 

[RFC PATCH v3 8/8] vfio: Add nested IOPF support

2021-04-08 Thread Shenming Lu
To set up nested mode, drivers such as vfio_pci need to register a
handler to receive stage/level 1 faults from the IOMMU, but since
currently each device can only have one iommu dev fault handler,
and if stage 2 IOPF is already enabled (VFIO_IOMMU_ENABLE_IOPF),
we choose to update the registered handler (a consolidated one) via
flags (set FAULT_REPORT_NESTED_L1), and further deliver the received
stage 1 faults in the handler to the guest through a newly added
vfio_device_ops callback.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio.c | 81 +
 drivers/vfio/vfio_iommu_type1.c | 49 +++-
 include/linux/vfio.h| 12 +
 3 files changed, 141 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 44c8dfabf7de..4245f15914bf 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2356,6 +2356,87 @@ struct iommu_domain *vfio_group_iommu_domain(struct 
vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_iommu_domain);
 
+/*
+ * Register/Update the VFIO IOPF handler to receive
+ * nested stage/level 1 faults.
+ */
+int vfio_iommu_dev_fault_handler_register_nested(struct device *dev)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (!group)
+   return -ENODEV;
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto out;
+
+   container = group->container;
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->register_handler))
+   ret = driver->ops->register_handler(container->iommu_data, dev);
+   else
+   ret = -ENOTTY;
+
+   vfio_group_try_dissolve_container(group);
+
+out:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_register_nested);
+
+int vfio_iommu_dev_fault_handler_unregister_nested(struct device *dev)
+{
+   struct vfio_container *container;
+   struct vfio_group *group;
+   struct vfio_iommu_driver *driver;
+   int ret;
+
+   if (!dev)
+   return -EINVAL;
+
+   group = vfio_group_get_from_dev(dev);
+   if (!group)
+   return -ENODEV;
+
+   ret = vfio_group_add_container_user(group);
+   if (ret)
+   goto out;
+
+   container = group->container;
+   driver = container->iommu_driver;
+   if (likely(driver && driver->ops->unregister_handler))
+   ret = driver->ops->unregister_handler(container->iommu_data, 
dev);
+   else
+   ret = -ENOTTY;
+
+   vfio_group_try_dissolve_container(group);
+
+out:
+   vfio_group_put(group);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler_unregister_nested);
+
+int vfio_transfer_iommu_fault(struct device *dev, struct iommu_fault *fault)
+{
+   struct vfio_device *device = dev_get_drvdata(dev);
+
+   if (unlikely(!device->ops->transfer))
+   return -EOPNOTSUPP;
+
+   return device->ops->transfer(device->device_data, fault);
+}
+EXPORT_SYMBOL_GPL(vfio_transfer_iommu_fault);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ba2b5a1cf6e9..9d1adeddb303 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3821,13 +3821,32 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
iommu_fault *fault, void *data)
struct vfio_batch batch;
struct vfio_range *range;
dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
-   int access_flags = 0;
+   int access_flags = 0, nested;
size_t premap_len, map_len, mapped_len = 0;
unsigned long bit_offset, vaddr, pfn, i, npages;
int ret;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
struct iommu_page_response resp = {0};
 
+   if (vfio_dev_domian_nested(dev, ))
+   return -ENODEV;
+
+   /*
+* When configured in nested mode, further deliver the
+* stage/level 1 faults to the guest.
+*/
+   if (nested) {
+   bool l2;
+
+   if (fault->type == IOMMU_FAULT_PAGE_REQ)
+   l2 = fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_L2;
+   if (fault->type == IOMMU_FAULT_DMA_UNRECOV)
+   l2 = fault->event.flags & IOMMU_FAULT_UNRECOV_L2;
+
+   if (!l2)
+   return vfio_transfer_iommu_fault(dev, fault);
+   }
+
if (fault->type != IOMMU_FAULT_PAGE_REQ)
return -EOPNOTSUPP;
 
@@ -4201,6 +4220,32 @@ static void vfio_iommu_type1_notify(void *iommu_data,
wake_up_all(>vaddr_wait);
 }
 
+static int vfio_iommu_type1_register_handler(void *iommu_data,
+

[RFC PATCH v3 6/8] vfio/type1: No need to statically pin and map if IOPF enabled

2021-04-08 Thread Shenming Lu
If IOPF enabled for the VFIO container, there is no need to statically
pin and map the entire DMA range, we can do it on demand. And unmap
according to the IOPF mapped bitmap when removing vfio_dma.

Note that we still mark all pages dirty even if IOPF enabled, we may
add IOPF-based fine grained dirty tracking support in the future.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio_iommu_type1.c | 38 +++--
 1 file changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 7df5711e743a..dcc93c3b258c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -175,6 +175,7 @@ struct vfio_iopf_group {
 #define IOPF_MAPPED_BITMAP_GET(dma, i) \
  ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
\
   >> ((i) % BITS_PER_LONG)) & 0x1)
+#define IOPF_MAPPED_BITMAP_BYTES(n)DIRTY_BITMAP_BYTES(n)
 
 #define WAITED 1
 
@@ -959,7 +960,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 * already pinned and accounted. Accouting should be done if there is no
 * iommu capable domain in the container.
 */
-   do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
+   do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
+   iommu->iopf_enabled;
 
for (i = 0; i < npage; i++) {
struct vfio_pfn *vpfn;
@@ -1048,7 +1050,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data,
 
mutex_lock(>lock);
 
-   do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu);
+   do_accounting = !IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
+   iommu->iopf_enabled;
for (i = 0; i < npage; i++) {
struct vfio_dma *dma;
dma_addr_t iova;
@@ -1169,7 +1172,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
if (!dma->size)
return 0;
 
-   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
+   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled)
return 0;
 
/*
@@ -1306,11 +1309,20 @@ static void vfio_unmap_partial_iopf(struct vfio_iommu 
*iommu,
}
 }
 
+static void vfio_dma_clean_iopf(struct vfio_iommu *iommu, struct vfio_dma *dma)
+{
+   vfio_unmap_partial_iopf(iommu, dma, dma->iova, dma->iova + dma->size);
+
+   kfree(dma->iopf_mapped_bitmap);
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
vfio_unmap_unpin(iommu, dma, true);
vfio_unlink_dma(iommu, dma);
+   if (iommu->iopf_enabled)
+   vfio_dma_clean_iopf(iommu, dma);
put_task_struct(dma->task);
vfio_dma_bitmap_free(dma);
if (dma->vaddr_invalid) {
@@ -1359,7 +1371,8 @@ static int update_user_bitmap(u64 __user *bitmap, struct 
vfio_iommu *iommu,
 * mark all pages dirty if any IOMMU capable device is not able
 * to report dirty pages and all pages are pinned and mapped.
 */
-   if (iommu->num_non_pinned_groups && dma->iommu_mapped)
+   if (iommu->num_non_pinned_groups &&
+   (dma->iommu_mapped || iommu->iopf_enabled))
bitmap_set(dma->bitmap, 0, nbits);
 
if (shift) {
@@ -1772,6 +1785,16 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
goto out_unlock;
}
 
+   if (iommu->iopf_enabled) {
+   dma->iopf_mapped_bitmap = kvzalloc(IOPF_MAPPED_BITMAP_BYTES(
+   size >> PAGE_SHIFT), 
GFP_KERNEL);
+   if (!dma->iopf_mapped_bitmap) {
+   ret = -ENOMEM;
+   kfree(dma);
+   goto out_unlock;
+   }
+   }
+
iommu->dma_avail--;
dma->iova = iova;
dma->vaddr = vaddr;
@@ -1811,8 +1834,11 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
/* Insert zero-sized and grow as we map chunks of it */
vfio_link_dma(iommu, dma);
 
-   /* Don't pin and map if container doesn't contain IOMMU capable domain*/
-   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
+   /*
+* Don't pin and map if container doesn't contain IOMMU capable domain,
+* or IOPF enabled for the container.
+*/
+   if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) || iommu->iopf_enabled)
dma->size = size;
else
ret = vfio_pin_map_dma(iommu, dma, size);
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 5/8] vfio/type1: VFIO_IOMMU_ENABLE_IOPF

2021-04-08 Thread Shenming Lu
Since enabling IOPF for devices may lead to a slow ramp up of performance,
we add an ioctl VFIO_IOMMU_ENABLE_IOPF to make it configurable. And the
IOPF enabling of a VFIO device includes setting IOMMU_DEV_FEAT_IOPF and
registering the VFIO IOPF handler.

Note that VFIO_IOMMU_DISABLE_IOPF is not supported since there may be
inflight page faults when disabling.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio_iommu_type1.c | 223 +++-
 include/uapi/linux/vfio.h   |   6 +
 2 files changed, 226 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 01e296c6dc9e..7df5711e743a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
struct rb_root  dma_list;
struct blocking_notifier_head notifier;
struct mmu_notifier mn;
+   struct mm_struct*mm;
unsigned intdma_avail;
unsigned intvaddr_invalid_count;
uint64_tpgsize_bitmap;
@@ -81,6 +82,7 @@ struct vfio_iommu {
booldirty_page_tracking;
boolpinned_page_dirty_scope;
boolcontainer_open;
+   booliopf_enabled;
 };
 
 struct vfio_domain {
@@ -461,6 +463,38 @@ vfio_find_iopf_group(struct iommu_group *iommu_group)
return node ? iopf_group : NULL;
 }
 
+static void vfio_link_iopf_group(struct vfio_iopf_group *new)
+{
+   struct rb_node **link, *parent = NULL;
+   struct vfio_iopf_group *iopf_group;
+
+   mutex_lock(_group_list_lock);
+
+   link = _group_list.rb_node;
+
+   while (*link) {
+   parent = *link;
+   iopf_group = rb_entry(parent, struct vfio_iopf_group, node);
+
+   if (new->iommu_group < iopf_group->iommu_group)
+   link = &(*link)->rb_left;
+   else
+   link = &(*link)->rb_right;
+   }
+
+   rb_link_node(>node, parent, link);
+   rb_insert_color(>node, _group_list);
+
+   mutex_unlock(_group_list_lock);
+}
+
+static void vfio_unlink_iopf_group(struct vfio_iopf_group *old)
+{
+   mutex_lock(_group_list_lock);
+   rb_erase(>node, _group_list);
+   mutex_unlock(_group_list_lock);
+}
+
 static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 {
struct mm_struct *mm;
@@ -2363,6 +2397,68 @@ static void vfio_iommu_iova_insert_copy(struct 
vfio_iommu *iommu,
list_splice_tail(iova_copy, iova);
 }
 
+static int vfio_dev_domian_nested(struct device *dev, int *nested)
+{
+   struct iommu_domain *domain;
+
+   domain = iommu_get_domain_for_dev(dev);
+   if (!domain)
+   return -ENODEV;
+
+   return iommu_domain_get_attr(domain, DOMAIN_ATTR_NESTING, nested);
+}
+
+static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void 
*data);
+
+static int dev_enable_iopf(struct device *dev, void *data)
+{
+   int *enabled_dev_cnt = data;
+   int nested;
+   u32 flags;
+   int ret;
+
+   ret = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_IOPF);
+   if (ret)
+   return ret;
+
+   ret = vfio_dev_domian_nested(dev, );
+   if (ret)
+   goto out_disable;
+
+   if (nested)
+   flags = FAULT_REPORT_NESTED_L2;
+   else
+   flags = FAULT_REPORT_FLAT;
+
+   ret = iommu_register_device_fault_handler(dev,
+   vfio_iommu_type1_dma_map_iopf, flags, dev);
+   if (ret)
+   goto out_disable;
+
+   (*enabled_dev_cnt)++;
+   return 0;
+
+out_disable:
+   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF);
+   return ret;
+}
+
+static int dev_disable_iopf(struct device *dev, void *data)
+{
+   int *enabled_dev_cnt = data;
+
+   if (enabled_dev_cnt && *enabled_dev_cnt <= 0)
+   return -1;
+
+   WARN_ON(iommu_unregister_device_fault_handler(dev));
+   WARN_ON(iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_IOPF));
+
+   if (enabled_dev_cnt)
+   (*enabled_dev_cnt)--;
+
+   return 0;
+}
+
 static int vfio_iommu_type1_attach_group(void *iommu_data,
 struct iommu_group *iommu_group)
 {
@@ -2376,6 +2472,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_domain_geometry geo;
LIST_HEAD(iova_copy);
LIST_HEAD(group_resv_regions);
+   int iopf_enabled_dev_cnt = 0;
+   struct vfio_iopf_group *iopf_group = NULL;
 
mutex_lock(>lock);
 
@@ -2453,6 +2551,24 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
if (ret)
goto out_domain;
 
+   if (iommu->iopf_enabled) {
+   ret = iommu_group_for_each_dev(iommu_group, 
_enabled_dev_cnt,
+  

[RFC PATCH v3 0/8] Add IOPF support for VFIO passthrough

2021-04-08 Thread Shenming Lu
Hi,

Requesting for your comments and suggestions. :-)

The static pinning and mapping problem in VFIO and possible solutions
have been discussed a lot [1, 2]. One of the solutions is to add I/O
Page Fault support for VFIO devices. Different from those relatively
complicated software approaches such as presenting a vIOMMU that provides
the DMA buffer information (might include para-virtualized optimizations),
IOPF mainly depends on the hardware faulting capability, such as the PCIe
PRI extension or Arm SMMU stall model. What's more, the IOPF support in
the IOMMU driver has already been implemented in SVA [3]. So we add IOPF
support for VFIO passthrough based on the IOPF part of SVA in this series.

We have measured its performance with UADK [4] (passthrough an accelerator
to a VM(1U16G)) on Hisilicon Kunpeng920 board (and compared with host SVA):

Run hisi_sec_test...
 - with varying sending times and message lengths
 - with/without IOPF enabled (speed slowdown)

when msg_len = 1MB (and PREMAP_LEN (in Patch 4) = 1):
slowdown (num of faults)
 times  VFIO IOPF  host SVA
 1  63.4% (518)82.8% (512)
 10022.9% (1058)   47.9% (1024)
 1000   2.6% (1071)8.5% (1024)

when msg_len = 10MB (and PREMAP_LEN = 512):
slowdown (num of faults)
 times  VFIO IOPF
 1  32.6% (13)
 1003.5% (26)
 1000   1.6% (26)

History:

v2 -> v3
 - Nit fixes.
 - No reason to disable reporting the unrecoverable faults. (baolu)
 - Maintain a global IOPF enabled group list.
 - Split the pre-mapping optimization to be a separate patch.
 - Add selective faulting support (use vfio_pin_pages to indicate the
   non-faultable scope and add a new struct vfio_range to record it,
   untested). (Kevin)

v1 -> v2
 - Numerous improvements following the suggestions. Thanks a lot to all
   of you.

Note that PRI is not supported at the moment since there is no hardware.

Links:
[1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS,
2016.
[2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer Tracking
for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020.
[3] 
https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210401154718.307519-1-jean-phili...@linaro.org/
[4] https://github.com/Linaro/uadk

Thanks,
Shenming


Shenming Lu (8):
  iommu: Evolve the device fault reporting framework
  vfio/type1: Add a page fault handler
  vfio/type1: Add an MMU notifier to avoid pinning
  vfio/type1: Pre-map more pages than requested in the IOPF handling
  vfio/type1: VFIO_IOMMU_ENABLE_IOPF
  vfio/type1: No need to statically pin and map if IOPF enabled
  vfio/type1: Add selective DMA faulting support
  vfio: Add nested IOPF support

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |3 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   18 +-
 drivers/iommu/iommu.c |   56 +-
 drivers/vfio/vfio.c   |   85 +-
 drivers/vfio/vfio_iommu_type1.c   | 1000 -
 include/linux/iommu.h |   19 +-
 include/linux/vfio.h  |   13 +
 include/uapi/linux/iommu.h|4 +
 include/uapi/linux/vfio.h |6 +
 9 files changed, 1181 insertions(+), 23 deletions(-)

-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v3 2/8] vfio/type1: Add a page fault handler

2021-04-08 Thread Shenming Lu
VFIO manages the DMA mapping itself. To support IOPF (on-demand paging)
for VFIO (IOMMU capable) devices, we add a VFIO page fault handler to
serve the reported page faults from the IOMMU driver.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio_iommu_type1.c | 114 
 1 file changed, 114 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 45cbfd4879a5..ab0ff60ee207 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -101,6 +101,7 @@ struct vfio_dma {
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
unsigned long   *bitmap;
+   unsigned long   *iopf_mapped_bitmap;
 };
 
 struct vfio_batch {
@@ -141,6 +142,16 @@ struct vfio_regions {
size_t len;
 };
 
+/* A global IOPF enabled group list */
+static struct rb_root iopf_group_list = RB_ROOT;
+static DEFINE_MUTEX(iopf_group_list_lock);
+
+struct vfio_iopf_group {
+   struct rb_node  node;
+   struct iommu_group  *iommu_group;
+   struct vfio_iommu   *iommu;
+};
+
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
@@ -157,6 +168,10 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX  ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
+#define IOPF_MAPPED_BITMAP_GET(dma, i) \
+ ((dma->iopf_mapped_bitmap[(i) / BITS_PER_LONG]
\
+  >> ((i) % BITS_PER_LONG)) & 0x1)
+
 #define WAITED 1
 
 static int put_pfn(unsigned long pfn, int prot);
@@ -416,6 +431,34 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
return ret;
 }
 
+/*
+ * Helper functions for iopf_group_list
+ */
+static struct vfio_iopf_group *
+vfio_find_iopf_group(struct iommu_group *iommu_group)
+{
+   struct vfio_iopf_group *iopf_group;
+   struct rb_node *node;
+
+   mutex_lock(_group_list_lock);
+
+   node = iopf_group_list.rb_node;
+
+   while (node) {
+   iopf_group = rb_entry(node, struct vfio_iopf_group, node);
+
+   if (iommu_group < iopf_group->iommu_group)
+   node = node->rb_left;
+   else if (iommu_group > iopf_group->iommu_group)
+   node = node->rb_right;
+   else
+   break;
+   }
+
+   mutex_unlock(_group_list_lock);
+   return node ? iopf_group : NULL;
+}
+
 static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 {
struct mm_struct *mm;
@@ -3106,6 +3149,77 @@ static int vfio_iommu_type1_dirty_pages(struct 
vfio_iommu *iommu,
return -EINVAL;
 }
 
+/* VFIO I/O Page Fault handler */
+static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data)
+{
+   struct device *dev = (struct device *)data;
+   struct iommu_group *iommu_group;
+   struct vfio_iopf_group *iopf_group;
+   struct vfio_iommu *iommu;
+   struct vfio_dma *dma;
+   dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
+   int access_flags = 0;
+   unsigned long bit_offset, vaddr, pfn;
+   int ret;
+   enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+   struct iommu_page_response resp = {0};
+
+   if (fault->type != IOMMU_FAULT_PAGE_REQ)
+   return -EOPNOTSUPP;
+
+   iommu_group = iommu_group_get(dev);
+   if (!iommu_group)
+   return -ENODEV;
+
+   iopf_group = vfio_find_iopf_group(iommu_group);
+   iommu_group_put(iommu_group);
+   if (!iopf_group)
+   return -ENODEV;
+
+   iommu = iopf_group->iommu;
+
+   mutex_lock(>lock);
+
+   ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, );
+   if (ret < 0)
+   goto out_invalid;
+
+   if (fault->prm.perm & IOMMU_FAULT_PERM_READ)
+   access_flags |= IOMMU_READ;
+   if (fault->prm.perm & IOMMU_FAULT_PERM_WRITE)
+   access_flags |= IOMMU_WRITE;
+   if ((dma->prot & access_flags) != access_flags)
+   goto out_invalid;
+
+   bit_offset = (iova - dma->iova) >> PAGE_SHIFT;
+   if (IOPF_MAPPED_BITMAP_GET(dma, bit_offset))
+   goto out_success;
+
+   vaddr = iova - dma->iova + dma->vaddr;
+
+   if (vfio_pin_page_external(dma, vaddr, , true))
+   goto out_invalid;
+
+   if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {
+   if (put_pfn(pfn, dma->prot))
+   vfio_lock_acct(dma, -1, true);
+   goto out_invalid;
+   }
+
+   bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1);
+
+out_success:
+   status = IOMMU_PAGE_RESP_SUCCESS;
+
+out_invalid:
+   mutex_unlock(>lock);
+   resp.version= IOMMU_PAGE_RESP_VERSION_1;
+   resp.grpid 

[RFC PATCH v3 3/8] vfio/type1: Add an MMU notifier to avoid pinning

2021-04-08 Thread Shenming Lu
To avoid pinning pages when they are mapped in IOMMU page tables, we
add an MMU notifier to tell the addresses which are no longer valid
and try to unmap them.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio_iommu_type1.c | 112 +++-
 1 file changed, 109 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ab0ff60ee207..1cb9d1f2717b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DRIVER_VERSION  "0.2"
 #define DRIVER_AUTHOR   "Alex Williamson "
@@ -69,6 +70,7 @@ struct vfio_iommu {
struct mutexlock;
struct rb_root  dma_list;
struct blocking_notifier_head notifier;
+   struct mmu_notifier mn;
unsigned intdma_avail;
unsigned intvaddr_invalid_count;
uint64_tpgsize_bitmap;
@@ -1204,6 +1206,72 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, 
struct vfio_dma *dma,
return unlocked;
 }
 
+/* Unmap the IOPF mapped pages in the specified range. */
+static void vfio_unmap_partial_iopf(struct vfio_iommu *iommu,
+   struct vfio_dma *dma,
+   dma_addr_t start, dma_addr_t end)
+{
+   struct iommu_iotlb_gather *gathers;
+   struct vfio_domain *d;
+   int i, num_domains = 0;
+
+   list_for_each_entry(d, >domain_list, next)
+   num_domains++;
+
+   gathers = kzalloc(sizeof(*gathers) * num_domains, GFP_KERNEL);
+   if (gathers) {
+   for (i = 0; i < num_domains; i++)
+   iommu_iotlb_gather_init([i]);
+   }
+
+   while (start < end) {
+   unsigned long bit_offset;
+   size_t len;
+
+   bit_offset = (start - dma->iova) >> PAGE_SHIFT;
+
+   for (len = 0; start + len < end; len += PAGE_SIZE) {
+   if (!IOPF_MAPPED_BITMAP_GET(dma,
+   bit_offset + (len >> PAGE_SHIFT)))
+   break;
+   }
+
+   if (len) {
+   i = 0;
+   list_for_each_entry(d, >domain_list, next) {
+   size_t unmapped;
+
+   if (gathers)
+   unmapped = iommu_unmap_fast(d->domain,
+   start, len,
+   
[i++]);
+   else
+   unmapped = iommu_unmap(d->domain,
+  start, len);
+
+   if (WARN_ON(unmapped != len))
+   goto out;
+   }
+
+   bitmap_clear(dma->iopf_mapped_bitmap,
+bit_offset, len >> PAGE_SHIFT);
+
+   cond_resched();
+   }
+
+   start += (len + PAGE_SIZE);
+   }
+
+out:
+   if (gathers) {
+   i = 0;
+   list_for_each_entry(d, >domain_list, next)
+   iommu_iotlb_sync(d->domain, [i++]);
+
+   kfree(gathers);
+   }
+}
+
 static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 {
WARN_ON(!RB_EMPTY_ROOT(>pfn_list));
@@ -3197,17 +3265,18 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
iommu_fault *fault, void *data)
 
vaddr = iova - dma->iova + dma->vaddr;
 
-   if (vfio_pin_page_external(dma, vaddr, , true))
+   if (vfio_pin_page_external(dma, vaddr, , false))
goto out_invalid;
 
if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {
-   if (put_pfn(pfn, dma->prot))
-   vfio_lock_acct(dma, -1, true);
+   put_pfn(pfn, dma->prot);
goto out_invalid;
}
 
bitmap_set(dma->iopf_mapped_bitmap, bit_offset, 1);
 
+   put_pfn(pfn, dma->prot);
+
 out_success:
status = IOMMU_PAGE_RESP_SUCCESS;
 
@@ -3220,6 +3289,43 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
iommu_fault *fault, void *data)
return 0;
 }
 
+static void mn_invalidate_range(struct mmu_notifier *mn, struct mm_struct *mm,
+   unsigned long start, unsigned long end)
+{
+   struct vfio_iommu *iommu = container_of(mn, struct vfio_iommu, mn);
+   struct rb_node *n;
+   int ret;
+
+   mutex_lock(>lock);
+
+   ret = vfio_wait_all_valid(iommu);
+   if (WARN_ON(ret < 0))
+   return;
+
+   for (n = rb_first(>dma_list); n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   unsigned long 

[RFC PATCH v3 4/8] vfio/type1: Pre-map more pages than requested in the IOPF handling

2021-04-08 Thread Shenming Lu
To optimize for fewer page fault handlings, we can pre-map more pages
than requested at once.

Note that IOPF_PREMAP_LEN is just an arbitrary value for now, which we
could try further tuning.

Signed-off-by: Shenming Lu 
---
 drivers/vfio/vfio_iommu_type1.c | 131 ++--
 1 file changed, 123 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1cb9d1f2717b..01e296c6dc9e 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -3217,6 +3217,91 @@ static int vfio_iommu_type1_dirty_pages(struct 
vfio_iommu *iommu,
return -EINVAL;
 }
 
+/*
+ * To optimize for fewer page fault handlings, try to
+ * pre-map more pages than requested.
+ */
+#define IOPF_PREMAP_LEN512
+
+/*
+ * Return 0 on success or a negative error code, the
+ * number of pages contiguously pinned is in @pinned.
+ */
+static int pin_pages_iopf(struct vfio_dma *dma, unsigned long vaddr,
+ unsigned long npages, unsigned long *pfn_base,
+ unsigned long *pinned, struct vfio_batch *batch)
+{
+   struct mm_struct *mm;
+   unsigned long pfn;
+   int ret = 0;
+   *pinned = 0;
+
+   mm = get_task_mm(dma->task);
+   if (!mm)
+   return -ENODEV;
+
+   if (batch->size) {
+   *pfn_base = page_to_pfn(batch->pages[batch->offset]);
+   pfn = *pfn_base;
+   } else {
+   *pfn_base = 0;
+   }
+
+   while (npages) {
+   if (!batch->size) {
+   unsigned long req_pages = min_t(unsigned long, npages,
+   batch->capacity);
+
+   ret = vaddr_get_pfns(mm, vaddr, req_pages, dma->prot,
+, batch->pages);
+   if (ret < 0)
+   goto out;
+
+   batch->size = ret;
+   batch->offset = 0;
+   ret = 0;
+
+   if (!*pfn_base)
+   *pfn_base = pfn;
+   }
+
+   while (true) {
+   if (pfn != *pfn_base + *pinned)
+   goto out;
+
+   (*pinned)++;
+   npages--;
+   vaddr += PAGE_SIZE;
+   batch->offset++;
+   batch->size--;
+
+   if (!batch->size)
+   break;
+
+   pfn = page_to_pfn(batch->pages[batch->offset]);
+   }
+
+   if (unlikely(disable_hugepages))
+   break;
+   }
+
+out:
+   if (batch->size == 1 && !batch->offset) {
+   put_pfn(pfn, dma->prot);
+   batch->size = 0;
+   }
+
+   mmput(mm);
+   return ret;
+}
+
+static void unpin_pages_iopf(struct vfio_dma *dma,
+unsigned long pfn, unsigned long npages)
+{
+   while (npages--)
+   put_pfn(pfn++, dma->prot);
+}
+
 /* VFIO I/O Page Fault handler */
 static int vfio_iommu_type1_dma_map_iopf(struct iommu_fault *fault, void *data)
 {
@@ -3225,9 +3310,11 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
iommu_fault *fault, void *data)
struct vfio_iopf_group *iopf_group;
struct vfio_iommu *iommu;
struct vfio_dma *dma;
+   struct vfio_batch batch;
dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
int access_flags = 0;
-   unsigned long bit_offset, vaddr, pfn;
+   size_t premap_len, map_len, mapped_len = 0;
+   unsigned long bit_offset, vaddr, pfn, i, npages;
int ret;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
struct iommu_page_response resp = {0};
@@ -3263,19 +3350,47 @@ static int vfio_iommu_type1_dma_map_iopf(struct 
iommu_fault *fault, void *data)
if (IOPF_MAPPED_BITMAP_GET(dma, bit_offset))
goto out_success;
 
+   premap_len = IOPF_PREMAP_LEN << PAGE_SHIFT;
+   npages = dma->size >> PAGE_SHIFT;
+   map_len = PAGE_SIZE;
+   for (i = bit_offset + 1; i < npages; i++) {
+   if (map_len >= premap_len || IOPF_MAPPED_BITMAP_GET(dma, i))
+   break;
+   map_len += PAGE_SIZE;
+   }
vaddr = iova - dma->iova + dma->vaddr;
+   vfio_batch_init();
 
-   if (vfio_pin_page_external(dma, vaddr, , false))
-   goto out_invalid;
+   while (map_len) {
+   ret = pin_pages_iopf(dma, vaddr + mapped_len,
+map_len >> PAGE_SHIFT, ,
+, );
+   if (!npages)
+   break;
 
-   if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {
-   put_pfn(pfn, dma->prot);
-   goto 

[RFC PATCH v3 1/8] iommu: Evolve the device fault reporting framework

2021-04-08 Thread Shenming Lu
This patch follows the discussion here:

https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/

Besides SVA/vSVA, such as VFIO may also enable (2nd level) IOPF to remove
pinning restriction. In order to better support more scenarios of using
device faults, we extend iommu_register_fault_handler() with flags and
introduce FAULT_REPORT_ to describe the device fault reporting capability
under a specific configuration.

Note that we don't further distinguish recoverable and unrecoverable faults
by flags in the fault reporting cap, having PAGE_FAULT_REPORT_ +
UNRECOV_FAULT_REPORT_ seems not a clean way.

In addition, still take VFIO as an example, in nested mode, the 1st level
and 2nd level fault reporting may be configured separately and currently
each device can only register one iommu dev fault handler, so we add a
handler update interface for this.

Signed-off-by: Shenming Lu 
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  3 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 18 --
 drivers/iommu/iommu.c | 56 ++-
 include/linux/iommu.h | 19 ++-
 include/uapi/linux/iommu.h|  4 ++
 5 files changed, 90 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index ee66d1f4cb81..e6d766fb8f1a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -482,7 +482,8 @@ static int arm_smmu_master_sva_enable_iopf(struct 
arm_smmu_master *master)
if (ret)
return ret;
 
-   ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf, dev);
+   ret = iommu_register_device_fault_handler(dev, iommu_queue_iopf,
+ FAULT_REPORT_FLAT, dev);
if (ret) {
iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
return ret;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 53abad8fdd91..51843f54a87f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1448,10 +1448,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device 
*smmu, u64 *evt)
return -EOPNOTSUPP;
}
 
-   /* Stage-2 is always pinned at the moment */
-   if (evt[1] & EVTQ_1_S2)
-   return -EFAULT;
-
if (evt[1] & EVTQ_1_RnW)
perm |= IOMMU_FAULT_PERM_READ;
else
@@ -1469,26 +1465,36 @@ static int arm_smmu_handle_evt(struct arm_smmu_device 
*smmu, u64 *evt)
.flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE,
.grpid = FIELD_GET(EVTQ_1_STAG, evt[1]),
.perm = perm,
-   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
};
 
if (ssid_valid) {
flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
}
+
+   if (evt[1] & EVTQ_1_S2) {
+   flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_L2;
+   flt->prm.addr = FIELD_GET(EVTQ_3_IPA, evt[3]);
+   } else
+   flt->prm.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]);
} else {
flt->type = IOMMU_FAULT_DMA_UNRECOV;
flt->event = (struct iommu_fault_unrecoverable) {
.reason = reason,
.flags = IOMMU_FAULT_UNRECOV_ADDR_VALID,
.perm = perm,
-   .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]),
};
 
if (ssid_valid) {
flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID;
flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]);
}
+
+   if (evt[1] & EVTQ_1_S2) {
+   flt->event.flags |= IOMMU_FAULT_UNRECOV_L2;
+   flt->event.addr = FIELD_GET(EVTQ_3_IPA, evt[3]);
+   } else
+   flt->event.addr = FIELD_GET(EVTQ_2_ADDR, evt[2]);
}
 
mutex_lock(>streams_mutex);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..b50b526b45ac 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1056,6 +1056,40 @@ int iommu_group_unregister_notifier(struct iommu_group 
*group,
 }
 EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
 
+/*
+ * iommu_update_device_fault_handler - Update the device fault handler via 
flags
+ * @dev: the device
+ * @mask: bits(not set) to clear
+ * @set: bits to set
+ *
+ * Update the device fault handler installed by
+ * iommu_register_device_fault_handler().
+ *
+ * Return 0 on success, or an error.
+ */
+int iommu_update_device_fault_handler(struct 

[PATCH RESEND] iommu/amd: Remove duplicate check of devid

2021-04-08 Thread Shaokun Zhang
'devid' has been checked in function check_device, no need to double
check and clean up this.

Cc: Joerg Roedel 
Cc: Will Deacon 
Signed-off-by: Shaokun Zhang 
---
 drivers/iommu/amd/iommu.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b4fa94a97446..cd1a85ff2cf4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1677,9 +1677,6 @@ static struct iommu_device *amd_iommu_probe_device(struct 
device *dev)
return ERR_PTR(-ENODEV);
 
devid = get_device_id(dev);
-   if (devid < 0)
-   return ERR_PTR(devid);
-
iommu = amd_iommu_rlookup_table[devid];
 
if (dev_iommu_priv_get(dev))
@@ -1961,16 +1958,12 @@ static void amd_iommu_detach_device(struct iommu_domain 
*dom,
struct device *dev)
 {
struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
+   int devid = get_device_id(dev);
struct amd_iommu *iommu;
-   int devid;
 
if (!check_device(dev))
return;
 
-   devid = get_device_id(dev);
-   if (devid < 0)
-   return;
-
if (dev_data->domain != NULL)
detach_device(dev);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

2021-04-08 Thread Florian Fainelli



On 3/24/2021 1:42 AM, Christoph Hellwig wrote:
> On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote:
>> When SWIOTLB_NO_FORCE is used, there should really be no allocations of
>> default_nslabs to occur since we are not going to use those slabs. If a
>> platform was somehow setting swiotlb_no_force and a later call to
>> swiotlb_init() was to be made we would still be proceeding with
>> allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce
>> was set on the kernel command line we would have only allocated 2KB.
>>
>> This would be inconsistent and the point of initializing default_nslabs
>> to 1, was intended to allocate the minimum amount of memory possible, so
>> simply remove that minimal allocation period.
>>
>> Signed-off-by: Florian Fainelli 
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig 
> 

Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch
if you are also happy with it?
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Remove duplicate check of devid

2021-04-08 Thread Shaokun Zhang
Apologies for my mistake:

devid = get_device_id(dev) is need, only the check is unnecessary.

Thanks,
Shaokun

On 2021/4/9 9:31, Shaokun Zhang wrote:
> 'devid' has been checked in function check_device, no need to double
> check and clean up this.
> 
> Cc: Joerg Roedel 
> Cc: Will Deacon 
> Signed-off-by: Shaokun Zhang 
> ---
>  drivers/iommu/amd/iommu.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index b4fa94a97446..4130df7c30c6 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1676,10 +1676,6 @@ static struct iommu_device 
> *amd_iommu_probe_device(struct device *dev)
>   if (!check_device(dev))
>   return ERR_PTR(-ENODEV);
>  
> - devid = get_device_id(dev);
> - if (devid < 0)
> - return ERR_PTR(devid);
> -
>   iommu = amd_iommu_rlookup_table[devid];
>  
>   if (dev_iommu_priv_get(dev))
> @@ -1967,10 +1963,6 @@ static void amd_iommu_detach_device(struct 
> iommu_domain *dom,
>   if (!check_device(dev))
>   return;
>  
> - devid = get_device_id(dev);
> - if (devid < 0)
> - return;
> -
>   if (dev_data->domain != NULL)
>   detach_device(dev);
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/sva: Remove mm parameter from SVA bind API

2021-04-08 Thread Jacob Pan
The mm parameter in iommu_sva_bind_device() is intended for privileged
process perform bind() on behalf of other processes. This use case has
yet to be materialized, let alone potential security implications of
adding kernel hooks without explicit user consent.
In addition, with the agreement that IOASID allocation shall be subject
cgroup limit. It will be inline with misc cgroup proposal if IOASID
allocation as part of the SVA bind is limited to the current task.

Link: https://lore.kernel.org/linux-iommu/20210303160205.151d114e@jacob-builder/
Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c |  2 +-
 drivers/dma/idxd/init.c |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  6 ++---
 drivers/iommu/iommu-sva-lib.c   | 30 +++--
 drivers/iommu/iommu-sva-lib.h   |  4 ++--
 drivers/iommu/iommu.c   | 16 -
 drivers/misc/uacce/uacce.c  |  2 +-
 include/linux/iommu.h   |  7 +++---
 8 files changed, 45 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 21ec82b..8c3347c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, 0);
+   sva = iommu_sva_bind_device(dev, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index cdc85f1..a583f79 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -306,7 +306,7 @@ static int idxd_enable_system_pasid(struct idxd_device 
*idxd)
 
flags = IOMMU_SVA_BIND_SUPERVISOR;
 
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
+   sva = iommu_sva_bind_device(>pdev->dev, flags);
if (IS_ERR(sva)) {
dev_warn(>pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 23e287e..bdd5c79 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
return ERR_PTR(-ENOMEM);
 
/* Allocate a PASID for this mm if necessary */
-   ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+   ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1);
if (ret)
goto err_free_bond;
 
@@ -347,7 +347,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
return >sva;
 
 err_free_pasid:
-   iommu_sva_free_pasid(mm);
+   iommu_sva_free_pasid();
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -377,7 +377,7 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
-   iommu_sva_free_pasid(bond->mm);
+   iommu_sva_free_pasid();
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405..bd99f6b 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -12,27 +12,33 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
 
 /**
  * iommu_sva_alloc_pasid - Allocate a PASID for the mm
- * @mm: the mm
  * @min: minimum PASID value (inclusive)
  * @max: maximum PASID value (inclusive)
  *
- * Try to allocate a PASID for this mm, or take a reference to the existing one
- * provided it fits within the [@min, @max] range. On success the PASID is
- * available in mm->pasid, and must be released with iommu_sva_free_pasid().
+ * Try to allocate a PASID for the current mm, or take a reference to the
+ * existing one provided it fits within the [@min, @max] range. On success
+ * the PASID is available in the current mm->pasid, and must be released with
+ * iommu_sva_free_pasid().
  * @min must be greater than 0, because 0 indicates an unused mm->pasid.
  *
  * Returns 0 on success and < 0 on error.
  */
-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
 {
int ret = 0;
ioasid_t pasid;
+   struct mm_struct *mm;
 
if (min == INVALID_IOASID || max == INVALID_IOASID ||
min == 0 || max < min)
return -EINVAL;
 
mutex_lock(_sva_lock);
+   mm = 

[PATCH 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-04-08 Thread Jacob Pan
The void* drvdata parameter isn't really used in iommu_sva_bind_device()
API, the current IDXD code "borrows" the drvdata for a VT-d private flag
for supervisor SVA usage.

Supervisor/Privileged mode request is a generic feature. It should be
promoted from the VT-d vendor driver to the generic code.

This patch replaces void* drvdata with a unsigned int flags parameter
and adjusts callers accordingly.

Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Suggested-by: Jean-Philippe Brucker 
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c |  2 +-
 drivers/dma/idxd/init.c |  6 +++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  4 ++--
 drivers/iommu/intel/Kconfig |  1 +
 drivers/iommu/intel/svm.c   | 18 ++
 drivers/iommu/iommu.c   |  9 ++---
 drivers/misc/uacce/uacce.c  |  2 +-
 include/linux/intel-iommu.h |  2 +-
 include/linux/intel-svm.h   | 17 ++---
 include/linux/iommu.h   | 19 ---
 11 files changed, 40 insertions(+), 42 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 0db9b82..21ec82b 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -103,7 +103,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, NULL);
+   sva = iommu_sva_bind_device(dev, current->mm, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 085a0c3..cdc85f1 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -300,13 +300,13 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev)
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
+   unsigned int flags;
unsigned int pasid;
struct iommu_sva *sva;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
+   flags = IOMMU_SVA_BIND_SUPERVISOR;
 
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
+   sva = iommu_sva_bind_device(>pdev->dev, NULL, flags);
if (IS_ERR(sva)) {
dev_warn(>pdev->dev,
 "iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251ca..23e287e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }
 
 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index f985817..b971d4d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -711,7 +711,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
+   unsigned int flags);
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
@@ -742,7 +742,7 @@ static inline int arm_smmu_master_disable_sva(struct 
arm_smmu_master *master)
 }
 
 static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
 {
return ERR_PTR(-ENODEV);
 }
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 28a3d15..5415052 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -41,6 +41,7 @@ config INTEL_IOMMU_SVM
select PCI_PRI
select MMU_NOTIFIER
select IOASID
+   select IOMMU_SVA_LIB
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 574a7e6..4b5f8b0 100644
--- a/drivers/iommu/intel/svm.c
+++ 

[PATCH] iommu/amd: Remove duplicate check of devid

2021-04-08 Thread Shaokun Zhang
'devid' has been checked in function check_device, no need to double
check and clean up this.

Cc: Joerg Roedel 
Cc: Will Deacon 
Signed-off-by: Shaokun Zhang 
---
 drivers/iommu/amd/iommu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b4fa94a97446..4130df7c30c6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1676,10 +1676,6 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
if (!check_device(dev))
return ERR_PTR(-ENODEV);
 
-   devid = get_device_id(dev);
-   if (devid < 0)
-   return ERR_PTR(devid);
-
iommu = amd_iommu_rlookup_table[devid];
 
if (dev_iommu_priv_get(dev))
@@ -1967,10 +1963,6 @@ static void amd_iommu_detach_device(struct iommu_domain 
*dom,
if (!check_device(dev))
return;
 
-   devid = get_device_id(dev);
-   if (devid < 0)
-   return;
-
if (dev_data->domain != NULL)
detach_device(dev);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: exynos: remove unneeded local variable initialization

2021-04-08 Thread Krzysztof Kozlowski
The initialization of 'fault_addr' local variable is not needed as it is
shortly after overwritten.

Addresses-Coverity: Unused value
Signed-off-by: Krzysztof Kozlowski 
---
 drivers/iommu/exynos-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index de324b4eedfe..8fa9a591fb96 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -407,7 +407,7 @@ static irqreturn_t exynos_sysmmu_irq(int irq, void *dev_id)
struct sysmmu_drvdata *data = dev_id;
const struct sysmmu_fault_info *finfo;
unsigned int i, n, itype;
-   sysmmu_iova_t fault_addr = -1;
+   sysmmu_iova_t fault_addr;
unsigned short reg_status, reg_clear;
int ret = -ENOSYS;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-08 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 75 ++
 1 file changed, 49 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index ea66b10c04c4..1b690911995a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -46,6 +46,9 @@
 #define ARM_LPAE_PGD_SIZE(d)   \
(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
+#define ARM_LPAE_PTES_PER_TABLE(d) \
+   (ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
+
 /*
  * Calculate the index at level l used to map virtual address a using the
  * pagetable in d.
@@ -253,8 +256,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, 
arm_lpae_iopte pte,
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep);
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep);
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -298,7 +301,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
 lvl, tblp) != sz) {
WARN_ON(1);
return -EINVAL;
@@ -526,14 +529,15 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
   unsigned long iova, size_t size,
   arm_lpae_iopte blk_pte, int lvl,
-  arm_lpae_iopte *ptep)
+  arm_lpae_iopte *ptep, size_t pgcount)
 {
struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte, *tablep;
phys_addr_t blk_paddr;
size_t tablesz = ARM_LPAE_GRANULE(data);
size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-   int i, unmap_idx = -1;
+   int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
+   int i, unmap_idx_start = -1, num_entries = 0, max_entries;
 
if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
@@ -542,15 +546,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
if (!tablep)
return 0; /* Bytes unmapped */
 
-   if (size == split_sz)
-   unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (size == split_sz) {
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   max_entries = ptes_per_table - unmap_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   }
 
blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
 
-   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
+   for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
/* Unmap! */
-   if (i == unmap_idx)
+   if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
continue;
 
__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
@@ -568,38 +575,44 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, data);
-   } else if (unmap_idx >= 0) {
-   io_pgtable_tlb_add_page(>iop, gather, iova, size);
-   return size;
+   } else if (unmap_idx_start >= 0) {
+   for (i = 0; i < num_entries; i++)
+   io_pgtable_tlb_add_page(>iop, gather, iova + i * 
size, size);
+
+   return num_entries * size;
}
 
-   return __arm_lpae_unmap(data, gather, iova, size, lvl, tablep);
+   return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep)
+  unsigned long iova, size_t size, size_t pgcount,
+  int 

[RFC PATCH v5 13/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()

2021-04-08 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 1af060686985..3331caafb273 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -519,11 +519,12 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
 }
 
-static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
-   int ret;
+   int ret = -EINVAL;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
paddr >= (1ULL << data->iop.cfg.oas)))
@@ -533,7 +534,17 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
 
-   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
+   while (pgcount--) {
+   ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, 
data->pgd,
+   gfp);
+   if (ret)
+   break;
+
+   iova += pgsize;
+   paddr += pgsize;
+   if (mapped)
+   *mapped += pgsize;
+   }
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -543,6 +554,12 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+   return arm_v7s_map_pages(ops, iova, paddr, size, 1, prot, gfp, NULL);
+}
+
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
@@ -797,6 +814,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_v7s_map,
+   .map_pages  = arm_v7s_map_pages,
.unmap  = arm_v7s_unmap,
.unmap_pages= arm_v7s_unmap_pages,
.iova_to_phys   = arm_v7s_iova_to_phys,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 14/15] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback

2021-04-08 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM SMMU driver
to allow calls from iommu_unmap to unmap multiple pages of
the same size in one call. Also, remove the unmap() callback
for the SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index d8c6bfde6a61..188e506d75e1 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1208,8 +1208,9 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return ret;
 }
 
-static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
-size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long 
iova,
+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *iotlb_gather)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1219,7 +1220,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return 0;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->unmap(ops, iova, size, gather);
+   ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1624,7 +1625,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
-   .unmap  = arm_smmu_unmap,
+   .unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-04-08 Thread Isaac J. Manjarres
The PTE methods currently operate on a single entry. In preparation
for manipulating multiple PTEs in one map or unmap call, allow them
to handle multiple PTEs.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Robin Murphy 
---
 drivers/iommu/io-pgtable-arm.c | 78 +++---
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..ea66b10c04c4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -232,20 +232,23 @@ static void __arm_lpae_free_pages(void *pages, size_t 
size,
free_pages((unsigned long)pages, get_order(size));
 }
 
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
+static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
struct io_pgtable_cfg *cfg)
 {
dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
-  sizeof(*ptep), DMA_TO_DEVICE);
+  sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
 static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
-  struct io_pgtable_cfg *cfg)
+  int num_entries, struct io_pgtable_cfg *cfg)
 {
-   *ptep = pte;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte;
 
if (!cfg->coherent_walk)
-   __arm_lpae_sync_pte(ptep, cfg);
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -255,47 +258,54 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
-   int lvl, arm_lpae_iopte *ptep)
+   int lvl, int num_entries, arm_lpae_iopte *ptep)
 {
arm_lpae_iopte pte = prot;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
 
if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
pte |= ARM_LPAE_PTE_TYPE_PAGE;
else
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
-   pte |= paddr_to_iopte(paddr, data);
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
 
-   __arm_lpae_set_pte(ptep, pte, >iop.cfg);
+   if (!cfg->coherent_walk)
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 unsigned long iova, phys_addr_t paddr,
-arm_lpae_iopte prot, int lvl,
+arm_lpae_iopte prot, int lvl, int num_entries,
 arm_lpae_iopte *ptep)
 {
-   arm_lpae_iopte pte = *ptep;
-
-   if (iopte_leaf(pte, lvl, data->iop.fmt)) {
-   /* We require an unmap first */
-   WARN_ON(!selftest_running);
-   return -EEXIST;
-   } else if (iopte_type(pte) == ARM_LPAE_PTE_TYPE_TABLE) {
-   /*
-* We need to unmap and free the old table before
-* overwriting it with a block entry.
-*/
-   arm_lpae_iopte *tblp;
-   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova, sz, lvl, tblp) != sz) {
-   WARN_ON(1);
-   return -EINVAL;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+   /* We require an unmap first */
+   WARN_ON(!selftest_running);
+   return -EEXIST;
+   } else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
+   /*
+* We need to unmap and free the old table before
+* overwriting it with a block entry.
+*/
+   arm_lpae_iopte *tblp;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+lvl, tblp) != sz) {
+   WARN_ON(1);
+   return -EINVAL;
+   }
}
-   }
 
-   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
return 0;
 }
 
@@ -323,7 +333,7 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte 

[RFC PATCH v5 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-08 Thread Isaac J. Manjarres
From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bcd623862bf9..624ce3c7ae33 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   unsigned long addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
@@ -2415,7 +2416,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, , pgsize);
@@ -2503,8 +2504,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+   size_t pgsize;
 
+   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
if (!unmapped_page)
break;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 07/15] iommu: Hook up '->unmap_pages' driver callback

2021-04-08 Thread Isaac J. Manjarres
From: Will Deacon 

Extend iommu_pgsize() to populate an optional 'count' parameter so that
we can direct unmapping operation to the ->unmap_pages callback if it
has been provided by the driver.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 60 ---
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 624ce3c7ae33..1fc919ea95ac 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2358,11 +2358,11 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
 static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
-  phys_addr_t paddr, size_t size)
+  phys_addr_t paddr, size_t size, size_t *count)
 {
-   unsigned int pgsize_idx;
+   unsigned int pgsize_idx, pgsize_idx_next;
unsigned long pgsizes;
-   size_t pgsize;
+   size_t offset, pgsize, pgsize_next;
unsigned long addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
@@ -2378,7 +2378,37 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
/* Pick the biggest page size remaining */
pgsize_idx = __fls(pgsizes);
pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
 
+
+   /* Find the next biggest support page size, if it exists */
+   pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
return pgsize;
 }
 
@@ -2416,7 +2446,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, , pgsize);
@@ -2467,6 +2497,19 @@ int iommu_map_atomic(struct iommu_domain *domain, 
unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
+static size_t __iommu_unmap_pages(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ struct iommu_iotlb_gather *iotlb_gather)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+
+   pgsize = iommu_pgsize(domain, iova, iova, size, );
+   return ops->unmap_pages ?
+  ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather) :
+  ops->unmap(domain, iova, pgsize, iotlb_gather);
+}
+
 static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
@@ -2476,7 +2519,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long orig_iova = iova;
unsigned int min_pagesz;
 
-   if (unlikely(ops->unmap == NULL ||
+   if (unlikely(!(ops->unmap || ops->unmap_pages) ||
 domain->pgsize_bitmap == 0UL))
return 0;
 
@@ -2504,10 +2547,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize;
-
-   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
-   unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
+   unmapped_page = __iommu_unmap_pages(domain, iova,
+   size - unmapped,
+   iotlb_gather);
if (!unmapped_page)
break;
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org

[RFC PATCH v5 04/15] iommu: Add a map_pages() op for IOMMU drivers

2021-04-08 Thread Isaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can
call into the io-pgtable code, to map a physically contiguous
rnage of pages of the same size.

For IOMMU drivers that do not specify a map_pages() callback,
the existing logic of mapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Acked-by: Lu Baolu 
---
 include/linux/iommu.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9cf81242581a..528d6a58479e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,8 @@ struct iommu_iotlb_gather {
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
+ * @map_pages: map a physically contiguous set of pages of the same size to
+ * an iommu domain.
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
@@ -244,6 +246,9 @@ struct iommu_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 08/15] iommu: Add support for the map_pages() callback

2021-04-08 Thread Isaac J. Manjarres
Since iommu_pgsize can calculate how many pages of the
same size can be mapped/unmapped before the next largest
page size boundary, add support for invoking an IOMMU
driver's map_pages() callback, if it provides one.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/iommu.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1fc919ea95ac..c94b6b3198b6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2412,6 +2412,30 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
return pgsize;
 }
 
+static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t size, int prot,
+gfp_t gfp, size_t *mapped)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+   int ret;
+
+   pgsize = iommu_pgsize(domain, iova, paddr, size, );
+
+   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %ld\n",
+iova, , pgsize, count);
+
+   if (ops->map_pages) {
+   ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot,
+gfp, mapped);
+   } else {
+   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   *mapped = ret ? 0 : pgsize;
+   }
+
+   return ret;
+}
+
 static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
@@ -2422,7 +2446,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t orig_paddr = paddr;
int ret = 0;
 
-   if (unlikely(ops->map == NULL ||
+   if (unlikely(!(ops->map || ops->map_pages) ||
 domain->pgsize_bitmap == 0UL))
return -ENODEV;
 
@@ -2446,18 +2470,21 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
+   size_t mapped = 0;
 
-   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
-iova, , pgsize);
-   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   ret = __iommu_map_pages(domain, iova, paddr, size, prot, gfp,
+   );
+   /*
+* Some pages may have been mapped, even if an error occurred,
+* so we should account for those so they can be unmapped.
+*/
+   size -= mapped;
 
if (ret)
break;
 
-   iova += pgsize;
-   paddr += pgsize;
-   size -= pgsize;
+   iova += mapped;
+   paddr += mapped;
}
 
/* unroll mapping in case something went wrong */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 15/15] iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

2021-04-08 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM SMMU driver
to allow calls from iommu_map to map multiple pages of
the same size in one call. Also, remove the map() callback
for the ARM SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 188e506d75e1..8fcc422e2f2f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1191,8 +1191,9 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 }
 
-static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int prot, gfp_t gfp, size_t *mapped)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1202,7 +1203,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return -ENODEV;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->map(ops, iova, paddr, size, prot, gfp);
+   ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, 
mapped);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1624,7 +1625,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_alloc   = arm_smmu_domain_alloc,
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
-   .map= arm_smmu_map,
+   .map_pages  = arm_smmu_map_pages,
.unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-08 Thread Isaac J. Manjarres
Implement the unmap_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index d4004bcf333a..1af060686985 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
 }
 
-static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-   size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long 
iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   size_t unmapped = 0, ret;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;
 
-   return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
+   while (pgcount--) {
+   ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+   if (!ret)
+   break;
+
+   unmapped += pgsize;
+   iova += pgsize;
+   }
+
+   return unmapped;
+}
+
+static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
+   size_t size, struct iommu_iotlb_gather *gather)
+{
+   return arm_v7s_unmap_pages(ops, iova, size, 1, gather);
 }
 
 static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
@@ -781,6 +798,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_v7s_map,
.unmap  = arm_v7s_unmap,
+   .unmap_pages= arm_v7s_unmap_pages,
.iova_to_phys   = arm_v7s_iova_to_phys,
};
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages()

2021-04-08 Thread Isaac J. Manjarres
Implement the map_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm.c | 42 ++
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 1b690911995a..92978dd9c885 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -344,20 +344,30 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
 }
 
 static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
- phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
- int lvl, arm_lpae_iopte *ptep, gfp_t gfp)
+ phys_addr_t paddr, size_t size, size_t pgcount,
+ arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+ gfp_t gfp, size_t *mapped)
 {
arm_lpae_iopte *cptep, pte;
size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
size_t tblsz = ARM_LPAE_GRANULE(data);
struct io_pgtable_cfg *cfg = >iop.cfg;
+   int ret = 0, num_entries, max_entries, map_idx_start;
 
/* Find our entry at the current level */
-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   ptep += map_idx_start;
 
/* If we can install a leaf entry at this level, then do so */
-   if (size == block_size)
-   return arm_lpae_init_pte(data, iova, paddr, prot, lvl, 1, ptep);
+   if (size == block_size) {
+   max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, 
num_entries, ptep);
+   if (!ret && mapped)
+   *mapped += num_entries * size;
+
+   return ret;
+   }
 
/* We can't allocate tables at the final level */
if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
@@ -386,7 +396,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
}
 
/* Rinse, repeat */
-   return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
+   return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+ cptep, gfp, mapped);
 }
 
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
@@ -453,8 +464,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
return pte;
 }
 
-static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int iommu_prot, gfp_t gfp, size_t *mapped)
 {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable_cfg *cfg = >iop.cfg;
@@ -463,7 +475,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
arm_lpae_iopte prot;
long iaext = (s64)iova >> cfg->ias;
 
-   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
return -EINVAL;
 
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
@@ -476,7 +488,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return 0;
 
prot = arm_lpae_prot_to_pte(data, iommu_prot);
-   ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep, gfp);
+   ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+ptep, gfp, NULL);
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -486,6 +499,14 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+
+static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+{
+   return arm_lpae_map_pages(ops, iova, paddr, size, 1, iommu_prot, gfp,
+ NULL);
+}
+
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
arm_lpae_iopte *ptep)
 {
@@ -782,6 +803,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_lpae_map,
+   .map_pages  = arm_lpae_map_pages,
.unmap  = arm_lpae_unmap,
.unmap_pages= arm_lpae_unmap_pages,

[RFC PATCH v5 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-04-08 Thread Isaac J. Manjarres
From: Will Deacon 

Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0b0a15dba84..bcd623862bf9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2360,30 +2361,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
   unsigned long addr_merge, size_t size)
 {
unsigned int pgsize_idx;
+   unsigned long pgsizes;
size_t pgsize;
 
-   /* Max page size that still fits into 'size' */
-   pgsize_idx = __fls(size);
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
 
-   /* need to consider alignment requirements ? */
-   if (likely(addr_merge)) {
-   /* Max page size allowed by address */
-   unsigned int align_pgsize_idx = __ffs(addr_merge);
-   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-   }
-
-   /* build a mask of acceptable page sizes */
-   pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-   /* throw away page sizes not supported by the hardware */
-   pgsize &= domain->pgsize_bitmap;
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
 
-   /* make sure we're still sane */
-   BUG_ON(!pgsize);
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
 
-   /* pick the biggest page */
-   pgsize_idx = __fls(pgsize);
-   pgsize = 1UL << pgsize_idx;
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
 
return pgsize;
 }
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 03/15] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-04-08 Thread Isaac J. Manjarres
Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 2ed0c057d9e7..019149b204b8 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -143,6 +143,7 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map:  Map a physically contiguous memory region.
+ * @map_pages:Map a physically contiguous range of pages of the same size.
  * @unmap:Unmap a physically contiguous memory region.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
@@ -153,6 +154,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 01/15] iommu/io-pgtable: Introduce unmap_pages() as a page table op

2021-04-08 Thread Isaac J. Manjarres
The io-pgtable code expects to operate on a single block or
granule of memory that is supported by the IOMMU hardware when
unmapping memory.

This means that when a large buffer that consists of multiple
such blocks is unmapped, the io-pgtable code will walk the page
tables to the correct level to unmap each block, even for blocks
that are virtually contiguous and at the same level, which can
incur an overhead in performance.

Introduce the unmap_pages() page table op to express to the
io-pgtable code that it should unmap a number of blocks of
the same size, instead of a single block. Doing so allows
multiple blocks to be unmapped in one call to the io-pgtable
code, reducing the number of page table walks, and indirect
calls.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..2ed0c057d9e7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -144,6 +144,7 @@ struct io_pgtable_cfg {
  *
  * @map:  Map a physically contiguous memory region.
  * @unmap:Unmap a physically contiguous memory region.
+ * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
  *
  * These functions map directly onto the iommu_ops member functions with
@@ -154,6 +155,9 @@ struct io_pgtable_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
+   size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
 };
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 02/15] iommu: Add an unmap_pages() op for IOMMU drivers

2021-04-08 Thread Isaac J. Manjarres
Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.

For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
Acked-by: Lu Baolu 
---
 include/linux/iommu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5e7fe519430a..9cf81242581a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -193,6 +193,7 @@ struct iommu_iotlb_gather {
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
@@ -245,6 +246,9 @@ struct iommu_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
+   size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
   size_t size);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v5 00/15] Optimizing iommu_[map/unmap] performance

2021-04-08 Thread Isaac J. Manjarres
When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
the buffer at a granule of the largest page size that is supported by
the IOMMU hardware and fits within the buffer. For every block that
is unmapped, the IOMMU framework will call into the IOMMU driver, and
then the io-pgtable framework to walk the page tables to find the entry
that corresponds to the IOVA, and then unmaps the entry.

This can be suboptimal in scenarios where a buffer or a piece of a
buffer can be split into several contiguous page blocks of the same size.
For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
and 2 page table walks, to unmap 2 entries that are next to each other in
the page-tables, when both entries could have been unmapped in one shot
by clearing both page table entries in the same call.

The same optimization is applicable to mapping buffers as well, so
these patches implement a set of callbacks called unmap_pages and
map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
an IOVA range that consists of a number of pages of the same
page size that is supported by the IOMMU hardware, and allows for
manipulating multiple page table entries in the same set of indirect
calls. The reason for introducing these callbacks is to give other IOMMU
drivers/io-pgtable formats time to change to using the new callbacks, so
that the transition to using this approach can be done piecemeal.

Changes since V4:

* Fixed type for addr_merge from phys_addr_t to unsigned long so
  that GENMASK() can be used.
* Hooked up arm_v7s_[unmap/map]_pages to the io-pgtable ops.
* Introduced a macro for calculating the number of page table entries
  for the ARM LPAE io-pgtable format.

Changes since V3:

* Removed usage of ULL variants of bitops from Will's patches, as
  they were not needed.
* Instead of unmapping/mapping pgcount pages, unmap_pages() and
  map_pages() will at most unmap and map pgcount pages, allowing
  for part of the pages in pgcount to be mapped and unmapped. This
  was done to simplify the handling in the io-pgtable layer.
* Extended the existing PTE manipulation methods in io-pgtable-arm
  to handle multiple entries, per Robin's suggestion, eliminating
  the need to add functions to clear multiple PTEs.
* Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable
  format.
* arm_[v7s/lpae]_[map/unmap] will call
  arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page.
* The arm_smmu_[map/unmap] functions have been removed, since they
  have been replaced by arm_smmu_[map/unmap]_pages.

Changes since V2:

* Added a check in __iommu_map() to check for the existence
  of either the map or map_pages callback as per Lu's suggestion.

Changes since V1:

* Implemented the map_pages() callbacks
* Integrated Will's patches into this series which
  address several concerns about how iommu_pgsize() partitioned a
  buffer (I made a minor change to the patch which changes
  iommu_pgsize() to use bitmaps by using the ULL variants of
  the bitops)

Isaac J. Manjarres (12):
  iommu/io-pgtable: Introduce unmap_pages() as a page table op
  iommu: Add an unmap_pages() op for IOMMU drivers
  iommu/io-pgtable: Introduce map_pages() as a page table op
  iommu: Add a map_pages() op for IOMMU drivers
  iommu: Add support for the map_pages() callback
  iommu/io-pgtable-arm: Prepare PTE methods for handling multiple
entries
  iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
  iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
  iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
  iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

Will Deacon (3):
  iommu: Use bitmap to calculate page size in iommu_pgsize()
  iommu: Split 'addr_merge' argument to iommu_pgsize() into separate
parts
  iommu: Hook up '->unmap_pages' driver callback

 drivers/iommu/arm/arm-smmu/arm-smmu.c |  18 +--
 drivers/iommu/io-pgtable-arm-v7s.c|  50 ++-
 drivers/iommu/io-pgtable-arm.c| 189 +-
 drivers/iommu/iommu.c | 130 +-
 include/linux/io-pgtable.h|   8 ++
 include/linux/iommu.h |   9 ++
 6 files changed, 289 insertions(+), 115 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 08/16] PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

2021-04-08 Thread Logan Gunthorpe
Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
dma_map_sg() implementations where the sg segment containing the page
differs from the sg segment containing the DMA address.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 65 ++
 include/linux/pci-p2pdma.h | 21 
 2 files changed, 86 insertions(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 38c93f57a941..44ad7664e875 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -923,6 +923,71 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 }
 EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
 
+/**
+ * pci_p2pdma_map_segment - map an sg segment determining the mapping type
+ * @state: State structure that should be declared on the stack outside of
+ * the for_each_sg() loop and initialized to zero.
+ * @dev: DMA device that's doing the mapping operation
+ * @sg: scatterlist segment to map
+ * @attrs: dma mapping attributes
+ *
+ * This is a helper to be used by non-iommu dma_map_sg() implementations where
+ * the sg segment is the same for the page_link and the dma_address.
+ *
+ * Attempt to map a single segment in an SGL with the PCI bus address.
+ * The segment must point to a PCI P2PDMA page and thus must be
+ * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
+ *
+ * Returns 1 if the segment was mapped, 0 if the segment should be mapped
+ * directly (or through the IOMMU) and -EREMOTEIO if the segment should not
+ * be mapped at all.
+ */
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+  struct device *dev, struct scatterlist *sg,
+  unsigned long dma_attrs)
+{
+   if (state->pgmap != sg_page(sg)->pgmap) {
+   state->pgmap = sg_page(sg)->pgmap;
+   state->map = pci_p2pdma_map_type(state->pgmap, dev, dma_attrs);
+   state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
+   }
+
+   switch (state->map) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   sg->dma_address = sg_phys(sg) + state->bus_off;
+   sg_dma_len(sg) = sg->length;
+   sg_mark_pci_p2pdma(sg);
+   return 1;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   return 0;
+   default:
+   return -EREMOTEIO;
+   }
+}
+
+/**
+ * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
+ * be mapped with PCI_P2PDMA_MAP_BUS_ADDR
+ * @pg_sg: scatterlist segment with the page to map
+ * @dma_sg: scatterlist segment to assign a dma address to
+ *
+ * This is a helper for iommu dma_map_sg() implementations when the
+ * segment for the dma address differs from the segment containing the
+ * source page.
+ *
+ * pci_p2pdma_map_type() must have already been called on the pg_sg and
+ * returned PCI_P2PDMA_MAP_BUS_ADDR.
+ */
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg)
+{
+   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
+
+   dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
+   sg_dma_len(dma_sg) = pg_sg->length;
+   sg_mark_pci_p2pdma(dma_sg);
+}
+
 /**
  * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
  * to enable p2pdma
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index a06072ac3a52..49e7679403cf 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -13,6 +13,12 @@
 
 #include 
 
+struct pci_p2pdma_map_state {
+   struct dev_pagemap *pgmap;
+   int map;
+   u64 bus_off;
+};
+
 struct block_device;
 struct scatterlist;
 
@@ -43,6 +49,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
+int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
+   struct device *dev, struct scatterlist *sg,
+   unsigned long dma_attrs);
+void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
+   struct scatterlist *dma_sg);
 int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev,
   

[PATCH 05/16] dma-mapping: Introduce dma_map_sg_p2pdma()

2021-04-08 Thread Logan Gunthorpe
dma_map_sg() either returns a positive number indicating the number
of entries mapped or zero indicating that resources were not available
to create the mapping. When zero is returned, it is always safe to retry
the mapping later once resources have been freed.

Once P2PDMA pages are mixed into the SGL there may be pages that may
never be successfully mapped with a given device because that device may
not actually be able to access those pages. Thus, multiple error
conditions will need to be distinguished to determine weather a retry
is safe.

Introduce dma_map_sg_p2pdma[_attrs]() with a different calling
convention from dma_map_sg(). The function will return a positive
integer on success or a negative errno on failure.

ENOMEM will be used to indicate a resource failure and EREMOTEIO to
indicate that a P2PDMA page is not mappable.

The __DMA_ATTR_PCI_P2PDMA attribute is introduced to inform the lower
level implementations that P2PDMA pages are allowed and to warn if a
caller introduces them into the regular dma_map_sg() interface.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/dma-mapping.h | 15 +++
 kernel/dma/mapping.c| 52 -
 2 files changed, 61 insertions(+), 6 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 2a984cb4d1e0..50b8f586cf59 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -60,6 +60,12 @@
  * at least read-only at lesser-privileged levels).
  */
 #define DMA_ATTR_PRIVILEGED(1UL << 9)
+/*
+ * __DMA_ATTR_PCI_P2PDMA: This should not be used directly, use
+ * dma_map_sg_p2pdma() instead. Used internally to indicate that the
+ * caller is using the dma_map_sg_p2pdma() interface.
+ */
+#define __DMA_ATTR_PCI_P2PDMA  (1UL << 10)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
@@ -107,6 +113,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
 int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
+int dma_map_sg_p2pdma_attrs(struct device *dev, struct scatterlist *sg,
+   int nents, enum dma_data_direction dir, unsigned long attrs);
 void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
  int nents, enum dma_data_direction dir,
  unsigned long attrs);
@@ -160,6 +168,12 @@ static inline int dma_map_sg_attrs(struct device *dev, 
struct scatterlist *sg,
 {
return 0;
 }
+static inline int dma_map_sg_p2pdma_attrs(struct device *dev,
+   struct scatterlist *sg, int nents, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+   return 0;
+}
 static inline void dma_unmap_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
@@ -392,6 +406,7 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_map_single(d, a, s, r) dma_map_single_attrs(d, a, s, r, 0)
 #define dma_unmap_single(d, a, s, r) dma_unmap_single_attrs(d, a, s, r, 0)
 #define dma_map_sg(d, s, n, r) dma_map_sg_attrs(d, s, n, r, 0)
+#define dma_map_sg_p2pdma(d, s, n, r) dma_map_sg_p2pdma_attrs(d, s, n, r, 0)
 #define dma_unmap_sg(d, s, n, r) dma_unmap_sg_attrs(d, s, n, r, 0)
 #define dma_map_page(d, p, o, s, r) dma_map_page_attrs(d, p, o, s, r, 0)
 #define dma_unmap_page(d, a, s, r) dma_unmap_page_attrs(d, a, s, r, 0)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b6a633679933..923089c4267b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -177,12 +177,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 EXPORT_SYMBOL(dma_unmap_page_attrs);
 
-/*
- * dma_maps_sg_attrs returns 0 on error and > 0 on success.
- * It should never return a value < 0.
- */
-int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
-   enum dma_data_direction dir, unsigned long attrs)
+static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
+   int nents, enum dma_data_direction dir, unsigned long attrs)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
int ents;
@@ -197,6 +193,20 @@ int dma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg, int nents,
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
+
+   return ents;
+}
+
+/*
+ * dma_maps_sg_attrs returns 0 on error and > 0 on success.
+ * It should never return a value < 0.
+ */
+int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   int ents;
+
+   

[PATCH 10/16] dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support

2021-04-08 Thread Logan Gunthorpe
Add a flags member to the dma_map_ops structure with one flag to
indicate support for PCI P2PDMA.

Also, add a helper to check if a device supports PCI P2PDMA.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/dma-map-ops.h |  3 +++
 include/linux/dma-mapping.h |  5 +
 kernel/dma/mapping.c| 18 ++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 51872e736e7b..481892822104 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -12,6 +12,9 @@
 struct cma;
 
 struct dma_map_ops {
+   unsigned int flags;
+#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0)
+
void *(*alloc)(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp,
unsigned long attrs);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 50b8f586cf59..c31980ecca62 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -146,6 +146,7 @@ int dma_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
unsigned long attrs);
 bool dma_can_mmap(struct device *dev);
 int dma_supported(struct device *dev, u64 mask);
+bool dma_pci_p2pdma_supported(struct device *dev);
 int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
@@ -247,6 +248,10 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 {
return 0;
 }
+static inline bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   return 0;
+}
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
return -EIO;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 923089c4267b..ce44a0fcc4e5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -573,6 +573,24 @@ int dma_supported(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_supported);
 
+bool dma_pci_p2pdma_supported(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+
+   /* if ops is not set, dma direct will be used which supports P2PDMA */
+   if (!ops)
+   return true;
+
+   /*
+* Note: dma_ops_bypass is not checked here because P2PDMA should
+* not be used with dma mapping ops that do not have support even
+* if the specific device is bypassing them.
+*/
+
+   return ops->flags & DMA_F_PCI_P2PDMA_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(dma_pci_p2pdma_supported);
+
 #ifdef CONFIG_ARCH_HAS_DMA_SET_MASK
 void arch_dma_set_mask(struct device *dev, u64 mask);
 #else
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/16] dma-direct: Support PCI P2PDMA pages in dma-direct map_sg

2021-04-08 Thread Logan Gunthorpe
Add PCI P2PDMA support for dma_direct_map_sg() so that it can map
PCI P2PDMA pages directly without a hack in the callers. This allows
for heterogeneous SGLs that contain both P2PDMA and regular pages.

SGL segments that contain PCI bus addresses are marked with
sg_mark_pci_p2pdma() and are ignored when unmapped.

Signed-off-by: Logan Gunthorpe 
---
 kernel/dma/direct.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 002268262c9a..108dfb4ecbd5 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "direct.h"
 
 /*
@@ -387,19 +388,37 @@ void dma_direct_unmap_sg(struct device *dev, struct 
scatterlist *sgl,
struct scatterlist *sg;
int i;
 
-   for_each_sg(sgl, sg, nents, i)
+   for_each_sg(sgl, sg, nents, i) {
+   if (sg_is_pci_p2pdma(sg)) {
+   sg_unmark_pci_p2pdma(sg);
+   continue;
+   }
+
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
 attrs);
+   }
 }
 #endif
 
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs)
 {
-   int i;
+   struct pci_p2pdma_map_state p2pdma_state = {};
struct scatterlist *sg;
+   int i, ret = 0;
 
for_each_sg(sgl, sg, nents, i) {
+   if (is_pci_p2pdma_page(sg_page(sg))) {
+   ret = pci_p2pdma_map_segment(_state, dev, sg,
+attrs);
+   if (ret < 0) {
+   goto out_unmap;
+   } else if (ret) {
+   ret = 0;
+   continue;
+   }
+   }
+
sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
sg->offset, sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR)
@@ -411,7 +430,7 @@ int dma_direct_map_sg(struct device *dev, struct 
scatterlist *sgl, int nents,
 
 out_unmap:
dma_direct_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
-   return 0;
+   return ret;
 }
 
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/16] PCI/P2PDMA: Make pci_p2pdma_map_type() non-static

2021-04-08 Thread Logan Gunthorpe
pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
implementation because it will need to determine the mapping type
ahead of actually doing the mapping to create the actual iommu mapping.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 34 +++---
 include/linux/pci-p2pdma.h | 15 +++
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bcb1a6d6119d..38c93f57a941 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -20,13 +20,6 @@
 #include 
 #include 
 
-enum pci_p2pdma_map_type {
-   PCI_P2PDMA_MAP_UNKNOWN = 0,
-   PCI_P2PDMA_MAP_NOT_SUPPORTED,
-   PCI_P2PDMA_MAP_BUS_ADDR,
-   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
-};
-
 struct pci_p2pdma {
struct gen_pool *pool;
bool p2pmem_published;
@@ -822,13 +815,30 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
-   struct device *dev)
+/**
+ * pci_p2pdma_map_type - return the type of mapping that should be used for
+ * a given device and pgmap
+ * @pgmap: the pagemap of a page to determine the mapping type for
+ * @dev: device that is mapping the page
+ * @dma_attrs: the attributes passed to the dma_map operation --
+ * this is so they can be checked to ensure P2PDMA pages were not
+ * introduced into an incorrect interface (like dma_map_sg). *
+ *
+ * Returns one of:
+ * PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
+ * PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
+ * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done directly
+ */
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+   struct device *dev, unsigned long dma_attrs)
 {
struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
enum pci_p2pdma_map_type ret;
struct pci_dev *client;
 
+   WARN_ONCE(!(dma_attrs & __DMA_ATTR_PCI_P2PDMA),
+ "PCI P2PDMA pages were mapped with dma_map_sg!");
+
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
@@ -879,7 +889,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
struct pci_p2pdma_pagemap *p2p_pgmap =
to_p2p_pgmap(sg_page(sg)->pgmap);
 
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
+   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+   __DMA_ATTR_PCI_P2PDMA)) {
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -904,7 +915,8 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 {
enum pci_p2pdma_map_type map_type;
 
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
+   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
+  __DMA_ATTR_PCI_P2PDMA);
 
if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 8318a97c9c61..a06072ac3a52 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -16,6 +16,13 @@
 struct block_device;
 struct scatterlist;
 
+enum pci_p2pdma_map_type {
+   PCI_P2PDMA_MAP_UNKNOWN = 0,
+   PCI_P2PDMA_MAP_NOT_SUPPORTED,
+   PCI_P2PDMA_MAP_BUS_ADDR,
+   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
+};
+
 #ifdef CONFIG_PCI_P2PDMA
 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
u64 offset);
@@ -30,6 +37,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev,
 unsigned int *nents, u32 length);
 void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
+enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+   struct device *dev, unsigned long dma_attrs);
 int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -83,6 +92,12 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev,
 static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish)
 {
 }
+static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
+   struct dev_pagemap *pgmap, struct device *dev,
+   unsigned long dma_attrs)
+{
+   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+}
 static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int 

[PATCH 16/16] PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

2021-04-08 Thread Logan Gunthorpe
This interface is superseded by the new dma_map_sg_p2pdma() interface
which supports heterogeneous scatterlists. There are no longer
any users, so remove it.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c   | 67 --
 include/linux/pci-p2pdma.h | 27 ---
 2 files changed, 94 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 44ad7664e875..2f2adcccfa11 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -856,73 +856,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
dev_pagemap *pgmap,
 GFP_ATOMIC);
 }
 
-static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
-   struct device *dev, struct scatterlist *sg, int nents)
-{
-   struct scatterlist *s;
-   int i;
-
-   for_each_sg(sg, s, nents, i) {
-   s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
-   sg_dma_len(s) = s->length;
-   }
-
-   return nents;
-}
-
-/**
- * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: elements in the scatterlist
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_map_sg() (if called)
- *
- * Scatterlists mapped with this function should be unmapped using
- * pci_p2pdma_unmap_sg_attrs().
- *
- * Returns the number of SG entries mapped or 0 on error.
- */
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   struct pci_p2pdma_pagemap *p2p_pgmap =
-   to_p2p_pgmap(sg_page(sg)->pgmap);
-
-   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-   __DMA_ATTR_PCI_P2PDMA)) {
-   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
-   return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
-   case PCI_P2PDMA_MAP_BUS_ADDR:
-   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
-   default:
-   return 0;
-   }
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
-
-/**
- * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
- * mapped with pci_p2pdma_map_sg()
- * @dev: device doing the DMA request
- * @sg: scatter list to map
- * @nents: number of elements returned by pci_p2pdma_map_sg()
- * @dir: DMA direction
- * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
- */
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs)
-{
-   enum pci_p2pdma_map_type map_type;
-
-   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev,
-  __DMA_ATTR_PCI_P2PDMA);
-
-   if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
-   dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-}
-EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
-
 /**
  * pci_p2pdma_map_segment - map an sg segment determining the mapping type
  * @state: State structure that should be declared on the stack outside of
diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
index 49e7679403cf..2ec9c75fa097 100644
--- a/include/linux/pci-p2pdma.h
+++ b/include/linux/pci-p2pdma.h
@@ -45,10 +45,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct 
scatterlist *sgl);
 void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
 enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
struct device *dev, unsigned long dma_attrs);
-int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
 int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
struct device *dev, struct scatterlist *sg,
unsigned long dma_attrs);
@@ -109,17 +105,6 @@ static inline enum pci_p2pdma_map_type pci_p2pdma_map_type(
 {
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 }
-static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-   return 0;
-}
-static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
-   struct scatterlist *sg, int nents, enum dma_data_direction dir,
-   unsigned long attrs)
-{
-}
 static inline int pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state,
struct device *dev, struct scatterlist *sg,
unsigned long dma_attrs)
@@ -155,16 +140,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct 
device *client)
return pci_p2pmem_find_many(, 1);
 }
 
-static inline int 

[PATCH 13/16] nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages

2021-04-08 Thread Logan Gunthorpe
Convert to using dma_map_sg_p2pdma() for PCI p2pdma pages.

This should be equivalent but allows for heterogeneous scatterlists
with both P2PDMA and regular pages. However, P2PDMA support will be
slightly more restricted (only dma-direct and dma-iommu are currently
supported).

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/pci.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 14f092973792..a1ed07ff38b7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -577,17 +577,6 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct 
request *req)
 
 }
 
-static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req)
-{
-   struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
-
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents,
-   rq_dma_dir(req));
-   else
-   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
-}
-
 static void nvme_unmap_data(struct nvme_dev *dev, struct request *req)
 {
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
@@ -600,7 +589,7 @@ static void nvme_unmap_data(struct nvme_dev *dev, struct 
request *req)
 
WARN_ON_ONCE(!iod->nents);
 
-   nvme_unmap_sg(dev, req);
+   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
if (iod->npages == 0)
dma_pool_free(dev->prp_small_pool, nvme_pci_iod_list(req)[0],
  iod->first_dma);
@@ -868,14 +857,13 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
if (!iod->nents)
goto out_free_sg;
 
-   if (is_pci_p2pdma_page(sg_page(iod->sg)))
-   nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg,
-   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
-   else
-   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
-rq_dma_dir(req), DMA_ATTR_NO_WARN);
-   if (!nr_mapped)
+   nr_mapped = dma_map_sg_p2pdma_attrs(dev->dev, iod->sg, iod->nents,
+rq_dma_dir(req), DMA_ATTR_NO_WARN);
+   if (nr_mapped < 0) {
+   if (nr_mapped != -ENOMEM)
+   ret = BLK_STS_TARGET;
goto out_free_sg;
+   }
 
iod->use_sgl = nvme_pci_use_sgls(dev, req);
if (iod->use_sgl)
@@ -887,7 +875,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
struct request *req,
return BLK_STS_OK;
 
 out_unmap_sg:
-   nvme_unmap_sg(dev, req);
+   dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req));
 out_free_sg:
mempool_free(iod->sg, dev->iod_mempool);
return ret;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/16] lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL

2021-04-08 Thread Logan Gunthorpe
Make use of the third free LSB in scatterlist's page_link on 64bit systems.

The extra bit will be used by dma_[un]map_sg_p2pdma() to determine when a
given SGL segments dma_address points to a PCI bus address.
dma_unmap_sg_p2pdma() will need to perform different cleanup when a
segment is marked as P2PDMA.

Using this bit requires adding an additional dependency on CONFIG_64BIT to
CONFIG_PCI_P2PDMA. This should be acceptable as the majority of P2PDMA
use cases are restricted to newer root complexes and roughly require the
extra address space for memory BARs used in the transactions.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/Kconfig |  2 +-
 include/linux/scatterlist.h | 49 ++---
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..90b4bddb3300 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -163,7 +163,7 @@ config PCI_PASID
 
 config PCI_P2PDMA
bool "PCI peer-to-peer transfer support"
-   depends on ZONE_DEVICE
+   depends on ZONE_DEVICE && 64BIT
select GENERIC_ALLOCATOR
help
  Enableѕ drivers to do PCI peer-to-peer transactions to and from
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6f70572b2938..5525d3ebf36f 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -58,6 +58,21 @@ struct sg_table {
 #define SG_CHAIN   0x01UL
 #define SG_END 0x02UL
 
+/*
+ * bit 2 is the third free bit in the page_link on 64bit systems which
+ * is used by dma_unmap_sg() to determine if the dma_address is a PCI
+ * bus address when doing P2PDMA.
+ * Note: CONFIG_PCI_P2PDMA depends on CONFIG_64BIT because of this.
+ */
+
+#ifdef CONFIG_PCI_P2PDMA
+#define SG_PCI_P2PDMA  0x04UL
+#else
+#define SG_PCI_P2PDMA  0x00UL
+#endif
+
+#define SG_PAGE_LINK_MASK (SG_CHAIN | SG_END | SG_PCI_P2PDMA)
+
 /*
  * We overload the LSB of the page pointer to indicate whether it's
  * a valid sg entry, or whether it points to the start of a new scatterlist.
@@ -65,8 +80,9 @@ struct sg_table {
  */
 #define sg_is_chain(sg)((sg)->page_link & SG_CHAIN)
 #define sg_is_last(sg) ((sg)->page_link & SG_END)
+#define sg_is_pci_p2pdma(sg)   ((sg)->page_link & SG_PCI_P2PDMA)
 #define sg_chain_ptr(sg)   \
-   ((struct scatterlist *) ((sg)->page_link & ~(SG_CHAIN | SG_END)))
+   ((struct scatterlist *) ((sg)->page_link & ~SG_PAGE_LINK_MASK))
 
 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -80,13 +96,13 @@ struct sg_table {
  **/
 static inline void sg_assign_page(struct scatterlist *sg, struct page *page)
 {
-   unsigned long page_link = sg->page_link & (SG_CHAIN | SG_END);
+   unsigned long page_link = sg->page_link & SG_PAGE_LINK_MASK;
 
/*
 * In order for the low bit stealing approach to work, pages
 * must be aligned at a 32-bit boundary as a minimum.
 */
-   BUG_ON((unsigned long) page & (SG_CHAIN | SG_END));
+   BUG_ON((unsigned long) page & SG_PAGE_LINK_MASK);
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
 #endif
@@ -120,7 +136,7 @@ static inline struct page *sg_page(struct scatterlist *sg)
 #ifdef CONFIG_DEBUG_SG
BUG_ON(sg_is_chain(sg));
 #endif
-   return (struct page *)((sg)->page_link & ~(SG_CHAIN | SG_END));
+   return (struct page *)((sg)->page_link & ~SG_PAGE_LINK_MASK);
 }
 
 /**
@@ -222,6 +238,31 @@ static inline void sg_unmark_end(struct scatterlist *sg)
sg->page_link &= ~SG_END;
 }
 
+/**
+ * sg_mark_pci_p2pdma - Mark the scatterlist entry for PCI p2pdma
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Marks the passed in sg entry to indicate that the dma_address is
+ *   a PCI bus address.
+ **/
+static inline void sg_mark_pci_p2pdma(struct scatterlist *sg)
+{
+   sg->page_link |= SG_PCI_P2PDMA;
+}
+
+/**
+ * sg_unmark_pci_p2pdma - Unmark the scatterlist entry for PCI p2pdma
+ * @sg: SG entryScatterlist
+ *
+ * Description:
+ *   Clears the PCI P2PDMA mark
+ **/
+static inline void sg_unmark_pci_p2pdma(struct scatterlist *sg)
+{
+   sg->page_link &= ~SG_PCI_P2PDMA;
+}
+
 /**
  * sg_phys - Return physical address of an sg entry
  * @sg: SG entry
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 12/16] nvme-pci: Check DMA ops when indicating support for PCI P2PDMA

2021-04-08 Thread Logan Gunthorpe
Introduce a supports_pci_p2pdma() operation in nvme_ctrl_ops to
replace the fixed NVME_F_PCI_P2PDMA flag such that the dma_map_ops
flags can be checked for PCI P2PDMA support.

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/host/core.c |  3 ++-
 drivers/nvme/host/nvme.h |  2 +-
 drivers/nvme/host/pci.c  | 11 +--
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0896e21642be..223419454516 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3907,7 +3907,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, 
unsigned nsid,
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
-   if (ctrl->ops->flags & NVME_F_PCI_P2PDMA)
+   if (ctrl->ops->supports_pci_p2pdma &&
+   ctrl->ops->supports_pci_p2pdma(ctrl))
blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
 
ns->queue->queuedata = ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07b34175c6ce..9c04df982d2c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -473,7 +473,6 @@ struct nvme_ctrl_ops {
unsigned int flags;
 #define NVME_F_FABRICS (1 << 0)
 #define NVME_F_METADATA_SUPPORTED  (1 << 1)
-#define NVME_F_PCI_P2PDMA  (1 << 2)
int (*reg_read32)(struct nvme_ctrl *ctrl, u32 off, u32 *val);
int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val);
int (*reg_read64)(struct nvme_ctrl *ctrl, u32 off, u64 *val);
@@ -481,6 +480,7 @@ struct nvme_ctrl_ops {
void (*submit_async_event)(struct nvme_ctrl *ctrl);
void (*delete_ctrl)(struct nvme_ctrl *ctrl);
int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
+   bool (*supports_pci_p2pdma)(struct nvme_ctrl *ctrl);
 };
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7249ae74f71f..14f092973792 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2759,17 +2759,24 @@ static int nvme_pci_get_address(struct nvme_ctrl *ctrl, 
char *buf, int size)
return snprintf(buf, size, "%s\n", dev_name(>dev));
 }
 
+static bool nvme_pci_supports_pci_p2pdma(struct nvme_ctrl *ctrl)
+{
+   struct nvme_dev *dev = to_nvme_dev(ctrl);
+
+   return dma_pci_p2pdma_supported(dev->dev);
+}
+
 static const struct nvme_ctrl_ops nvme_pci_ctrl_ops = {
.name   = "pcie",
.module = THIS_MODULE,
-   .flags  = NVME_F_METADATA_SUPPORTED |
- NVME_F_PCI_P2PDMA,
+   .flags  = NVME_F_METADATA_SUPPORTED,
.reg_read32 = nvme_pci_reg_read32,
.reg_write32= nvme_pci_reg_write32,
.reg_read64 = nvme_pci_reg_read64,
.free_ctrl  = nvme_pci_free_ctrl,
.submit_async_event = nvme_pci_submit_async_event,
.get_address= nvme_pci_get_address,
+   .supports_pci_p2pdma= nvme_pci_supports_pci_p2pdma,
 };
 
 static int nvme_dev_map(struct nvme_dev *dev)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/16] iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg

2021-04-08 Thread Logan Gunthorpe
When a PCI P2PDMA page is seen, set the IOVA length of the segment
to zero so that it is not mapped into the IOVA. Then, in finalise_sg(),
apply the appropriate bus address to the segment. The IOVA is not
created if the scatterlist only consists of P2PDMA pages.

Similar to dma-direct, the sg_mark_pci_p2pdma() flag is used to
indicate bus address segments. On unmap, P2PDMA segments are skipped
over when determining the start and end IOVA addresses.

With this change, the flags variable in the dma_map_ops is
set to DMA_F_PCI_P2PDMA_SUPPORTED to indicate support for
P2PDMA pages.

Signed-off-by: Logan Gunthorpe 
---
 drivers/iommu/dma-iommu.c | 66 ++-
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af765c813cc8..ef49635f9819 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -864,6 +865,16 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
+   if (is_pci_p2pdma_page(sg_page(s)) && !s_iova_len) {
+   if (i > 0)
+   cur = sg_next(cur);
+
+   pci_p2pdma_map_bus_segment(s, cur);
+   count++;
+   cur_len = 0;
+   continue;
+   }
+
/*
 * Now fill in the real DMA data. If...
 * - there is a valid output segment to append to
@@ -961,10 +972,12 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   struct dev_pagemap *pgmap = NULL;
+   enum pci_p2pdma_map_type map_type;
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
-   int i;
+   int i, ret = 0;
 
if (static_branch_unlikely(_deferred_attach_enabled) &&
iommu_deferred_attach(dev, domain))
@@ -993,6 +1006,31 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
s_length = iova_align(iovad, s_length + s_iova_off);
s->length = s_length;
 
+   if (is_pci_p2pdma_page(sg_page(s))) {
+   if (sg_page(s)->pgmap != pgmap) {
+   pgmap = sg_page(s)->pgmap;
+   map_type = pci_p2pdma_map_type(pgmap, dev,
+  attrs);
+   }
+
+   switch (map_type) {
+   case PCI_P2PDMA_MAP_BUS_ADDR:
+   /*
+* A zero length will be ignored by
+* iommu_map_sg() and then can be detected
+* in __finalise_sg() to actually map the
+* bus address.
+*/
+   s->length = 0;
+   continue;
+   case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
+   break;
+   default:
+   ret = -EREMOTEIO;
+   goto out_restore_sg;
+   }
+   }
+
/*
 * Due to the alignment of our single IOVA allocation, we can
 * depend on these assumptions about the segment boundary mask:
@@ -1015,6 +1053,9 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
+   if (!iova_len)
+   return __finalise_sg(dev, sg, nents, 0);
+
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
if (!iova)
goto out_restore_sg;
@@ -1032,13 +1073,13 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
iommu_dma_free_iova(cookie, iova, iova_len, NULL);
 out_restore_sg:
__invalidate_sg(sg, nents);
-   return 0;
+   return ret;
 }
 
 static void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-   dma_addr_t start, end;
+   dma_addr_t end, start = DMA_MAPPING_ERROR;
struct scatterlist *tmp;
int i;
 
@@ -1054,14 +1095,22 @@ static void iommu_dma_unmap_sg(struct device *dev, 
struct scatterlist *sg,
 * The scatterlist segments are mapped into a single
 * contiguous IOVA allocation, so this is incredibly easy.
 */
-   start = sg_dma_address(sg);
- 

[PATCH 03/16] PCI/P2PDMA: Attempt to set map_type if it has not been set

2021-04-08 Thread Logan Gunthorpe
Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 473a08940fbc..2574a062a255 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -825,11 +825,18 @@ EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
struct pci_dev *client)
 {
+   enum pci_p2pdma_map_type ret;
+
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
-   return xa_to_value(xa_load(>p2pdma->map_types,
-  map_types_idx(client)));
+   ret = xa_to_value(xa_load(>p2pdma->map_types,
+ map_types_idx(client)));
+   if (ret != PCI_P2PDMA_MAP_UNKNOWN)
+   return ret;
+
+   return upstream_bridge_distance_warn(provider, client, NULL,
+GFP_ATOMIC);
 }
 
 static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
@@ -877,7 +884,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
case PCI_P2PDMA_MAP_BUS_ADDR:
return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
default:
-   WARN_ON_ONCE(1);
return 0;
}
 }
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 14/16] nvme-rdma: Ensure dma support when using p2pdma

2021-04-08 Thread Logan Gunthorpe
Ensure the dma operations support p2pdma before using the RDMA
device for P2PDMA. This allows switching the RDMA driver from
pci_p2pdma_map_sg() to dma_map_sg_p2pdma().

Signed-off-by: Logan Gunthorpe 
---
 drivers/nvme/target/rdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 6c1f3ab7649c..3ec7e77e5416 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -414,7 +414,8 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device 
*ndev,
if (ib_dma_mapping_error(ndev->device, r->send_sge.addr))
goto out_free_rsp;
 
-   if (!ib_uses_virt_dma(ndev->device))
+   if (!ib_uses_virt_dma(ndev->device) &&
+   dma_pci_p2pdma_supported(>device->dev))
r->req.p2p_client = >device->dev;
r->send_sge.length = sizeof(*r->req.cqe);
r->send_sge.lkey = ndev->pd->local_dma_lkey;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/16] Add new DMA mapping operation for P2PDMA

2021-04-08 Thread Logan Gunthorpe
Hi,

This patchset continues my work to to add P2PDMA support to the common
dma map operations. This allows for creating SGLs that have both P2PDMA
and regular pages which is a necessary step to allowing P2PDMA pages in
userspace.

The earlier RFC[1] generated a lot of great feedback and I heard no show
stopping objections. Thus, I've incorporated all the feedback and have
decided to post this as a proper patch series with hopes of eventually
getting it in mainline.

I'm happy to do a few more passes if anyone has any further feedback
or better ideas.

This series is based on v5.12-rc6 and a git branch can be found here:

  https://github.com/sbates130272/linux-p2pmem/  p2pdma_map_ops_v1

Thanks,

Logan

[1] 
https://lore.kernel.org/linux-block/20210311233142.7900-1-log...@deltatee.com/


Changes since the RFC:
 * Added comment and fixed up the pci_get_slot patch. (per Bjorn)
 * Fixed glaring sg_phys() double offset bug. (per Robin)
 * Created a new map operation (dma_map_sg_p2pdma()) with a new calling
   convention instead of modifying the calling convention of
   dma_map_sg(). (per Robin)
 * Integrated the two similar pci_p2pdma_dma_map_type() and
   pci_p2pdma_map_type() functions into one (per Ira)
 * Reworked some of the logic in the map_sg() implementations into
   helpers in the p2pdma code. (per Christoph)
 * Dropped a bunch of unnecessary symbol exports (per Christoph)
 * Expanded the code in dma_pci_p2pdma_supported() for clarity. (per
   Ira and Christoph)
 * Finished off using the new dma_map_sg_p2pdma() call in rdma_rw
   and removed the old pci_p2pdma_[un]map_sg(). (per Jason)

--

Logan Gunthorpe (16):
  PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()
  PCI/P2PDMA: Avoid pci_get_slot() which sleeps
  PCI/P2PDMA: Attempt to set map_type if it has not been set
  PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device
  dma-mapping: Introduce dma_map_sg_p2pdma()
  lib/scatterlist: Add flag for indicating P2PDMA segments in an SGL
  PCI/P2PDMA: Make pci_p2pdma_map_type() non-static
  PCI/P2PDMA: Introduce helpers for dma_map_sg implementations
  dma-direct: Support PCI P2PDMA pages in dma-direct map_sg
  dma-mapping: Add flags to dma_map_ops to indicate PCI P2PDMA support
  iommu/dma: Support PCI P2PDMA pages in dma-iommu map_sg
  nvme-pci: Check DMA ops when indicating support for PCI P2PDMA
  nvme-pci: Convert to using dma_map_sg_p2pdma for p2pdma pages
  nvme-rdma: Ensure dma support when using p2pdma
  RDMA/rw: use dma_map_sg_p2pdma()
  PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

 drivers/infiniband/core/rw.c |  50 +++---
 drivers/iommu/dma-iommu.c|  66 ++--
 drivers/nvme/host/core.c |   3 +-
 drivers/nvme/host/nvme.h |   2 +-
 drivers/nvme/host/pci.c  |  39 
 drivers/nvme/target/rdma.c   |   3 +-
 drivers/pci/Kconfig  |   2 +-
 drivers/pci/p2pdma.c | 188 +++
 include/linux/dma-map-ops.h  |   3 +
 include/linux/dma-mapping.h  |  20 
 include/linux/pci-p2pdma.h   |  53 ++
 include/linux/scatterlist.h  |  49 -
 include/rdma/ib_verbs.h  |  32 ++
 kernel/dma/direct.c  |  25 -
 kernel/dma/mapping.c |  70 +++--
 15 files changed, 416 insertions(+), 189 deletions(-)


base-commit: e49d033bddf5b565044e2abe4241353959bc9120
--
2.20.1
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/16] PCI/P2PDMA: Refactor pci_p2pdma_map_type() to take pagmap and device

2021-04-08 Thread Logan Gunthorpe
All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a
struct device (of the client doing the DMA transfer). Thus move the
conversion to struct pci_devs for the provider and client into this
function.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 2574a062a255..bcb1a6d6119d 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -822,14 +822,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
publish)
 }
 EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
 
-static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct pci_dev *provider,
-   struct pci_dev *client)
+static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
+   struct device *dev)
 {
+   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
enum pci_p2pdma_map_type ret;
+   struct pci_dev *client;
 
if (!provider->p2pdma)
return PCI_P2PDMA_MAP_NOT_SUPPORTED;
 
+   if (!dev_is_pci(dev))
+   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
+
+   client = to_pci_dev(dev);
+
ret = xa_to_value(xa_load(>p2pdma->map_types,
  map_types_idx(client)));
if (ret != PCI_P2PDMA_MAP_UNKNOWN)
@@ -871,14 +878,8 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 {
struct pci_p2pdma_pagemap *p2p_pgmap =
to_p2p_pgmap(sg_page(sg)->pgmap);
-   struct pci_dev *client;
-
-   if (WARN_ON_ONCE(!dev_is_pci(dev)))
-   return 0;
 
-   client = to_pci_dev(dev);
-
-   switch (pci_p2pdma_map_type(p2p_pgmap->provider, client)) {
+   switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -901,17 +902,9 @@ EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
 void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-   struct pci_p2pdma_pagemap *p2p_pgmap =
-   to_p2p_pgmap(sg_page(sg)->pgmap);
enum pci_p2pdma_map_type map_type;
-   struct pci_dev *client;
-
-   if (WARN_ON_ONCE(!dev_is_pci(dev)))
-   return;
-
-   client = to_pci_dev(dev);
 
-   map_type = pci_p2pdma_map_type(p2p_pgmap->provider, client);
+   map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
 
if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/16] RDMA/rw: use dma_map_sg_p2pdma()

2021-04-08 Thread Logan Gunthorpe
Drop the use of pci_p2pdma_map_sg() in favour of dma_map_sg_p2pdma().

The new interface allows mapping scatterlists that mix both regular
and P2PDMA pages and will verify that the dma device can communicate
with the device the pages are on.

Signed-off-by: Logan Gunthorpe 
---
 drivers/infiniband/core/rw.c | 50 ++--
 include/rdma/ib_verbs.h  | 32 +++
 2 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 31156e22d3e7..0c6213d9b044 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -273,26 +273,6 @@ static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return 1;
 }
 
-static void rdma_rw_unmap_sg(struct ib_device *dev, struct scatterlist *sg,
-u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg)))
-   pci_p2pdma_unmap_sg(dev->dma_device, sg, sg_cnt, dir);
-   else
-   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
-}
-
-static int rdma_rw_map_sg(struct ib_device *dev, struct scatterlist *sg,
- u32 sg_cnt, enum dma_data_direction dir)
-{
-   if (is_pci_p2pdma_page(sg_page(sg))) {
-   if (WARN_ON_ONCE(ib_uses_virt_dma(dev)))
-   return 0;
-   return pci_p2pdma_map_sg(dev->dma_device, sg, sg_cnt, dir);
-   }
-   return ib_dma_map_sg(dev, sg, sg_cnt, dir);
-}
-
 /**
  * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
  * @ctx:   context to initialize
@@ -315,9 +295,9 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u8 port_num,
struct ib_device *dev = qp->pd->device;
int ret;
 
-   ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-   if (!ret)
-   return -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+   if (ret < 0)
+   return ret;
sg_cnt = ret;
 
/*
@@ -354,7 +334,7 @@ int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp 
*qp, u8 port_num,
return ret;
 
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
@@ -394,17 +374,15 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
return -EINVAL;
}
 
-   ret = rdma_rw_map_sg(dev, sg, sg_cnt, dir);
-   if (!ret)
-   return -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, sg, sg_cnt, dir);
+   if (ret < 0)
+   return ret;
sg_cnt = ret;
 
if (prot_sg_cnt) {
-   ret = rdma_rw_map_sg(dev, prot_sg, prot_sg_cnt, dir);
-   if (!ret) {
-   ret = -ENOMEM;
+   ret = ib_dma_map_sg_p2pdma(dev, prot_sg, prot_sg_cnt, dir);
+   if (ret < 0)
goto out_unmap_sg;
-   }
prot_sg_cnt = ret;
}
 
@@ -469,9 +447,9 @@ int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 out_unmap_prot_sg:
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
 out_unmap_sg:
-   rdma_rw_unmap_sg(dev, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
return ret;
 }
 EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
@@ -603,7 +581,7 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct 
ib_qp *qp, u8 port_num,
break;
}
 
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
@@ -631,8 +609,8 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, 
struct ib_qp *qp,
kfree(ctx->reg);
 
if (prot_sg_cnt)
-   rdma_rw_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
-   rdma_rw_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+   ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ca28fca5736b..a541ed1702f5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -4028,6 +4028,17 @@ static inline int ib_dma_map_sg_attrs(struct ib_device 
*dev,
dma_attrs);
 }
 
+static inline int ib_dma_map_sg_p2pdma_attrs(struct ib_device *dev,
+struct scatterlist *sg, int nents,
+enum dma_data_direction direction,
+unsigned long dma_attrs)
+{
+   if (ib_uses_virt_dma(dev))
+   return 

[PATCH 02/16] PCI/P2PDMA: Avoid pci_get_slot() which sleeps

2021-04-08 Thread Logan Gunthorpe
In order to use upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. However, pci_get_slot() takes the pci_bus_sem so it
might sleep.

In order to avoid this, try to get the host bridge's device from
bus->self, and if that is not set, just get the first element in the
device list. It should be impossible for the host bridge's device to
go away while references are held on child devices, so the first element
should not be able to change and, thus, this should be safe.

Signed-off-by: Logan Gunthorpe 
---
 drivers/pci/p2pdma.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index bd89437faf06..473a08940fbc 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -311,16 +311,26 @@ static const struct pci_p2pdma_whitelist_entry {
 static bool __host_bridge_whitelist(struct pci_host_bridge *host,
bool same_host_bridge)
 {
-   struct pci_dev *root = pci_get_slot(host->bus, PCI_DEVFN(0, 0));
const struct pci_p2pdma_whitelist_entry *entry;
+   struct pci_dev *root = host->bus->self;
unsigned short vendor, device;
 
+   /*
+* This makes the assumption that the first device on the bus is the
+* bridge itself and it has the devfn of 00.0. This assumption should
+* hold for the devices in the white list above, and if there are cases
+* where this isn't true they will have to be dealt with when such a
+* case is added to the whitelist.
+*/
if (!root)
+   root = list_first_entry_or_null(>bus->devices,
+   struct pci_dev, bus_list);
+
+   if (!root || root->devfn)
return false;
 
vendor = root->vendor;
device = root->device;
-   pci_dev_put(root);
 
for (entry = pci_p2pdma_whitelist; entry->vendor; entry++) {
if (vendor != entry->vendor || device != entry->device)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/16] PCI/P2PDMA: Pass gfp_mask flags to upstream_bridge_distance_warn()

2021-04-08 Thread Logan Gunthorpe
In order to call upstream_bridge_distance_warn() from a dma_map function,
it must not sleep. The only reason it does sleep is to allocate the seqbuf
to print which devices are within the ACS path.

Switch the kmalloc call to use a passed in gfp_mask and don't print that
message if the buffer fails to be allocated.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/p2pdma.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 196382630363..bd89437faf06 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -267,7 +267,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
 
 static void seq_buf_print_bus_devfn(struct seq_buf *buf, struct pci_dev *pdev)
 {
-   if (!buf)
+   if (!buf || !buf->buffer)
return;
 
seq_buf_printf(buf, "%s;", pci_name(pdev));
@@ -495,25 +495,26 @@ upstream_bridge_distance(struct pci_dev *provider, struct 
pci_dev *client,
 
 static enum pci_p2pdma_map_type
 upstream_bridge_distance_warn(struct pci_dev *provider, struct pci_dev *client,
- int *dist)
+ int *dist, gfp_t gfp_mask)
 {
struct seq_buf acs_list;
bool acs_redirects;
int ret;
 
-   seq_buf_init(_list, kmalloc(PAGE_SIZE, GFP_KERNEL), PAGE_SIZE);
-   if (!acs_list.buffer)
-   return -ENOMEM;
+   seq_buf_init(_list, kmalloc(PAGE_SIZE, gfp_mask), PAGE_SIZE);
 
ret = upstream_bridge_distance(provider, client, dist, _redirects,
   _list);
if (acs_redirects) {
pci_warn(client, "ACS redirect is set between the client and 
provider (%s)\n",
 pci_name(provider));
-   /* Drop final semicolon */
-   acs_list.buffer[acs_list.len-1] = 0;
-   pci_warn(client, "to disable ACS redirect for this path, add 
the kernel parameter: pci=disable_acs_redir=%s\n",
-acs_list.buffer);
+
+   if (acs_list.buffer) {
+   /* Drop final semicolon */
+   acs_list.buffer[acs_list.len - 1] = 0;
+   pci_warn(client, "to disable ACS redirect for this 
path, add the kernel parameter: pci=disable_acs_redir=%s\n",
+acs_list.buffer);
+   }
}
 
if (ret == PCI_P2PDMA_MAP_NOT_SUPPORTED) {
@@ -566,7 +567,7 @@ int pci_p2pdma_distance_many(struct pci_dev *provider, 
struct device **clients,
 
if (verbose)
ret = upstream_bridge_distance_warn(provider,
-   pci_client, );
+   pci_client, , GFP_KERNEL);
else
ret = upstream_bridge_distance(provider, pci_client,
   , NULL, NULL);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel
On Tue, Mar 23, 2021 at 02:06:19PM -0700, Nadav Amit wrote:
>  drivers/iommu/amd/iommu.c | 76 +--
>  1 file changed, 42 insertions(+), 34 deletions(-)

Load-testing looks good here too, so this patch is queued now for v5.13,
thanks Nadav.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] iommu/arm-smmu: Updates for 5.13

2021-04-08 Thread Joerg Roedel
On Thu, Apr 08, 2021 at 02:29:59PM +0100, Will Deacon wrote:
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
> tags/arm-smmu-updates

Pulled, thanks Will.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-08 Thread isaacm

On 2021-04-08 07:32, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:36PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
---
 drivers/iommu/io-pgtable-arm.c | 70 
++

 1 file changed, 45 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c 
b/drivers/iommu/io-pgtable-arm.c

index ea66b10c04c4..6700685f81d4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -253,8 +253,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte 
*ptep, arm_lpae_iopte pte,


 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep);
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep);

 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct 
arm_lpae_io_pgtable *data,

size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);

tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
 lvl, tblp) != sz) {
WARN_ON(1);
return -EINVAL;
@@ -526,14 +526,14 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

   struct iommu_iotlb_gather *gather,
   unsigned long iova, size_t size,
   arm_lpae_iopte blk_pte, int lvl,
-  arm_lpae_iopte *ptep)
+  arm_lpae_iopte *ptep, size_t pgcount)
 {
struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte, *tablep;
phys_addr_t blk_paddr;
size_t tablesz = ARM_LPAE_GRANULE(data);
size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-   int i, unmap_idx = -1;
+   int i, unmap_idx_start = -1, num_entries = 0, max_entries;

if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
@@ -542,15 +542,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

if (!tablep)
return 0; /* Bytes unmapped */

-   if (size == split_sz)
-   unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (size == split_sz) {
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   max_entries = (tablesz >> ilog2(sizeof(pte))) - unmap_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   }

blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);

for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {


Given that we already have a 'tablesz / sizeof(pte)' expression here, 
I'd be
inclined to have either a local variable or a macro helper to get at 
the

ptes_per_table value that you also need to compute max_entries.
I think a macro might be helpful, as the number of PTEs per table is 
useful in a few places.



/* Unmap! */
-   if (i == unmap_idx)
+   if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
continue;

__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
@@ -568,38 +571,45 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,

return 0;

tablep = iopte_deref(pte, data);
-   } else if (unmap_idx >= 0) {
-   io_pgtable_tlb_add_page(>iop, gather, iova, size);
-   return size;
+   } else if (unmap_idx_start >= 0) {
+   for (i = 0; i < num_entries; i++)
+			io_pgtable_tlb_add_page(>iop, gather, iova + i * size, 
size);


I suppose we could add a count paramater to the iotlb gather stuff in
future too, but for now this is fine as this series is already pretty 
big.



Okay. I can keep this in mind for the future.

Will

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-08 Thread Will Deacon
On Thu, Apr 08, 2021 at 07:19:29AM -0700, isa...@codeaurora.org wrote:
> On 2021-04-08 06:58, Will Deacon wrote:
> > On Wed, Apr 07, 2021 at 09:52:38PM -0700, Isaac J. Manjarres wrote:
> > > Implement the unmap_pages() callback for the ARM v7s io-pgtable
> > > format.
> > > 
> > > Signed-off-by: Isaac J. Manjarres 
> > > ---
> > >  drivers/iommu/io-pgtable-arm-v7s.c | 23 ---
> > >  1 file changed, 20 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c
> > > b/drivers/iommu/io-pgtable-arm-v7s.c
> > > index d4004bcf333a..5e203e03c352 100644
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct
> > > arm_v7s_io_pgtable *data,
> > >   return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
> > >  }
> > > 
> > > -static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned
> > > long iova,
> > > - size_t size, struct iommu_iotlb_gather *gather)
> > > +static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops,
> > > unsigned long iova,
> > > +   size_t pgsize, size_t pgcount,
> > > +   struct iommu_iotlb_gather *gather)
> > >  {
> > >   struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > + size_t unmapped = 0, ret;
> > > 
> > >   if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
> > >   return 0;
> > > 
> > > - return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
> > > + while (pgcount--) {
> > > + ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
> > > + if (!ret)
> > > + break;
> > > +
> > > + unmapped += pgsize;
> > > + iova += pgsize;
> > > + }
> > > +
> > > + return unmapped;
> > > +}
> > 
> > Wait -- don't you need to hook this up somewhere (likewise for
> > ->map_pages)?
> Done. Likewise for map_pages(). I'm not sure how the compiler didn't catch
> this; I'm compile testing this, as I don't have hardware that uses the short
> descriptor format.

Damn, neither do I :/ My seattle has all the memory high up iirc.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-04-08 Thread Will Deacon
On Wed, Apr 07, 2021 at 09:52:36PM -0700, Isaac J. Manjarres wrote:
> Implement the unmap_pages() callback for the ARM LPAE io-pgtable
> format.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Will Deacon 
> ---
>  drivers/iommu/io-pgtable-arm.c | 70 ++
>  1 file changed, 45 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index ea66b10c04c4..6700685f81d4 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -253,8 +253,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, 
> arm_lpae_iopte pte,
>  
>  static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>  struct iommu_iotlb_gather *gather,
> -unsigned long iova, size_t size, int lvl,
> -arm_lpae_iopte *ptep);
> +unsigned long iova, size_t size, size_t pgcount,
> +int lvl, arm_lpae_iopte *ptep);
>  
>  static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>   phys_addr_t paddr, arm_lpae_iopte prot,
> @@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
> *data,
>   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
>  
>   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
> - if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
> + if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
>lvl, tblp) != sz) {
>   WARN_ON(1);
>   return -EINVAL;
> @@ -526,14 +526,14 @@ static size_t arm_lpae_split_blk_unmap(struct 
> arm_lpae_io_pgtable *data,
>  struct iommu_iotlb_gather *gather,
>  unsigned long iova, size_t size,
>  arm_lpae_iopte blk_pte, int lvl,
> -arm_lpae_iopte *ptep)
> +arm_lpae_iopte *ptep, size_t pgcount)
>  {
>   struct io_pgtable_cfg *cfg = >iop.cfg;
>   arm_lpae_iopte pte, *tablep;
>   phys_addr_t blk_paddr;
>   size_t tablesz = ARM_LPAE_GRANULE(data);
>   size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
> - int i, unmap_idx = -1;
> + int i, unmap_idx_start = -1, num_entries = 0, max_entries;
>  
>   if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
>   return 0;
> @@ -542,15 +542,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
> arm_lpae_io_pgtable *data,
>   if (!tablep)
>   return 0; /* Bytes unmapped */
>  
> - if (size == split_sz)
> - unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
> + if (size == split_sz) {
> + unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
> + max_entries = (tablesz >> ilog2(sizeof(pte))) - unmap_idx_start;
> + num_entries = min_t(int, pgcount, max_entries);
> + }
>  
>   blk_paddr = iopte_to_paddr(blk_pte, data);
>   pte = iopte_prot(blk_pte);
>  
>   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {

Given that we already have a 'tablesz / sizeof(pte)' expression here, I'd be
inclined to have either a local variable or a macro helper to get at the
ptes_per_table value that you also need to compute max_entries.

>   /* Unmap! */
> - if (i == unmap_idx)
> + if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
>   continue;
>  
>   __arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
> @@ -568,38 +571,45 @@ static size_t arm_lpae_split_blk_unmap(struct 
> arm_lpae_io_pgtable *data,
>   return 0;
>  
>   tablep = iopte_deref(pte, data);
> - } else if (unmap_idx >= 0) {
> - io_pgtable_tlb_add_page(>iop, gather, iova, size);
> - return size;
> + } else if (unmap_idx_start >= 0) {
> + for (i = 0; i < num_entries; i++)
> + io_pgtable_tlb_add_page(>iop, gather, iova + i * 
> size, size);

I suppose we could add a count paramater to the iotlb gather stuff in
future too, but for now this is fine as this series is already pretty big.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-08 Thread Dmitry Osipenko
08.04.2021 16:26, Thierry Reding пишет:
> On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote:
>> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
>>> All consumer-grade Android and Chromebook devices show a splash screen
>>> on boot and then display is left enabled when kernel is booted. This
>>> behaviour is unacceptable in a case of implicit IOMMU domains to which
>>> devices are attached during kernel boot since devices, like display
>>> controller, may perform DMA at that time. We can work around this problem
>>> by deferring the enable of SMMU translation for a specific devices,
>>> like a display controller, until the first IOMMU mapping is created,
>>> which works good enough in practice because by that time h/w is already
>>> stopped.
>>>
>>> Signed-off-by: Dmitry Osipenko 
>>
>> For both patches:
>> Acked-by: Nicolin Chen 
>> Tested-by: Nicolin Chen 
>>
>> The WAR looks good to me. Perhaps Thierry would give some input.

Nicolin, thank you very much for the help!

>> Another topic:
>> I think this may help work around the mc-errors, which we have
>> been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA.
>> (attached a test patch rebasing on these two)
> 
> Ugh... that's exactly what I was afraid of. Now everybody is going to
> think that we can just work around this issue with driver-specific SMMU
> hacks...
> 
>> However, GPU would also report errors using DMA domain:
>>
>>  nouveau 5700.gpu: acr: firmware unavailable
>>  nouveau 5700.gpu: pmu: firmware unavailable
>>  nouveau 5700.gpu: gr: firmware unavailable
>>  tegra-mc 70019000.memory-controller: gpusrd: read @0xfffbe200: 
>> Security violation (TrustZone violation)
>>  nouveau 5700.gpu: DRM: failed to create kernel channel, -22
>>  tegra-mc 70019000.memory-controller: gpusrd: read @0xfffad000: 
>> Security violation (TrustZone violation)
>>  nouveau 5700.gpu: fifo: SCHED_ERROR 20 []
>>  nouveau 5700.gpu: fifo: SCHED_ERROR 20 []
>>
>> Looking at the address, seems that GPU allocated memory in 32-bit
>> physical address space behind SMMU, so a violation happened after
>> turning on DMA domain I guess... 
> 
> The problem with GPU is... extra complicated. You're getting these
> faults because you're enabling the IOMMU-backed DMA API, which then
> causes the Nouveau driver allocate buffers using the DMA API instead of
> explicitly allocating pages and then mapping them using the IOMMU API.
> However, there are additional patches needed to teach Nouveau about how
> to deal with SMMU and those haven't been merged yet. I've got prototypes
> of this, but before the whole framebuffer carveout passing work makes
> progress there's little sense in moving individual pieces forward.
> 
> One more not to try and cut corners. We know what the right solution is,
> even if it takes a lot of work. I'm willing to ack this patch, or some
> version of it, but only as a way of working around things we have no
> realistic chance of fixing properly anymore. I still think it would be
> best if we could derive identity mappings from command-line arguments on
> these platforms because I think most of them will actually set that, and
> then the solution becomes at least uniform at the SMMU level.
> 
> For Tegra210 I've already laid out a path to a solution that's going to
> be generic and extend to Tegra186 and later as well.

We still have issues in the DRM and other drivers that don't allow us to
flip ON the IOMMU_DOMAIN_DMA support.

My patch addresses the issue with the ARM_DMA_USE_IOMMU option, which
allocates the unmanaged domain for DMA purposes on ARM32, causing the
trouble in the multiplatform kernel configuration since it's not
possible to opt-out from ARM_DMA_USE_IOMMU in this case. Perhaps this
needs to be clarified in the commit message.

https://elixir.bootlin.com/linux/v5.12-rc6/source/arch/arm/mm/dma-mapping.c#L2078

https://elixir.bootlin.com/linux/v5.12-rc6/source/drivers/iommu/iommu.c#L1929
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-04-08 Thread Will Deacon
On Thu, Apr 08, 2021 at 03:02:30PM +0100, Christoph Hellwig wrote:
> On Thu, Apr 08, 2021 at 02:59:26PM +0100, Will Deacon wrote:
> > > -static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> > > +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
> > >   struct io_pgtable_cfg *cfg)
> > >  {
> > >   dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> > > -sizeof(*ptep), DMA_TO_DEVICE);
> > > +sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
> > >  }
> > 
> > Have you tested this with CONFIG_DMA_API_DEBUG=y? I _think_ it should be
> > ok as long as we don't attempt to sync across a page boundary, but it would
> > be good to give it a spin just to check.
> 
> syncing over a page boundary is perfectly fine.  It just needs to say in
> the bounds of the original mapping.

Yes, you're right. I got the CPU page size mixed up with the IOMMU page
size, so I think we're good as the allocations here are made at IOMMU
page size granularity.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-08 Thread isaacm

On 2021-04-08 06:59, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:32PM -0700, Isaac J. Manjarres wrote:

From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this 
address

is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bcd623862bf9..ab689611a03b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct 
iommu_domain *domain, dma_addr_t iova)

 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);

-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long 
iova,

+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   phys_addr_t addr_merge = paddr | iova;


^^^ this needs to be 'unsigned long' as it was before (otherwise using
GENMASK _is_ a problem).


Done.

Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-08 Thread isaacm

On 2021-04-08 06:58, Will Deacon wrote:

On Wed, Apr 07, 2021 at 09:52:38PM -0700, Isaac J. Manjarres wrote:

Implement the unmap_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c

index d4004bcf333a..5e203e03c352 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct 
arm_v7s_io_pgtable *data,

return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
 }

-static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long 
iova,

-   size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, 
unsigned long iova,

+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   size_t unmapped = 0, ret;

if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;

-   return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
+   while (pgcount--) {
+   ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+   if (!ret)
+   break;
+
+   unmapped += pgsize;
+   iova += pgsize;
+   }
+
+   return unmapped;
+}


Wait -- don't you need to hook this up somewhere (likewise for 
->map_pages)?
Done. Likewise for map_pages(). I'm not sure how the compiler didn't 
catch this; I'm compile testing this, as I don't have hardware that uses 
the short descriptor format.

How are you testing this?

Will

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-08 Thread Dmitry Osipenko
08.04.2021 15:40, Thierry Reding пишет:
> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
>> All consumer-grade Android and Chromebook devices show a splash screen
>> on boot and then display is left enabled when kernel is booted. This
>> behaviour is unacceptable in a case of implicit IOMMU domains to which
>> devices are attached during kernel boot since devices, like display
>> controller, may perform DMA at that time. We can work around this problem
>> by deferring the enable of SMMU translation for a specific devices,
>> like a display controller, until the first IOMMU mapping is created,
>> which works good enough in practice because by that time h/w is already
>> stopped.
>>
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  drivers/iommu/tegra-smmu.c | 71 ++
>>  1 file changed, 71 insertions(+)
> 
> In general I do see why we would want to enable this. However, I think
> this is a bad idea because it's going to proliferate the bad practice of
> not describing things properly in device tree.
> 
> Whatever happened to the idea of creating identity mappings based on the
> obscure tegra_fb_mem (or whatever it was called) command-line option? Is
> that command-line not universally passed to the kernel from bootloaders
> that initialize display?

This is still a good idea! The command-line isn't universally passed
just because it's up to a user to override the cmdline and then we get a
hang (a very slow boot in reality) on T30 since display client takes out
the whole memory bus with the constant SMMU faults. For example I don't
have that cmdline option in my current setups.

> That idealistic objection aside, this seems a bit over-engineered for
> the hack that it is. See below.
> 
>> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
>> index 602aab98c079..af1e4b5adb27 100644
>> --- a/drivers/iommu/tegra-smmu.c
>> +++ b/drivers/iommu/tegra-smmu.c
>> @@ -60,6 +60,8 @@ struct tegra_smmu_as {
>>  dma_addr_t pd_dma;
>>  unsigned id;
>>  u32 attr;
>> +bool display_attached[2];
>> +bool attached_devices_need_sync;
>>  };
>>  
>>  static struct tegra_smmu_as *to_smmu_as(struct iommu_domain *dom)
>> @@ -78,6 +80,10 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, 
>> unsigned long offset)
>>  return readl(smmu->regs + offset);
>>  }
>>  
>> +/* all Tegra SoCs use the same group IDs for displays */
>> +#define SMMU_SWGROUP_DC 1
>> +#define SMMU_SWGROUP_DCB2
>> +
>>  #define SMMU_CONFIG 0x010
>>  #define  SMMU_CONFIG_ENABLE (1 << 0)
>>  
>> @@ -253,6 +259,20 @@ static inline void smmu_flush(struct tegra_smmu *smmu)
>>  smmu_readl(smmu, SMMU_PTB_ASID);
>>  }
>>  
>> +static int smmu_swgroup_to_display_id(unsigned int swgroup)
>> +{
>> +switch (swgroup) {
>> +case SMMU_SWGROUP_DC:
>> +return 0;
>> +
>> +case SMMU_SWGROUP_DCB:
>> +return 1;
>> +
>> +default:
>> +return -1;
>> +}
>> +}
>> +
> 
> Why do we need to have this two-level mapping? Do we even need to care
> about the specific swgroups IDs?

It's not clear to me what you're meaning here, the swgroup IDs are used
here for determining whether client belongs to a display controller.

> Can we not just simply check at attach
> time if the client that's being attached is a display client and then
> set atteched_devices_need_sync = true?

The reason I made atteched_devices_need_sync opt-out for display clients
instead of
opt-in is to make it clear and easy to override this option once we will
support the identity mappings.

- attached_devices_need_sync = true;
+ attached_devices_need_sync = no_identity_mapping_for_display;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v4 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-04-08 Thread Christoph Hellwig
On Thu, Apr 08, 2021 at 02:59:26PM +0100, Will Deacon wrote:
> > -static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> > +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
> > struct io_pgtable_cfg *cfg)
> >  {
> > dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> > -  sizeof(*ptep), DMA_TO_DEVICE);
> > +  sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
> >  }
> 
> Have you tested this with CONFIG_DMA_API_DEBUG=y? I _think_ it should be
> ok as long as we don't attempt to sync across a page boundary, but it would
> be good to give it a spin just to check.

syncing over a page boundary is perfectly fine.  It just needs to say in
the bounds of the original mapping.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-04-08 Thread Will Deacon
On Wed, Apr 07, 2021 at 09:52:35PM -0700, Isaac J. Manjarres wrote:
> The PTE methods currently operate on a single entry. In preparation
> for manipulating multiple PTEs in one map or unmap call, allow them
> to handle multiple PTEs.
> 
> Signed-off-by: Isaac J. Manjarres 
> Suggested-by: Robin Murphy 
> ---
>  drivers/iommu/io-pgtable-arm.c | 78 +++---
>  1 file changed, 44 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..ea66b10c04c4 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -232,20 +232,23 @@ static void __arm_lpae_free_pages(void *pages, size_t 
> size,
>   free_pages((unsigned long)pages, get_order(size));
>  }
>  
> -static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
> +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
>   struct io_pgtable_cfg *cfg)
>  {
>   dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
> -sizeof(*ptep), DMA_TO_DEVICE);
> +sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
>  }

Have you tested this with CONFIG_DMA_API_DEBUG=y? I _think_ it should be
ok as long as we don't attempt to sync across a page boundary, but it would
be good to give it a spin just to check.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-04-08 Thread Will Deacon
On Wed, Apr 07, 2021 at 09:52:32PM -0700, Isaac J. Manjarres wrote:
> From: Will Deacon 
> 
> The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
> intended to describe the alignment requirements to consider when
> choosing an appropriate page size. On the iommu_map() path, this address
> is the logical OR of the virtual and physical addresses.
> 
> Subsequent improvements to iommu_pgsize() will need to check the
> alignment of the virtual and physical components of 'addr_merge'
> independently, so pass them in as separate parameters and reconstruct
> 'addr_merge' locally.
> 
> No functional change.
> 
> Signed-off-by: Will Deacon 
> Signed-off-by: Isaac J. Manjarres 
> ---
>  drivers/iommu/iommu.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index bcd623862bf9..ab689611a03b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2357,12 +2357,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
> *domain, dma_addr_t iova)
>  }
>  EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
>  
> -static size_t iommu_pgsize(struct iommu_domain *domain,
> -unsigned long addr_merge, size_t size)
> +static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
> +phys_addr_t paddr, size_t size)
>  {
>   unsigned int pgsize_idx;
>   unsigned long pgsizes;
>   size_t pgsize;
> + phys_addr_t addr_merge = paddr | iova;

^^^ this needs to be 'unsigned long' as it was before (otherwise using
GENMASK _is_ a problem).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v4 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-04-08 Thread Will Deacon
On Wed, Apr 07, 2021 at 09:52:38PM -0700, Isaac J. Manjarres wrote:
> Implement the unmap_pages() callback for the ARM v7s io-pgtable
> format.
> 
> Signed-off-by: Isaac J. Manjarres 
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 23 ---
>  1 file changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index d4004bcf333a..5e203e03c352 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
> *data,
>   return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
>  }
>  
> -static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> - size_t size, struct iommu_iotlb_gather *gather)
> +static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long 
> iova,
> +   size_t pgsize, size_t pgcount,
> +   struct iommu_iotlb_gather *gather)
>  {
>   struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
> + size_t unmapped = 0, ret;
>  
>   if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
>   return 0;
>  
> - return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
> + while (pgcount--) {
> + ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
> + if (!ret)
> + break;
> +
> + unmapped += pgsize;
> + iova += pgsize;
> + }
> +
> + return unmapped;
> +}

Wait -- don't you need to hook this up somewhere (likewise for ->map_pages)?
How are you testing this?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] iommu/arm-smmu: Updates for 5.13

2021-04-08 Thread Will Deacon
Hi Joerg,

There's hardly anything on the SMMU front for 5.13, but please pull
these regardless. Summary in the tag.

Cheers,

Will

--->8

The following changes since commit 1e28eed17697bcf343c6743f0028cc3b5dd88bf0:

  Linux 5.12-rc3 (2021-03-14 14:41:02 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
tags/arm-smmu-updates

for you to fetch changes up to e0bb4b73540495111ff2723e41cf5add2f031021:

  iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command 
(2021-04-07 11:30:40 +0100)


Arm SMMU updates for 5.13

- SMMUv3:

  * Drop vestigial PREFETCH_ADDR support

  * Elide TLB sync logic for empty gather

  * Fix "Service Failure Mode" handling

- SMMUv2:

  * New Qualcomm compatible string


Sai Prakash Ranjan (1):
  dt-bindings: arm-smmu: Add compatible for SC7280 SoC

Xiang Chen (1):
  iommu/arm-smmu-v3: Add a check to avoid invalid iotlb sync

Zenghui Yu (1):
  iommu/arm-smmu-v3: Remove the unused fields for PREFETCH_CONFIG command

Zhen Lei (1):
  iommu/arm-smmu-v3: add bit field SFM into GERROR_ERR_MASK

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 5 +++--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 4 +---
 3 files changed, 5 insertions(+), 5 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-08 Thread Thierry Reding
On Thu, Apr 08, 2021 at 02:42:42AM -0700, Nicolin Chen wrote:
> On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
> > All consumer-grade Android and Chromebook devices show a splash screen
> > on boot and then display is left enabled when kernel is booted. This
> > behaviour is unacceptable in a case of implicit IOMMU domains to which
> > devices are attached during kernel boot since devices, like display
> > controller, may perform DMA at that time. We can work around this problem
> > by deferring the enable of SMMU translation for a specific devices,
> > like a display controller, until the first IOMMU mapping is created,
> > which works good enough in practice because by that time h/w is already
> > stopped.
> > 
> > Signed-off-by: Dmitry Osipenko 
> 
> For both patches:
> Acked-by: Nicolin Chen 
> Tested-by: Nicolin Chen 
> 
> The WAR looks good to me. Perhaps Thierry would give some input.
> 
> Another topic:
> I think this may help work around the mc-errors, which we have
> been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA.
> (attached a test patch rebasing on these two)

Ugh... that's exactly what I was afraid of. Now everybody is going to
think that we can just work around this issue with driver-specific SMMU
hacks...

> However, GPU would also report errors using DMA domain:
> 
>  nouveau 5700.gpu: acr: firmware unavailable
>  nouveau 5700.gpu: pmu: firmware unavailable
>  nouveau 5700.gpu: gr: firmware unavailable
>  tegra-mc 70019000.memory-controller: gpusrd: read @0xfffbe200: 
> Security violation (TrustZone violation)
>  nouveau 5700.gpu: DRM: failed to create kernel channel, -22
>  tegra-mc 70019000.memory-controller: gpusrd: read @0xfffad000: 
> Security violation (TrustZone violation)
>  nouveau 5700.gpu: fifo: SCHED_ERROR 20 []
>  nouveau 5700.gpu: fifo: SCHED_ERROR 20 []
> 
> Looking at the address, seems that GPU allocated memory in 32-bit
> physical address space behind SMMU, so a violation happened after
> turning on DMA domain I guess... 

The problem with GPU is... extra complicated. You're getting these
faults because you're enabling the IOMMU-backed DMA API, which then
causes the Nouveau driver allocate buffers using the DMA API instead of
explicitly allocating pages and then mapping them using the IOMMU API.
However, there are additional patches needed to teach Nouveau about how
to deal with SMMU and those haven't been merged yet. I've got prototypes
of this, but before the whole framebuffer carveout passing work makes
progress there's little sense in moving individual pieces forward.

One more not to try and cut corners. We know what the right solution is,
even if it takes a lot of work. I'm willing to ack this patch, or some
version of it, but only as a way of working around things we have no
realistic chance of fixing properly anymore. I still think it would be
best if we could derive identity mappings from command-line arguments on
these platforms because I think most of them will actually set that, and
then the solution becomes at least uniform at the SMMU level.

For Tegra210 I've already laid out a path to a solution that's going to
be generic and extend to Tegra186 and later as well.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel
On Thu, Apr 08, 2021 at 10:29:25AM +, Nadav Amit wrote:
> In the version that you referred me to, iommu_update_domain_tlb() only
> regards the size of the region to be flushed and disregards the
> alignment:
> 
> + order   = get_order(domain->flush.end - domain->flush.start);
> + mask= (0x1000ULL << order) - 1;
> + address = ((domain->flush.start & ~mask) | (mask >> 1)) & ~0xfffULL;
> 
> 
> If you need to flush for instance the region between 0x1000-0x5000, this
> version would use the address|mask of 0x1000 (16KB page). The version I
> sent regards the alignment, and since the range is not aligned would use
> address|mask of 0x3000 (32KB page).
> 
> IIUC, IOVA allocations today are aligned in such way, but at least in
> the past (looking on 3.19 for the matter), it was not like always like
> that, which can explain the problems.

Yeah, that make sense and explains the data corruption problems. I will
give your patch a try on one of my test machines and consider it for
v5.13 if all goes well.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-08 Thread Thierry Reding
On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
> All consumer-grade Android and Chromebook devices show a splash screen
> on boot and then display is left enabled when kernel is booted. This
> behaviour is unacceptable in a case of implicit IOMMU domains to which
> devices are attached during kernel boot since devices, like display
> controller, may perform DMA at that time. We can work around this problem
> by deferring the enable of SMMU translation for a specific devices,
> like a display controller, until the first IOMMU mapping is created,
> which works good enough in practice because by that time h/w is already
> stopped.
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/iommu/tegra-smmu.c | 71 ++
>  1 file changed, 71 insertions(+)

In general I do see why we would want to enable this. However, I think
this is a bad idea because it's going to proliferate the bad practice of
not describing things properly in device tree.

Whatever happened to the idea of creating identity mappings based on the
obscure tegra_fb_mem (or whatever it was called) command-line option? Is
that command-line not universally passed to the kernel from bootloaders
that initialize display?

That idealistic objection aside, this seems a bit over-engineered for
the hack that it is. See below.

> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 602aab98c079..af1e4b5adb27 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -60,6 +60,8 @@ struct tegra_smmu_as {
>   dma_addr_t pd_dma;
>   unsigned id;
>   u32 attr;
> + bool display_attached[2];
> + bool attached_devices_need_sync;
>  };
>  
>  static struct tegra_smmu_as *to_smmu_as(struct iommu_domain *dom)
> @@ -78,6 +80,10 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, 
> unsigned long offset)
>   return readl(smmu->regs + offset);
>  }
>  
> +/* all Tegra SoCs use the same group IDs for displays */
> +#define SMMU_SWGROUP_DC  1
> +#define SMMU_SWGROUP_DCB 2
> +
>  #define SMMU_CONFIG 0x010
>  #define  SMMU_CONFIG_ENABLE (1 << 0)
>  
> @@ -253,6 +259,20 @@ static inline void smmu_flush(struct tegra_smmu *smmu)
>   smmu_readl(smmu, SMMU_PTB_ASID);
>  }
>  
> +static int smmu_swgroup_to_display_id(unsigned int swgroup)
> +{
> + switch (swgroup) {
> + case SMMU_SWGROUP_DC:
> + return 0;
> +
> + case SMMU_SWGROUP_DCB:
> + return 1;
> +
> + default:
> + return -1;
> + }
> +}
> +

Why do we need to have this two-level mapping? Do we even need to care
about the specific swgroups IDs? Can we not just simply check at attach
time if the client that's being attached is a display client and then
set atteched_devices_need_sync = true?

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 06/13] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs

2021-04-08 Thread Auger Eric
Hi Kunkun,

On 4/1/21 2:37 PM, Kunkun Jiang wrote:
> Hi Eric,
> 
> On 2021/2/24 4:56, Eric Auger wrote:
>> With nested stage support, soon we will need to invalidate
>> S1 contexts and ranges tagged with an unmanaged asid, this
>> latter being managed by the guest. So let's introduce 2 helpers
>> that allow to invalidate with externally managed ASIDs
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v13 -> v14
>> - Actually send the NH_ASID command (reported by Xingang Wang)
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 38 -
>>   1 file changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 5579ec4fccc8..4c19a1114de4 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -1843,9 +1843,9 @@ int arm_smmu_atc_inv_domain(struct
>> arm_smmu_domain *smmu_domain, int ssid,
>>   }
>>     /* IO_PGTABLE API */
>> -static void arm_smmu_tlb_inv_context(void *cookie)
>> +static void __arm_smmu_tlb_inv_context(struct arm_smmu_domain
>> *smmu_domain,
>> +   int ext_asid)
>>   {
>> -    struct arm_smmu_domain *smmu_domain = cookie;
>>   struct arm_smmu_device *smmu = smmu_domain->smmu;
>>   struct arm_smmu_cmdq_ent cmd;
>>   @@ -1856,7 +1856,13 @@ static void arm_smmu_tlb_inv_context(void
>> *cookie)
>>    * insertion to guarantee those are observed before the TLBI. Do be
>>    * careful, 007.
>>    */
>> -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) { /* guest stage 1 invalidation */
>> +    cmd.opcode    = CMDQ_OP_TLBI_NH_ASID;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> +    arm_smmu_cmdq_issue_cmd(smmu, );
>> +    arm_smmu_cmdq_issue_sync(smmu);
>> +    } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>>   arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid);
>>   } else {
>>   cmd.opcode    = CMDQ_OP_TLBI_S12_VMALL;
>> @@ -1867,6 +1873,13 @@ static void arm_smmu_tlb_inv_context(void *cookie)
>>   arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
>>   }
>>   +static void arm_smmu_tlb_inv_context(void *cookie)
>> +{
>> +    struct arm_smmu_domain *smmu_domain = cookie;
>> +
>> +    __arm_smmu_tlb_inv_context(smmu_domain, -1);
>> +}
>> +
>>   static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
>>    unsigned long iova, size_t size,
>>    size_t granule,
>> @@ -1926,9 +1939,10 @@ static void __arm_smmu_tlb_inv_range(struct
>> arm_smmu_cmdq_ent *cmd,
>>   arm_smmu_cmdq_batch_submit(smmu, );
>>   }
>>   
> Here is the part of code in __arm_smmu_tlb_inv_range():
>>     if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) {
>>     /* Get the leaf page size */
>>     tg = __ffs(smmu_domain->domain.pgsize_bitmap);
>>
>>     /* Convert page size of 12,14,16 (log2) to 1,2,3 */
>>     cmd->tlbi.tg = (tg - 10) / 2;
>>
>>     /* Determine what level the granule is at */
>>     cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3));
>>
>>     num_pages = size >> tg;
>>     }
> When pSMMU supports RIL, we get the leaf page size by __ffs(smmu_domain->
> domain.pgsize_bitmap). In nested mode, it is determined by host
> PAGE_SIZE. If
> the host kernel and guest kernel has different translation granule (e.g.
> host 16K,
> guest 4K), __arm_smmu_tlb_inv_range() will issue an incorrect tlbi command.
> 
> Do you have any idea about this issue?

I think this is the same issue as the one reported by Chenxiang

https://lore.kernel.org/lkml/15938ed5-2095-e903-a290-333c29901...@hisilicon.com/

In case RIL is not supported by the host, next version will use the
smallest pSMMU supported page size, as done in __arm_smmu_tlb_inv_range

Thanks

Eric

> 
> Best Regards,
> Kunkun Jiang
>> -static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t
>> size,
>> -  size_t granule, bool leaf,
>> -  struct arm_smmu_domain *smmu_domain)
>> +static void
>> +arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>> +  size_t granule, bool leaf, int ext_asid,
>> +  struct arm_smmu_domain *smmu_domain)
>>   {
>>   struct arm_smmu_cmdq_ent cmd = {
>>   .tlbi = {
>> @@ -1936,7 +1950,12 @@ static void
>> arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
>>   },
>>   };
>>   -    if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>> +    if (ext_asid >= 0) {  /* guest stage 1 invalidation */
>> +    cmd.opcode    = smmu_domain->smmu->features &
>> ARM_SMMU_FEAT_E2H ?
>> +  CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA;
>> +    cmd.tlbi.asid    = ext_asid;
>> +    cmd.tlbi.vmid    = smmu_domain->s2_cfg.vmid;
>> 

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-08 Thread Jason Gunthorpe
On Wed, Apr 07, 2021 at 11:50:02PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, April 7, 2021 8:21 PM
> > 
> > On Wed, Apr 07, 2021 at 02:08:33AM +, Tian, Kevin wrote:
> > 
> > > > Because if you don't then we enter insane world where a PASID is being
> > > > created under /dev/ioasid but its translation path flows through setup
> > > > done by VFIO and the whole user API becomes an incomprehensible
> > mess.
> > > >
> > > > How will you even associate the PASID with the other translation??
> > >
> > > PASID is attached to a specific iommu domain (created by VFIO/VDPA),
> > which
> > > has GPA->HPA mappings already configured. If we view that mapping as an
> > > attribute of the iommu domain, it's reasonable to have the userspace-
> > bound
> > > pgtable through /dev/ioasid to nest on it.
> > 
> > A user controlled page table should absolutely not be an attribute of
> > a hidden kernel object, nor should two parts of the kernel silently
> > connect to each other via a hidden internal objects like this.
> > 
> > Security is important - the kind of connection must use some explicit
> > FD authorization to access shared objects, not be made implicit!
> > 
> > IMHO this direction is a dead end for this reason.
> > 
> 
> Could you elaborate what exact security problem is brought with this 
> approach? Isn't ALLOW_PASID the authorization interface for the
> connection?

If the kernel objects don't come out of FDs then no.

> Is it really the only practice in Linux that any new feature has to be
> blocked as long as a refactoring work is identified? 

The practice is to define uAPIs that make sense and have a good chance
to be supported over a long time period, as the software evolves, not
to hacky hacky a gaint uAPI mess just to get some feature out the
door. 

This proposal as it was oringial shown is exactly the kind of hacky
hacky uapi nobody wants to see. Tunneling an IOMMU uapi through a
whole bunch of other FDs is completely nutz.

Intel should basically be investing most of its time building a robust
and well designed uAPI here, and don't complain that the community is
not doing Intel's job for free.

> Don't people accept any balance between enabling new features and
> completing refactoring work through a staging approach, as long as
> we don't introduce an uAPI specifically for the staging purpose? ☹

Since this is all uapi I don't see it as applicable here.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Nadav Amit

> On Apr 8, 2021, at 12:18 AM, Joerg Roedel  wrote:
> 
> Hi Nadav,
> 
> On Wed, Apr 07, 2021 at 05:57:31PM +, Nadav Amit wrote:
>> I tested it on real bare-metal hardware. I ran some basic I/O workloads
>> with the IOMMU enabled, checkers enabled/disabled, and so on.
>> 
>> However, I only tested the IOMMU-flushes and I did not test that the
>> device-IOTLB flush work, since I did not have the hardware for that.
>> 
>> If you can refer me to the old patches, I will have a look and see
>> whether I can see a difference in the logic or test them. If you want
>> me to run different tests - let me know. If you want me to remove
>> the device-IOTLB invalidations logic - that is also fine with me.
> 
> Here is the patch-set, it is from 2010 and against a very old version of
> the AMD IOMMU driver:

Thanks. I looked at your code and I see a difference between the
implementations.

As far as I understand, pages are always assumed to be aligned to their
own sizes. I therefore assume that flushes should regard the lower bits
as a “mask” and not just as encoding of the size.

In the version that you referred me to, iommu_update_domain_tlb() only
regards the size of the region to be flushed and disregards the
alignment:

+   order   = get_order(domain->flush.end - domain->flush.start);
+   mask= (0x1000ULL << order) - 1;
+   address = ((domain->flush.start & ~mask) | (mask >> 1)) & ~0xfffULL;


If you need to flush for instance the region between 0x1000-0x5000, this
version would use the address|mask of 0x1000 (16KB page). The version I
sent regards the alignment, and since the range is not aligned would use
address|mask of 0x3000 (32KB page).

IIUC, IOVA allocations today are aligned in such way, but at least in
the past (looking on 3.19 for the matter), it was not like always like
that, which can explain the problems.

Thoughts?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v1 1/2] iommu/tegra-smmu: Defer attachment of display clients

2021-04-08 Thread Nicolin Chen
On Mon, Mar 29, 2021 at 02:32:55AM +0300, Dmitry Osipenko wrote:
> All consumer-grade Android and Chromebook devices show a splash screen
> on boot and then display is left enabled when kernel is booted. This
> behaviour is unacceptable in a case of implicit IOMMU domains to which
> devices are attached during kernel boot since devices, like display
> controller, may perform DMA at that time. We can work around this problem
> by deferring the enable of SMMU translation for a specific devices,
> like a display controller, until the first IOMMU mapping is created,
> which works good enough in practice because by that time h/w is already
> stopped.
> 
> Signed-off-by: Dmitry Osipenko 

For both patches:
Acked-by: Nicolin Chen 
Tested-by: Nicolin Chen 

The WAR looks good to me. Perhaps Thierry would give some input.

Another topic:
I think this may help work around the mc-errors, which we have
been facing on Tegra210 also when we enable IOMMU_DOMAIN_DMA.
(attached a test patch rebasing on these two)

However, GPU would also report errors using DMA domain:

 nouveau 5700.gpu: acr: firmware unavailable
 nouveau 5700.gpu: pmu: firmware unavailable
 nouveau 5700.gpu: gr: firmware unavailable
 tegra-mc 70019000.memory-controller: gpusrd: read @0xfffbe200: 
Security violation (TrustZone violation)
 nouveau 5700.gpu: DRM: failed to create kernel channel, -22
 tegra-mc 70019000.memory-controller: gpusrd: read @0xfffad000: 
Security violation (TrustZone violation)
 nouveau 5700.gpu: fifo: SCHED_ERROR 20 []
 nouveau 5700.gpu: fifo: SCHED_ERROR 20 []

Looking at the address, seems that GPU allocated memory in 32-bit
physical address space behind SMMU, so a violation happened after
turning on DMA domain I guess... 
>From 20b58a74fee0c7b961b92f9118ad69a12199e6a5 Mon Sep 17 00:00:00 2001
From: Nicolin Chen 
Date: Thu, 12 Dec 2019 17:46:50 -0800
Subject: [PATCH 6/7] iommu/tegra-smmu: Add IOMMU_DOMAIN_DMA

Signed-off-by: Nicolin Chen 
---
 drivers/iommu/tegra-smmu.c | 39 ++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 8104f001e679..eff10d1ec568 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -297,35 +298,29 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
 {
 	struct tegra_smmu_as *as;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
 
 	as = kzalloc(sizeof(*as), GFP_KERNEL);
 	if (!as)
 		return NULL;
 
+	if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(>domain))
+		goto free_as;
+
 	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
 	as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
-	if (!as->pd) {
-		kfree(as);
-		return NULL;
-	}
+	if (!as->pd)
+		goto put_dma_cookie;
 
 	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
-	if (!as->count) {
-		__free_page(as->pd);
-		kfree(as);
-		return NULL;
-	}
+	if (!as->count)
+		goto free_pd_range;
 
 	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
-	if (!as->pts) {
-		kfree(as->count);
-		__free_page(as->pd);
-		kfree(as);
-		return NULL;
-	}
+	if (!as->pts)
+		goto free_pts;
 
 	spin_lock_init(>lock);
 
@@ -335,6 +330,17 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
 	as->attached_devices_need_sync = true;
 
 	return >domain;
+
+free_pts:
+	kfree(as->pts);
+free_pd_range:
+	__free_page(as->pd);
+put_dma_cookie:
+	iommu_put_dma_cookie(>domain);
+free_as:
+	kfree(as);
+
+	return NULL;
 }
 
 static void tegra_smmu_domain_free(struct iommu_domain *domain)
@@ -346,6 +352,7 @@ static void tegra_smmu_domain_free(struct iommu_domain *domain)
 	WARN_ON_ONCE(as->use_count);
 	kfree(as->count);
 	kfree(as->pts);
+	iommu_put_dma_cookie(domain);
 	kfree(as);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

2021-04-08 Thread Jean-Philippe Brucker
On Wed, Apr 07, 2021 at 04:36:54PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 07, 2021 at 08:43:50PM +0200, Jean-Philippe Brucker wrote:
> 
> > * Get a container handle out of /dev/ioasid (or /dev/iommu, really.)
> >   No operation available since we don't know what the device and IOMMU
> >   capabilities are.
> >
> > * Attach the handle to a VF. With VFIO that would be
> >   VFIO_GROUP_SET_CONTAINER. That causes the kernel to associate an IOMMU
> >   with the handle, and decide which operations are available.
> 
> Right, this is basically the point, - the VFIO container (/dev/vfio)
> and the /dev/ioasid we are talking about have a core of
> similarity. ioasid is the generalized, modernized, and cross-subsystem
> version of the same idea. Instead of calling it "vfio container" we
> call it something that evokes the idea of controlling the iommu.
> 
> The issue is to seperate /dev/vfio generic functionality from vfio and
> share it with every subsystem.
> 
> It may be that /dev/vfio and /dev/ioasid end up sharing a lot of code,
> with a different IOCTL interface around it. The vfio_iommu_driver_ops
> is not particularly VFIOy.
> 
> Creating /dev/ioasid may primarily start as a code reorganization
> exercise.
> 
> > * With a map/unmap vIOMMU (or shadow mappings), a single translation level
> >   is supported. With a nesting vIOMMU, we're populating the level-2
> >   translation (some day maybe by binding the KVM page tables, but
> >   currently with map/unmap ioctl).
> > 
> >   Single-level translation needs single VF per container. 
> 
> Really? Why?

The vIOMMU is started in bypass, so the device can do DMA to the GPA space
until the guest configures the vIOMMU, at which point each VF is either
kept in bypass or gets new DMA mappings, which requires the host to tear
down the bypass mappings and set up the guest mappings on a per-VF basis
(I'm not considering nesting translation in the host kernel for this,
because it's not supported by all pIOMMUs and is expensive in terms of TLB
and pinned memory). So keeping a single VF per container is simpler, but
there are certainly other programming models possible.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage

2021-04-08 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
Hi Baolu,

> -Original Message-
> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> Sent: Thursday, April 8, 2021 12:32 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> ; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org
> Cc: baolu...@linux.intel.com; David Woodhouse ; Nadav
> Amit ; Alex Williamson ;
> Kevin Tian ; Gonglei (Arei) ;
> sta...@vger.kernel.org
> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating 
> superpage
> 
> Hi Longpeng,
> 
> On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product
> Dept.) wrote:
> > Hi Baolu,
> >
> >> -Original Message-
> >> From: Lu Baolu [mailto:baolu...@linux.intel.com]
> >> Sent: Friday, April 2, 2021 12:44 PM
> >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> >> ; iommu@lists.linux-foundation.org;
> >> linux-ker...@vger.kernel.org
> >> Cc: baolu...@linux.intel.com; David Woodhouse ;
> >> Nadav Amit ; Alex Williamson
> >> ; Kevin Tian ;
> >> Gonglei (Arei) ; sta...@vger.kernel.org
> >> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating
> >> superpage
> >>
> >> Hi Longpeng,
> >>
> >> On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
> >>> diff --git a/drivers/iommu/intel/iommu.c
> >>> b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644
> >>> --- a/drivers/iommu/intel/iommu.c
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -2342,9 +2342,20 @@ static inline int
> >>> hardware_largepage_caps(struct
> >> dmar_domain *domain,
> >>>* removed to make room for 
> >>> superpage(s).
> >>>* We're adding new large pages, so 
> >>> make sure
> >>>* we don't remove their parent tables.
> >>> +  *
> >>> +  * We also need to flush the iotlb before 
> >>> creating
> >>> +  * superpage to ensure it does not perserves any
> >>> +  * obsolete info.
> >>>*/
> >>> - dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> >>> -largepage_lvl + 1);
> >>> + if (dma_pte_present(pte)) {
> >>
> >> The dma_pte_free_pagetable() clears a batch of PTEs. So checking
> >> current PTE is insufficient. How about removing this check and always
> >> performing cache invalidation?
> >>
> >
> > Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping )
> orNOT-present ( e.g. create a totally new superpage mapping ), but we only 
> need to
> call free_pagetable and flush_iotlb in the former case, right ?
> 
> But this code covers multiple PTEs and perhaps crosses the page boundary.
> 
> How about moving this code into a separated function and check PTE presence
> there. A sample code could look like below: [compiled but not tested!]
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index
> d334f5b4e382..0e04d450c38a 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct
> dmar_domain *domain,
>  return level;
>   }
> 
> +/*
> + * Ensure that old small page tables are removed to make room for
> superpage(s).
> + * We're going to add new large pages, so make sure we don't remove
> their parent
> + * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared.
> + */
> +static void switch_to_super_page(struct dmar_domain *domain,
> +unsigned long start_pfn,
> +unsigned long end_pfn, int level) {

Maybe "swith_to" will lead people to think "remove old and then setup new", so 
how about something like "remove_room_for_super_page" or 
"prepare_for_super_page" ?

> +   unsigned long lvl_pages = lvl_to_nr_pages(level);
> +   struct dma_pte *pte = NULL;
> +   int i;
> +
> +   while (start_pfn <= end_pfn) {

start_pfn < end_pfn ?

> +   if (!pte)
> +   pte = pfn_to_dma_pte(domain, start_pfn, );
> +
> +   if (dma_pte_present(pte)) {
> +   dma_pte_free_pagetable(domain, start_pfn,
> +  start_pfn + lvl_pages - 1,
> +  level + 1);
> +
> +   for_each_domain_iommu(i, domain)
> +   iommu_flush_iotlb_psi(g_iommus[i],
> domain,
> + start_pfn,
> lvl_pages,
> + 0, 0);
> +   }
> +
> +   pte++;
> +   start_pfn += lvl_pages;
> +   if (first_pte_in_page(pte))
> +   pte = NULL;
> +   }
> +}
> +
>   static int
>   __domain_mapping(struct dmar_domain *domain, unsigned long 

Re: [PATCH] iommu/amd: page-specific invalidations for more than one page

2021-04-08 Thread Joerg Roedel
Hi Nadav,

On Wed, Apr 07, 2021 at 05:57:31PM +, Nadav Amit wrote:
> I tested it on real bare-metal hardware. I ran some basic I/O workloads
> with the IOMMU enabled, checkers enabled/disabled, and so on.
> 
> However, I only tested the IOMMU-flushes and I did not test that the
> device-IOTLB flush work, since I did not have the hardware for that.
> 
> If you can refer me to the old patches, I will have a look and see
> whether I can see a difference in the logic or test them. If you want
> me to run different tests - let me know. If you want me to remove
> the device-IOTLB invalidations logic - that is also fine with me.

Here is the patch-set, it is from 2010 and against a very old version of
the AMD IOMMU driver:


https://lore.kernel.org/lkml/1265898797-32183-1-git-send-email-joerg.roe...@amd.com/

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu