[PATCH v2 6/6] iommu/vt-d: Make DMAR_UNITS_SUPPORTED default 1024

2022-07-01 Thread Lu Baolu
If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set
to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to
allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ
remapping doesn't support X2APIC mode x2apic disabled"; and the system
fails to boot properly.

To support up to 64 sockets with 10 DMAR units each (640), make the
value of DMAR_UNITS_SUPPORTED default 1024.

Signed-off-by: Steve Wahl
Link: 
https://lore.kernel.org/linux-iommu/20220615183650.32075-1-steve.w...@hpe.com/
Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 include/linux/dmar.h | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index cbd714a198a0..d81a51978d01 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -18,11 +18,7 @@
 
 struct acpi_dmar_header;
 
-#ifdef CONFIG_X86
-# define   DMAR_UNITS_SUPPORTEDMAX_IO_APICS
-#else
-# define   DMAR_UNITS_SUPPORTED64
-#endif
+#define DMAR_UNITS_SUPPORTED   1024
 
 /* DMAR Flags */
 #define DMAR_INTR_REMAP0x1
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/6] iommu/vt-d: Remove global g_iommus array

2022-07-01 Thread Lu Baolu
The g_iommus and g_num_of_iommus is not used anywhere. Remove them to
avoid dead code.

Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/iommu.c | 44 -
 1 file changed, 44 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d79c48c5fc8c..73a48e2d4fbe 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -126,9 +126,6 @@ static inline unsigned long virt_to_dma_pfn(void *p)
return page_to_dma_pfn(virt_to_page(p));
 }
 
-/* global iommu list, set NULL for ignored DMAR units */
-static struct intel_iommu **g_iommus;
-
 static void __init check_tylersburg_isoch(void);
 static int rwbf_quirk;
 
@@ -287,9 +284,6 @@ static LIST_HEAD(dmar_satc_units);
 #define for_each_rmrr_units(rmrr) \
list_for_each_entry(rmrr, _rmrr_units, list)
 
-/* bitmap for indexing intel_iommus */
-static int g_num_of_iommus;
-
 static void dmar_remove_one_dev_info(struct device *dev);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -1694,8 +1688,6 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
iommu->domain_ids = NULL;
}
 
-   g_iommus[iommu->seq_id] = NULL;
-
/* free context mapping */
free_context_table(iommu);
 
@@ -2901,36 +2893,6 @@ static int __init init_dmars(void)
struct intel_iommu *iommu;
int ret;
 
-   /*
-* for each drhd
-*allocate root
-*initialize and program root entry to not present
-* endfor
-*/
-   for_each_drhd_unit(drhd) {
-   /*
-* lock not needed as this is only incremented in the single
-* threaded kernel __init code path all other access are read
-* only
-*/
-   if (g_num_of_iommus < DMAR_UNITS_SUPPORTED) {
-   g_num_of_iommus++;
-   continue;
-   }
-   pr_err_once("Exceeded %d IOMMUs\n", DMAR_UNITS_SUPPORTED);
-   }
-
-   /* Preallocate enough resources for IOMMU hot-addition */
-   if (g_num_of_iommus < DMAR_UNITS_SUPPORTED)
-   g_num_of_iommus = DMAR_UNITS_SUPPORTED;
-
-   g_iommus = kcalloc(g_num_of_iommus, sizeof(struct intel_iommu *),
-   GFP_KERNEL);
-   if (!g_iommus) {
-   ret = -ENOMEM;
-   goto error;
-   }
-
ret = intel_cap_audit(CAP_AUDIT_STATIC_DMAR, NULL);
if (ret)
goto free_iommu;
@@ -2953,8 +2915,6 @@ static int __init init_dmars(void)
   intel_pasid_max_id);
}
 
-   g_iommus[iommu->seq_id] = iommu;
-
intel_iommu_init_qi(iommu);
 
ret = iommu_init_domains(iommu);
@@ -3080,9 +3040,6 @@ static int __init init_dmars(void)
free_dmar_iommu(iommu);
}
 
-   kfree(g_iommus);
-
-error:
return ret;
 }
 
@@ -3486,7 +3443,6 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
if (iommu->gcmd & DMA_GCMD_TE)
iommu_disable_translation(iommu);
 
-   g_iommus[iommu->seq_id] = iommu;
ret = iommu_init_domains(iommu);
if (ret == 0)
ret = iommu_alloc_root_entry(iommu);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/6] iommu/vt-d: Remove unnecessary check in intel_iommu_add()

2022-07-01 Thread Lu Baolu
The Intel IOMMU hot-add process starts from dmar_device_hotplug(). It
uses the global dmar_global_lock to synchronize all the hot-add and
hot-remove paths. In the hot-add path, the new IOMMU data structures
are allocated firstly by dmar_parse_one_drhd() and then initialized by
dmar_hp_add_drhd(). All the IOMMU units are allocated and initialized
in the same synchronized path. There is no case where any IOMMU unit
is created and then initialized for multiple times.

This removes the unnecessary check in intel_iommu_add() which is the
last reference place of the global IOMMU array.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 70408c234f5b..d79c48c5fc8c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3463,9 +3463,6 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
int sp, ret;
struct intel_iommu *iommu = dmaru->iommu;
 
-   if (g_iommus[iommu->seq_id])
-   return 0;
-
ret = intel_cap_audit(CAP_AUDIT_HOTPLUG_DMAR, iommu);
if (ret)
goto out;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/6] iommu/vt-d: Refactor iommu information of each domain

2022-07-01 Thread Lu Baolu
When a DMA domain is attached to a device, it needs to allocate a domain
ID from its IOMMU. Currently, the domain ID information is stored in two
static arrays embedded in the domain structure. This can lead to memory
waste when the driver is running on a small platform.

This optimizes these static arrays by replacing them with an xarray and
consuming memory on demand.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h |  29 ++---
 drivers/iommu/intel/iommu.c | 124 +---
 drivers/iommu/intel/pasid.c |   2 +-
 drivers/iommu/intel/svm.c   |   2 +-
 4 files changed, 94 insertions(+), 63 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 56e0d8cd2102..fae45bbb0c7f 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -524,17 +525,17 @@ struct context_entry {
  */
 #define DOMAIN_FLAG_USE_FIRST_LEVELBIT(1)
 
-struct dmar_domain {
-   int nid;/* node id */
-
-   unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED];
-   /* Refcount of devices per iommu */
-
-
-   u16 iommu_did[DMAR_UNITS_SUPPORTED];
-   /* Domain ids per IOMMU. Use u16 since
+struct iommu_domain_info {
+   struct intel_iommu *iommu;
+   unsigned int refcnt;/* Refcount of devices per iommu */
+   u16 did;/* Domain ids per IOMMU. Use u16 since
 * domain ids are 16 bit wide according
 * to VT-d spec, section 9.3 */
+};
+
+struct dmar_domain {
+   int nid;/* node id */
+   struct xarray iommu_array;  /* Attached IOMMU array */
 
u8 has_iotlb_device: 1;
u8 iommu_coherency: 1;  /* indicate coherency of iommu access */
@@ -640,6 +641,16 @@ static inline struct dmar_domain *to_dmar_domain(struct 
iommu_domain *dom)
return container_of(dom, struct dmar_domain, domain);
 }
 
+/* Retrieve the domain ID which has allocated to the domain */
+static inline u16
+domain_id_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
+{
+   struct iommu_domain_info *info =
+   xa_load(>iommu_array, iommu->seq_id);
+
+   return info->did;
+}
+
 /*
  * 0: readable
  * 1: writable
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 781e060352e6..70408c234f5b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -254,10 +254,6 @@ static inline void context_clear_entry(struct 
context_entry *context)
 static struct dmar_domain *si_domain;
 static int hw_pass_through = 1;
 
-#define for_each_domain_iommu(idx, domain) \
-   for (idx = 0; idx < g_num_of_iommus; idx++) \
-   if (domain->iommu_refcnt[idx])
-
 struct dmar_rmrr_unit {
struct list_head list;  /* list of rmrr units   */
struct acpi_dmar_header *hdr;   /* ACPI header  */
@@ -453,16 +449,16 @@ static inline bool 
iommu_paging_structure_coherency(struct intel_iommu *iommu)
 
 static void domain_update_iommu_coherency(struct dmar_domain *domain)
 {
+   struct iommu_domain_info *info;
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu;
bool found = false;
-   int i;
+   unsigned long i;
 
domain->iommu_coherency = true;
-
-   for_each_domain_iommu(i, domain) {
+   xa_for_each(>iommu_array, i, info) {
found = true;
-   if (!iommu_paging_structure_coherency(g_iommus[i])) {
+   if (!iommu_paging_structure_coherency(info->iommu)) {
domain->iommu_coherency = false;
break;
}
@@ -1495,7 +1491,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
unsigned int aligned_pages = __roundup_pow_of_two(pages);
unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
-   u16 did = domain->iommu_did[iommu->seq_id];
+   u16 did = domain_id_iommu(domain, iommu);
 
BUG_ON(pages == 0);
 
@@ -1565,11 +1561,12 @@ static inline void __mapping_notify_one(struct 
intel_iommu *iommu,
 static void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-   int idx;
+   struct iommu_domain_info *info;
+   unsigned long idx;
 
-   for_each_domain_iommu(idx, dmar_domain) {
-   struct intel_iommu *iommu = g_iommus[idx];
-   u16 did = dmar_domain->iommu_did[iommu->seq_id];
+   xa_for_each(_domain->iommu_array, idx, info) {
+   struct intel_iommu *iommu = info->iommu;
+   u16 did = domain_id_iommu(dmar_domain, 

[PATCH v2 2/6] iommu/vt-d: Use IDA interface to manage iommu sequence id

2022-07-01 Thread Lu Baolu
Switch dmar unit sequence id allocation and release from bitmap to IDA
interface.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/dmar.c | 35 ---
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 2a5e0f91e647..6327b34f5aa7 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -60,7 +60,7 @@ LIST_HEAD(dmar_drhd_units);
 
 struct acpi_table_header * __initdata dmar_tbl;
 static int dmar_dev_scope_status = 1;
-static unsigned long dmar_seq_ids[BITS_TO_LONGS(DMAR_UNITS_SUPPORTED)];
+static DEFINE_IDA(dmar_seq_ids);
 
 static int alloc_iommu(struct dmar_drhd_unit *drhd);
 static void free_iommu(struct intel_iommu *iommu);
@@ -1023,28 +1023,6 @@ static int map_iommu(struct intel_iommu *iommu, u64 
phys_addr)
return err;
 }
 
-static int dmar_alloc_seq_id(struct intel_iommu *iommu)
-{
-   iommu->seq_id = find_first_zero_bit(dmar_seq_ids,
-   DMAR_UNITS_SUPPORTED);
-   if (iommu->seq_id >= DMAR_UNITS_SUPPORTED) {
-   iommu->seq_id = -1;
-   } else {
-   set_bit(iommu->seq_id, dmar_seq_ids);
-   sprintf(iommu->name, "dmar%d", iommu->seq_id);
-   }
-
-   return iommu->seq_id;
-}
-
-static void dmar_free_seq_id(struct intel_iommu *iommu)
-{
-   if (iommu->seq_id >= 0) {
-   clear_bit(iommu->seq_id, dmar_seq_ids);
-   iommu->seq_id = -1;
-   }
-}
-
 static int alloc_iommu(struct dmar_drhd_unit *drhd)
 {
struct intel_iommu *iommu;
@@ -1062,11 +1040,14 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
if (!iommu)
return -ENOMEM;
 
-   if (dmar_alloc_seq_id(iommu) < 0) {
+   iommu->seq_id = ida_alloc_range(_seq_ids, 0,
+   DMAR_UNITS_SUPPORTED - 1, GFP_KERNEL);
+   if (iommu->seq_id < 0) {
pr_err("Failed to allocate seq_id\n");
-   err = -ENOSPC;
+   err = iommu->seq_id;
goto error;
}
+   sprintf(iommu->name, "dmar%d", iommu->seq_id);
 
err = map_iommu(iommu, drhd->reg_base_addr);
if (err) {
@@ -1150,7 +1131,7 @@ static int alloc_iommu(struct dmar_drhd_unit *drhd)
 err_unmap:
unmap_iommu(iommu);
 error_free_seq_id:
-   dmar_free_seq_id(iommu);
+   ida_free(_seq_ids, iommu->seq_id);
 error:
kfree(iommu);
return err;
@@ -1183,7 +1164,7 @@ static void free_iommu(struct intel_iommu *iommu)
if (iommu->reg)
unmap_iommu(iommu);
 
-   dmar_free_seq_id(iommu);
+   ida_free(_seq_ids, iommu->seq_id);
kfree(iommu);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/6] iommu/vt-d: Remove unused domain_get_iommu()

2022-07-01 Thread Lu Baolu
It is not used anywhere. Remove it to avoid dead code.

Signed-off-by: Lu Baolu 
Reviewed-by: Kevin Tian 
---
 drivers/iommu/intel/iommu.h |  1 -
 drivers/iommu/intel/iommu.c | 18 --
 2 files changed, 19 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index df64d3d9c49a..56e0d8cd2102 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -725,7 +725,6 @@ extern int dmar_ir_support(void);
 
 void *alloc_pgtable_page(int node);
 void free_pgtable_page(void *vaddr);
-struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index da6cfea0f0d6..781e060352e6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -445,24 +445,6 @@ int iommu_calculate_agaw(struct intel_iommu *iommu)
return __iommu_calculate_agaw(iommu, DEFAULT_DOMAIN_ADDRESS_WIDTH);
 }
 
-/* This functionin only returns single iommu in a domain */
-struct intel_iommu *domain_get_iommu(struct dmar_domain *domain)
-{
-   int iommu_id;
-
-   /* si_domain and vm domain should not get here. */
-   if (WARN_ON(!iommu_is_dma_domain(>domain)))
-   return NULL;
-
-   for_each_domain_iommu(iommu_id, domain)
-   break;
-
-   if (iommu_id < 0 || iommu_id >= g_num_of_iommus)
-   return NULL;
-
-   return g_iommus[iommu_id];
-}
-
 static inline bool iommu_paging_structure_coherency(struct intel_iommu *iommu)
 {
return sm_supported(iommu) ?
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/6] iommu/vt-d: Reset DMAR_UNITS_SUPPORTED

2022-07-01 Thread Lu Baolu
Hi folks,

This is a follow-up series of changes proposed by this patch:

https://lore.kernel.org/linux-iommu/20220615183650.32075-1-steve.w...@hpe.com/

It removes several static arrays of size DMAR_UNITS_SUPPORTED and sets
the DMAR_UNITS_SUPPORTED to 1024.

This series is also available on github:

https://github.com/LuBaolu/intel-iommu/commits/reset-DMAR_UNITS_SUPPORTED-v2

Please help review and suggest.

Best regards,
baolu

Change log:

v2:
 - Set the right @max for ida_alloc_range() and return the right error
   when ida_alloc_range() returns failure.
 - Replace xa_store() with xa_cmpxchg().
 - Set domain->nid to NUMA_NO_NODE when domain is detached from an
   iommu.
 - Avoid adding a new VTD_FLAG_IOMMU_PROBED flag. Remove the duplicate
   check directly.

v1:
 - 
https://lore.kernel.org/lkml/20220625125204.2199437-1-baolu...@linux.intel.com/
 - Initial post.

Lu Baolu (6):
  iommu/vt-d: Remove unused domain_get_iommu()
  iommu/vt-d: Use IDA interface to manage iommu sequence id
  iommu/vt-d: Refactor iommu information of each domain
  iommu/vt-d: Remove unnecessary check in intel_iommu_add()
  iommu/vt-d: Remove global g_iommus array
  iommu/vt-d: Make DMAR_UNITS_SUPPORTED default 1024

 include/linux/dmar.h|   6 +-
 drivers/iommu/intel/iommu.h |  30 --
 drivers/iommu/intel/dmar.c  |  35 ++-
 drivers/iommu/intel/iommu.c | 189 ++--
 drivers/iommu/intel/pasid.c |   2 +-
 drivers/iommu/intel/svm.c   |   2 +-
 6 files changed, 103 insertions(+), 161 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/5] vfio/iommu_type1: Simplify group attachment

2022-07-01 Thread Nicolin Chen via iommu
Un-inline the domain specific logic from the attach/detach_group ops into
two paired functions vfio_iommu_alloc_attach_domain() and
vfio_iommu_detach_destroy_domain() that strictly deal with creating and
destroying struct vfio_domains.

Add the logic to check for EMEDIUMTYPE return code of iommu_attach_group()
and avoid the extra domain allocations and attach/detach sequences of the
old code. This allows properly detecting an actual attach error, like
-ENOMEM, vs treating all attach errors as an incompatible domain.

Reviewed-by: Kevin Tian 
Co-developed-by: Jason Gunthorpe 
Signed-off-by: Jason Gunthorpe 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 333 +---
 1 file changed, 180 insertions(+), 153 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5624bbf02ab7..d3a4cedcd082 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2155,14 +2155,179 @@ static int vfio_iommu_domain_alloc(struct device *dev, 
void *data)
return 1; /* Don't iterate */
 }
 
+static struct vfio_domain *
+vfio_iommu_alloc_attach_domain(struct vfio_iommu *iommu,
+  struct vfio_iommu_group *group,
+  struct list_head *group_resv_regions)
+{
+   struct iommu_domain *new_domain;
+   struct vfio_domain *domain;
+   phys_addr_t resv_msi_base;
+   int ret = 0;
+
+   /* Try to match an existing compatible domain */
+   list_for_each_entry (domain, >domain_list, next) {
+   ret = iommu_attach_group(domain->domain, group->iommu_group);
+   /* -EMEDIUMTYPE means an incompatible domain, so try next one */
+   if (ret == -EMEDIUMTYPE)
+   continue;
+   if (ret)
+   return ERR_PTR(ret);
+   goto done;
+   }
+
+   /*
+* Going via the iommu_group iterator avoids races, and trivially gives
+* us a representative device for the IOMMU API call. We don't actually
+* want to iterate beyond the first device (if any).
+*/
+   iommu_group_for_each_dev(group->iommu_group, _domain,
+vfio_iommu_domain_alloc);
+   if (!new_domain)
+   return ERR_PTR(-EIO);
+
+   if (iommu->nesting) {
+   ret = iommu_enable_nesting(new_domain);
+   if (ret)
+   goto out_free_iommu_domain;
+   }
+
+   ret = iommu_attach_group(new_domain, group->iommu_group);
+   if (ret)
+   goto out_free_iommu_domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (!domain) {
+   ret = -ENOMEM;
+   goto out_detach;
+   }
+
+   domain->domain = new_domain;
+   vfio_test_domain_fgsp(domain);
+
+   /*
+* If the IOMMU can block non-coherent operations (ie PCIe TLPs with
+* no-snoop set) then VFIO always turns this feature on because on Intel
+* platforms it optimizes KVM to disable wbinvd emulation.
+*/
+   if (new_domain->ops->enforce_cache_coherency)
+   domain->enforce_cache_coherency =
+   new_domain->ops->enforce_cache_coherency(new_domain);
+
+   /* replay mappings on new domains */
+   ret = vfio_iommu_replay(iommu, domain);
+   if (ret)
+   goto out_free_domain;
+
+   if (vfio_iommu_has_sw_msi(group_resv_regions, _msi_base)) {
+   ret = iommu_get_msi_cookie(domain->domain, resv_msi_base);
+   if (ret && ret != -ENODEV)
+   goto out_free_domain;
+   }
+
+   INIT_LIST_HEAD(>group_list);
+   list_add(>next, >domain_list);
+   vfio_update_pgsize_bitmap(iommu);
+
+done:
+   list_add(>next, >group_list);
+
+   /*
+* An iommu backed group can dirty memory directly and therefore
+* demotes the iommu scope until it declares itself dirty tracking
+* capable via the page pinning interface.
+*/
+   iommu->num_non_pinned_groups++;
+
+   return domain;
+
+out_free_domain:
+   kfree(domain);
+out_detach:
+   iommu_detach_group(new_domain, group->iommu_group);
+out_free_iommu_domain:
+   iommu_domain_free(new_domain);
+   return ERR_PTR(ret);
+}
+
+static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *node;
+
+   while ((node = rb_first(>dma_list)))
+   vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
+}
+
+static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
+{
+   struct rb_node *n, *p;
+
+   n = rb_first(>dma_list);
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma;
+   long locked = 0, unlocked = 0;
+
+   dma = rb_entry(n, struct vfio_dma, node);
+   unlocked += vfio_unmap_unpin(iommu, dma, false);
+  

[PATCH v5 4/5] vfio/iommu_type1: Clean up update_dirty_scope in detach_group()

2022-07-01 Thread Nicolin Chen via iommu
All devices in emulated_iommu_groups have pinned_page_dirty_scope
set, so the update_dirty_scope in the first list_for_each_entry
is always false. Clean it up, and move the "if update_dirty_scope"
part from the detach_group_done routine to the domain_list part.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5992ee2345a0..5624bbf02ab7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2456,14 +2456,12 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
struct vfio_iommu *iommu = iommu_data;
struct vfio_domain *domain;
struct vfio_iommu_group *group;
-   bool update_dirty_scope = false;
LIST_HEAD(iova_copy);
 
mutex_lock(>lock);
list_for_each_entry(group, >emulated_iommu_groups, next) {
if (group->iommu_group != iommu_group)
continue;
-   update_dirty_scope = !group->pinned_page_dirty_scope;
list_del(>next);
kfree(group);
 
@@ -2472,7 +2470,8 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
WARN_ON(iommu->notifier.head);
vfio_iommu_unmap_unpin_all(iommu);
}
-   goto detach_group_done;
+   mutex_unlock(>lock);
+   return;
}
 
/*
@@ -2488,9 +2487,7 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
continue;
 
iommu_detach_group(domain->domain, group->iommu_group);
-   update_dirty_scope = !group->pinned_page_dirty_scope;
list_del(>next);
-   kfree(group);
/*
 * Group ownership provides privilege, if the group list is
 * empty, the domain goes away. If it's the last domain with
@@ -2513,6 +2510,16 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
vfio_iommu_aper_expand(iommu, _copy);
vfio_update_pgsize_bitmap(iommu);
}
+   /*
+* Removal of a group without dirty tracking may allow
+* the iommu scope to be promoted.
+*/
+   if (!group->pinned_page_dirty_scope) {
+   iommu->num_non_pinned_groups--;
+   if (iommu->dirty_page_tracking)
+   vfio_iommu_populate_bitmap_full(iommu);
+   }
+   kfree(group);
break;
}
 
@@ -2521,16 +2528,6 @@ static void vfio_iommu_type1_detach_group(void 
*iommu_data,
else
vfio_iommu_iova_free(_copy);
 
-detach_group_done:
-   /*
-* Removal of a group without dirty tracking may allow the iommu scope
-* to be promoted.
-*/
-   if (update_dirty_scope) {
-   iommu->num_non_pinned_groups--;
-   if (iommu->dirty_page_tracking)
-   vfio_iommu_populate_bitmap_full(iommu);
-   }
mutex_unlock(>lock);
 }
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/5] vfio/iommu_type1: Remove the domain->ops comparison

2022-07-01 Thread Nicolin Chen via iommu
The domain->ops validation was added, as a precaution, for mixed-driver
systems.

Per Robin's remarks,
* While bus_set_iommu() still exists, the core code prevents multiple
  drivers from registering, so we can't really run into a situation of
  having a mixed-driver system:
  https://lore.kernel.org/kvm/6e1280c5-4b22-ebb3-3912-6c72bc169...@arm.com/

* And there's plenty more significant problems than this to fix; in future
  when many can be permitted, we will rely on the IOMMU core code to check
  the domain->ops:
  https://lore.kernel.org/kvm/6575de6d-94ba-c427-5b1e-967750ddf...@arm.com/

So remove the check in VFIO for simplicity.

Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 7530f0d727e5..5992ee2345a0 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2280,29 +2280,19 @@ static int vfio_iommu_type1_attach_group(void 
*iommu_data,
domain->domain->ops->enforce_cache_coherency(
domain->domain);
 
-   /*
-* Try to match an existing compatible domain.  We don't want to
-* preclude an IOMMU driver supporting multiple bus_types and being
-* able to include different bus_types in the same IOMMU domain, so
-* we test whether the domains use the same iommu_ops rather than
-* testing if they're on the same bus_type.
-*/
+   /* Try to match an existing compatible domain */
list_for_each_entry(d, >domain_list, next) {
-   if (d->domain->ops == domain->domain->ops) {
-   iommu_detach_group(domain->domain, group->iommu_group);
-   if (!iommu_attach_group(d->domain,
-   group->iommu_group)) {
-   list_add(>next, >group_list);
-   iommu_domain_free(domain->domain);
-   kfree(domain);
-   goto done;
-   }
-
-   ret = iommu_attach_group(domain->domain,
-group->iommu_group);
-   if (ret)
-   goto out_domain;
+   iommu_detach_group(domain->domain, group->iommu_group);
+   if (!iommu_attach_group(d->domain, group->iommu_group)) {
+   list_add(>next, >group_list);
+   iommu_domain_free(domain->domain);
+   kfree(domain);
+   goto done;
}
+
+   ret = iommu_attach_group(domain->domain,  group->iommu_group);
+   if (ret)
+   goto out_domain;
}
 
vfio_test_domain_fgsp(domain);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/5] vfio/iommu_type1: Prefer to reuse domains vs match enforced cache coherency

2022-07-01 Thread Nicolin Chen via iommu
From: Jason Gunthorpe 

The KVM mechanism for controlling wbinvd is based on OR of the coherency
property of all devices attached to a guest, no matter whether those
devices are attached to a single domain or multiple domains.

On the other hand, the benefit to using separate domains was that those
devices attached to domains supporting enforced cache coherency always
mapped with the attributes necessary to provide that feature, therefore
if a non-enforced domain was dropped, the associated group removal would
re-trigger an evaluation by KVM.

In practice however, the only known cases of such mixed domains included
an Intel IGD device behind an IOMMU lacking snoop control, where such
devices do not support hotplug, therefore this scenario lacks testing and
is not considered sufficiently relevant to support.

After all, KVM won't take advantage of trying to push a device that could
do enforced cache coherency to a dedicated domain vs re-using an existing
domain, which is non-coherent.

Simplify this code and eliminate the test. This removes the only logic
that needed to have a dummy domain attached prior to searching for a
matching domain and simplifies the next patches.

It's unclear whether we want to further optimize the Intel driver to
update the domain coherency after a device is detached from it, at
least not before KVM can be verified to handle such dynamics in related
emulation paths (wbinvd, vcpu load, write_cr0, ept, etc.). In reality
we don't see an usage requiring such optimization as the only device
which imposes such non-coherency is Intel GPU which even doesn't
support hotplug/hot remove.

Signed-off-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Reviewed-by: Lu Baolu 
Signed-off-by: Nicolin Chen 
---
 drivers/vfio/vfio_iommu_type1.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c496b7d0b96f..7530f0d727e5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2288,9 +2288,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 * testing if they're on the same bus_type.
 */
list_for_each_entry(d, >domain_list, next) {
-   if (d->domain->ops == domain->domain->ops &&
-   d->enforce_cache_coherency ==
-   domain->enforce_cache_coherency) {
+   if (d->domain->ops == domain->domain->ops) {
iommu_detach_group(domain->domain, group->iommu_group);
if (!iommu_attach_group(d->domain,
group->iommu_group)) {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Nicolin Chen via iommu
Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.

This is a soft failure that can be resolved by using a different domain.

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.

VFIO can use this to know attach is a soft failure and it should continue
searching. Otherwise the attach will be a hard failure and VFIO will
return the code to userspace.

Update all drivers to return EMEDIUMTYPE in their failure paths that are
related to domain incompatability. Also remove adjacent error prints for
these soft failures, to prevent a kernel log spam, since -EMEDIUMTYPE is
clear enough to indicate an incompatability error.

Add kdocs describing this behavior.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Reviewed-by: Lu Baolu 
Signed-off-by: Nicolin Chen 
---
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/apple-dart.c  |  4 +--
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  5 +---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |  9 ++-
 drivers/iommu/intel/iommu.c | 10 +++-
 drivers/iommu/iommu.c   | 28 +
 drivers/iommu/ipmmu-vmsa.c  |  4 +--
 drivers/iommu/omap-iommu.c  |  3 +--
 drivers/iommu/s390-iommu.c  |  2 +-
 drivers/iommu/sprd-iommu.c  |  6 ++---
 drivers/iommu/tegra-gart.c  |  2 +-
 drivers/iommu/virtio-iommu.c|  3 +--
 13 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 840831d5d2ad..ad499658a6b6 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1662,7 +1662,7 @@ static int attach_device(struct device *dev,
if (domain->flags & PD_IOMMUV2_MASK) {
struct iommu_domain *def_domain = iommu_get_dma_domain(dev);
 
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
if (def_domain->type != IOMMU_DOMAIN_IDENTITY)
goto out;
 
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index 8af0242a90d9..e58dc310afd7 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -495,10 +495,10 @@ static int apple_dart_attach_dev(struct iommu_domain 
*domain,
 
if (cfg->stream_maps[0].dart->force_bypass &&
domain->type != IOMMU_DOMAIN_IDENTITY)
-   return -EINVAL;
+   return -EMEDIUMTYPE;
if (!cfg->stream_maps[0].dart->supports_bypass &&
domain->type == IOMMU_DOMAIN_IDENTITY)
-   return -EINVAL;
+   return -EMEDIUMTYPE;
 
ret = apple_dart_finalize_domain(domain, cfg);
if (ret)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 88817a3376ef..5b64138f549d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2420,24 +2420,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
goto out_unlock;
}
} else if (smmu_domain->smmu != smmu) {
-   dev_err(dev,
-   "cannot attach to SMMU %s (upstream of %s)\n",
-   dev_name(smmu_domain->smmu->dev),
-   dev_name(smmu->dev));
-   ret = -ENXIO;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
   master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) {
-   dev_err(dev,
-   "cannot attach to incompatible domain (%u SSID bits != 
%u)\n",
-   smmu_domain->s1_cfg.s1cdmax, master->ssid_bits);
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
} else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
   smmu_domain->stall_enabled != master->stall_enabled) {
-   dev_err(dev, "cannot attach to stall-%s domain\n",
-   smmu_domain->stall_enabled ? "enabled" : "disabled");
-   ret = -EINVAL;
+   ret = -EMEDIUMTYPE;
goto out_unlock;
}
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..4b95673be076 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ 

[PATCH v5 0/5] cover-letter: Simplify vfio_iommu_type1 attach/detach routine

2022-07-01 Thread Nicolin Chen via iommu
This is a preparatory series for IOMMUFD v2 patches. It enforces error
code -EMEDIUMTYPE in iommu_attach_device() and iommu_attach_group() when
an IOMMU domain and a device/group are incompatible. It also drops the
useless domain->ops check since it won't fail in current environment.

These allow VFIO iommu code to simplify its group attachment routine, by
avoiding the extra IOMMU domain allocations and attach/detach sequences
of the old code.

Worths mentioning the exact match for enforce_cache_coherency is removed
with this series, since there's very less value in doing that as KVM will
not be able to take advantage of it -- this just wastes domain memory.
Instead, we rely on Intel IOMMU driver taking care of that internally.

This is on github:
https://github.com/nicolinc/iommufd/commits/vfio_iommu_attach

Changelog
v5:
 * Rebased on top of Robin's "Simplify bus_type determination".
 * Fixed a wrong change returning -EMEDIUMTYPE in arm-smmu driver.
 * Added Baolu's "Reviewed-by".
v4:
 * Dropped -EMEDIUMTYPE change in mtk_v1 driver per Robin's input
 * Added Baolu's and Kevin's Reviewed-by lines
v3: https://lore.kernel.org/kvm/20220623200029.26007-1-nicol...@nvidia.com/
 * Dropped all dev_err since -EMEDIUMTYPE clearly indicates what error.
 * Updated commit message of enforce_cache_coherency removing patch.
 * Updated commit message of domain->ops removing patch.
 * Replaced "goto out_unlock" with simply mutex_unlock() and return.
 * Added a line of comments for -EMEDIUMTYPE return check.
 * Moved iommu_get_msi_cookie() into alloc_attach_domain() as a cookie
   should be logically tied to the lifetime of a domain itself.
 * Added Kevin's "Reviewed-by".
v2: https://lore.kernel.org/kvm/20220616000304.23890-1-nicol...@nvidia.com/
 * Added -EMEDIUMTYPE to more IOMMU drivers that fit the category.
 * Changed dev_err to dev_dbg for -EMEDIUMTYPE to avoid kernel log spam.
 * Dropped iommu_ops patch, and removed domain->ops in VFIO directly,
   since there's no mixed-driver use case that would fail the sanity.
 * Updated commit log of the patch removing enforce_cache_coherency.
 * Fixed a misplace of "num_non_pinned_groups--" in detach_group patch.
 * Moved "num_non_pinned_groups++" in PATCH-5 to the common path between
   domain-reusing and new-domain pathways, like the code previously did.
 * Fixed a typo in EMEDIUMTYPE patch.
v1: https://lore.kernel.org/kvm/20220606061927.26049-1-nicol...@nvidia.com/

Jason Gunthorpe (1):
  vfio/iommu_type1: Prefer to reuse domains vs match enforced cache
coherency

Nicolin Chen (4):
  iommu: Return -EMEDIUMTYPE for incompatible domain and device/group
  vfio/iommu_type1: Remove the domain->ops comparison
  vfio/iommu_type1: Clean up update_dirty_scope in detach_group()
  vfio/iommu_type1: Simplify group attachment

 drivers/iommu/amd/iommu.c   |   2 +-
 drivers/iommu/apple-dart.c  |   4 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  15 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |   5 +-
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |   9 +-
 drivers/iommu/intel/iommu.c |  10 +-
 drivers/iommu/iommu.c   |  28 ++
 drivers/iommu/ipmmu-vmsa.c  |   4 +-
 drivers/iommu/omap-iommu.c  |   3 +-
 drivers/iommu/s390-iommu.c  |   2 +-
 drivers/iommu/sprd-iommu.c  |   6 +-
 drivers/iommu/tegra-gart.c  |   2 +-
 drivers/iommu/virtio-iommu.c|   3 +-
 drivers/vfio/vfio_iommu_type1.c | 352 ++--
 14 files changed, 229 insertions(+), 216 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/8] genirq: Drop redundant irq_init_effective_affinity

2022-07-01 Thread Samuel Holland
It does exactly the same thing as irq_data_update_effective_affinity.

Signed-off-by: Samuel Holland 
---

Changes in v3:
 - New patch to drop irq_init_effective_affinity

 kernel/irq/manage.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 8c396319d5ac..40fe7806cc8c 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -205,16 +205,8 @@ static void irq_validate_effective_affinity(struct 
irq_data *data)
pr_warn_once("irq_chip %s did not update eff. affinity mask of irq 
%u\n",
 chip->name, data->irq);
 }
-
-static inline void irq_init_effective_affinity(struct irq_data *data,
-  const struct cpumask *mask)
-{
-   cpumask_copy(irq_data_get_effective_affinity_mask(data), mask);
-}
 #else
 static inline void irq_validate_effective_affinity(struct irq_data *data) { }
-static inline void irq_init_effective_affinity(struct irq_data *data,
-  const struct cpumask *mask) { }
 #endif
 
 int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
@@ -347,7 +339,7 @@ static bool irq_set_affinity_deactivated(struct irq_data 
*data,
return false;
 
cpumask_copy(desc->irq_common_data.affinity, mask);
-   irq_init_effective_affinity(data, mask);
+   irq_data_update_effective_affinity(data, mask);
irqd_set(data, IRQD_AFFINITY_SET);
return true;
 }
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/8] genirq: Add and use an irq_data_update_affinity helper

2022-07-01 Thread Samuel Holland
Some architectures and irqchip drivers modify the cpumask returned by
irq_data_get_affinity_mask, usually by copying in to it. This is
problematic for uniprocessor configurations, where the affinity mask
should be constant, as it is known at compile time.

Add and use a setter for the affinity mask, following the pattern of
irq_data_update_effective_affinity. This allows the getter function to
return a const cpumask pointer.

Signed-off-by: Samuel Holland 
---

Changes in v3:
 - New patch to introduce irq_data_update_affinity

 arch/alpha/kernel/irq.c  | 2 +-
 arch/ia64/kernel/iosapic.c   | 2 +-
 arch/ia64/kernel/irq.c   | 4 ++--
 arch/ia64/kernel/msi_ia64.c  | 4 ++--
 arch/parisc/kernel/irq.c | 2 +-
 drivers/irqchip/irq-bcm6345-l1.c | 4 ++--
 drivers/parisc/iosapic.c | 2 +-
 drivers/sh/intc/chip.c   | 2 +-
 drivers/xen/events/events_base.c | 7 ---
 include/linux/irq.h  | 6 ++
 10 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/arch/alpha/kernel/irq.c b/arch/alpha/kernel/irq.c
index f6d2946edbd2..15f2effd6baf 100644
--- a/arch/alpha/kernel/irq.c
+++ b/arch/alpha/kernel/irq.c
@@ -60,7 +60,7 @@ int irq_select_affinity(unsigned int irq)
cpu = (cpu < (NR_CPUS-1) ? cpu + 1 : 0);
last_cpu = cpu;
 
-   cpumask_copy(irq_data_get_affinity_mask(data), cpumask_of(cpu));
+   irq_data_update_affinity(data, cpumask_of(cpu));
chip->irq_set_affinity(data, cpumask_of(cpu), false);
return 0;
 }
diff --git a/arch/ia64/kernel/iosapic.c b/arch/ia64/kernel/iosapic.c
index 35adcf89035a..99300850abc1 100644
--- a/arch/ia64/kernel/iosapic.c
+++ b/arch/ia64/kernel/iosapic.c
@@ -834,7 +834,7 @@ iosapic_unregister_intr (unsigned int gsi)
if (iosapic_intr_info[irq].count == 0) {
 #ifdef CONFIG_SMP
/* Clear affinity */
-   cpumask_setall(irq_get_affinity_mask(irq));
+   irq_data_update_affinity(irq_get_irq_data(irq), cpu_all_mask);
 #endif
/* Clear the interrupt information */
iosapic_intr_info[irq].dest = 0;
diff --git a/arch/ia64/kernel/irq.c b/arch/ia64/kernel/irq.c
index ecef17c7c35b..275b9ea58c64 100644
--- a/arch/ia64/kernel/irq.c
+++ b/arch/ia64/kernel/irq.c
@@ -57,8 +57,8 @@ static char irq_redir [NR_IRQS]; // = { [0 ... NR_IRQS-1] = 1 
};
 void set_irq_affinity_info (unsigned int irq, int hwid, int redir)
 {
if (irq < NR_IRQS) {
-   cpumask_copy(irq_get_affinity_mask(irq),
-cpumask_of(cpu_logical_id(hwid)));
+   irq_data_update_affinity(irq_get_irq_data(irq),
+cpumask_of(cpu_logical_id(hwid)));
irq_redir[irq] = (char) (redir & 0xff);
}
 }
diff --git a/arch/ia64/kernel/msi_ia64.c b/arch/ia64/kernel/msi_ia64.c
index df5c28f252e3..025e5133c860 100644
--- a/arch/ia64/kernel/msi_ia64.c
+++ b/arch/ia64/kernel/msi_ia64.c
@@ -37,7 +37,7 @@ static int ia64_set_msi_irq_affinity(struct irq_data *idata,
msg.data = data;
 
pci_write_msi_msg(irq, );
-   cpumask_copy(irq_data_get_affinity_mask(idata), cpumask_of(cpu));
+   irq_data_update_affinity(idata, cpumask_of(cpu));
 
return 0;
 }
@@ -132,7 +132,7 @@ static int dmar_msi_set_affinity(struct irq_data *data,
msg.address_lo |= MSI_ADDR_DEST_ID_CPU(cpu_physical_id(cpu));
 
dmar_msi_write(irq, );
-   cpumask_copy(irq_data_get_affinity_mask(data), mask);
+   irq_data_update_affinity(data, mask);
 
return 0;
 }
diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index 0fe2d79fb123..5ebb1771b4ab 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -315,7 +315,7 @@ unsigned long txn_affinity_addr(unsigned int irq, int cpu)
 {
 #ifdef CONFIG_SMP
struct irq_data *d = irq_get_irq_data(irq);
-   cpumask_copy(irq_data_get_affinity_mask(d), cpumask_of(cpu));
+   irq_data_update_affinity(d, cpumask_of(cpu));
 #endif
 
return per_cpu(cpu_data, cpu).txn_addr;
diff --git a/drivers/irqchip/irq-bcm6345-l1.c b/drivers/irqchip/irq-bcm6345-l1.c
index 142a7431745f..6899e37810a8 100644
--- a/drivers/irqchip/irq-bcm6345-l1.c
+++ b/drivers/irqchip/irq-bcm6345-l1.c
@@ -216,11 +216,11 @@ static int bcm6345_l1_set_affinity(struct irq_data *d,
enabled = intc->cpus[old_cpu]->enable_cache[word] & mask;
if (enabled)
__bcm6345_l1_mask(d);
-   cpumask_copy(irq_data_get_affinity_mask(d), dest);
+   irq_data_update_affinity(d, dest);
if (enabled)
__bcm6345_l1_unmask(d);
} else {
-   cpumask_copy(irq_data_get_affinity_mask(d), dest);
+   irq_data_update_affinity(d, dest);
}
raw_spin_unlock_irqrestore(>lock, flags);
 
diff --git a/drivers/parisc/iosapic.c b/drivers/parisc/iosapic.c
index 8a3b0c3a1e92..3a8c98615634 

[PATCH v3 7/8] genirq: Return a const cpumask from irq_data_get_affinity_mask

2022-07-01 Thread Samuel Holland
Now that the irq_data_update_affinity helper exists, enforce its use
by returning a a const cpumask from irq_data_get_affinity_mask.

Since the previous commit already updated places that needed to call
irq_data_update_affinity, this commit updates the remaining code that
either did not modify the cpumask or immediately passed the modified
mask to irq_set_affinity.

Signed-off-by: Samuel Holland 
---

Changes in v3:
 - New patch to make the returned cpumasks const

 arch/mips/cavium-octeon/octeon-irq.c |  4 ++--
 arch/sh/kernel/irq.c |  7 ---
 arch/x86/hyperv/irqdomain.c  |  2 +-
 arch/xtensa/kernel/irq.c |  7 ---
 drivers/iommu/hyperv-iommu.c |  2 +-
 drivers/pci/controller/pci-hyperv.c  | 10 +-
 include/linux/irq.h  | 12 +++-
 kernel/irq/chip.c|  8 +---
 kernel/irq/debugfs.c |  2 +-
 kernel/irq/ipi.c | 16 +---
 10 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/arch/mips/cavium-octeon/octeon-irq.c 
b/arch/mips/cavium-octeon/octeon-irq.c
index 6cdcbf4de763..9cb9ed44bcaf 100644
--- a/arch/mips/cavium-octeon/octeon-irq.c
+++ b/arch/mips/cavium-octeon/octeon-irq.c
@@ -263,7 +263,7 @@ static int next_cpu_for_irq(struct irq_data *data)
 
 #ifdef CONFIG_SMP
int cpu;
-   struct cpumask *mask = irq_data_get_affinity_mask(data);
+   const struct cpumask *mask = irq_data_get_affinity_mask(data);
int weight = cpumask_weight(mask);
struct octeon_ciu_chip_data *cd = irq_data_get_irq_chip_data(data);
 
@@ -758,7 +758,7 @@ static void octeon_irq_cpu_offline_ciu(struct irq_data 
*data)
 {
int cpu = smp_processor_id();
cpumask_t new_affinity;
-   struct cpumask *mask = irq_data_get_affinity_mask(data);
+   const struct cpumask *mask = irq_data_get_affinity_mask(data);
 
if (!cpumask_test_cpu(cpu, mask))
return;
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index ef0f0827cf57..56269c2c3414 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -230,16 +230,17 @@ void migrate_irqs(void)
struct irq_data *data = irq_get_irq_data(irq);
 
if (irq_data_get_node(data) == cpu) {
-   struct cpumask *mask = irq_data_get_affinity_mask(data);
+   const struct cpumask *mask = 
irq_data_get_affinity_mask(data);
unsigned int newcpu = cpumask_any_and(mask,
  cpu_online_mask);
if (newcpu >= nr_cpu_ids) {
pr_info_ratelimited("IRQ%u no longer affine to 
CPU%u\n",
irq, cpu);
 
-   cpumask_setall(mask);
+   irq_set_affinity(irq, cpu_all_mask);
+   } else {
+   irq_set_affinity(irq, mask);
}
-   irq_set_affinity(irq, mask);
}
}
 }
diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 7e0f6bedc248..42c70d28ef27 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -192,7 +192,7 @@ static void hv_irq_compose_msi_msg(struct irq_data *data, 
struct msi_msg *msg)
struct pci_dev *dev;
struct hv_interrupt_entry out_entry, *stored_entry;
struct irq_cfg *cfg = irqd_cfg(data);
-   cpumask_t *affinity;
+   const cpumask_t *affinity;
int cpu;
u64 status;
 
diff --git a/arch/xtensa/kernel/irq.c b/arch/xtensa/kernel/irq.c
index 529fe9245821..42f106004400 100644
--- a/arch/xtensa/kernel/irq.c
+++ b/arch/xtensa/kernel/irq.c
@@ -169,7 +169,7 @@ void migrate_irqs(void)
 
for_each_active_irq(i) {
struct irq_data *data = irq_get_irq_data(i);
-   struct cpumask *mask;
+   const struct cpumask *mask;
unsigned int newcpu;
 
if (irqd_is_per_cpu(data))
@@ -185,9 +185,10 @@ void migrate_irqs(void)
pr_info_ratelimited("IRQ%u no longer affine to CPU%u\n",
i, cpu);
 
-   cpumask_setall(mask);
+   irq_set_affinity(i, cpu_all_mask);
+   } else {
+   irq_set_affinity(i, mask);
}
-   irq_set_affinity(i, mask);
}
 }
 #endif /* CONFIG_HOTPLUG_CPU */
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..51bd66a45a11 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -194,7 +194,7 @@ hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, 
struct msi_msg *msg)
u32 vector;
struct irq_cfg *cfg;
int ioapic_id;
-   struct cpumask *affinity;
+   

[PATCH v3 5/8] genirq: Refactor accessors to use irq_data_get_affinity_mask

2022-07-01 Thread Samuel Holland
A couple of functions directly reference the affinity mask. Route them
through irq_data_get_affinity_mask so they will pick up any refactoring
done there.

Signed-off-by: Samuel Holland 
---

(no changes since v1)

 include/linux/irq.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 505308253d23..69ee4e2f36ce 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -879,16 +879,16 @@ static inline int irq_data_get_node(struct irq_data *d)
return irq_common_data_get_node(d->common);
 }
 
-static inline struct cpumask *irq_get_affinity_mask(int irq)
+static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d)
 {
-   struct irq_data *d = irq_get_irq_data(irq);
-
-   return d ? d->common->affinity : NULL;
+   return d->common->affinity;
 }
 
-static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d)
+static inline struct cpumask *irq_get_affinity_mask(int irq)
 {
-   return d->common->affinity;
+   struct irq_data *d = irq_get_irq_data(irq);
+
+   return d ? irq_data_get_affinity_mask(d) : NULL;
 }
 
 #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
@@ -910,7 +910,7 @@ static inline void 
irq_data_update_effective_affinity(struct irq_data *d,
 static inline
 struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d)
 {
-   return d->common->affinity;
+   return irq_data_get_affinity_mask(d);
 }
 #endif
 
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/8] irqchip/mips-gic: Only register IPI domain when SMP is enabled

2022-07-01 Thread Samuel Holland
The MIPS GIC irqchip driver may be selected in a uniprocessor
configuration, but it unconditionally registers an IPI domain.

Limit the part of the driver dealing with IPIs to only be compiled when
GENERIC_IRQ_IPI is enabled, which corresponds to an SMP configuration.

Reported-by: kernel test robot 
Signed-off-by: Samuel Holland 
---

Changes in v3:
 - New patch to fix build errors in uniprocessor MIPS configs

 drivers/irqchip/Kconfig|  3 +-
 drivers/irqchip/irq-mips-gic.c | 80 +++---
 2 files changed, 56 insertions(+), 27 deletions(-)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 1f23a6be7d88..d26a4ff7c99f 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -322,7 +322,8 @@ config KEYSTONE_IRQ
 
 config MIPS_GIC
bool
-   select GENERIC_IRQ_IPI
+   select GENERIC_IRQ_IPI if SMP
+   select IRQ_DOMAIN_HIERARCHY
select MIPS_CM
 
 config INGENIC_IRQ
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c
index ff89b36267dd..8a9efb6ae587 100644
--- a/drivers/irqchip/irq-mips-gic.c
+++ b/drivers/irqchip/irq-mips-gic.c
@@ -52,13 +52,15 @@ static DEFINE_PER_CPU_READ_MOSTLY(unsigned 
long[GIC_MAX_LONGS], pcpu_masks);
 
 static DEFINE_SPINLOCK(gic_lock);
 static struct irq_domain *gic_irq_domain;
-static struct irq_domain *gic_ipi_domain;
 static int gic_shared_intrs;
 static unsigned int gic_cpu_pin;
 static unsigned int timer_cpu_pin;
 static struct irq_chip gic_level_irq_controller, gic_edge_irq_controller;
+
+#ifdef CONFIG_GENERIC_IRQ_IPI
 static DECLARE_BITMAP(ipi_resrv, GIC_MAX_INTRS);
 static DECLARE_BITMAP(ipi_available, GIC_MAX_INTRS);
+#endif /* CONFIG_GENERIC_IRQ_IPI */
 
 static struct gic_all_vpes_chip_data {
u32 map;
@@ -472,9 +474,11 @@ static int gic_irq_domain_map(struct irq_domain *d, 
unsigned int virq,
u32 map;
 
if (hwirq >= GIC_SHARED_HWIRQ_BASE) {
+#ifdef CONFIG_GENERIC_IRQ_IPI
/* verify that shared irqs don't conflict with an IPI irq */
if (test_bit(GIC_HWIRQ_TO_SHARED(hwirq), ipi_resrv))
return -EBUSY;
+#endif /* CONFIG_GENERIC_IRQ_IPI */
 
err = irq_domain_set_hwirq_and_chip(d, virq, hwirq,
_level_irq_controller,
@@ -567,6 +571,8 @@ static const struct irq_domain_ops gic_irq_domain_ops = {
.map = gic_irq_domain_map,
 };
 
+#ifdef CONFIG_GENERIC_IRQ_IPI
+
 static int gic_ipi_domain_xlate(struct irq_domain *d, struct device_node 
*ctrlr,
const u32 *intspec, unsigned int intsize,
irq_hw_number_t *out_hwirq,
@@ -670,6 +676,48 @@ static const struct irq_domain_ops gic_ipi_domain_ops = {
.match = gic_ipi_domain_match,
 };
 
+static int gic_register_ipi_domain(struct device_node *node)
+{
+   struct irq_domain *gic_ipi_domain;
+   unsigned int v[2], num_ipis;
+
+   gic_ipi_domain = irq_domain_add_hierarchy(gic_irq_domain,
+ IRQ_DOMAIN_FLAG_IPI_PER_CPU,
+ GIC_NUM_LOCAL_INTRS + 
gic_shared_intrs,
+ node, _ipi_domain_ops, 
NULL);
+   if (!gic_ipi_domain) {
+   pr_err("Failed to add IPI domain");
+   return -ENXIO;
+   }
+
+   irq_domain_update_bus_token(gic_ipi_domain, DOMAIN_BUS_IPI);
+
+   if (node &&
+   !of_property_read_u32_array(node, "mti,reserved-ipi-vectors", v, 
2)) {
+   bitmap_set(ipi_resrv, v[0], v[1]);
+   } else {
+   /*
+* Reserve 2 interrupts per possible CPU/VP for use as IPIs,
+* meeting the requirements of arch/mips SMP.
+*/
+   num_ipis = 2 * num_possible_cpus();
+   bitmap_set(ipi_resrv, gic_shared_intrs - num_ipis, num_ipis);
+   }
+
+   bitmap_copy(ipi_available, ipi_resrv, GIC_MAX_INTRS);
+
+   return 0;
+}
+
+#else /* !CONFIG_GENERIC_IRQ_IPI */
+
+static inline int gic_register_ipi_domain(struct device_node *node)
+{
+   return 0;
+}
+
+#endif /* !CONFIG_GENERIC_IRQ_IPI */
+
 static int gic_cpu_startup(unsigned int cpu)
 {
/* Enable or disable EIC */
@@ -688,11 +736,12 @@ static int gic_cpu_startup(unsigned int cpu)
 static int __init gic_of_init(struct device_node *node,
  struct device_node *parent)
 {
-   unsigned int cpu_vec, i, gicconfig, v[2], num_ipis;
+   unsigned int cpu_vec, i, gicconfig;
unsigned long reserved;
phys_addr_t gic_base;
struct resource res;
size_t gic_len;
+   int ret;
 
/* Find the first available CPU vector. */
i = 0;
@@ -780,30 +829,9 @@ static int __init gic_of_init(struct device_node *node,
return -ENXIO;
}
 
-   gic_ipi_domain = 

[PATCH v3 2/8] genirq: GENERIC_IRQ_IPI depends on SMP

2022-07-01 Thread Samuel Holland
The generic IPI code depends on the IRQ affinity mask being allocated
and initialized. This will not be the case if SMP is disabled. Fix up
the remaining driver that selected GENERIC_IRQ_IPI in a non-SMP config.

Reported-by: kernel test robot 
Signed-off-by: Samuel Holland 
---

(no changes since v2)

Changes in v2:
 - New patch to prevent GENERIC_IRQ_IPI from being selected on !SMP

 drivers/irqchip/Kconfig | 2 +-
 kernel/irq/Kconfig  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index d26a4ff7c99f..5dd98a81efc8 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -177,7 +177,7 @@ config MADERA_IRQ
 config IRQ_MIPS_CPU
bool
select GENERIC_IRQ_CHIP
-   select GENERIC_IRQ_IPI if SYS_SUPPORTS_MULTITHREADING
+   select GENERIC_IRQ_IPI if SMP && SYS_SUPPORTS_MULTITHREADING
select IRQ_DOMAIN
select GENERIC_IRQ_EFFECTIVE_AFF_MASK
 
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 10929eda9825..fc760d064a65 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -82,6 +82,7 @@ config IRQ_FASTEOI_HIERARCHY_HANDLERS
 # Generic IRQ IPI support
 config GENERIC_IRQ_IPI
bool
+   depends on SMP
select IRQ_DOMAIN_HIERARCHY
 
 # Generic MSI interrupt support
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/8] genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP

2022-07-01 Thread Samuel Holland
An IRQ's effective affinity can only be different from its configured
affinity if there are multiple CPUs. Make it clear that this option is
only meaningful when SMP is enabled. Most of the relevant code in
irqdesc.c is already hidden behind CONFIG_SMP anyway.

Signed-off-by: Samuel Holland 
---

(no changes since v1)

 arch/arm/mach-hisi/Kconfig |  2 +-
 drivers/irqchip/Kconfig| 14 +++---
 kernel/irq/Kconfig |  1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mach-hisi/Kconfig b/arch/arm/mach-hisi/Kconfig
index 75cccbd3f05f..7b3440687176 100644
--- a/arch/arm/mach-hisi/Kconfig
+++ b/arch/arm/mach-hisi/Kconfig
@@ -40,7 +40,7 @@ config ARCH_HIP04
select HAVE_ARM_ARCH_TIMER
select MCPM if SMP
select MCPM_QUAD_CLUSTER if SMP
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
help
  Support for Hisilicon HiP04 SoC family
 
diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig
index 5dd98a81efc8..462adac905a6 100644
--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -8,7 +8,7 @@ config IRQCHIP
 config ARM_GIC
bool
select IRQ_DOMAIN_HIERARCHY
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config ARM_GIC_PM
bool
@@ -34,7 +34,7 @@ config ARM_GIC_V3
bool
select IRQ_DOMAIN_HIERARCHY
select PARTITION_PERCPU
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config ARM_GIC_V3_ITS
bool
@@ -76,7 +76,7 @@ config ARMADA_370_XP_IRQ
bool
select GENERIC_IRQ_CHIP
select PCI_MSI if PCI
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config ALPINE_MSI
bool
@@ -112,7 +112,7 @@ config BCM6345_L1_IRQ
bool
select GENERIC_IRQ_CHIP
select IRQ_DOMAIN
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config BCM7038_L1_IRQ
tristate "Broadcom STB 7038-style L1/L2 interrupt controller driver"
@@ -120,7 +120,7 @@ config BCM7038_L1_IRQ
default ARCH_BRCMSTB || BMIPS_GENERIC
select GENERIC_IRQ_CHIP
select IRQ_DOMAIN
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config BCM7120_L2_IRQ
tristate "Broadcom STB 7120-style L2 interrupt controller driver"
@@ -179,7 +179,7 @@ config IRQ_MIPS_CPU
select GENERIC_IRQ_CHIP
select GENERIC_IRQ_IPI if SMP && SYS_SUPPORTS_MULTITHREADING
select IRQ_DOMAIN
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config CLPS711X_IRQCHIP
bool
@@ -294,7 +294,7 @@ config VERSATILE_FPGA_IRQ_NR
 config XTENSA_MX
bool
select IRQ_DOMAIN
-   select GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP
 
 config XILINX_INTC
bool "Xilinx Interrupt Controller IP"
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index fc760d064a65..db3d174c53d4 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -24,6 +24,7 @@ config GENERIC_IRQ_SHOW_LEVEL
 
 # Supports effective affinity mask
 config GENERIC_IRQ_EFFECTIVE_AFF_MASK
+   depends on SMP
bool
 
 # Support for delayed migration from interrupt context
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/8] genirq: Provide real IRQ affinity masks in non-SMP configs

2022-07-01 Thread Samuel Holland
This series solves some inconsistency with how IRQ affinity masks are
handled between SMP and non-SMP configurations.

In non-SMP configs, an IRQ's true affinity is always cpumask_of(0), so
irq_{,data_}get_affinity_mask now return that, instead of returning an
uninitialized per-IRQ cpumask. This change makes iterating over the
affinity mask do the right thing in both SMP and non-SMP configurations.

To accomplish that:
 - patches 1-3 disable some library code that was broken anyway on !SMP
 - patches 4-7 refactor the code so that irq_{,data_}get_affinity_mask
   can return a const cpumask, since that is what cpumask_of provides
 - patch 8 drops the per-IRQ cpumask and replaces it with cpumask_of(0)

This series was split from the v2 series here, which uses the new
behavior in the RISC-V PLIC irqchip driver:

https://lore.kernel.org/lkml/20220616064028.57933-1-sam...@sholland.org/

Changes in v3:
 - New patch to fix build errors in uniprocessor MIPS configs
 - New patch to drop irq_init_effective_affinity
 - New patch to introduce irq_data_update_affinity
 - New patch to make the returned cpumasks const
 - Use cpumask_of(0) instead of cpu_possible_mask

Changes in v2:
 - New patch to prevent GENERIC_IRQ_IPI from being selected on !SMP

Samuel Holland (8):
  irqchip/mips-gic: Only register IPI domain when SMP is enabled
  genirq: GENERIC_IRQ_IPI depends on SMP
  genirq: GENERIC_IRQ_EFFECTIVE_AFF_MASK depends on SMP
  genirq: Drop redundant irq_init_effective_affinity
  genirq: Refactor accessors to use irq_data_get_affinity_mask
  genirq: Add and use an irq_data_update_affinity helper
  genirq: Return a const cpumask from irq_data_get_affinity_mask
  genirq: Provide an IRQ affinity mask in non-SMP configs

 arch/alpha/kernel/irq.c  |  2 +-
 arch/arm/mach-hisi/Kconfig   |  2 +-
 arch/ia64/kernel/iosapic.c   |  2 +-
 arch/ia64/kernel/irq.c   |  4 +-
 arch/ia64/kernel/msi_ia64.c  |  4 +-
 arch/mips/cavium-octeon/octeon-irq.c |  4 +-
 arch/parisc/kernel/irq.c |  2 +-
 arch/sh/kernel/irq.c |  7 +--
 arch/x86/hyperv/irqdomain.c  |  2 +-
 arch/xtensa/kernel/irq.c |  7 +--
 drivers/iommu/hyperv-iommu.c |  2 +-
 drivers/irqchip/Kconfig  | 19 +++
 drivers/irqchip/irq-bcm6345-l1.c |  4 +-
 drivers/irqchip/irq-mips-gic.c   | 80 +++-
 drivers/parisc/iosapic.c |  2 +-
 drivers/pci/controller/pci-hyperv.c  | 10 ++--
 drivers/sh/intc/chip.c   |  2 +-
 drivers/xen/events/events_base.c |  7 +--
 include/linux/irq.h  | 34 
 kernel/irq/Kconfig   |  2 +
 kernel/irq/chip.c|  8 +--
 kernel/irq/debugfs.c |  2 +-
 kernel/irq/ipi.c | 16 +++---
 kernel/irq/manage.c  | 10 +---
 24 files changed, 140 insertions(+), 94 deletions(-)

-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 8/8] genirq: Provide an IRQ affinity mask in non-SMP configs

2022-07-01 Thread Samuel Holland
IRQ affinity masks are not allocated in uniprocessor configurations.
This requires special case non-SMP code in drivers for irqchips which
have per-CPU enable or mask registers.

Since IRQ affinity is always the same in a uniprocessor configuration,
we can provide a correct affinity mask without allocating one per IRQ.

By returning a real cpumask from irq_data_get_affinity_mask even when
SMP is disabled, irqchip drivers which iterate over that mask will
automatically do the right thing.

Signed-off-by: Samuel Holland 
---

Changes in v3:
 - Use cpumask_of(0) instead of cpu_possible_mask

 include/linux/irq.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 02073f7a156e..996e22744edd 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -151,7 +151,9 @@ struct irq_common_data {
 #endif
void*handler_data;
struct msi_desc *msi_desc;
+#ifdef CONFIG_SMP
cpumask_var_t   affinity;
+#endif
 #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK
cpumask_var_t   effective_affinity;
 #endif
@@ -882,13 +884,19 @@ static inline int irq_data_get_node(struct irq_data *d)
 static inline
 const struct cpumask *irq_data_get_affinity_mask(struct irq_data *d)
 {
+#ifdef CONFIG_SMP
return d->common->affinity;
+#else
+   return cpumask_of(0);
+#endif
 }
 
 static inline void irq_data_update_affinity(struct irq_data *d,
const struct cpumask *m)
 {
+#ifdef CONFIG_SMP
cpumask_copy(d->common->affinity, m);
+#endif
 }
 
 static inline const struct cpumask *irq_get_affinity_mask(int irq)
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Saravana Kannan via iommu
On Fri, Jul 1, 2022 at 8:08 AM Sudeep Holla  wrote:
>
> Hi, Saravana,
>
> On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:
>
> [...]
>
> > Can you check if this hack helps? If so, then I can think about
> > whether we can pick it up without breaking everything else. Copy-paste
> > tab mess up warning.
>
> Sorry for jumping in late and not even sure if this is right thread.
> I have not bisected anything yet, but I am seeing issues on my Juno R2
> with SCMI enabled power domains and Coresight AMBA devices.
>
> OF: amba_device_add() failed (-19) for /etf@2001
> OF: amba_device_add() failed (-19) for /tpiu@2003
> OF: amba_device_add() failed (-19) for /funnel@2004
> OF: amba_device_add() failed (-19) for /etr@2007
> OF: amba_device_add() failed (-19) for /stm@2010
> OF: amba_device_add() failed (-19) for /replicator@2012
> OF: amba_device_add() failed (-19) for /cpu-debug@2201
> OF: amba_device_add() failed (-19) for /etm@2204
> OF: amba_device_add() failed (-19) for /cti@2202
> OF: amba_device_add() failed (-19) for /funnel@220c
> OF: amba_device_add() failed (-19) for /cpu-debug@2211
> OF: amba_device_add() failed (-19) for /etm@2214
> OF: amba_device_add() failed (-19) for /cti@2212
> OF: amba_device_add() failed (-19) for /cpu-debug@2301
> OF: amba_device_add() failed (-19) for /etm@2304
> OF: amba_device_add() failed (-19) for /cti@2302
> OF: amba_device_add() failed (-19) for /funnel@230c
> OF: amba_device_add() failed (-19) for /cpu-debug@2311
> OF: amba_device_add() failed (-19) for /etm@2314
> OF: amba_device_add() failed (-19) for /cti@2312
> OF: amba_device_add() failed (-19) for /cpu-debug@2321
> OF: amba_device_add() failed (-19) for /etm@2324
> OF: amba_device_add() failed (-19) for /cti@2322
> OF: amba_device_add() failed (-19) for /cpu-debug@2331
> OF: amba_device_add() failed (-19) for /etm@2334
> OF: amba_device_add() failed (-19) for /cti@2332
> OF: amba_device_add() failed (-19) for /cti@2002
> OF: amba_device_add() failed (-19) for /cti@2011
> OF: amba_device_add() failed (-19) for /funnel@2013
> OF: amba_device_add() failed (-19) for /etf@2014
> OF: amba_device_add() failed (-19) for /funnel@2015
> OF: amba_device_add() failed (-19) for /cti@2016
>
> These are working fine with deferred probe in the mainline.
> I tried the hack you have suggested here(rather Tony's version),

Thanks for trying that.

> also
> tried with fw_devlink=0 and fw_devlink=1

0 and 1 aren't valid input to fw_devlink. But yeah, I don't expect
disabling it to make anything better.

> && fw_devlink.strict=0
> No change in the behaviour.
>
> The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
> coresight devices are mostly in juno-cs-r1r2.dtsi

Thanks

> Let me know if there is anything obvious or you want me to bisect which
> means I need more time. I can do that next week.

I'll let you know once I poke at the DTS. We need to figure out why
fw_devlink wasn't blocking these from getting to the error (same as in
Tony's case). But since these are amba devices, I think I have some
guesses.

This is an old series that had some issues in some cases and I haven't
gotten around to looking at it. You can give that a shot if you can
apply it to a recent tree.
https://lore.kernel.org/lkml/20210304195101.3843496-1-sarava...@google.com/

After looking at that old patch again, I think I know what's going on.
For normal devices, the pm domain attach happens AFTER the device is
added and fw_devlink has had a chance to set up device links. And if
the suppliers aren't ready, really_probe() won't get as far as
dev_pm_domain_attach(). But for amba, the clock and pm domain
suppliers are "grabbed" before adding the device.

So with that old patch + always returning -EPROBE_DEFER in
amba_device_add() if amba_read_periphid() fails should fix your issue.

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Nicolin Chen via iommu
On Fri, Jul 01, 2022 at 07:17:38PM +0100, Robin Murphy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 01/07/2022 5:43 pm, Nicolin Chen wrote:
> > On Fri, Jul 01, 2022 at 11:21:48AM +0100, Robin Murphy wrote:
> > 
> > > > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > > > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > index 2ed3594f384e..072cac5ab5a4 100644
> > > > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > > > @@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct 
> > > > iommu_domain *domain, struct device *dev)
> > > >struct arm_smmu_device *smmu;
> > > >int ret;
> > > > 
> > > > - if (!fwspec || fwspec->ops != _smmu_ops) {
> > > > - dev_err(dev, "cannot attach to SMMU, is it on the same 
> > > > bus?\n");
> > > > - return -ENXIO;
> > > > - }
> > > > + if (!fwspec || fwspec->ops != _smmu_ops)
> > > > + return -EMEDIUMTYPE;
> > > 
> > > This is the wrong check, you want the "if (smmu_domain->smmu != smmu)"
> > > condition further down. If this one fails it's effectively because the
> > > device doesn't have an IOMMU at all, and similar to patch #3 it will be
> > 
> > Thanks for the review! I will fix that. The "on the same bus" is
> > quite eye-catching.
> > 
> > > removed once the core code takes over properly (I even have both those
> > > patches written now!)
> > 
> > Actually in my v1 the proposal for ops check returned -EMEDIUMTYPE
> > also upon an ops mismatch, treating that too as an incompatibility.
> > Do you mean that we should have fine-grained it further?
> 
> On second look, I think this particular check was already entirely
> redundant by the time I made the fwspec conversion to it, oh well. Since
> it remains harmless for the time being, let's just ignore it entirely
> until we can confidently say goodbye to the whole lot[1].

That looks cleaner!

> I don't think there's any need to differentiate an instance mismatch
> from a driver mismatch, once the latter becomes realistically possible,
> mostly due to iommu_domain_alloc() also having to become device-aware to
> know which driver to allocate from. Thus as far as a user is concerned,
> if attaching a device to an existing domain fails with -EMEDIUMTYPE,
> allocating a new domain using the given device, and attaching to that,
> can be expected to succeed, regardless of why the original attempt was
> rejected. In fact even in the theoretical different-driver-per-bus model
> the same principle still holds up.

I see. Thanks for the explanation. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Robin Murphy

On 01/07/2022 5:43 pm, Nicolin Chen wrote:

On Fri, Jul 01, 2022 at 11:21:48AM +0100, Robin Murphy wrote:


diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..072cac5ab5a4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
   struct arm_smmu_device *smmu;
   int ret;

- if (!fwspec || fwspec->ops != _smmu_ops) {
- dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
- return -ENXIO;
- }
+ if (!fwspec || fwspec->ops != _smmu_ops)
+ return -EMEDIUMTYPE;


This is the wrong check, you want the "if (smmu_domain->smmu != smmu)"
condition further down. If this one fails it's effectively because the
device doesn't have an IOMMU at all, and similar to patch #3 it will be


Thanks for the review! I will fix that. The "on the same bus" is
quite eye-catching.


removed once the core code takes over properly (I even have both those
patches written now!)


Actually in my v1 the proposal for ops check returned -EMEDIUMTYPE
also upon an ops mismatch, treating that too as an incompatibility.
Do you mean that we should have fine-grained it further?


On second look, I think this particular check was already entirely 
redundant by the time I made the fwspec conversion to it, oh well. Since 
it remains harmless for the time being, let's just ignore it entirely 
until we can confidently say goodbye to the whole lot[1].


I don't think there's any need to differentiate an instance mismatch 
from a driver mismatch, once the latter becomes realistically possible, 
mostly due to iommu_domain_alloc() also having to become device-aware to 
know which driver to allocate from. Thus as far as a user is concerned, 
if attaching a device to an existing domain fails with -EMEDIUMTYPE, 
allocating a new domain using the given device, and attaching to that, 
can be expected to succeed, regardless of why the original attempt was 
rejected. In fact even in the theoretical different-driver-per-bus model 
the same principle still holds up.


Thanks,
Robin.

[1] 
https://gitlab.arm.com/linux-arm/linux-rm/-/commit/aa4accfa4a10e92daad0d51095918e8a89014393

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Nicolin Chen via iommu
On Fri, Jul 01, 2022 at 11:21:48AM +0100, Robin Murphy wrote:

> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index 2ed3594f384e..072cac5ab5a4 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct iommu_domain 
> > *domain, struct device *dev)
> >   struct arm_smmu_device *smmu;
> >   int ret;
> > 
> > - if (!fwspec || fwspec->ops != _smmu_ops) {
> > - dev_err(dev, "cannot attach to SMMU, is it on the same 
> > bus?\n");
> > - return -ENXIO;
> > - }
> > + if (!fwspec || fwspec->ops != _smmu_ops)
> > + return -EMEDIUMTYPE;
> 
> This is the wrong check, you want the "if (smmu_domain->smmu != smmu)"
> condition further down. If this one fails it's effectively because the
> device doesn't have an IOMMU at all, and similar to patch #3 it will be

Thanks for the review! I will fix that. The "on the same bus" is
quite eye-catching.

> removed once the core code takes over properly (I even have both those
> patches written now!)

Actually in my v1 the proposal for ops check returned -EMEDIUMTYPE
also upon an ops mismatch, treating that too as an incompatibility.
Do you mean that we should have fine-grained it further?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node

2022-07-01 Thread Shameerali Kolothum Thodi via iommu



> -Original Message-
> From: Shameerali Kolothum Thodi
> Sent: 28 June 2022 09:00
> To: 'Robin Murphy' ; j...@8bytes.org;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: j...@solid-run.com; Linuxarm ;
> h...@infradead.org; Guohanjun (Hanjun Guo) ;
> sami.muja...@arm.com; w...@kernel.org; wanghuiqiang
> ; lpieral...@kernel.org; Steven Price
> ; lorenzo.pieral...@gmail.com
> Subject: RE: [PATCH v13 0/9] ACPI/IORT: Support for IORT RMR node
> > > Hi Will/Robin,
> > >
> > > Appreciate, if you could please take a look at the remaining SMMU
> > > related
> > > patches(7-9) and provide your approval?
> >
> > I said v12 looked fine, but for the avoidance of doubt, here it is
> > again, as formally as can be:
> >
> > Acked-by: Robin Murphy 
> 
> Thanks Robin.
> 
> Hi Joerg,
> 
> Now that we have all the required acks, could you please pick this series via
> IOMMU tree?

Hi Will,

Since Joerg hasn't replied yet, just wondering could you please take it through 
ARM
SMMU tree if that makes sense? Don't want to miss the 5.20 merge window for this
series.

Thanks,
Shameer


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Fix undefined behavior in GBPA_UPDATE

2022-07-01 Thread xenia

On 7/1/22 17:34, Will Deacon wrote:

On Thu, Jun 30, 2022 at 09:39:59AM +0300, Xenia Ragiadakou wrote:

The expression 1 << 31 results in undefined behaviour because the type of
integer constant 1 is (signed) int and the result of shifting 1 by 31 bits
is not representable in the (signed) int type.

Change the type of 1 to unsigned int by adding the U suffix.

Signed-off-by: Xenia Ragiadakou 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..44fbd499edea 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -96,7 +96,7 @@
  #define CR2_E2H   (1 << 0)
  
  #define ARM_SMMU_GBPA			0x44

-#define GBPA_UPDATE(1 << 31)
+#define GBPA_UPDATE(1U << 31)

There are loads of these kicking around in the kernel sources and we compile
with -fno-strict-overflow.

If you really want to change these, then let's use the BIT() macro instead,
but I think it's really just churn.

Will

Hi Will,

I thought that since in commit 587e6c10a7ce89a5924fdbeff2ec524fbd6a124b 
there was a similar fix to Q_OVERFLOW_FLAG (see below)


--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -183,7 +183,7 @@

 #define Q_IDX(llq, p)  ((p) & ((1 << 
(llq)->max_n_shift) - 1))

 #define Q_WRP(llq, p)  ((p) & (1 << (llq)->max_n_shift))
-#define Q_OVERFLOW_FLAG    (1 << 31)
+#define Q_OVERFLOW_FLAG    (1U << 31)
 #define Q_OVF(p)   ((p) & Q_OVERFLOW_FLAG)
 #define Q_ENT(q, p)    ((q)->base +    \
 Q_IDX(&((q)->llq), p) *    \

then it would make sense to fix GBPA_UPDATE in the same way.

Xenia

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Sudeep Holla
Hi, Saravana,

On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:

[...]

> Can you check if this hack helps? If so, then I can think about
> whether we can pick it up without breaking everything else. Copy-paste
> tab mess up warning.

Sorry for jumping in late and not even sure if this is right thread.
I have not bisected anything yet, but I am seeing issues on my Juno R2
with SCMI enabled power domains and Coresight AMBA devices.

OF: amba_device_add() failed (-19) for /etf@2001
OF: amba_device_add() failed (-19) for /tpiu@2003
OF: amba_device_add() failed (-19) for /funnel@2004
OF: amba_device_add() failed (-19) for /etr@2007
OF: amba_device_add() failed (-19) for /stm@2010
OF: amba_device_add() failed (-19) for /replicator@2012
OF: amba_device_add() failed (-19) for /cpu-debug@2201
OF: amba_device_add() failed (-19) for /etm@2204
OF: amba_device_add() failed (-19) for /cti@2202
OF: amba_device_add() failed (-19) for /funnel@220c
OF: amba_device_add() failed (-19) for /cpu-debug@2211
OF: amba_device_add() failed (-19) for /etm@2214
OF: amba_device_add() failed (-19) for /cti@2212
OF: amba_device_add() failed (-19) for /cpu-debug@2301
OF: amba_device_add() failed (-19) for /etm@2304
OF: amba_device_add() failed (-19) for /cti@2302
OF: amba_device_add() failed (-19) for /funnel@230c
OF: amba_device_add() failed (-19) for /cpu-debug@2311
OF: amba_device_add() failed (-19) for /etm@2314
OF: amba_device_add() failed (-19) for /cti@2312
OF: amba_device_add() failed (-19) for /cpu-debug@2321
OF: amba_device_add() failed (-19) for /etm@2324
OF: amba_device_add() failed (-19) for /cti@2322
OF: amba_device_add() failed (-19) for /cpu-debug@2331
OF: amba_device_add() failed (-19) for /etm@2334
OF: amba_device_add() failed (-19) for /cti@2332
OF: amba_device_add() failed (-19) for /cti@2002
OF: amba_device_add() failed (-19) for /cti@2011
OF: amba_device_add() failed (-19) for /funnel@2013
OF: amba_device_add() failed (-19) for /etf@2014
OF: amba_device_add() failed (-19) for /funnel@2015
OF: amba_device_add() failed (-19) for /cti@2016

These are working fine with deferred probe in the mainline.
I tried the hack you have suggested here(rather Tony's version), also
tried with fw_devlink=0 and fw_devlink=1 && fw_devlink.strict=0
No change in the behaviour.

The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
coresight devices are mostly in juno-cs-r1r2.dtsi

Let me know if there is anything obvious or you want me to bisect which
means I need more time. I can do that next week.

-- 
Regards,
Sudeep
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu-v3: Fix undefined behavior in GBPA_UPDATE

2022-07-01 Thread Will Deacon
On Thu, Jun 30, 2022 at 09:39:59AM +0300, Xenia Ragiadakou wrote:
> The expression 1 << 31 results in undefined behaviour because the type of
> integer constant 1 is (signed) int and the result of shifting 1 by 31 bits
> is not representable in the (signed) int type.
> 
> Change the type of 1 to unsigned int by adding the U suffix.
> 
> Signed-off-by: Xenia Ragiadakou 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index cd48590ada30..44fbd499edea 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -96,7 +96,7 @@
>  #define CR2_E2H  (1 << 0)
>  
>  #define ARM_SMMU_GBPA0x44
> -#define GBPA_UPDATE  (1 << 31)
> +#define GBPA_UPDATE  (1U << 31)

There are loads of these kicking around in the kernel sources and we compile
with -fno-strict-overflow.

If you really want to change these, then let's use the BIT() macro instead,
but I think it's really just churn.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Tony Lindgren
* Saravana Kannan  [220701 08:21]:
> On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan  wrote:
> >
> > On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren  wrote:
> > >
> > > * Tony Lindgren  [220701 08:33]:
> > > > * Saravana Kannan  [220630 23:25]:
> > > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring  wrote:
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan  [220623 08:17]:
> > > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > * Saravana Kannan  [220622 19:05]:
> > > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > This issue is no directly related fw_devlink. It is a 
> > > > > > > > > > > > side effect of
> > > > > > > > > > > > removing driver_deferred_probe_check_state(). We no 
> > > > > > > > > > > > longer return
> > > > > > > > > > > > -EPROBE_DEFER at the end of 
> > > > > > > > > > > > driver_deferred_probe_check_state().
> > > > > > > > > > >
> > > > > > > > > > > Yes, I understand the issue. But 
> > > > > > > > > > > driver_deferred_probe_check_state()
> > > > > > > > > > > was deleted because fw_devlink=on should have short 
> > > > > > > > > > > circuited the
> > > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the 
> > > > > > > > > > > bus/driver
> > > > > > > > > > > probe function and hitting this -ENOENT failure. That's 
> > > > > > > > > > > why I was
> > > > > > > > > > > asking the other questions.
> > > > > > > > > >
> > > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > > >
> > > > > > > > > device_links_check_suppliers() call inside really_probe() 
> > > > > > > > > would short
> > > > > > > > > circuit and return an -EPROBE_DEFER if the device links are 
> > > > > > > > > created as
> > > > > > > > > expected.
> > > > > > > >
> > > > > > > > OK
> > > > > > > >
> > > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp 
> > > > > > > > > > device in
> > > > > > > > > > the booting case without your patches. I see the suppliers 
> > > > > > > > > > for the
> > > > > > > > > > ocp child device instances only.
> > > > > > > > >
> > > > > > > > > Hmmm... this is strange (that the device link isn't there), 
> > > > > > > > > but this
> > > > > > > > > is what I suspected.
> > > > > > > >
> > > > > > > > Yup, maybe it's because of the supplier being a device in the 
> > > > > > > > child
> > > > > > > > interconnect for the ocp.
> > > > > > >
> > > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device 
> > > > > > > link
> > > > > > > isn't being created.
> > > > > > >
> > > > > > > So the aggregated view is something like (I had to set tabs = 4 
> > > > > > > space
> > > > > > > to fit it within 80 cols):
> > > > > > >
> > > > > > > ocp: ocp { <= Consumer
> > > > > > > compatible = "simple-pm-bus";
> > > > > > > power-domains = <_per>; <=== Supplier ref
> > > > > > >
> > > > > > > l4_wkup: interconnect@44c0 {
> > > > > > > compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > > >
> > > > > > > segment@20 {  /* 0x44e0 */
> > > > > > > compatible = "simple-pm-bus";
> > > > > > >
> > > > > > > target-module@0 { /* 0x44e0, ap 8 58.0 */
> > > > > > > compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > > >
> > > > > > > prcm: prcm@0 {
> > > > > > > compatible = "ti,am3-prcm", "simple-bus";
> > > > > > >
> > > > > > > prm_per: prm@c00 { <= Actual 
> > > > > > > Supplier
> > > > > > > compatible = "ti,am3-prm-inst", 
> > > > > > > "ti,omap-prm-inst";
> > > > > > > };
> > > > > > > };
> > > > > > > };
> > > > > > > };
> > > > > > > };
> > > > > > > };
> > > > > > >
> > > > > > > The power-domain supplier is the great-great-great-grand-child of 
> > > > > > > the
> > > > > > > consumer. It's not clear to me how this is valid. What does it 
> > > > > > > even
> > > > > > > mean?
> > > > > > >
> > > > > > > Rob, is this considered a valid DT?
> > > > > >
> > > > > > Valid DT for broken h/w.
> > > > >
> > > > > I'm not sure even in that case it's valid. When the parent device is
> > > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > > descendant is functional. And if the descendant is not functional, how
> > > > > is the parent device powered up? This just feels like an incorrect
> > > > > representation of the real h/w.
> > > >
> 

Re: New IOMMU mailing list coming

2022-07-01 Thread Jörg Rödel
On Tue, Jun 14, 2022 at 10:30:21AM +0200, Jörg Rödel wrote:
> Hi all,
> 
> after several problems with the current IOMMU mailing list (no DKIM
> support, poor b4 interoperability) we have decided to move the IOMMU
> development discussions to a new list hosted at lists.linux.dev.
> 
> The new list is up and running already, to subscribe please send an
> email to iommu+subscr...@lists.linux.dev. It is not yet archived, but
> there will be a hard switch between the old and the new list on
> 
>   July 5th, 2022
> 
> After that date the old list will no longer work and the archive will
> switch to the new mailing list.
> 
> Sorry for any inconveniences this might cause.

Gentle reminder that the old IOMMU mailing list at
iommu@lists.linux-foundation.org

will no longer be archived from Tuesday, July 5th on and will disappear
soon after. If not already done, please subscribe to the new mailing
list at

iomm+subscr...@lists.linux.dev

Thanks and see you over there!


Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-07-01 Thread Robin Murphy

On 2022-07-01 12:33, John Garry wrote:

On 01/07/2022 04:56, Feng Tang wrote:

inclination.


ok, what you are saying sounds reasonable. I just remember that when we
analyzed the longterm aging issue that we concluded that the FQ size 
and its
relation to the magazine size was a factor and this change makes me a 
little

worried about new issues. Better the devil you know and all that...

Anyway, if I get some time I might do some testing to see if this 
change has

any influence.

Another thought is if we need even store the size in the 
iova_magazine? mags
in the depot are always full. As such, we only need worry about mags 
loaded

in the cpu rcache and their sizes, so maybe we could have something like
this:

struct iova_magazine {
-   unsigned long size;
    unsigned long pfns[IOVA_MAG_SIZE];
};

@@ -631,6 +630,8 @@ struct iova_cpu_rcache {
    spinlock_t lock;
    struct iova_magazine *loaded;
    struct iova_magazine *prev;
+   int loaded_size;
+   int prev_size;
};

I haven't tried to implement it though..

I have very few knowledge of iova, so you can chose what's the better
solution. I just wanted to raise the problem and will be happy to see
it solved:)


I quickly tested your patch for performance and saw no noticeable 
difference, which is no surprise.


But I'll defer to Robin if he thinks that your patch is a better 
solution - I would guess that he does. For me personally I would prefer 
that this value was not changed, as I mentioned before.


This idea is interesting, but it would mean a bit more fiddly work to 
keep things in sync when magazines are allocated, freed and swapped 
around. It seems like the kind of non-obvious thing that might make 
sense if it gave a significant improvement in cache locality or 
something like that, but for simply fixing an allocation size it feels a 
bit too wacky.


From my perspective, indeed I'd rather do the simple thing for now to 
address the memory wastage issue directly, then we can do the deeper 
performance analysis on top to see if further tweaking of magazine sizes 
and/or design is justified.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/iova: change IOVA_MAG_SIZE to 127 to save memory

2022-07-01 Thread John Garry via iommu

On 01/07/2022 04:56, Feng Tang wrote:

inclination.


ok, what you are saying sounds reasonable. I just remember that when we
analyzed the longterm aging issue that we concluded that the FQ size and its
relation to the magazine size was a factor and this change makes me a little
worried about new issues. Better the devil you know and all that...

Anyway, if I get some time I might do some testing to see if this change has
any influence.

Another thought is if we need even store the size in the iova_magazine? mags
in the depot are always full. As such, we only need worry about mags loaded
in the cpu rcache and their sizes, so maybe we could have something like
this:

struct iova_magazine {
-   unsigned long size;
unsigned long pfns[IOVA_MAG_SIZE];
};

@@ -631,6 +630,8 @@ struct iova_cpu_rcache {
spinlock_t lock;
struct iova_magazine *loaded;
struct iova_magazine *prev;
+   int loaded_size;
+   int prev_size;
};

I haven't tried to implement it though..

I have very few knowledge of iova, so you can chose what's the better
solution. I just wanted to raise the problem and will be happy to see
it solved:)


I quickly tested your patch for performance and saw no noticeable 
difference, which is no surprise.


But I'll defer to Robin if he thinks that your patch is a better 
solution - I would guess that he does. For me personally I would prefer 
that this value was not changed, as I mentioned before.


thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Robin Murphy

On 2022-06-30 21:36, Nicolin Chen wrote:

Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.

This is a soft failure that can be resolved by using a different domain.

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.

VFIO can use this to know attach is a soft failure and it should continue
searching. Otherwise the attach will be a hard failure and VFIO will
return the code to userspace.

Update all drivers to return EMEDIUMTYPE in their failure paths that are
related to domain incompatability. Also remove adjacent error prints for
these soft failures, to prevent a kernel log spam, since -EMEDIUMTYPE is
clear enough to indicate an incompatability error.

Add kdocs describing this behavior.

Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Signed-off-by: Nicolin Chen 
---

[...]

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 2ed3594f384e..072cac5ab5a4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1135,10 +1135,8 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
struct arm_smmu_device *smmu;
int ret;
  
-	if (!fwspec || fwspec->ops != _smmu_ops) {

-   dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
-   return -ENXIO;
-   }
+   if (!fwspec || fwspec->ops != _smmu_ops)
+   return -EMEDIUMTYPE;


This is the wrong check, you want the "if (smmu_domain->smmu != smmu)" 
condition further down. If this one fails it's effectively because the 
device doesn't have an IOMMU at all, and similar to patch #3 it will be 
removed once the core code takes over properly (I even have both those 
patches written now!)


Thanks,
Robin.


/*
 * FIXME: The arch/arm DMA API code tries to attach devices to its own

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] serial: Set probe_no_timeout for all DT based drivers

2022-07-01 Thread Andy Shevchenko
On Fri, Jul 1, 2022 at 3:28 AM Saravana Kannan  wrote:
>
> With commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by
> default") the probing of TTY consoles could get delayed if they have
> optional suppliers that are listed in DT, but those suppliers don't
> probe by the time kernel boot finishes. The console devices will probe
> eventually after driver_probe_timeout expires.
>
> However, since consoles are often used for debugging kernel issues, it
> does not make sense to delay their probe. So, set the newly added
> probe_no_timeout flag for all serial drivers that at DT based. This way,
> fw_devlink will know not to delay the probing of the consoles past
> kernel boot.

Same question, do you think only serial drivers need that?

-- 
With Best Regards,
Andy Shevchenko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/9] Add dynamic iommu backed bounce buffers

2022-07-01 Thread Niklas Schnelle
On Fri, 2022-05-27 at 10:25 +0900, David Stevens wrote:
> On Tue, May 24, 2022 at 9:27 PM Niklas Schnelle  
> wrote:
> > On Fri, 2021-08-06 at 19:34 +0900, David Stevens wrote:
> > > From: David Stevens 
> > > 
> > > This patch series adds support for per-domain dynamic pools of iommu
> > > bounce buffers to the dma-iommu API. This allows iommu mappings to be
> > > reused while still maintaining strict iommu protection.
> > > 
> > > This bounce buffer support is used to add a new config option that, when
> > > enabled, causes all non-direct streaming mappings below a configurable
> > > size to go through the bounce buffers. This serves as an optimization on
> > > systems where manipulating iommu mappings is very expensive. For
> > > example, virtio-iommu operations in a guest on a linux host require a
> > > vmexit, involvement the VMM, and a VFIO syscall. For relatively small
> > > DMA operations, memcpy can be significantly faster.
> > > 
> > > As a performance comparison, on a device with an i5-10210U, I ran fio
> > > with a VFIO passthrough NVMe drive and virtio-iommu with '--direct=1
> > > --rw=read --ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k,
> > > and 128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time
> > > spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%,
> > > 94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased
> > > by >99%, as bounce buffers don't require syncing here in the read case.
> > > Running with multiple jobs doesn't serve as a useful performance
> > > comparison because virtio-iommu and vfio_iommu_type1 both have big
> > > locks that significantly limit mulithreaded DMA performance.
> > > 
> > > These pooled bounce buffers are also used for subgranule mappings with
> > > untrusted devices, replacing the single use bounce buffers used
> > > currently. The biggest difference here is that the new implementation
> > > maps a whole sglist using a single bounce buffer. The new implementation
> > > does not support using bounce buffers for only some segments of the
> > > sglist, so it may require more copying. However, the current
> > > implementation requires per-segment iommu map/unmap operations for all
> > > untrusted sglist mappings (fully aligned sglists included). On a
> > > i5-10210U laptop with the internal NVMe drive made to appear untrusted,
> > > fio --direct=1 --rw=read --ioengine=libaio --iodepth=64 --bs=64k showed
> > > a statistically significant decrease in CPU load from 2.28% -> 2.17%
> > > with the new iommu bounce buffer optimization enabled.
> > > 
> > > Each domain's buffer pool is split into multiple power-of-2 size
> > > classes. Each class allocates a fixed number of buffer slot metadata. A
> > > large iova range is allocated, and each slot is assigned an iova from
> > > the range. This allows the iova to be easily mapped back to the slot,
> > > and allows the critical section of most pool operations to be constant
> > > time. The one exception is finding a cached buffer to reuse. These are
> > > only separated according to R/W permissions - the use of other
> > > permissions such as IOMMU_PRIV may require a linear search through the
> > > cache. However, these other permissions are rare and likely exhibit high
> > > locality, so the should not be a bottleneck in practice.
> > > 
> > > Since untrusted devices may require bounce buffers, each domain has a
> > > fallback rbtree to manage single use buffers. This may be necessary if a
> > > very large number of DMA operations are simultaneously in-flight, or for
> > > very large individual DMA operations.
> > > 
> > > This patch set does not use swiotlb. There are two primary ways in which
> > > swiotlb isn't compatible with per-domain buffer pools. First, swiotlb
> > > allocates buffers to be compatible with a single device, whereas
> > > per-domain buffer pools don't handle that during buffer allocation as a
> > > single buffer may end up being used by multiple devices. Second, swiotlb
> > > allocation establishes the original to bounce buffer mapping, which
> > > again doesn't work if buffers can be reused. Effectively the only code
> > > that can be shared between the two use cases is allocating slots from
> > > the swiotlb's memory. However, given that we're going to be allocating
> > > memory for use with an iommu, allocating memory from a block of memory
> > > explicitly set aside to deal with a lack of iommu seems kind of
> > > contradictory. At best there might be a small performance improvement if
> > > wiotlb allocation is faster than regular page allocation, but buffer
> > > allocation isn't on the hot path anyway.
> > > 
> > > Not using the swiotlb has the benefit that memory doesn't have to be
> > > preallocated. Instead, bounce buffers consume memory only for in-flight
> > > dma transactions (ignoring temporarily cached buffers), which is the
> > > smallest amount possible. This makes it easier to use bounce buffers as
> > > an 

Re: [PATCH v3 10/11] iommu/vt-d: Use device_domain_lock accurately

2022-07-01 Thread Baolu Lu

On 2022/7/1 16:15, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Wednesday, June 29, 2022 3:47 PM

+   spin_lock_irqsave(_domain_lock, flags);
list_for_each_entry(info, >devices, link) {
-   if (!info->dev)
-   continue;
-


suppose you can replace all spin_lock_irqsave() with spin_lock()
in patch5 instead of leaving some replacement to next patch.



Make sense. I will update the series.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 4/5] scsi: scsi_transport_sas: Cap shost max_sectors according to DMA optimal limit

2022-07-01 Thread John Garry via iommu

On 01/07/2022 00:49, Damien Le Moal wrote:
  
+	if (dma_dev) {

+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }


Hi Damien,

> Hmm... shost->max_sectors becomes the max_hw_sectors limit for the block
> dev. So using dma_max_mapping_size(dma_dev) for that limit makes sense.
> Shouldn't dma_opt_mapping_size(dma_dev) be used to limit only the default
> "soft" limit (queue max_sectors limit) instead of the hard limit ?
>

Sure, it would sensible to use dma_opt_mapping_size() to limit the 
default queue max sectors limit, while dma_max_mapping_size() limits the 
host max sectors. But I didn't see in practice how limiting the shost 
max sectors to dma_opt_mapping_size() makes a difference:


- block queue max_hw_sectors_kb file is read-only, so we cannot change 
the queue max sectors from there


- And no SAS driver actually tries to modify upwards from the default.
I do note that USB storage driver as an example of a scsi driver which 
does (modify from shost max sectors): see scsiglue.c::slave_configure()


Finally there is no common method to limit the default request queue max 
sectors for those SAS drivers - I would need to add this limit in each 
of their slave_configure callbacks, and I didn't think that its worth it.


Thanks,
John

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Saravana Kannan via iommu
On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan  wrote:
>
> On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren  wrote:
> >
> > * Tony Lindgren  [220701 08:33]:
> > > * Saravana Kannan  [220630 23:25]:
> > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring  wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren  
> > > > > > wrote:
> > > > > > >
> > > > > > > * Saravana Kannan  [220623 08:17]:
> > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > * Saravana Kannan  [220622 19:05]:
> > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren 
> > > > > > > > > >  wrote:
> > > > > > > > > > > This issue is no directly related fw_devlink. It is a 
> > > > > > > > > > > side effect of
> > > > > > > > > > > removing driver_deferred_probe_check_state(). We no 
> > > > > > > > > > > longer return
> > > > > > > > > > > -EPROBE_DEFER at the end of 
> > > > > > > > > > > driver_deferred_probe_check_state().
> > > > > > > > > >
> > > > > > > > > > Yes, I understand the issue. But 
> > > > > > > > > > driver_deferred_probe_check_state()
> > > > > > > > > > was deleted because fw_devlink=on should have short 
> > > > > > > > > > circuited the
> > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the 
> > > > > > > > > > bus/driver
> > > > > > > > > > probe function and hitting this -ENOENT failure. That's why 
> > > > > > > > > > I was
> > > > > > > > > > asking the other questions.
> > > > > > > > >
> > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > >
> > > > > > > > device_links_check_suppliers() call inside really_probe() would 
> > > > > > > > short
> > > > > > > > circuit and return an -EPROBE_DEFER if the device links are 
> > > > > > > > created as
> > > > > > > > expected.
> > > > > > >
> > > > > > > OK
> > > > > > >
> > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp 
> > > > > > > > > device in
> > > > > > > > > the booting case without your patches. I see the suppliers 
> > > > > > > > > for the
> > > > > > > > > ocp child device instances only.
> > > > > > > >
> > > > > > > > Hmmm... this is strange (that the device link isn't there), but 
> > > > > > > > this
> > > > > > > > is what I suspected.
> > > > > > >
> > > > > > > Yup, maybe it's because of the supplier being a device in the 
> > > > > > > child
> > > > > > > interconnect for the ocp.
> > > > > >
> > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device 
> > > > > > link
> > > > > > isn't being created.
> > > > > >
> > > > > > So the aggregated view is something like (I had to set tabs = 4 
> > > > > > space
> > > > > > to fit it within 80 cols):
> > > > > >
> > > > > > ocp: ocp { <= Consumer
> > > > > > compatible = "simple-pm-bus";
> > > > > > power-domains = <_per>; <=== Supplier ref
> > > > > >
> > > > > > l4_wkup: interconnect@44c0 {
> > > > > > compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > >
> > > > > > segment@20 {  /* 0x44e0 */
> > > > > > compatible = "simple-pm-bus";
> > > > > >
> > > > > > target-module@0 { /* 0x44e0, ap 8 58.0 */
> > > > > > compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > >
> > > > > > prcm: prcm@0 {
> > > > > > compatible = "ti,am3-prcm", "simple-bus";
> > > > > >
> > > > > > prm_per: prm@c00 { <= Actual 
> > > > > > Supplier
> > > > > > compatible = "ti,am3-prm-inst", 
> > > > > > "ti,omap-prm-inst";
> > > > > > };
> > > > > > };
> > > > > > };
> > > > > > };
> > > > > > };
> > > > > > };
> > > > > >
> > > > > > The power-domain supplier is the great-great-great-grand-child of 
> > > > > > the
> > > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > > mean?
> > > > > >
> > > > > > Rob, is this considered a valid DT?
> > > > >
> > > > > Valid DT for broken h/w.
> > > >
> > > > I'm not sure even in that case it's valid. When the parent device is
> > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > descendant is functional. And if the descendant is not functional, how
> > > > is the parent device powered up? This just feels like an incorrect
> > > > representation of the real h/w.
> > >
> > > It should be correct representation based on scanning the interconnects
> > > and looking at the documentation. Some interconnect parts are wired
> > > always-on and some interconnect instances may be dual-mapped.
>
> Thanks for helping to debug this. Appreciate it.
>
> 

RE: [PATCH v3 11/11] iommu/vt-d: Convert global spinlock into per domain lock

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> Using a global device_domain_lock spinlock to protect per-domain device
> tracking lists is an inefficient way, especially considering this lock
> is also needed in the hot paths. This optimizes the locking mechanism
> by converting the global lock to per domain lock.
> 
> On the other hand, as the device tracking lists are never accessed in
> any interrupt context, there is no need to disable interrupts while
> spinning. Replace irqsave variant with spinlock calls.
> 
> Signed-off-by: Lu Baolu 

except the previous comment on where to convert spin_lock_irqsave()
the rest looks good to me.

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2022-07-01 Thread Robin Murphy

On 2022-07-01 08:19, Baolu Lu wrote:

On 2022/6/29 21:03, Robin Murphy wrote:

On 2019-06-12 01:28, Lu Baolu wrote:

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[    3.695886] =
[    3.695917] WARNING: suspicious RCU usage
[    3.695950] 5.2.0-rc2+ #2467 Not tainted
[    3.695981] -
[    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!

[    3.696069]
    other info that might help us debug this:

[    3.696126]
    rcu_scheduler_active = 2, debug_locks = 1
[    3.696173] no locks held by swapper/0/1.
[    3.696204]
    stack backtrace:
[    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ 
#2467

[    3.696370] Call Trace:
[    3.696404]  dump_stack+0x85/0xcb
[    3.696441]  intel_iommu_init+0x128c/0x13ce
[    3.696478]  ? kmem_cache_free+0x16b/0x2c0
[    3.696516]  ? __fput+0x14b/0x270
[    3.696550]  ? __call_rcu+0xb7/0x300
[    3.696583]  ? get_max_files+0x10/0x10
[    3.696631]  ? set_debug_rodata+0x11/0x11
[    3.696668]  ? e820__memblock_setup+0x60/0x60
[    3.696704]  ? pci_iommu_init+0x16/0x3f
[    3.696737]  ? set_debug_rodata+0x11/0x11
[    3.696770]  pci_iommu_init+0x16/0x3f
[    3.696805]  do_one_initcall+0x5d/0x2e4
[    3.696844]  ? set_debug_rodata+0x11/0x11
[    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[    3.696924]  kernel_init_freeable+0x1f0/0x27c
[    3.696961]  ? rest_init+0x260/0x260
[    3.696997]  kernel_init+0xa/0x110
[    3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
devices")

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
  cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", 
NULL,

    intel_iommu_cpu_dead);
+    down_read(_global_lock);
  if (probe_acpi_namespace_devices())
  pr_warn("ACPI name space devices didn't probe correctly\n");
+    up_read(_global_lock);


Doing a bit of archaeology here, is this actually broken? If any ANDD 
entries exist, we'd end up doing:


   down_read(_global_lock)
   probe_acpi_namespace_devices()
   -> iommu_probe_device()
  -> iommu_create_device_direct_mappings()
 -> iommu_get_resv_regions()
    -> intel_iommu_get_resv_regions()
   -> down_read(_global_lock)

I'm wondering whether this might explain why my bus_set_iommu series 
prevented Baolu's machine from booting, since "iommu: Move bus setup 
to IOMMU device registration" creates the same condition where we end 
up in get_resv_regions (via bus_iommu_probe() this time) from the same 
task that already holds dmar_global_lock. Of course that leaves me 
wondering how it *did* manage to boot OK on my Xeon box, but maybe 
there's a config difference or dumb luck at play?


This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.


I've prepared an up-to-date series here:

https://gitlab.arm.com/linux-arm/linux-rm/-/tree/bus-set-iommu-v3

but I've been hesitant to post it without trying to make *some* progress 
on your breakage. I think last time I was just testing with 
x86_64_defconfig, so I'll double-check it with lockdep this afternoon.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 10/11] iommu/vt-d: Use device_domain_lock accurately

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> + spin_lock_irqsave(_domain_lock, flags);
>   list_for_each_entry(info, >devices, link) {
> - if (!info->dev)
> - continue;
> -

suppose you can replace all spin_lock_irqsave() with spin_lock()
in patch5 instead of leaving some replacement to next patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/11] iommu/vt-d: Optimize the use of locks

2022-07-01 Thread Baolu Lu

On 2022/7/1 15:53, Tian, Kevin wrote:

From: Lu Baolu 
Sent: Wednesday, June 29, 2022 3:47 PM

v3:
  - Split reduction of lock ranges from changing irqsave.
https://lore.kernel.org/linux-
iommu/BN9PR11MB52760A3D7C6BF1AF9C9D34658CAA9@BN9PR11MB5276.
namprd11.prod.outlook.com/
  - Fully initialize the dev_info before adding it to the list.
https://lore.kernel.org/linux-
iommu/BN9PR11MB52764D7CD86448C5E4EB46668CAA9@BN9PR11MB5276.
namprd11.prod.outlook.com/
  - Various code and comments refinement.



This doesn't say why original patch2 was removed:

"iommu/vt-d: Remove for_each_device_domain()"

It took me a while to realize that it's already covered by your another
patch fixing RID2PASID. 


My fault! I forgot to mention it in the change log. Sorry about it.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Saravana Kannan via iommu
On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren  wrote:
>
> * Tony Lindgren  [220701 08:33]:
> > * Saravana Kannan  [220630 23:25]:
> > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring  wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan  
> > > > wrote:
> > > > >
> > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren  
> > > > > wrote:
> > > > > >
> > > > > > * Saravana Kannan  [220623 08:17]:
> > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan  [220622 19:05]:
> > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren 
> > > > > > > > >  wrote:
> > > > > > > > > > This issue is no directly related fw_devlink. It is a side 
> > > > > > > > > > effect of
> > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer 
> > > > > > > > > > return
> > > > > > > > > > -EPROBE_DEFER at the end of 
> > > > > > > > > > driver_deferred_probe_check_state().
> > > > > > > > >
> > > > > > > > > Yes, I understand the issue. But 
> > > > > > > > > driver_deferred_probe_check_state()
> > > > > > > > > was deleted because fw_devlink=on should have short circuited 
> > > > > > > > > the
> > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the 
> > > > > > > > > bus/driver
> > > > > > > > > probe function and hitting this -ENOENT failure. That's why I 
> > > > > > > > > was
> > > > > > > > > asking the other questions.
> > > > > > > >
> > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > >
> > > > > > > device_links_check_suppliers() call inside really_probe() would 
> > > > > > > short
> > > > > > > circuit and return an -EPROBE_DEFER if the device links are 
> > > > > > > created as
> > > > > > > expected.
> > > > > >
> > > > > > OK
> > > > > >
> > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device 
> > > > > > > > in
> > > > > > > > the booting case without your patches. I see the suppliers for 
> > > > > > > > the
> > > > > > > > ocp child device instances only.
> > > > > > >
> > > > > > > Hmmm... this is strange (that the device link isn't there), but 
> > > > > > > this
> > > > > > > is what I suspected.
> > > > > >
> > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > interconnect for the ocp.
> > > > >
> > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > isn't being created.
> > > > >
> > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > to fit it within 80 cols):
> > > > >
> > > > > ocp: ocp { <= Consumer
> > > > > compatible = "simple-pm-bus";
> > > > > power-domains = <_per>; <=== Supplier ref
> > > > >
> > > > > l4_wkup: interconnect@44c0 {
> > > > > compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > >
> > > > > segment@20 {  /* 0x44e0 */
> > > > > compatible = "simple-pm-bus";
> > > > >
> > > > > target-module@0 { /* 0x44e0, ap 8 58.0 */
> > > > > compatible = "ti,sysc-omap4", "ti,sysc";
> > > > >
> > > > > prcm: prcm@0 {
> > > > > compatible = "ti,am3-prcm", "simple-bus";
> > > > >
> > > > > prm_per: prm@c00 { <= Actual Supplier
> > > > > compatible = "ti,am3-prm-inst", 
> > > > > "ti,omap-prm-inst";
> > > > > };
> > > > > };
> > > > > };
> > > > > };
> > > > > };
> > > > > };
> > > > >
> > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > mean?
> > > > >
> > > > > Rob, is this considered a valid DT?
> > > >
> > > > Valid DT for broken h/w.
> > >
> > > I'm not sure even in that case it's valid. When the parent device is
> > > in reset (when the SoC is coming out of reset), there's no way the
> > > descendant is functional. And if the descendant is not functional, how
> > > is the parent device powered up? This just feels like an incorrect
> > > representation of the real h/w.
> >
> > It should be correct representation based on scanning the interconnects
> > and looking at the documentation. Some interconnect parts are wired
> > always-on and some interconnect instances may be dual-mapped.

Thanks for helping to debug this. Appreciate it.

> >
> > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().

:'(

I checked out the code. These prm devices just get populated with NULL
as the parent. So they are effectively top level devices from the
perspective of driver core.

> > Maybe that also now fails in addition to the top level interconnect
> > probing no longer 

RE: [PATCH v3 09/11] iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info()
> which
> is its only caller. Make the spin lock critical range only cover the
> device list change code and remove some unnecessary checks.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 06/11] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The iommu->lock is used to protect the per-IOMMU domain ID resource.
> Moving the lock into the ID alloc/free helpers makes the code more
> compact. At the same time, the device_domain_lock is irrelevant to
> the domain ID resource, remove its assertion as well.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/5] scsi: core: Cap shost max_sectors according to DMA limits only once

2022-07-01 Thread John Garry via iommu

On 01/07/2022 00:41, Damien Le Moal wrote:
  
  	shost->dma_dev = dma_dev;
  
+	if (dma_dev->dma_mask) {

+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_max_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }

Nit: you could remove the curly brackets... But it being a multi-line
statement, having them is OK too I think.



tglx seems to think that they are ok, and I generally agree (now):

https://lore.kernel.org/linux-arm-kernel/877djwdorz@nanos.tec.linutronix.de/

AFAICT coding-style.rst is ok with them in this scenario too

Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 02/11] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The disable_dmar_iommu() is called when IOMMU initialization fails or
> the IOMMU is hot-removed from the system. In both cases, there is no
> need to clear the IOMMU translation data structures for devices.
> 
> On the initialization path, the device probing only happens after the
> IOMMU is initialized successfully, hence there're no translation data
> structures.
> 
> On the hot-remove path, there is no real use case where the IOMMU is
> hot-removed, but the devices that it manages are still alive in the
> system. The translation data structures were torn down during device
> release, hence there's no need to repeat it in IOMMU hot-remove path
> either. This removes the unnecessary code and only leaves a check.
> 
> Signed-off-by: Lu Baolu 

You probably overlooked my last comment on kexec:

https://lore.kernel.org/lkml/bl1pr11mb52711a71ad9f11b7ae42694c8c...@bl1pr11mb5271.namprd11.prod.outlook.com/

I think my question is still not answered.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 01/11] iommu/vt-d: debugfs: Remove device_domain_lock usage

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> The domain_translation_struct debugfs node is used to dump the DMAR
> page
> tables for the PCI devices. It potentially races with setting domains to
> devices. The existing code uses the global spinlock device_domain_lock to
> avoid the races.
> 
> This removes the use of device_domain_lock outside of iommu.c by replacing
> it with the group mutex lock. Using the group mutex lock is cleaner and
> more compatible to following cleanups.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 00/11] iommu/vt-d: Optimize the use of locks

2022-07-01 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Wednesday, June 29, 2022 3:47 PM
> 
> v3:
>  - Split reduction of lock ranges from changing irqsave.
>https://lore.kernel.org/linux-
> iommu/BN9PR11MB52760A3D7C6BF1AF9C9D34658CAA9@BN9PR11MB5276.
> namprd11.prod.outlook.com/
>  - Fully initialize the dev_info before adding it to the list.
>https://lore.kernel.org/linux-
> iommu/BN9PR11MB52764D7CD86448C5E4EB46668CAA9@BN9PR11MB5276.
> namprd11.prod.outlook.com/
>  - Various code and comments refinement.
> 

This doesn't say why original patch2 was removed:

"iommu/vt-d: Remove for_each_device_domain()"

It took me a while to realize that it's already covered by your another
patch fixing RID2PASID. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Geert Uytterhoeven
Hi Saravana,

On Fri, Jul 1, 2022 at 1:11 AM Saravana Kannan  wrote:
> On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren  wrote:
> > * Saravana Kannan  [220623 08:17]:
> > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren  wrote:
> > > > * Saravana Kannan  [220622 19:05]:
> > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren  
> > > > > wrote:
> > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > >
> > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > asking the other questions.
> > > >
> > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > driver_deferred_probe_check_state() then?
> > >
> > > device_links_check_suppliers() call inside really_probe() would short
> > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > expected.
> >
> > OK
> >
> > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > the booting case without your patches. I see the suppliers for the
> > > > ocp child device instances only.
> > >
> > > Hmmm... this is strange (that the device link isn't there), but this
> > > is what I suspected.
> >
> > Yup, maybe it's because of the supplier being a device in the child
> > interconnect for the ocp.
>
> Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> isn't being created.
>
> So the aggregated view is something like (I had to set tabs = 4 space
> to fit it within 80 cols):
>
> ocp: ocp { <= Consumer
> compatible = "simple-pm-bus";
> power-domains = <_per>; <=== Supplier ref
>
> l4_wkup: interconnect@44c0 {
> compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
>
> segment@20 {  /* 0x44e0 */
> compatible = "simple-pm-bus";
>
> target-module@0 { /* 0x44e0, ap 8 58.0 */
> compatible = "ti,sysc-omap4", "ti,sysc";
>
> prcm: prcm@0 {
> compatible = "ti,am3-prcm", "simple-bus";
>
> prm_per: prm@c00 { <= Actual Supplier
> compatible = "ti,am3-prm-inst", 
> "ti,omap-prm-inst";
> };
> };
> };
> };
> };
> };
>
> The power-domain supplier is the great-great-great-grand-child of the
> consumer. It's not clear to me how this is valid. What does it even
> mean?
>
> Rob, is this considered a valid DT?
>
> Geert, thoughts on whether this is a correct use of simple-pm-bus device?

Well, if the hardware is wired that way...

It's not that dissimilar from CPU cores, and interrupt and GPIO
controllers in power domains and clocked by controllable clocks:
you can cut the branch you're sitting on, and you have to be careful
when going to sleep, and make sure your wake-up sources are still
functional.

> Also, how is the power domain attach/get working in this case? As far
> as I can tell, at least for "simple-pm-bus" devices, the pm domain
> attachment is happening under:
> really_probe() -> call_driver_probe -> platform_probe() ->
> dev_pm_domain_attach()
>
> So, how is the pm domain attach succeeding in the first place without
> my changes?

That's a software thing ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Handle return of iommu_device_sysfs_add

2022-07-01 Thread Bo Liu
As iommu_device_sysfs_add() can fail, we should check the return value.

Signed-off-by: Bo Liu 
---
 drivers/iommu/amd/init.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 3c82d9c5f1c0..a3c4fdd40f04 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -2058,8 +2058,11 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
amd_iommu_erratum_746_workaround(iommu);
amd_iommu_ats_write_check_workaround(iommu);
 
-   iommu_device_sysfs_add(>iommu, >dev->dev,
+   ret = iommu_device_sysfs_add(>iommu, >dev->dev,
   amd_iommu_groups, "ivhd%d", iommu->index);
+   if (ret)
+   return ret;
+
iommu_device_register(>iommu, _iommu_ops, NULL);
 
return pci_enable_device(iommu->dev);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Geert Uytterhoeven
Hi Saravana,

On Fri, Jul 1, 2022 at 2:37 AM Saravana Kannan  wrote:
> On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
>  wrote:
> > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:

> > > * Saravana Kannan  [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > >
> > > > So, delete the call and replace it with -ENODEV.
> > >
> > > Looks like this causes omaps to not boot in Linux next. With this
> > > simple-pm-bus fails to probe initially as the power-domain is not
> > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> > >
> > > Seems like other stuff is potentially broken too, any ideas on
> > > how to fix this?
> >
> > I think I'm hit by this as well, although I do not get a lockup.
> > In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts
> > and probing of 3832.blk-ctrl fails as the power-domain is not (yet)
> > registed.
>
> Ok, took a look.
>
> The problem is that there are two drivers for the same device and they
> both initialize this device.
>
> gpc: gpc@303a {
> compatible = "fsl,imx8mq-gpc";
> }
>
> $ git grep -l "fsl,imx7d-gpc" -- drivers/
> drivers/irqchip/irq-imx-gpcv2.c
> drivers/soc/imx/gpcv2.c

You missed the "driver" in arch/arm/mach-imx/src.c ;-)

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2022-07-01 Thread Baolu Lu

On 2022/6/29 21:03, Robin Murphy wrote:

On 2019-06-12 01:28, Lu Baolu wrote:

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[    3.695886] =
[    3.695917] WARNING: suspicious RCU usage
[    3.695950] 5.2.0-rc2+ #2467 Not tainted
[    3.695981] -
[    3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!

[    3.696069]
    other info that might help us debug this:

[    3.696126]
    rcu_scheduler_active = 2, debug_locks = 1
[    3.696173] no locks held by swapper/0/1.
[    3.696204]
    stack backtrace:
[    3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
[    3.696370] Call Trace:
[    3.696404]  dump_stack+0x85/0xcb
[    3.696441]  intel_iommu_init+0x128c/0x13ce
[    3.696478]  ? kmem_cache_free+0x16b/0x2c0
[    3.696516]  ? __fput+0x14b/0x270
[    3.696550]  ? __call_rcu+0xb7/0x300
[    3.696583]  ? get_max_files+0x10/0x10
[    3.696631]  ? set_debug_rodata+0x11/0x11
[    3.696668]  ? e820__memblock_setup+0x60/0x60
[    3.696704]  ? pci_iommu_init+0x16/0x3f
[    3.696737]  ? set_debug_rodata+0x11/0x11
[    3.696770]  pci_iommu_init+0x16/0x3f
[    3.696805]  do_one_initcall+0x5d/0x2e4
[    3.696844]  ? set_debug_rodata+0x11/0x11
[    3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[    3.696924]  kernel_init_freeable+0x1f0/0x27c
[    3.696961]  ? rest_init+0x260/0x260
[    3.696997]  kernel_init+0xa/0x110
[    3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space 
devices")

Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
  cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
    intel_iommu_cpu_dead);
+    down_read(_global_lock);
  if (probe_acpi_namespace_devices())
  pr_warn("ACPI name space devices didn't probe correctly\n");
+    up_read(_global_lock);


Doing a bit of archaeology here, is this actually broken? If any ANDD 
entries exist, we'd end up doing:


   down_read(_global_lock)
   probe_acpi_namespace_devices()
   -> iommu_probe_device()
  -> iommu_create_device_direct_mappings()
     -> iommu_get_resv_regions()
    -> intel_iommu_get_resv_regions()
   -> down_read(_global_lock)

I'm wondering whether this might explain why my bus_set_iommu series 
prevented Baolu's machine from booting, since "iommu: Move bus setup to 
IOMMU device registration" creates the same condition where we end up in 
get_resv_regions (via bus_iommu_probe() this time) from the same task 
that already holds dmar_global_lock. Of course that leaves me wondering 
how it *did* manage to boot OK on my Xeon box, but maybe there's a 
config difference or dumb luck at play?


This is really problematic. Where does the latest bus_set_iommu series
locate? I'd like to take a closer look at what happened here. Perhaps
two weeks later? I'm busy with preparing Intel IOMMU patches for v5.20
these days.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/virtio: Handle return of iommu_device_register

2022-07-01 Thread Bo Liu
iommu_device_register returns an error code and, although it currently
never fails, we should check its return value anyway.

Signed-off-by: Bo Liu 
---
 drivers/iommu/virtio-iommu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 25be4b822aa0..12e7d8364560 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1144,7 +1144,9 @@ static int viommu_probe(struct virtio_device *vdev)
if (ret)
goto err_free_vqs;
 
-   iommu_device_register(>iommu, _ops, parent_dev);
+   ret = iommu_device_register(>iommu, _ops, parent_dev);
+   if (ret)
+   goto err_remove_sysfs;
 
 #ifdef CONFIG_PCI
if (pci_bus_type.iommu_ops != _ops) {
@@ -1175,8 +1177,9 @@ static int viommu_probe(struct virtio_device *vdev)
return 0;
 
 err_unregister:
-   iommu_device_sysfs_remove(>iommu);
iommu_device_unregister(>iommu);
+err_remove_sysfs:
+   iommu_device_sysfs_remove(>iommu);
 err_free_vqs:
vdev->config->del_vqs(vdev);
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Saravana Kannan via iommu
On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
 wrote:
>
> Hi Saravana,
>
> Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> >
> >  wrote:
> > > Hi,
> > >
> > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > Hi,
> > > >
> > > > * Saravana Kannan  [700101 02:00]:
> > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > "power-domains" property, the execution will never get to the point
> > > > > where driver_deferred_probe_check_state() is called before the
> > > > > supplier
> > > > > has probed successfully or before deferred probe timeout has expired.
> > > > >
> > > > > So, delete the call and replace it with -ENODEV.
> > > >
> > > > Looks like this causes omaps to not boot in Linux next. With this
> > > > simple-pm-bus fails to probe initially as the power-domain is not
> > > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > > -ENOENT.
> > > >
> > > > Seems like other stuff is potentially broken too, any ideas on
> > > > how to fix this?
> > >
> > > I think I'm hit by this as well, although I do not get a lockup.
> > > In my case I'm using
> > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > > 3832.blk-ctrl fails as the power-domain is not (yet) registed.
> >
> > Ok, took a look.
> >
> > The problem is that there are two drivers for the same device and they
> > both initialize this device.
> >
> > gpc: gpc@303a {
> > compatible = "fsl,imx8mq-gpc";
> > }
> >
> > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > drivers/irqchip/irq-imx-gpcv2.c
> > drivers/soc/imx/gpcv2.c
> >
> > IMHO, this is a bad/broken design.
> >
> > So what's happening is that fw_devlink will block the probe of
> > 3832.blk-ctrl until 303a.gpc is initialized. And it stops
> > blocking the probe of 3832.blk-ctrl as soon as the first driver
> > initializes the device. In this case, it's the irqchip driver.
> >
> > I'd recommend combining these drivers into one. Something like the
> > patch I'm attaching (sorry for the attachment, copy-paste is mangling
> > the tabs). Can you give it a shot please?
>
> I tried this patch and it delayed the driver initialization (those of UART as
> well BTW). Unfortunately the driver fails the same way:

Thanks for testing the patch!

> > [1.125253] imx8m-blk-ctrl 3832.blk-ctrl: error -ENODEV: failed to
> attach power domain "bus"
>
> More than that it even introduced some more errors:
> > [0.008160] irq: no irq domain found for gpc@303a !

So the idea behind my change was that as long as the irqchip isn't the
root of the irqdomain (might be using the terms incorrectly) like the
gic, you can make it a platform driver. And I was trying to hack up a
patch that's the equivalent of platform_irqchip_probe() (which just
ends up eventually calling the callback you use in IRQCHIP_DECLARE().
I probably made some mistake in the quick hack that I'm sure if
fixable.

> > [0.013251] Failed to map interrupt for
> > /soc@0/bus@3040/timer@306a

However, this timer driver also uses TIMER_OF_DECLARE() which can't
handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
this means, the timer driver inturn needs to be converted to a
platform driver if it's supposed to work with the IRQCHIP_DECLARE()
being converted to a platform driver.

But that's a can of worms not worth opening. But then I remembered
this simpler workaround will work and it is pretty much a variant of
the workaround that's already in the gpc's irqchip driver to allow two
drivers to probe the same device (people really should stop doing
that).

Can you drop my previous hack patch and try this instead please? I'm
99% sure this will work.

diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index b9c22f764b4d..8a0e82067924 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
device_node *node,
 * later the GPC power domain driver will not be skipped.
 */
of_node_clear_flag(node, OF_POPULATED);
+   fwnode_dev_initialized(domain->fwnode, false);
return 0;
 }

-Saravana
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain and device/group

2022-07-01 Thread Baolu Lu

On 2022/7/1 04:36, Nicolin Chen wrote:

Cases like VFIO wish to attach a device to an existing domain that was
not allocated specifically from the device. This raises a condition
where the IOMMU driver can fail the domain attach because the domain and
device are incompatible with each other.

This is a soft failure that can be resolved by using a different domain.

Provide a dedicated errno from the IOMMU driver during attach that the
reason attached failed is because of domain incompatability. EMEDIUMTYPE
is chosen because it is never used within the iommu subsystem today and
evokes a sense that the 'medium' aka the domain is incompatible.

VFIO can use this to know attach is a soft failure and it should continue
searching. Otherwise the attach will be a hard failure and VFIO will
return the code to userspace.

Update all drivers to return EMEDIUMTYPE in their failure paths that are
related to domain incompatability. Also remove adjacent error prints for
these soft failures, to prevent a kernel log spam, since -EMEDIUMTYPE is
clear enough to indicate an incompatability error.

Add kdocs describing this behavior.

Suggested-by: Jason Gunthorpe
Reviewed-by: Kevin Tian
Signed-off-by: Nicolin Chen


Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] vfio: Require that devices support DMA cache coherence

2022-07-01 Thread Alexey Kardashevskiy



On 7/1/22 16:07, Tian, Kevin wrote:

From: Alexey Kardashevskiy 
Sent: Friday, July 1, 2022 12:58 PM

On 4/8/22 01:23, Jason Gunthorpe via iommu wrote:

IOMMU_CACHE means that normal DMAs do not require any additional

coherency

mechanism and is the basic uAPI that VFIO exposes to userspace. For
instance VFIO applications like DPDK will not work if additional coherency
operations are required.

Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do

before

allowing an IOMMU backed VFIO device to be created.



This just broke VFIO on POWER which does not use iommu_ops.


In this case below check is more reasonable to be put in type1
attach_group(). Do a iommu_group_for_each_dev() to verify
CACHE_COHERENCY similar to what Robin did for INTR_REMAP.



ah makes sense. I've posted an RFC with another problem - "[RFC PATCH 
kernel] vfio: Skip checking for IOMMU_CAP_CACHE_COHERENCY on POWER and 
more", would be great if both addressed, or I try moving those next week 
:) Thanks,





(sorry no access to my build machine now but I suppose Jason
can soon work out a fix once he sees this. )






Signed-off-by: Jason Gunthorpe 
---
   drivers/vfio/vfio.c | 7 +++
   1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index a4555014bd1e72..9edad767cfdad3 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device

*device,


   int vfio_register_group_dev(struct vfio_device *device)
   {
+   /*
+* VFIO always sets IOMMU_CACHE because we offer no way for

userspace to

+* restore cache coherency.
+*/
+   if (!iommu_capable(device->dev->bus,

IOMMU_CAP_CACHE_COHERENCY))

+   return -EINVAL;
+
return __vfio_register_dev(device,
vfio_group_find_or_alloc(device->dev));
   }


--
Alexey


--
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Tony Lindgren
* Tony Lindgren  [220701 08:33]:
> * Saravana Kannan  [220630 23:25]:
> > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring  wrote:
> > >
> > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan  
> > > wrote:
> > > >
> > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren  wrote:
> > > > >
> > > > > * Saravana Kannan  [220623 08:17]:
> > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren  
> > > > > > wrote:
> > > > > > >
> > > > > > > * Saravana Kannan  [220622 19:05]:
> > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren 
> > > > > > > >  wrote:
> > > > > > > > > This issue is no directly related fw_devlink. It is a side 
> > > > > > > > > effect of
> > > > > > > > > removing driver_deferred_probe_check_state(). We no longer 
> > > > > > > > > return
> > > > > > > > > -EPROBE_DEFER at the end of 
> > > > > > > > > driver_deferred_probe_check_state().
> > > > > > > >
> > > > > > > > Yes, I understand the issue. But 
> > > > > > > > driver_deferred_probe_check_state()
> > > > > > > > was deleted because fw_devlink=on should have short circuited 
> > > > > > > > the
> > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the 
> > > > > > > > bus/driver
> > > > > > > > probe function and hitting this -ENOENT failure. That's why I 
> > > > > > > > was
> > > > > > > > asking the other questions.
> > > > > > >
> > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > driver_deferred_probe_check_state() then?
> > > > > >
> > > > > > device_links_check_suppliers() call inside really_probe() would 
> > > > > > short
> > > > > > circuit and return an -EPROBE_DEFER if the device links are created 
> > > > > > as
> > > > > > expected.
> > > > >
> > > > > OK
> > > > >
> > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > ocp child device instances only.
> > > > > >
> > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > is what I suspected.
> > > > >
> > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > interconnect for the ocp.
> > > >
> > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > isn't being created.
> > > >
> > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > to fit it within 80 cols):
> > > >
> > > > ocp: ocp { <= Consumer
> > > > compatible = "simple-pm-bus";
> > > > power-domains = <_per>; <=== Supplier ref
> > > >
> > > > l4_wkup: interconnect@44c0 {
> > > > compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > >
> > > > segment@20 {  /* 0x44e0 */
> > > > compatible = "simple-pm-bus";
> > > >
> > > > target-module@0 { /* 0x44e0, ap 8 58.0 */
> > > > compatible = "ti,sysc-omap4", "ti,sysc";
> > > >
> > > > prcm: prcm@0 {
> > > > compatible = "ti,am3-prcm", "simple-bus";
> > > >
> > > > prm_per: prm@c00 { <= Actual Supplier
> > > > compatible = "ti,am3-prm-inst", 
> > > > "ti,omap-prm-inst";
> > > > };
> > > > };
> > > > };
> > > > };
> > > > };
> > > > };
> > > >
> > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > consumer. It's not clear to me how this is valid. What does it even
> > > > mean?
> > > >
> > > > Rob, is this considered a valid DT?
> > >
> > > Valid DT for broken h/w.
> > 
> > I'm not sure even in that case it's valid. When the parent device is
> > in reset (when the SoC is coming out of reset), there's no way the
> > descendant is functional. And if the descendant is not functional, how
> > is the parent device powered up? This just feels like an incorrect
> > representation of the real h/w.
> 
> It should be correct representation based on scanning the interconnects
> and looking at the documentation. Some interconnect parts are wired
> always-on and some interconnect instances may be dual-mapped.
> 
> We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
> Maybe that also now fails in addition to the top level interconnect
> probing no longer producing -EPROBE_DEFER.
> 
> > > So the domain must be default on and then simple-pm-bus is going to
> > > hold a reference to the domain preventing it from ever getting powered
> > > off and things seem to work. Except what happens during suspend?
> > 
> > But how can simple-pm-bus even get a reference? The PM domain can't
> > get added until we are well into the probe of the simple-pm-bus and
> > AFAICT the genpd attach is done before the driver probe is even
> > called.
> 
> The prm/prcm gets of_platform_populate() called on it early.

The 

RE: [PATCH v1 4/6] iommu/vt-d: Add VTD_FLAG_IOMMU_PROBED flag

2022-07-01 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Friday, July 1, 2022 11:13 AM
> 
> On 6/30/22 4:29 PM, Tian, Kevin wrote:
> >> From: Lu Baolu 
> >> Sent: Saturday, June 25, 2022 8:52 PM
> >>
> >> In the IOMMU hot-add path, there's a need to check whether an IOMMU
> >> has been probed. Instead of checking the IOMMU pointer in the global
> >> list, it's better to allocate a flag bit in iommu->flags for this
> >> purpose.
> >
> > Sorry I didn't get the point of original check. This is the hotplug path
> > hence the caller of this function should already figure out it's a new
> > iommu before reaching this point?
> >
> 
> Either did I. It was added by below commit without any comments about
> this check.
> 
> commit ffebeb46dd34736c90ffbca1ccb0bef8f4827c44
> Author: Jiang Liu 
> Date:   Sun Nov 9 22:48:02 2014 +0800
> 
>  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
> 
>  Implement required callback functions for intel-iommu driver
>  to support DMAR unit hotplug.
> 
>  Signed-off-by: Jiang Liu 
>  Reviewed-by: Yijing Wang 
>  Signed-off-by: Joerg Roedel 
> 
> I went through the whole hot-add process and found this check seemed to
> be duplicate.
> 
> Hot-add process starts from dmar_device_hotplug(), it uses a rwlock to
> synchronize the hot-add paths.
> 
> 2386 down_write(_global_lock);
> 2387 if (insert)
> 2388 ret = dmar_hotplug_insert(tmp);
> 2389 else
> 2390 ret = dmar_hotplug_remove(tmp);
> 2391 up_write(_global_lock);
> 
> dmar_device_hotplug()
> ->dmar_hotplug_insert()
> -->dmar_parse_one_drhd()   /* the added intel_iommu is allocated here*/
> -->dmar_hp_add_drhd()/* the intel_iommu is about to bring up */
> --->intel_iommu_add()
> 
> The duplicate check here:
> 
>  if (g_iommus[iommu->seq_id])
>  return 0;
> 
> All the iommu units are allocated and then initialized in the same
> synchronized path. There is no need to check a duplicate initialization.
> 
> I would like to remove this check if no objection.
> 

This matches my impression.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 4/4] vfio: Require that devices support DMA cache coherence

2022-07-01 Thread Tian, Kevin
> From: Alexey Kardashevskiy 
> Sent: Friday, July 1, 2022 12:58 PM
> 
> On 4/8/22 01:23, Jason Gunthorpe via iommu wrote:
> > IOMMU_CACHE means that normal DMAs do not require any additional
> coherency
> > mechanism and is the basic uAPI that VFIO exposes to userspace. For
> > instance VFIO applications like DPDK will not work if additional coherency
> > operations are required.
> >
> > Therefore check IOMMU_CAP_CACHE_COHERENCY like vdpa & usnic do
> before
> > allowing an IOMMU backed VFIO device to be created.
> 
> 
> This just broke VFIO on POWER which does not use iommu_ops.

In this case below check is more reasonable to be put in type1
attach_group(). Do a iommu_group_for_each_dev() to verify
CACHE_COHERENCY similar to what Robin did for INTR_REMAP.

(sorry no access to my build machine now but I suppose Jason
can soon work out a fix once he sees this. )

> 
> 
> >
> > Signed-off-by: Jason Gunthorpe 
> > ---
> >   drivers/vfio/vfio.c | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index a4555014bd1e72..9edad767cfdad3 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -815,6 +815,13 @@ static int __vfio_register_dev(struct vfio_device
> *device,
> >
> >   int vfio_register_group_dev(struct vfio_device *device)
> >   {
> > +   /*
> > +* VFIO always sets IOMMU_CACHE because we offer no way for
> userspace to
> > +* restore cache coherency.
> > +*/
> > +   if (!iommu_capable(device->dev->bus,
> IOMMU_CAP_CACHE_COHERENCY))
> > +   return -EINVAL;
> > +
> > return __vfio_register_dev(device,
> > vfio_group_find_or_alloc(device->dev));
> >   }
> 
> --
> Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-07-01 Thread Alexander Stein
Hi Saravana,

Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> 
>  wrote:
> > Hi,
> > 
> > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > Hi,
> > > 
> > > * Saravana Kannan  [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the
> > > > supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > > 
> > > > So, delete the call and replace it with -ENODEV.
> > > 
> > > Looks like this causes omaps to not boot in Linux next. With this
> > > simple-pm-bus fails to probe initially as the power-domain is not
> > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> > > 
> > > Seems like other stuff is potentially broken too, any ideas on
> > > how to fix this?
> > 
> > I think I'm hit by this as well, although I do not get a lockup.
> > In my case I'm using
> > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > 3832.blk-ctrl fails as the power-domain is not (yet) registed.
> 
> Ok, took a look.
> 
> The problem is that there are two drivers for the same device and they
> both initialize this device.
> 
> gpc: gpc@303a {
> compatible = "fsl,imx8mq-gpc";
> }
> 
> $ git grep -l "fsl,imx7d-gpc" -- drivers/
> drivers/irqchip/irq-imx-gpcv2.c
> drivers/soc/imx/gpcv2.c
> 
> IMHO, this is a bad/broken design.
> 
> So what's happening is that fw_devlink will block the probe of
> 3832.blk-ctrl until 303a.gpc is initialized. And it stops
> blocking the probe of 3832.blk-ctrl as soon as the first driver
> initializes the device. In this case, it's the irqchip driver.
> 
> I'd recommend combining these drivers into one. Something like the
> patch I'm attaching (sorry for the attachment, copy-paste is mangling
> the tabs). Can you give it a shot please?

I tried this patch and it delayed the driver initialization (those of UART as 
well BTW). Unfortunately the driver fails the same way:
> [1.125253] imx8m-blk-ctrl 3832.blk-ctrl: error -ENODEV: failed to 
attach power domain "bus"

More than that it even introduced some more errors:
> [0.008160] irq: no irq domain found for gpc@303a !
> [0.013251] Failed to map interrupt for
> /soc@0/bus@3040/timer@306a
> [0.020152] Failed to initialize '/soc@0/bus@3040/timer@306a':
> -22

I kept the timestamps to show that these errors happen very early. So now the 
usage of the "global" interrupt parent, set at line 18,
> interrupt-parent = <>;
is not possible at this point of boot time.

Best regards,
Alexander



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu