[PATCH 0/2] iommu: Fix regression in IOMMU grouping

2014-09-19 Thread Alex Williamson
We've had surprisingly little fallout from the DMA alias changes, but
unfortunately one regression has popped up.  We have an AMD system
that seems to use the SATA controller to master transactions for the
legacy IDE controller and they are different slots on the root complex.
The IVRS reports 00:11.0 (SATA) as an alias from 00:14.1 (IDE), which
doesn't work with the new, converged PCI IOMMU grouping code, where
we've made a simplifying assumption that aliases will be to the same
slot.

To fix this, we need to rip out that assumption and write the alias
search code that I was unable to come up with previously.  I think
this can now do the chaining of aliases, which I referenced in the
removed comments.  Any sort of multi-level aliases are exceptionally
unlikely, but I think this code can now handle whatever firmware and
alias quirks can throw at it.

I know this is late for 3.17, but this is a regression from the prior
code.  If reviews and testing can give us the confidence to put it in
for 3.17, that would be my preference.  I've also marked it for stable
in case we want to loop back through that way.  Thanks,

Alex

---

Alex Williamson (2):
  iommu/amd: Split init_iommu_group() from iommu_init_device()
  iommu: Rework iommu_group_get_for_pci_dev()


 drivers/iommu/amd_iommu.c |   27 ---
 drivers/iommu/iommu.c |  163 +++--
 2 files changed, 109 insertions(+), 81 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu: Rework iommu_group_get_for_pci_dev()

2014-09-19 Thread Alex Williamson
It turns out that our assumption that aliases are always to the same
slot isn't true.  One particular platform reports an IVRS alias of the
SATA controller (00:11.0) for the legacy IDE controller (00:14.1).
When we hit this, we attempt to use a single IOMMU group for
everything on the same bus, which in this case is the root complex.
We already have multiple groups defined for the root complex by this
point, resulting in multiple WARN_ON hits.

This patch makes these sorts of aliases work again with IOMMU groups
by reworking how we search through the PCI address space to find
existing groups.  This should also now handle looped dependencies and
all sorts of crazy inter-dependencies that we'll likely never see.

The recursion used here should never be very deep.  It's unlikely to
have individual aliases and only theoretical that we'd ever see a
chain where one alias causes us to search through to yet another
alias.  We're also only dealing with PCIe device on a single bus,
which means we'll typically only see multiple slots in use on the root
complex.  Loops are also a theoretically possibility, which I've
tested using fake DMA alias quirks and prevent from causing problems
using a bitmap of the devfn space that's been visited.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: sta...@vger.kernel.org # 3.17
---

 drivers/iommu/iommu.c |  163 +
 1 file changed, 96 insertions(+), 67 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0639b92..690818d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -30,6 +30,7 @@
 #include linux/notifier.h
 #include linux/err.h
 #include linux/pci.h
+#include linux/bitops.h
 #include trace/events/iommu.h
 
 static struct kset *iommu_group_kset;
@@ -519,6 +520,9 @@ int iommu_group_id(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_id);
 
+static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev,
+  unsigned long *devfns);
+
 /*
  * To consider a PCI device isolated, we require ACS to support Source
  * Validation, Request Redirection, Completer Redirection, and Upstream
@@ -529,6 +533,86 @@ EXPORT_SYMBOL_GPL(iommu_group_id);
  */
 #define REQ_ACS_FLAGS   (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
 
+/*
+ * For multifunction devices which are not isolated from each other, find
+ * all the other non-isolated functions and look for existing groups.  For
+ * each function, we also need to look for aliases to or from other devices
+ * that may already have a group.
+ */
+static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev,
+   unsigned long *devfns)
+{
+   struct pci_dev *tmp = NULL;
+   struct iommu_group *group;
+
+   if (!pdev-multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
+   return NULL;
+
+   for_each_pci_dev(tmp) {
+   if (tmp == pdev || tmp-bus != pdev-bus ||
+   PCI_SLOT(tmp-devfn) != PCI_SLOT(pdev-devfn) ||
+   pci_acs_enabled(tmp, REQ_ACS_FLAGS))
+   continue;
+
+   group = get_pci_alias_group(tmp, devfns);
+   if (group) {
+   pci_dev_put(tmp);
+   return group;
+   }
+   }
+
+   return NULL;
+}
+
+/*
+ * Look for aliases to or from the given device for exisiting groups.  The
+ * dma_alias_devfn only supports aliases on the same bus, therefore the search
+ * space is quite small (especially since we're really only looking at pcie
+ * device, and therefore only expect multiple slots on the root complex or
+ * downstream switch ports).  It's conceivable though that a pair of
+ * multifunction devices could have aliases between them that would cause a
+ * loop.  To prevent this, we use a bitmap to track where we've been.
+ */
+static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev,
+  unsigned long *devfns)
+{
+   struct pci_dev *tmp = NULL;
+   struct iommu_group *group;
+
+   if (test_and_set_bit(pdev-devfn  0xff, devfns))
+   return NULL;
+
+   group = iommu_group_get(pdev-dev);
+   if (group)
+   return group;
+
+   for_each_pci_dev(tmp) {
+   if (tmp == pdev || tmp-bus != pdev-bus)
+   continue;
+
+   /* We alias them or they alias us */
+   if (((pdev-dev_flags  PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) 
+pdev-dma_alias_devfn == tmp-devfn) ||
+   ((tmp-dev_flags  PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) 
+tmp-dma_alias_devfn == pdev-devfn)) {
+
+   group = get_pci_alias_group(tmp, devfns);
+   if (group) {
+   pci_dev_put(tmp);
+   return group;

[PATCH 2/2] iommu/amd: Split init_iommu_group() from iommu_init_device()

2014-09-19 Thread Alex Williamson
For a PCI device, aliases from the IVRS table won't be populated
into dma_alias_devfn until after iommu_init_device() is called on
each device.  We therefore want to split init_iommu_group() to
be called from a separate loop immediately following.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: sta...@vger.kernel.org # 3.17
---

 drivers/iommu/amd_iommu.c |   27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ecb0109..5aff937 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -260,17 +260,13 @@ static bool check_device(struct device *dev)
return true;
 }
 
-static int init_iommu_group(struct device *dev)
+static void init_iommu_group(struct device *dev)
 {
struct iommu_group *group;
 
group = iommu_group_get_for_dev(dev);
-
-   if (IS_ERR(group))
-   return PTR_ERR(group);
-
-   iommu_group_put(group);
-   return 0;
+   if (!IS_ERR(group))
+   iommu_group_put(group);
 }
 
 static int __last_alias(struct pci_dev *pdev, u16 alias, void *data)
@@ -340,7 +336,6 @@ static int iommu_init_device(struct device *dev)
struct pci_dev *pdev = to_pci_dev(dev);
struct iommu_dev_data *dev_data;
u16 alias;
-   int ret;
 
if (dev-archdata.iommu)
return 0;
@@ -364,12 +359,6 @@ static int iommu_init_device(struct device *dev)
dev_data-alias_data = alias_data;
}
 
-   ret = init_iommu_group(dev);
-   if (ret) {
-   free_dev_data(dev_data);
-   return ret;
-   }
-
if (pci_iommuv2_capable(pdev)) {
struct amd_iommu *iommu;
 
@@ -455,6 +444,15 @@ int __init amd_iommu_init_devices(void)
goto out_free;
}
 
+   /*
+* Initialize IOMMU groups only after iommu_init_device() has
+* had a chance to populate any IVRS defined aliases.
+*/
+   for_each_pci_dev(pdev) {
+   if (check_device(pdev-dev))
+   init_iommu_group(pdev-dev);
+   }
+
return 0;
 
 out_free:
@@ -2415,6 +2413,7 @@ static int device_change_notifier(struct notifier_block 
*nb,
case BUS_NOTIFY_ADD_DEVICE:
 
iommu_init_device(dev);
+   init_iommu_group(dev);
 
/*
 * dev_data is still NULL and

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13 v2] iommu/arm-smmu: Convert to iommu_capable() API function function

2014-09-19 Thread Will Deacon
On Wed, Sep 17, 2014 at 09:53:12AM +0100, Joerg Roedel wrote:
 Hi Will,

Hello Joerg,

 On Mon, Sep 08, 2014 at 05:51:36PM +0100, Will Deacon wrote:
  On Fri, Sep 05, 2014 at 11:52:56AM +0100, Joerg Roedel wrote:
   From: Joerg Roedel jroe...@suse.de
   
   Cc: Will Deacon will.dea...@arm.com
   Signed-off-by: Joerg Roedel jroe...@suse.de
   ---
drivers/iommu/arm-smmu.c | 35 ++-
1 file changed, 26 insertions(+), 9 deletions(-)
 
 Okay, so here is the updated patch:

Thanks, comments inline.

 From b5d895980849ba1a46a5250cd4cc5f3f9f28235d Mon Sep 17 00:00:00 2001
 From: Joerg Roedel jroe...@suse.de
 Date: Fri, 5 Sep 2014 10:49:34 +0200
 Subject: [PATCH 04/13] iommu/arm-smmu: Convert to iommu_capable() API function
 
 Cc: Will Deacon will.dea...@arm.com
 Signed-off-by: Joerg Roedel jroe...@suse.de
 ---
  drivers/iommu/arm-smmu.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)
 
 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index a83cc2a..f5cacf4 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -1526,20 +1526,20 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
 iommu_domain *domain,
   return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
  }
  
 -static int arm_smmu_domain_has_cap(struct iommu_domain *domain,
 -unsigned long cap)
 +static bool arm_smmu_capable(enum iommu_cap cap)
  {
 - struct arm_smmu_domain *smmu_domain = domain-priv;
 - struct arm_smmu_device *smmu = smmu_domain-smmu;
 - u32 features = smmu ? smmu-features : 0;
 -
   switch (cap) {
   case IOMMU_CAP_CACHE_COHERENCY:
 - return features  ARM_SMMU_FEAT_COHERENT_WALK;
 + /*
 +  * Return false here until we have a way to find out whether the
 +  * SMMUs in the system a coherently connected and able to force
 +  * DMA coherency.
 +  */

s/a/are/

However, I thought about this a bit more and the coherency isn't necessarily
a global property of the SMMU. In reality, it is dependent on the IOTLBs in
use by the domain, so it's not going to be possible to report true here in
many cases.

That means we'd need a way to say `this device is dma coherent when its
downstream IOMMU is enabled with IOMMU_CACHE mappings'. For the moment,
people will probably just add `dma-coherent' to the endpoint and dma-mapping
will request IOMMU_CACHE mappings regardless of the features advertised by
the IOMMU. In that case, it might make more sense to return `true' here as
we can always generated cacheable transactions from the SMMU. The
dma-coherent property on the device would then indicate whether those
transactions will snoop the CPU caches.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix bug in pmd construction

2014-09-19 Thread Mitchel Humpherys
We are using the same pfn for every pte we create while constructing the
pmd. Fix this by actually updating the pfn on each iteration of the pmd
construction loop.

It's not clear if we can actually hit this bug right now since iommu_map
splits up the calls to .map based on the page size, so we only ever seem to
iterate this loop once. However, things might change in the future that
might cause us to hit this.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Will, I was unable to come up with a test case to hit this bug based on
what I said in the commit message above. Not sure if my analysis is
completely off base, my head is still spinning from all these page tables
:).
---
 drivers/iommu/arm-smmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ca18d6d42a..eba4cb390c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1368,6 +1368,7 @@ static int arm_smmu_alloc_init_pmd(struct arm_smmu_device 
*smmu, pud_t *pud,
ret = arm_smmu_alloc_init_pte(smmu, pmd, addr, next, pfn,
  prot, stage);
phys += next - addr;
+   pfn = __phys_to_pfn(phys);
} while (pmd++, addr = next, addr  end);
 
return ret;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/arm-smmu: add support for access-protected mappings

2014-09-19 Thread Will Deacon
On Wed, Sep 17, 2014 at 09:16:09PM +0100, Mitchel Humpherys wrote:
 ARM SMMUs support memory access control via some bits in the translation
 table descriptor memory attributes. Currently we assume all translations
 are unprivileged. Add support for privileged mappings, controlled by
 the IOMMU_PRIV prot flag.
 
 Also sneak in a whitespace change for consistency with nearby code.
 
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
  drivers/iommu/arm-smmu.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index ca18d6d42a..93999ec22c 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -1256,10 +1256,11 @@ static int arm_smmu_alloc_init_pte(struct 
 arm_smmu_device *smmu, pmd_t *pmd,
   }
  
   if (stage == 1) {
 - pteval |= ARM_SMMU_PTE_AP_UNPRIV | ARM_SMMU_PTE_nG;
 + pteval |= ARM_SMMU_PTE_nG;
 + if (!(prot  IOMMU_PRIV))
 + pteval |= ARM_SMMU_PTE_AP_UNPRIV;

I think this actually makes more sense if we invert the logic, i.e. have
IOMMU_USER as a flag which sets the UNPRIV bit in the pte.

I don't have the spec to hand, but I guess you can't enforce this at
stage-2? If so, do we also need a new IOMMU capability so people don't try
to use this for stage-2 only SMMUs?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu