[PATCH 0/2] iommu: Fix regression in IOMMU grouping
We've had surprisingly little fallout from the DMA alias changes, but unfortunately one regression has popped up. We have an AMD system that seems to use the SATA controller to master transactions for the legacy IDE controller and they are different slots on the root complex. The IVRS reports 00:11.0 (SATA) as an alias from 00:14.1 (IDE), which doesn't work with the new, converged PCI IOMMU grouping code, where we've made a simplifying assumption that aliases will be to the same slot. To fix this, we need to rip out that assumption and write the alias search code that I was unable to come up with previously. I think this can now do the chaining of aliases, which I referenced in the removed comments. Any sort of multi-level aliases are exceptionally unlikely, but I think this code can now handle whatever firmware and alias quirks can throw at it. I know this is late for 3.17, but this is a regression from the prior code. If reviews and testing can give us the confidence to put it in for 3.17, that would be my preference. I've also marked it for stable in case we want to loop back through that way. Thanks, Alex --- Alex Williamson (2): iommu/amd: Split init_iommu_group() from iommu_init_device() iommu: Rework iommu_group_get_for_pci_dev() drivers/iommu/amd_iommu.c | 27 --- drivers/iommu/iommu.c | 163 +++-- 2 files changed, 109 insertions(+), 81 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] iommu: Rework iommu_group_get_for_pci_dev()
It turns out that our assumption that aliases are always to the same slot isn't true. One particular platform reports an IVRS alias of the SATA controller (00:11.0) for the legacy IDE controller (00:14.1). When we hit this, we attempt to use a single IOMMU group for everything on the same bus, which in this case is the root complex. We already have multiple groups defined for the root complex by this point, resulting in multiple WARN_ON hits. This patch makes these sorts of aliases work again with IOMMU groups by reworking how we search through the PCI address space to find existing groups. This should also now handle looped dependencies and all sorts of crazy inter-dependencies that we'll likely never see. The recursion used here should never be very deep. It's unlikely to have individual aliases and only theoretical that we'd ever see a chain where one alias causes us to search through to yet another alias. We're also only dealing with PCIe device on a single bus, which means we'll typically only see multiple slots in use on the root complex. Loops are also a theoretically possibility, which I've tested using fake DMA alias quirks and prevent from causing problems using a bitmap of the devfn space that's been visited. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: sta...@vger.kernel.org # 3.17 --- drivers/iommu/iommu.c | 163 + 1 file changed, 96 insertions(+), 67 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0639b92..690818d 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -30,6 +30,7 @@ #include linux/notifier.h #include linux/err.h #include linux/pci.h +#include linux/bitops.h #include trace/events/iommu.h static struct kset *iommu_group_kset; @@ -519,6 +520,9 @@ int iommu_group_id(struct iommu_group *group) } EXPORT_SYMBOL_GPL(iommu_group_id); +static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev, + unsigned long *devfns); + /* * To consider a PCI device isolated, we require ACS to support Source * Validation, Request Redirection, Completer Redirection, and Upstream @@ -529,6 +533,86 @@ EXPORT_SYMBOL_GPL(iommu_group_id); */ #define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF) +/* + * For multifunction devices which are not isolated from each other, find + * all the other non-isolated functions and look for existing groups. For + * each function, we also need to look for aliases to or from other devices + * that may already have a group. + */ +static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, + unsigned long *devfns) +{ + struct pci_dev *tmp = NULL; + struct iommu_group *group; + + if (!pdev-multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) + return NULL; + + for_each_pci_dev(tmp) { + if (tmp == pdev || tmp-bus != pdev-bus || + PCI_SLOT(tmp-devfn) != PCI_SLOT(pdev-devfn) || + pci_acs_enabled(tmp, REQ_ACS_FLAGS)) + continue; + + group = get_pci_alias_group(tmp, devfns); + if (group) { + pci_dev_put(tmp); + return group; + } + } + + return NULL; +} + +/* + * Look for aliases to or from the given device for exisiting groups. The + * dma_alias_devfn only supports aliases on the same bus, therefore the search + * space is quite small (especially since we're really only looking at pcie + * device, and therefore only expect multiple slots on the root complex or + * downstream switch ports). It's conceivable though that a pair of + * multifunction devices could have aliases between them that would cause a + * loop. To prevent this, we use a bitmap to track where we've been. + */ +static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev, + unsigned long *devfns) +{ + struct pci_dev *tmp = NULL; + struct iommu_group *group; + + if (test_and_set_bit(pdev-devfn 0xff, devfns)) + return NULL; + + group = iommu_group_get(pdev-dev); + if (group) + return group; + + for_each_pci_dev(tmp) { + if (tmp == pdev || tmp-bus != pdev-bus) + continue; + + /* We alias them or they alias us */ + if (((pdev-dev_flags PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) +pdev-dma_alias_devfn == tmp-devfn) || + ((tmp-dev_flags PCI_DEV_FLAGS_DMA_ALIAS_DEVFN) +tmp-dma_alias_devfn == pdev-devfn)) { + + group = get_pci_alias_group(tmp, devfns); + if (group) { + pci_dev_put(tmp); + return group;
[PATCH 2/2] iommu/amd: Split init_iommu_group() from iommu_init_device()
For a PCI device, aliases from the IVRS table won't be populated into dma_alias_devfn until after iommu_init_device() is called on each device. We therefore want to split init_iommu_group() to be called from a separate loop immediately following. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: sta...@vger.kernel.org # 3.17 --- drivers/iommu/amd_iommu.c | 27 +-- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index ecb0109..5aff937 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -260,17 +260,13 @@ static bool check_device(struct device *dev) return true; } -static int init_iommu_group(struct device *dev) +static void init_iommu_group(struct device *dev) { struct iommu_group *group; group = iommu_group_get_for_dev(dev); - - if (IS_ERR(group)) - return PTR_ERR(group); - - iommu_group_put(group); - return 0; + if (!IS_ERR(group)) + iommu_group_put(group); } static int __last_alias(struct pci_dev *pdev, u16 alias, void *data) @@ -340,7 +336,6 @@ static int iommu_init_device(struct device *dev) struct pci_dev *pdev = to_pci_dev(dev); struct iommu_dev_data *dev_data; u16 alias; - int ret; if (dev-archdata.iommu) return 0; @@ -364,12 +359,6 @@ static int iommu_init_device(struct device *dev) dev_data-alias_data = alias_data; } - ret = init_iommu_group(dev); - if (ret) { - free_dev_data(dev_data); - return ret; - } - if (pci_iommuv2_capable(pdev)) { struct amd_iommu *iommu; @@ -455,6 +444,15 @@ int __init amd_iommu_init_devices(void) goto out_free; } + /* +* Initialize IOMMU groups only after iommu_init_device() has +* had a chance to populate any IVRS defined aliases. +*/ + for_each_pci_dev(pdev) { + if (check_device(pdev-dev)) + init_iommu_group(pdev-dev); + } + return 0; out_free: @@ -2415,6 +2413,7 @@ static int device_change_notifier(struct notifier_block *nb, case BUS_NOTIFY_ADD_DEVICE: iommu_init_device(dev); + init_iommu_group(dev); /* * dev_data is still NULL and ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 04/13 v2] iommu/arm-smmu: Convert to iommu_capable() API function function
On Wed, Sep 17, 2014 at 09:53:12AM +0100, Joerg Roedel wrote: Hi Will, Hello Joerg, On Mon, Sep 08, 2014 at 05:51:36PM +0100, Will Deacon wrote: On Fri, Sep 05, 2014 at 11:52:56AM +0100, Joerg Roedel wrote: From: Joerg Roedel jroe...@suse.de Cc: Will Deacon will.dea...@arm.com Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/arm-smmu.c | 35 ++- 1 file changed, 26 insertions(+), 9 deletions(-) Okay, so here is the updated patch: Thanks, comments inline. From b5d895980849ba1a46a5250cd4cc5f3f9f28235d Mon Sep 17 00:00:00 2001 From: Joerg Roedel jroe...@suse.de Date: Fri, 5 Sep 2014 10:49:34 +0200 Subject: [PATCH 04/13] iommu/arm-smmu: Convert to iommu_capable() API function Cc: Will Deacon will.dea...@arm.com Signed-off-by: Joerg Roedel jroe...@suse.de --- drivers/iommu/arm-smmu.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index a83cc2a..f5cacf4 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1526,20 +1526,20 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain, return __pfn_to_phys(pte_pfn(pte)) | (iova ~PAGE_MASK); } -static int arm_smmu_domain_has_cap(struct iommu_domain *domain, -unsigned long cap) +static bool arm_smmu_capable(enum iommu_cap cap) { - struct arm_smmu_domain *smmu_domain = domain-priv; - struct arm_smmu_device *smmu = smmu_domain-smmu; - u32 features = smmu ? smmu-features : 0; - switch (cap) { case IOMMU_CAP_CACHE_COHERENCY: - return features ARM_SMMU_FEAT_COHERENT_WALK; + /* + * Return false here until we have a way to find out whether the + * SMMUs in the system a coherently connected and able to force + * DMA coherency. + */ s/a/are/ However, I thought about this a bit more and the coherency isn't necessarily a global property of the SMMU. In reality, it is dependent on the IOTLBs in use by the domain, so it's not going to be possible to report true here in many cases. That means we'd need a way to say `this device is dma coherent when its downstream IOMMU is enabled with IOMMU_CACHE mappings'. For the moment, people will probably just add `dma-coherent' to the endpoint and dma-mapping will request IOMMU_CACHE mappings regardless of the features advertised by the IOMMU. In that case, it might make more sense to return `true' here as we can always generated cacheable transactions from the SMMU. The dma-coherent property on the device would then indicate whether those transactions will snoop the CPU caches. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: fix bug in pmd construction
We are using the same pfn for every pte we create while constructing the pmd. Fix this by actually updating the pfn on each iteration of the pmd construction loop. It's not clear if we can actually hit this bug right now since iommu_map splits up the calls to .map based on the page size, so we only ever seem to iterate this loop once. However, things might change in the future that might cause us to hit this. Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org --- Will, I was unable to come up with a test case to hit this bug based on what I said in the commit message above. Not sure if my analysis is completely off base, my head is still spinning from all these page tables :). --- drivers/iommu/arm-smmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index ca18d6d42a..eba4cb390c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1368,6 +1368,7 @@ static int arm_smmu_alloc_init_pmd(struct arm_smmu_device *smmu, pud_t *pud, ret = arm_smmu_alloc_init_pte(smmu, pmd, addr, next, pfn, prot, stage); phys += next - addr; + pfn = __phys_to_pfn(phys); } while (pmd++, addr = next, addr end); return ret; -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] iommu/arm-smmu: add support for access-protected mappings
On Wed, Sep 17, 2014 at 09:16:09PM +0100, Mitchel Humpherys wrote: ARM SMMUs support memory access control via some bits in the translation table descriptor memory attributes. Currently we assume all translations are unprivileged. Add support for privileged mappings, controlled by the IOMMU_PRIV prot flag. Also sneak in a whitespace change for consistency with nearby code. Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org --- drivers/iommu/arm-smmu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index ca18d6d42a..93999ec22c 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1256,10 +1256,11 @@ static int arm_smmu_alloc_init_pte(struct arm_smmu_device *smmu, pmd_t *pmd, } if (stage == 1) { - pteval |= ARM_SMMU_PTE_AP_UNPRIV | ARM_SMMU_PTE_nG; + pteval |= ARM_SMMU_PTE_nG; + if (!(prot IOMMU_PRIV)) + pteval |= ARM_SMMU_PTE_AP_UNPRIV; I think this actually makes more sense if we invert the logic, i.e. have IOMMU_USER as a flag which sets the UNPRIV bit in the pte. I don't have the spec to hand, but I guess you can't enforce this at stage-2? If so, do we also need a new IOMMU capability so people don't try to use this for stage-2 only SMMUs? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu