Re: [PATCH] iommu/arm-smmu-v3: Set GBPA to abort all transactions
Hi Mark, On 4/12/2018 7:56 AM, Marc Zyngier wrote: On 12/04/18 11:17, Robin Murphy wrote: On 11/04/18 17:54, Marc Zyngier wrote: Hi Sammer, On 11/04/18 16:58, Goel, Sameer wrote: On 3/28/2018 9:00 AM, Marc Zyngier wrote: On 2018-03-28 15:39, Timur Tabi wrote: From: Sameer Goel Set SMMU_GBPA to abort all incoming translations during the SMMU reset when SMMUEN==0. This prevents a race condition where a stray DMA from the crashed primary kernel can try to access an IOVA address as an invalid PA when SMMU is disabled during reset in the crash kernel. Signed-off-by: Sameer Goel --- drivers/iommu/arm-smmu-v3.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 3f2f1fc68b52..c04a89310c59 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) if (reg & CR0_SMMUEN) dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); + /* + * Abort all incoming translations. This can happen in a kdump case + * where SMMU is initialized when a prior DMA is pending. Just + * disabling the SMMU in this case might result in writes to invalid + * PAs. + */ + ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT); + if (ret) { + dev_err(smmu->dev, "GBPA not responding to update\n"); + return ret; + } + ret = arm_smmu_device_disable(smmu); if (ret) return ret; A tangential question: can we reliably detect that the SMMU already has valid mappings, which would indicate that we're in a pretty bad shape already by the time we set that bit? For all we know, memory could have been corrupted long before we hit this point, and this patch barely narrows the window of opportunity. :) Yes that is correct. This only covers the kdump scenario. Trying to get some reliability when booting up the crash kernel. The system is already in a bad state. I don't think that this will happen in a normal scenario. But please point me to the GICv3 change and I'll have a look. See this: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407 The nearest equivalent to that is probably the top-level SMMUEN check that we already have (see the diff context above). To go beyond that you'd have to chase the old stream table pointer and scan the whole thing looking for valid contexts, then potentially walk page tables within those contexts to check for live translations if you really wanted to be sure. That would be a hell of a lot of work to do in the boot path. Yeah, feels a bit too involved for sanity. I'd simply suggest you taint the kernel if you find the SMMU enabled, as you're already on shaky ground. Finding the SMMU already enabled does not necessarily indicate that anything catastrophic has occurred. For instance, to support OSes without an SMMUv3 driver, boot FW may have enabled the SMMU and installed 1-to-1 mappings for DDR and MSI target addr(s) to compensate for a MSI-capable master whose default DMA attrs needed tweaking (ex: non-coherent -> coherent). If such a configuration warrants tainting the kernel, then we should similarly check GBPA for attr overrides and taint the kernel if any are found there. Thanks, M. -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/arm-smmu-v3: limit reporting of MSI allocation failures
Currently, the arm-smmu-v3 driver expects to allocate MSIs for all SMMUs with FEAT_MSI set. This results in unwarranted "failed to allocate MSIs" warnings being printed on systems where FW was either deliberately configured to force the use of SMMU wired interrupts -or- is altogether incapable of describing SMMU MSI topology (ACPI IORT prior to rev.C). Remedy this by checking msi_domain before attempting to allocate SMMU MSIs. Signed-off-by: Nate Watterson Signed-off-by: Sinan Kaya --- drivers/iommu/arm-smmu-v3.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 744592d..00de028 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2328,10 +2328,15 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) if (!(smmu->features & ARM_SMMU_FEAT_MSI)) return; + if (!dev->msi_domain) { + dev_info(smmu->dev, "msi_domain absent - falling back to wired irqs\n"); + return; + } + /* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */ ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg); if (ret) { - dev_warn(dev, "failed to allocate MSIs\n"); + dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n"); return; } -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: suppress MSI allocation failure message
From: Sinan Kaya Even though QDF2400 supports MSI interrupts with SMMUv3, it is not enabled in ACPI FW to preserve compatibility with older kernel versions. Code is emitting warning message during boot. This is causing some test tools to record a false warning and is causing support issues. Better reduce the message level. Signed-off-by: Sinan Kaya Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 744592d..2118fda 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2331,7 +2331,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) /* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */ ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg); if (ret) { - dev_warn(dev, "failed to allocate MSIs\n"); + dev_info(dev, "failed to allocate MSIs\n"); return; } -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
Hi Robin, On 11/29/2017 6:29 AM, Robin Murphy wrote: Hi Nate, On 29/11/17 07:07, Nate Watterson wrote: Hi Robin, On 11/28/2017 12:27 PM, Robin Murphy wrote: Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing 52-bit physical addresses when using the 64KB translation granule. This will be supported by SMMUv3.1. Signed-off-by: Robin Murphy --- drivers/iommu/io-pgtable-arm.c | 65 ++ 1 file changed, 47 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 51e5c43caed1..4d46017b3262 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -21,6 +21,7 @@ #define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt #include +#include #include #include #include @@ -32,7 +33,7 @@ #include "io-pgtable.h" -#define ARM_LPAE_MAX_ADDR_BITS 48 +#define ARM_LPAE_MAX_ADDR_BITS 52 #define ARM_LPAE_S2_MAX_CONCAT_PAGES 16 #define ARM_LPAE_MAX_LEVELS 4 @@ -86,6 +87,8 @@ #define ARM_LPAE_PTE_TYPE_TABLE 3 #define ARM_LPAE_PTE_TYPE_PAGE 3 +#define ARM_LPAE_PTE_ADDR_MASK GENMASK_ULL(47,12) + #define ARM_LPAE_PTE_NSTABLE (((arm_lpae_iopte)1) << 63) #define ARM_LPAE_PTE_XN (((arm_lpae_iopte)3) << 53) #define ARM_LPAE_PTE_AF (((arm_lpae_iopte)1) << 10) @@ -159,6 +162,7 @@ #define ARM_LPAE_TCR_PS_42_BIT 0x3ULL #define ARM_LPAE_TCR_PS_44_BIT 0x4ULL #define ARM_LPAE_TCR_PS_48_BIT 0x5ULL +#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL #define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3) #define ARM_LPAE_MAIR_ATTR_MASK 0xff @@ -170,9 +174,7 @@ #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 /* IOPTE accessors */ -#define iopte_deref(pte,d) \ - (__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1) \ - & ~(ARM_LPAE_GRANULE(d) - 1ULL))) +#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) #define iopte_type(pte,l) \ (((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK) @@ -184,12 +186,6 @@ (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \ (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK)) -#define iopte_to_pfn(pte,d) \ - (((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift) - -#define pfn_to_iopte(pfn,d) \ - (((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) - struct arm_lpae_io_pgtable { struct io_pgtable iop; @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, + struct arm_lpae_io_pgtable *data) +{ + arm_lpae_iopte pte = paddr; + + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */ + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK; +} + +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte, + struct arm_lpae_io_pgtable *data) +{ + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK; + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1); + + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */ + return (paddr ^ paddr_hi) | (paddr_hi << 36); +} + static bool selftest_running = false; static dma_addr_t __arm_lpae_dma_addr(void *pages) @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, pte |= ARM_LPAE_PTE_TYPE_BLOCK; pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS; - pte |= pfn_to_iopte(paddr >> data->pg_shift, data); + pte |= paddr_to_iopte(paddr, data); __arm_lpae_set_pte(ptep, pte, &data->iop.cfg); } @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (size == split_sz) unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data); - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift; + blk_paddr = iopte_to_paddr(blk_pte, data); pte = iopte_prot(blk_pte); for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) { @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, found_translation: iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1); - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova; + return iopte_to_paddr(pte, data) | iova; } static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { - unsigned long granule; + unsigned long granule, page_sizes; + unsigned int max_addr_bits = 48; /* * We need to restrict the supported page sizes to match the @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) switch (granule) { case SZ_4K: - cfg-&g
Re: [PATCH 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
Hi Robin, On 11/28/2017 12:27 PM, Robin Murphy wrote: Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing 52-bit physical addresses when using the 64KB translation granule. This will be supported by SMMUv3.1. Signed-off-by: Robin Murphy --- drivers/iommu/io-pgtable-arm.c | 65 ++ 1 file changed, 47 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 51e5c43caed1..4d46017b3262 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -21,6 +21,7 @@ #define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt #include +#include #include #include #include @@ -32,7 +33,7 @@ #include "io-pgtable.h" -#define ARM_LPAE_MAX_ADDR_BITS 48 +#define ARM_LPAE_MAX_ADDR_BITS 52 #define ARM_LPAE_S2_MAX_CONCAT_PAGES 16 #define ARM_LPAE_MAX_LEVELS 4 @@ -86,6 +87,8 @@ #define ARM_LPAE_PTE_TYPE_TABLE 3 #define ARM_LPAE_PTE_TYPE_PAGE3 +#define ARM_LPAE_PTE_ADDR_MASK GENMASK_ULL(47,12) + #define ARM_LPAE_PTE_NSTABLE (((arm_lpae_iopte)1) << 63) #define ARM_LPAE_PTE_XN (((arm_lpae_iopte)3) << 53) #define ARM_LPAE_PTE_AF (((arm_lpae_iopte)1) << 10) @@ -159,6 +162,7 @@ #define ARM_LPAE_TCR_PS_42_BIT0x3ULL #define ARM_LPAE_TCR_PS_44_BIT0x4ULL #define ARM_LPAE_TCR_PS_48_BIT0x5ULL +#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL #define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3) #define ARM_LPAE_MAIR_ATTR_MASK 0xff @@ -170,9 +174,7 @@ #define ARM_LPAE_MAIR_ATTR_IDX_DEV2 /* IOPTE accessors */ -#define iopte_deref(pte,d) \ - (__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1) \ - & ~(ARM_LPAE_GRANULE(d) - 1ULL))) +#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) #define iopte_type(pte,l) \ (((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK) @@ -184,12 +186,6 @@ (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \ (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK)) -#define iopte_to_pfn(pte,d) \ - (((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift) - -#define pfn_to_iopte(pfn,d)\ - (((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) - struct arm_lpae_io_pgtable { struct io_pgtable iop; @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, +struct arm_lpae_io_pgtable *data) +{ + arm_lpae_iopte pte = paddr; + + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */ + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK; +} + +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte, + struct arm_lpae_io_pgtable *data) +{ + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK; + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1); + + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */ + return (paddr ^ paddr_hi) | (paddr_hi << 36); +} + static bool selftest_running = false; static dma_addr_t __arm_lpae_dma_addr(void *pages) @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, pte |= ARM_LPAE_PTE_TYPE_BLOCK; pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS; - pte |= pfn_to_iopte(paddr >> data->pg_shift, data); + pte |= paddr_to_iopte(paddr, data); __arm_lpae_set_pte(ptep, pte, &data->iop.cfg); } @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (size == split_sz) unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data); - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift; + blk_paddr = iopte_to_paddr(blk_pte, data); pte = iopte_prot(blk_pte); for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) { @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, found_translation: iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1); - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova; + return iopte_to_paddr(pte, data) | iova; } static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) { - unsigned long granule; + unsigned long granule, page_sizes; + unsigned int max_addr_bits = 48; /* * We need to restrict the supported page sizes to match the @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg) switch (granule) { case SZ_4K: - cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ
Re: [PATCH v2 0/3] arm-smmu: performance optimization
Hi Leizhen, On 9/12/2017 9:00 AM, Zhen Lei wrote: v1 -> v2: base on (add02cfdc9bc2 "iommu: Introduce Interface for IOMMU TLB Flushing") Zhen Lei (3): iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction iommu/arm-smmu-v3: add support for unmap an iova range with only one tlb sync I tested these (2) patches on QDF2400 hardware and saw performance improvements in line with those I reported when testing the original series. I don't have any hardware close at hand to test the 3rd patch in the series so that will have to come from someone else. Tested-by: Nate Watterson Thanks, Nate iommu/arm-smmu: add support for unmap a memory range with only one tlb sync drivers/iommu/arm-smmu-v3.c| 52 ++ drivers/iommu/arm-smmu.c | 10 drivers/iommu/io-pgtable-arm-v7s.c | 32 +++ drivers/iommu/io-pgtable-arm.c | 30 ++ drivers/iommu/io-pgtable.h | 1 + 5 files changed, 99 insertions(+), 26 deletions(-) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/iova: Make rcache flush optional on IOVA allocation failure
Hi Tomasz, On 9/18/2017 12:02 PM, Robin Murphy wrote: Hi Tomasz, On 18/09/17 11:56, Tomasz Nowicki wrote: Since IOVA allocation failure is not unusual case we need to flush CPUs' rcache in hope we will succeed in next round. However, it is useful to decide whether we need rcache flush step because of two reasons: - Not scalability. On large system with ~100 CPUs iterating and flushing rcache for each CPU becomes serious bottleneck so we may want to deffer it. s/deffer/defer - free_cpu_cached_iovas() does not care about max PFN we are interested in. Thus we may flush our rcaches and still get no new IOVA like in the commonly used scenario: if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift); if (!iova) iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift); 1. First alloc_iova_fast() call is limited to DMA_BIT_MASK(32) to get PCI devices a SAC address 2. alloc_iova() fails due to full 32-bit space 3. rcaches contain PFNs out of 32-bit space so free_cpu_cached_iovas() throws entries away for nothing and alloc_iova() fails again 4. Next alloc_iova_fast() call cannot take advantage of rcache since we have just defeated caches. In this case we pick the slowest option to proceed. This patch reworks flushed_rcache local flag to be additional function argument instead and control rcache flush step. Also, it updates all users to do the flush as the last chance. Looks like you've run into the same thing Nate found[1] - I came up with almost the exact same patch, only with separate alloc_iova_fast() and alloc_iova_fast_noretry() wrapper functions, but on reflection, just exposing the bool to callers is probably simpler. One nit, can you document it in the kerneldoc comment too? With that: Reviewed-by: Robin Murphy Thanks, Robin. [1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19758.html This patch completely resolves the issue I reported in [1]!! Tested-by: Nate Watterson Thanks, Nate Signed-off-by: Tomasz Nowicki --- drivers/iommu/amd_iommu.c | 5 +++-- drivers/iommu/dma-iommu.c | 6 -- drivers/iommu/intel-iommu.c | 5 +++-- drivers/iommu/iova.c| 7 +++ include/linux/iova.h| 5 +++-- 5 files changed, 16 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8d2ec60..ce68986 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1604,10 +1604,11 @@ static unsigned long dma_ops_alloc_iova(struct device *dev, if (dma_mask > DMA_BIT_MASK(32)) pfn = alloc_iova_fast(&dma_dom->iovad, pages, - IOVA_PFN(DMA_BIT_MASK(32))); + IOVA_PFN(DMA_BIT_MASK(32)), false); if (!pfn) - pfn = alloc_iova_fast(&dma_dom->iovad, pages, IOVA_PFN(dma_mask)); + pfn = alloc_iova_fast(&dma_dom->iovad, pages, + IOVA_PFN(dma_mask), true); return (pfn << PAGE_SHIFT); } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 191be9c..25914d3 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -370,10 +370,12 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, /* Try to get PCI devices a SAC address */ if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) - iova = alloc_iova_fast(iovad, iova_len, DMA_BIT_MASK(32) >> shift); + iova = alloc_iova_fast(iovad, iova_len, + DMA_BIT_MASK(32) >> shift, false); if (!iova) - iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift); + iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift, + true); return (dma_addr_t)iova << shift; } diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 05c0c3a..75c8320 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3460,11 +3460,12 @@ static unsigned long intel_alloc_iova(struct device *dev, * from higher range */ iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, - IOVA_PFN(DMA_BIT_MASK(32))); + IOVA_PFN(DMA_BIT_MASK(32)), false); if (iova_pfn) return iova_pfn; } - iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, IOVA_PFN(dma_mask)); + iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, + IOVA_PFN(dma_mask), true); if (unlikely(!iova_pfn)) { pr_err(
Re: [PATCH v3 0/4] Optimise 64-bit IOVA allocations
Hi Robin, On 8/22/2017 11:17 AM, Robin Murphy wrote: Hi all, Just a quick repost of v2[1] with a small fix for the bug reported by Nate. I tested the series and can confirm that the crash I reported on v2 no longer occurs with this version. To recap, whilst this mostly only improves worst-case performance, those worst-cases have a tendency to be pathologically bad: Ard reports general desktop performance with Chromium on AMD Seattle going from ~1-2FPS to perfectly usable. Leizhen reports gigabit ethernet throughput going from ~6.5Mbit/s to line speed. I also inadvertantly found that the HiSilicon hns_dsaf driver was taking ~35s to probe simply becuase of the number of DMA buffers it maps on startup (perf shows around 76% of that was spent under the lock in alloc_iova()). With this series applied it takes a mere ~1s, mostly of unrelated mdelay()s, with alloc_iova() entirely lost in the noise. Are any of these cases PCI devices attached to domains that have run out of 32-bit IOVAs and have to retry the allocation using dma_limit? iommu_dma_alloc_iova() { [...] if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) [<- TRUE] iova = alloc_iova_fast(DMA_BIT_MASK(32)); [<- NULL] if (!iova) iova = alloc_iova_fast(dma_limit);[<- OK ] [...] } I am asking because, when using 64k pages, the Mellanox CX4 adapter exhausts the supply 32-bit IOVAs simply allocating per-cpu IOVA space during 'ifconfig up' so the code path outlined above is taken for nearly all subsequent allocations. Although I do see a notable (~2x) performance improvement with this series, I would still characterize it as "pathologically bad" at < 10% of iommu passthrough performance. This was a bit surprising to me as I thought the iova_rcache would have eliminated the need to walk the rbtree for runtime allocations. Unfortunately, it looks like the failed attempt to allocate a 32-bit IOVA actually drops the cached IOVAs that we could have used when subsequently performing the allocation at dma_limit. alloc_iova_fast() { [...] iova_pfn = iova_rcache_get(...); [<- Fail, no 32-bit IOVAs] if (iova_pfn) return iova_pfn; retry: new_iova = alloc_iova(...); [<- Fail, no 32-bit IOVAs] if (!new_iova) { unsigned int cpu; if (flushed_rcache) return 0; /* Try replenishing IOVAs by flushing rcache. */ flushed_rcache = true; for_each_online_cpu(cpu) free_cpu_cached_iovas(cpu, iovad); [<- :( ] goto retry; } } As an experiment, I added code to skip the rcache flushing/retry for the 32-bit allocations. In this configuration, 100% of passthrough mode performance was achieved. I made the same change in the baseline and measured performance at ~95% of passthrough mode. I also got similar results by altogether removing the 32-bit allocation from iommu_dma_alloc_iova() which makes me wonder why we even bother. What (PCIe) workloads have been shown to actually benefit from it? Tested-by: Nate Watterson -Nate Robin. [1] https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19139.html Robin Murphy (1): iommu/iova: Extend rbtree node caching Zhen Lei (3): iommu/iova: Optimise rbtree searching iommu/iova: Optimise the padding calculation iommu/iova: Make dma_32bit_pfn implicit drivers/gpu/drm/tegra/drm.c | 3 +- drivers/gpu/host1x/dev.c | 3 +- drivers/iommu/amd_iommu.c| 7 +-- drivers/iommu/dma-iommu.c| 18 +-- drivers/iommu/intel-iommu.c | 11 ++-- drivers/iommu/iova.c | 114 +-- drivers/misc/mic/scif/scif_rma.c | 3 +- include/linux/iova.h | 8 +-- 8 files changed, 62 insertions(+), 105 deletions(-) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] SMMUv3 CMD_SYNC optimisation
Hi Robin, On 8/18/2017 1:33 PM, Robin Murphy wrote: Hi all, Waiting for the command queue to drain for CMD_SYNC completion is likely a contention hotspot on high-core-count systems. If the SMMU is coherent and supports MSIs, though, we can use this cool feature (as suggested by the architecture, no less) to make syncs effectively non-blocking for anyone other than the caller. I don't have any hardware that supports MSIs, but this has at least passed muster on the Fast Model with cache modelling enabled - I'm hoping the Qualcomm machines have the appropriate configuration to actually test how well it works in reality. If it is worthwhile, I do have most of a plan for how we can do something similar in the non-MSI polling case (it's mostly a problem of handling the queue-wrapping edge cases correctly). I tested this on QDF2400 hardware which supports MSI as a CMD_SYNC completion signal. As with Thunder's "performance optimization" series, I evaluated the patches using FIO with 4 NVME drives connected to a single SMMU. Here's how they compared: FIO - 512k blocksize / io-depth 32 / 1 thread per drive Baseline 4.13-rc1 w/SMMU enabled: 25% of SMMU bypass performance Baseline + Thunder Patch 1 : 28% Baseline + CMD_SYNC Optimization: 36% Baseline + Thunder Patches 2-5 : 86% Baseline + Thunder Patches 1-5 : 100% [!!] Seems like it would probably be worthwhile to implement this for the non-MSI case also. Let me know if there are other workloads you're particularly interested in, and I'll try to get those tested too. -Nate Robin. Robin Murphy (3): iommu/arm-smmu-v3: Specialise CMD_SYNC handling iommu/arm-smmu-v3: Forget about cmdq-sync interrupt iommu/arm-smmu-v3: Utilise CMD_SYNC MSI feature drivers/iommu/arm-smmu-v3.c | 117 +--- 1 file changed, 77 insertions(+), 40 deletions(-) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/4] iommu/iova: Extend rbtree node caching
Hi Robin, On 7/31/2017 7:42 AM, Robin Murphy wrote: Hi Nate, On 29/07/17 04:57, Nate Watterson wrote: Hi Robin, I am seeing a crash when performing very basic testing on this series with a Mellanox CX4 NIC. I dug into the crash a bit, and think this patch is the culprit, but this rcache business is still mostly witchcraft to me. # ifconfig eth5 up # ifconfig eth5 down Unable to handle kernel NULL pointer dereference at virtual address 0020 user pgtable: 64k pages, 48-bit VAs, pgd = 8007dbf47c00 [0020] *pgd=0006efab0003, *pud=0006efab0003, *pmd=0007d8720003, *pte= Internal error: Oops: 9607 [#1] SMP Modules linked in: CPU: 47 PID: 5082 Comm: ifconfig Not tainted 4.13.0-rtp-enablement+ #3 task: 8007da1e5780 task.stack: 8007ddcb8000 PC is at __cached_rbnode_delete_update+0x2c/0x58 LR is at private_free_iova+0x2c/0x60 pc : [] lr : [] pstate: 204001c5 sp : 8007ddcbba00 x29: 8007ddcbba00 x28: 8007c8350210 x27: 8007d1a8 x26: 8007dcc20800 x25: 0140 x24: 8007c98f0008 x23: fe4e x22: 0140 x21: 8007c98f0008 x20: 8007c9adb240 x19: 8007c98f0018 x18: 0010 x17: x16: x15: 4000 x14: x13: x12: 0001 x11: dead0200 x10: x9 : x8 : 8007c9adb1c0 x7 : 40002000 x6 : 00210d00 x5 : x4 : c57e x3 : ffcf x2 : ffcf x1 : 8007c9adb240 x0 : [...] [] __cached_rbnode_delete_update+0x2c/0x58 [] private_free_iova+0x2c/0x60 [] iova_magazine_free_pfns+0x4c/0xa0 [] free_iova_fast+0x1b0/0x230 [] iommu_dma_free_iova+0x5c/0x80 [] __iommu_dma_unmap+0x5c/0x98 [] iommu_dma_unmap_resource+0x24/0x30 [] iommu_dma_unmap_page+0xc/0x18 [] __iommu_unmap_page+0x40/0x60 [] mlx5e_page_release+0xbc/0x128 [] mlx5e_dealloc_rx_wqe+0x30/0x40 [] mlx5e_close_channel+0x70/0x1f8 [] mlx5e_close_channels+0x2c/0x50 [] mlx5e_close_locked+0x54/0x68 [] mlx5e_close+0x30/0x58 [...] ** Disassembly for __cached_rbnode_delete_update() near the fault ** 92|if (free->pfn_hi < iovad->dma_32bit_pfn) 0852C6C4|ldr x3,[x1,#0x18]; x3,[free,#24] 0852C6C8|ldr x2,[x0,#0x30]; x2,[iovad,#48] 0852C6CC|cmp x3,x2 0852C6D0|b.cs0x0852C708 |curr = &iovad->cached32_node; 94|if (!curr) 0852C6D4|addsx19,x0,#0x18 ; x19,iovad,#24 0852C6D8|b.eq0x0852C708 | |cached_iova = rb_entry(*curr, struct iova, node); | 99|if (free->pfn_lo >= cached_iova->pfn_lo) 0852C6DC|ldr x0,[x19] ; xiovad,[curr] 0852C6E0|ldr x2,[x1,#0x20]; x2,[free,#32] 0852C6E4|ldr x0,[x0,#0x20]; x0,[x0,#32] Apparently cached_iova was NULL so the pfn_lo access faulted. 0852C6E8|cmp x2,x0 0852C6EC|b.cc0x0852C6FC 0852C6F0|mov x0,x1; x0,free 100|*curr = rb_next(&free->node); After instrumenting the code a bit, this seems to be the culprit. In the previous call, free->pfn_lo was 0x_ which is actually the dma_limit for the domain so rb_next() returns NULL. Let me know if you have any questions or would like additional tests run. I also applied your "DMA domain debug info" patches and dumped the contents of the domain at each of the steps above in case that would be useful. If nothing else, they reinforce how thirsty the CX4 NIC is especially when using 64k pages and many CPUs. Thanks for the report - I somehow managed to reason myself out of keeping the "no cached node" check in __cached_rbnode_delete_update() on the assumption that it must always be set by a previous allocation. However, there is indeed just one case case for which that fails: when you free any IOVA immediately after freeing the very topmost one. Which is something that freeing an entire magazine's worth of IOVAs back to the tree all at once has a very real chance of doing... The obvious straightforward fix is inline below, but I'm now starting to understand the appeal of reserving a sentinel node to ensure the tree can never be empty, so I might have a quick go at that to see if it results in nicer code overall. After applying your fix, the crash no longer occurs, but the performance of this use case is only marginally less awful
Re: [PATCH v2 3/4] iommu/iova: Extend rbtree node caching
Hi Robin, I am seeing a crash when performing very basic testing on this series with a Mellanox CX4 NIC. I dug into the crash a bit, and think this patch is the culprit, but this rcache business is still mostly witchcraft to me. # ifconfig eth5 up # ifconfig eth5 down Unable to handle kernel NULL pointer dereference at virtual address 0020 user pgtable: 64k pages, 48-bit VAs, pgd = 8007dbf47c00 [0020] *pgd=0006efab0003, *pud=0006efab0003, *pmd=0007d8720003, *pte= Internal error: Oops: 9607 [#1] SMP Modules linked in: CPU: 47 PID: 5082 Comm: ifconfig Not tainted 4.13.0-rtp-enablement+ #3 task: 8007da1e5780 task.stack: 8007ddcb8000 PC is at __cached_rbnode_delete_update+0x2c/0x58 LR is at private_free_iova+0x2c/0x60 pc : [] lr : [] pstate: 204001c5 sp : 8007ddcbba00 x29: 8007ddcbba00 x28: 8007c8350210 x27: 8007d1a8 x26: 8007dcc20800 x25: 0140 x24: 8007c98f0008 x23: fe4e x22: 0140 x21: 8007c98f0008 x20: 8007c9adb240 x19: 8007c98f0018 x18: 0010 x17: x16: x15: 4000 x14: x13: x12: 0001 x11: dead0200 x10: x9 : x8 : 8007c9adb1c0 x7 : 40002000 x6 : 00210d00 x5 : x4 : c57e x3 : ffcf x2 : ffcf x1 : 8007c9adb240 x0 : [...] [] __cached_rbnode_delete_update+0x2c/0x58 [] private_free_iova+0x2c/0x60 [] iova_magazine_free_pfns+0x4c/0xa0 [] free_iova_fast+0x1b0/0x230 [] iommu_dma_free_iova+0x5c/0x80 [] __iommu_dma_unmap+0x5c/0x98 [] iommu_dma_unmap_resource+0x24/0x30 [] iommu_dma_unmap_page+0xc/0x18 [] __iommu_unmap_page+0x40/0x60 [] mlx5e_page_release+0xbc/0x128 [] mlx5e_dealloc_rx_wqe+0x30/0x40 [] mlx5e_close_channel+0x70/0x1f8 [] mlx5e_close_channels+0x2c/0x50 [] mlx5e_close_locked+0x54/0x68 [] mlx5e_close+0x30/0x58 [...] ** Disassembly for __cached_rbnode_delete_update() near the fault ** 92|if (free->pfn_hi < iovad->dma_32bit_pfn) 0852C6C4|ldr x3,[x1,#0x18]; x3,[free,#24] 0852C6C8|ldr x2,[x0,#0x30]; x2,[iovad,#48] 0852C6CC|cmp x3,x2 0852C6D0|b.cs0x0852C708 |curr = &iovad->cached32_node; 94|if (!curr) 0852C6D4|addsx19,x0,#0x18 ; x19,iovad,#24 0852C6D8|b.eq0x0852C708 | |cached_iova = rb_entry(*curr, struct iova, node); | 99|if (free->pfn_lo >= cached_iova->pfn_lo) 0852C6DC|ldr x0,[x19] ; xiovad,[curr] 0852C6E0|ldr x2,[x1,#0x20]; x2,[free,#32] 0852C6E4|ldr x0,[x0,#0x20]; x0,[x0,#32] Apparently cached_iova was NULL so the pfn_lo access faulted. 0852C6E8|cmp x2,x0 0852C6EC|b.cc0x0852C6FC 0852C6F0|mov x0,x1; x0,free 100|*curr = rb_next(&free->node); After instrumenting the code a bit, this seems to be the culprit. In the previous call, free->pfn_lo was 0x_ which is actually the dma_limit for the domain so rb_next() returns NULL. Let me know if you have any questions or would like additional tests run. I also applied your "DMA domain debug info" patches and dumped the contents of the domain at each of the steps above in case that would be useful. If nothing else, they reinforce how thirsty the CX4 NIC is especially when using 64k pages and many CPUs. -Nate On 7/21/2017 7:42 AM, Robin Murphy wrote: The cached node mechanism provides a significant performance benefit for allocations using a 32-bit DMA mask, but in the case of non-PCI devices or where the 32-bit space is full, the loss of this benefit can be significant - on large systems there can be many thousands of entries in the tree, such that traversing to the end then walking all the way down to find free space every time becomes increasingly awful. Maintain a similar cached node for the whole IOVA space as a superset of the 32-bit space so that performance can remain much more consistent. Inspired by work by Zhen Lei . Tested-by: Ard Biesheuvel Tested-by: Zhen Lei Signed-off-by: Robin Murphy --- v2: No change drivers/iommu/iova.c | 59 +--- include/linux/iova.h | 3 ++- 2 files changed, 30 insertions(+), 32 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index d094d1ca8f23..f5809a2ee6c2 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -46,6
Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Hi Jonathan, [...] Hi All, I'm a bit of late entry to this discussion. Just been running some more detailed tests on our d05 boards and wanted to bring some more numbers to the discussion. All tests against 4.12 with the following additions: * Robin's series removing the io-pgtable spinlock (and a few recent fixes) * Cherry picked updates to the sas driver, merged prior to 4.13-rc1 * An additional HNS (network card) bug fix that will be upstreamed shortly. I've broken the results down into this patch and this patch + the remainder of the set. As leizhen mentioned we got a nice little performance bump from Robin's series so that was applied first (as it's in mainline now) SAS tests were fio with noop scheduler, 4k block size and various io depths 1 process per disk. Note this is probably a different setup to leizhen's original numbers. Precentages are off the performance seen with the smmu disabled. SAS 4.12 - none of this series. SMMU disabled read io-depth 32 - 384K IOPS (100%) read io-depth 2048 - 950K IOPS (100%) rw io-depth 32 - 166K IOPS (100%) rw io-depth 2048 - 340K IOPS (100%) SMMU enabled read io-depth 32 - 201K IOPS (52%) read io-depth 2048 - 306K IOPS (32%) rw io-depth 32 - 99K IOPS (60%) rw io-depth 2048 - 150K IOPS (44%) Robin's recent series with fixes as seen on list (now merged) SMMU enabled. read io-depth 32 - 208K IOPS (54%) read io-depth 2048 - 335K IOPS (35%) rw io-depth 32 - 105K IOPS (63%) rw io-depth 2048 - 165K IOPS (49%) 4.12 + Robin's series + just this patch SMMU enabled (iommu/arm-smmu-v3: put of the execution of TLBI* to reduce lock conflict) read io-depth 32 - 225K IOPS (59%) read io-depth 2048 - 365K IOPS (38%) rw io-depth 32 - 110K IOPS (66%) rw io-depth 2048 - 179K IOPS (53%) 4.12 + Robin's series + Second part of this series (iommu/arm-smmu-v3: put of the execution of TLBI* to reduce lock conflict) (iommu: add a new member unmap_tlb_sync into struct iommu_ops) (iommu/arm-smmu-v3: add supprot for unmap an iova range with only on tlb sync) (iommu/arm-smmu: add support for unmap of a memory range with only one tlb sync) read io-depth 32 -225K IOPS (59%) read io-depth 2048 - 833K IOPS (88%) rw io-depth 32 - 112K IOPS (67%) rw io-depth 2048 -220K IOPS (65%) Robin's series gave us small gains across the board (3-5% recovered) relative to the no smmu performance (which we are taking as the ideal case) This first patch gets us back another 2-5% of the no smmu performance The next few patches get us very little advantage on the small io-depths but make a large difference to the larger io-depths - in particular the read IOPS which is over twice as fast as without the series. For HNS it seems that we are less dependent on the SMMU performance and can reach the non SMMU speed. Tests with iperf -t 30 -i 10 -c IPADDRESS -P 3 last 10 seconds taken to avoid any initial variability. The server end of the link was always running with smmu v3 disabled so as to act as a fast sink of the data. Some variation seen across repeat runs. Mainline v4.12 + network card fix NO SMMU 9.42 GBits/sec SMMU 4.36 GBits/sec (46%) Robin's io-pgtable spinlock series 6.68 to 7.34 (71% - 78% variation across runs) Just this patch SMMU enabled (iommu/arm-smmu-v3: put of the execution of TLBI* to reduce lock conflict) 7.96-8.8 GBits/sec (85% - 94% some variation across runs) Full series (iommu/arm-smmu-v3: put of the execution of TLBI* to reduce lock conflict) (iommu: add a new member unmap_tlb_sync into struct iommu_ops) (iommu/arm-smmu-v3: add supprot for unmap an iova range with only on tlb sync) (iommu/arm-smmu: add support for unmap of a memory range with only one tlb sync) 9.42 GBits/Sec (100%) So HNS test shows a greater boost from Robin's series and this first patch. This is most likely because the HNS test is not putting as high a load on the SMMU and associated code as the SAS test. In both cases however, this shows that both parts of this patch series are beneficial. So on to the questions ;) Will, you mentioned that along with Robin and Nate you were working on a somewhat related strategy to improve the performance. Any ETA on that? The strategy I was working on is basically equivalent to the second part of the series. I will test your patches out sometime this week, and I'll also try to have our performance team run it through their whole suite. Thanks, that's excellent. Look forward to hearing how it goes. I tested the patches with 4 NVME drives connected to a single SMMU and the results seem to be inline with those you've reported. FIO - 512k blocksize / io-depth 32 / 1 thread per drive Baseline 4.13-rc1 w/SMMU enabled: 25% of SMMU bypass performance Baseline + Patch 1 : 28% Baseline + Patches 2-5 : 86% Baseline + Complete series : 100% [!!] I saw performance improvements across all of the other FIO profiles I tested, although not always as substantial as was seen in th
Re: [PATCH 1/5] iommu/arm-smmu-v3: put off the execution of TLBI* to reduce lock confliction
Hi Jonathan, On 7/17/2017 10:23 AM, Jonathan Cameron wrote: On Mon, 17 Jul 2017 14:06:42 +0100 John Garry wrote: + On 29/06/2017 03:08, Leizhen (ThunderTown) wrote: On 2017/6/28 17:32, Will Deacon wrote: Hi Zhen Lei, Nate (CC'd), Robin and I have been working on something very similar to this series, but this patch is different to what we had planned. More below. On Mon, Jun 26, 2017 at 09:38:46PM +0800, Zhen Lei wrote: Because all TLBI commands should be followed by a SYNC command, to make sure that it has been completely finished. So we can just add the TLBI commands into the queue, and put off the execution until meet SYNC or other commands. To prevent the followed SYNC command waiting for a long time because of too many commands have been delayed, restrict the max delayed number. According to my test, I got the same performance data as I replaced writel with writel_relaxed in queue_inc_prod. Signed-off-by: Zhen Lei --- drivers/iommu/arm-smmu-v3.c | 42 +- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 291da5f..4481123 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -337,6 +337,7 @@ /* Command queue */ #define CMDQ_ENT_DWORDS 2 #define CMDQ_MAX_SZ_SHIFT 8 +#define CMDQ_MAX_DELAYED 32 #define CMDQ_ERR_SHIFT24 #define CMDQ_ERR_MASK 0x7f @@ -472,6 +473,7 @@ struct arm_smmu_cmdq_ent { }; } cfgi; + #define CMDQ_OP_TLBI_NH_ALL 0x10 #define CMDQ_OP_TLBI_NH_ASID0x11 #define CMDQ_OP_TLBI_NH_VA 0x12 #define CMDQ_OP_TLBI_EL2_ALL0x20 @@ -499,6 +501,7 @@ struct arm_smmu_cmdq_ent { struct arm_smmu_queue { int irq; /* Wired interrupt */ + u32 nr_delay; __le64 *base; dma_addr_t base_dma; @@ -722,11 +725,16 @@ static int queue_sync_prod(struct arm_smmu_queue *q) return ret; } -static void queue_inc_prod(struct arm_smmu_queue *q) +static void queue_inc_swprod(struct arm_smmu_queue *q) { - u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1; + u32 prod = q->prod + 1; q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod); +} + +static void queue_inc_prod(struct arm_smmu_queue *q) +{ + queue_inc_swprod(q); writel(q->prod, q->prod_reg); } @@ -761,13 +769,24 @@ static void queue_write(__le64 *dst, u64 *src, size_t n_dwords) *dst++ = cpu_to_le64(*src++); } -static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent) +static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent, int optimize) { if (queue_full(q)) return -ENOSPC; queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords); - queue_inc_prod(q); + + /* +* We don't want too many commands to be delayed, this may lead the +* followed sync command to wait for a long time. +*/ + if (optimize && (++q->nr_delay < CMDQ_MAX_DELAYED)) { + queue_inc_swprod(q); + } else { + queue_inc_prod(q); + q->nr_delay = 0; + } + So here, you're effectively putting invalidation commands into the command queue without updating PROD. Do you actually see a performance advantage from doing so? Another side of the argument would be that we should be Yes, my sas ssd performance test showed that it can improve about 100-150K/s(the same to I directly replace writel with writel_relaxed). And the average execution time of iommu_unmap(which called by iommu_dma_unmap_sg) dropped from 10us to 5us. moving PROD as soon as we can, so that the SMMU can process invalidation commands in the background and reduce the cost of the final SYNC operation when the high-level unmap operation is complete. There maybe that __iowmb() is more expensive than wait for tlbi complete. Except the time of __iowmb() itself, it also protected by spinlock, lock confliction will rise rapidly in the stress scene. __iowmb() average cost 300-500ns(Sorry, I forget the exact value). In addition, after applied this patcheset and Robin's v2, and my earlier dma64 iova optimization patchset. Our net performance test got the same data to global bypass. But sas ssd still have more than 20% dropped. Maybe we should still focus at map/unamp, because the average execution time of iova alloc/free is only about 400ns. By the way, patch2-5 is more effective than this one, it can improve more than 350K/s. And with it, we can got about 100-150K/s improvement of Robin's v2. Otherwise, I saw non effective of Robin's v2. Sorry, I have not tested how about this patch without patch2-5. Further more, I got the sam
Re: [PATCH v2] iommu/arm-smmu-v3: Implement shutdown method
I should have removed the '-v3' since this revision of the patch adds shutdown to arm-smmu.c as well. I'll fix that in a subsequent version after waiting to see if there are additional changes that need to be made. On 6/29/2017 6:18 PM, Nate Watterson wrote: The shutdown method disables the SMMU to avoid corrupting a new kernel started with kexec. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 7 +++ drivers/iommu/arm-smmu.c| 6 ++ 2 files changed, 13 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 380969a..3d8ac29 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2765,9 +2765,15 @@ static int arm_smmu_device_remove(struct platform_device *pdev) struct arm_smmu_device *smmu = platform_get_drvdata(pdev); arm_smmu_device_disable(smmu); + return 0; } +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + static struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v3", }, { }, @@ -2781,6 +2787,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) }, .probe = arm_smmu_device_probe, .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, }; module_platform_driver(arm_smmu_driver); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 7ec30b0..af50bab 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2319,6 +2319,11 @@ static int arm_smmu_device_remove(struct platform_device *pdev) return 0; } +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + static struct platform_driver arm_smmu_driver = { .driver = { .name = "arm-smmu", @@ -2326,6 +2331,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) }, .probe = arm_smmu_device_probe, .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, }; module_platform_driver(arm_smmu_driver); -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/arm-smmu-v3: Implement shutdown method
The shutdown method disables the SMMU to avoid corrupting a new kernel started with kexec. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 7 +++ drivers/iommu/arm-smmu.c| 6 ++ 2 files changed, 13 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 380969a..3d8ac29 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2765,9 +2765,15 @@ static int arm_smmu_device_remove(struct platform_device *pdev) struct arm_smmu_device *smmu = platform_get_drvdata(pdev); arm_smmu_device_disable(smmu); + return 0; } +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + static struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v3", }, { }, @@ -2781,6 +2787,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) }, .probe = arm_smmu_device_probe, .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, }; module_platform_driver(arm_smmu_driver); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 7ec30b0..af50bab 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2319,6 +2319,11 @@ static int arm_smmu_device_remove(struct platform_device *pdev) return 0; } +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + static struct platform_driver arm_smmu_driver = { .driver = { .name = "arm-smmu", @@ -2326,6 +2331,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) }, .probe = arm_smmu_device_probe, .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, }; module_platform_driver(arm_smmu_driver); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/arm-smmu-v3: Implement shutdown method
On 6/29/2017 2:34 PM, Will Deacon wrote: On Thu, Jun 29, 2017 at 01:40:15PM -0400, Nate Watterson wrote: The shutdown method disables the SMMU and its interrupts to avoid potentially corrupting a new kernel started with kexec. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 11 +++ 1 file changed, 11 insertions(+) We should update arm-smmu.c as well. diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 380969a..907d576 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2765,9 +2765,19 @@ static int arm_smmu_device_remove(struct platform_device *pdev) struct arm_smmu_device *smmu = platform_get_drvdata(pdev); arm_smmu_device_disable(smmu); + + /* Disable IRQs */ + arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL, + ARM_SMMU_IRQ_CTRLACK); + Can you justify the need for this? If we actually need to disable interrupts, then I'd like to understand why so that we can make sure we get the ordering right with respect to disabling the device. Also, do we need to clear the MSI registers too? I have no justification. Based on the number of drivers that take care to prevent their HW from generating an interrupt, I thought it would be required, but I can't find any such requirement explicitly laid out in the documentation. When you mention the MSI registers do you mean, for instance, SMMU_GERROR_IRQ_CFG0? It looks like those are always cleared while initializing the SMMU so the case where an SMMU transitions from using MSIs to using wired interrupts between kernels will be handled properly. My understanding is that kexec will mask irqs at the GIC, so there's not actually an issue here. Will -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: Implement shutdown method
The shutdown method disables the SMMU and its interrupts to avoid potentially corrupting a new kernel started with kexec. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 380969a..907d576 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2765,9 +2765,19 @@ static int arm_smmu_device_remove(struct platform_device *pdev) struct arm_smmu_device *smmu = platform_get_drvdata(pdev); arm_smmu_device_disable(smmu); + + /* Disable IRQs */ + arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL, + ARM_SMMU_IRQ_CTRLACK); + return 0; } +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + static struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v3", }, { }, @@ -2781,6 +2791,7 @@ static int arm_smmu_device_remove(struct platform_device *pdev) }, .probe = arm_smmu_device_probe, .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, }; module_platform_driver(arm_smmu_driver); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/8] io-pgtable lock removal
Hi Robin, On 6/8/2017 7:51 AM, Robin Murphy wrote: Hi all, Here's the cleaned up nominally-final version of the patches everybody's keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix, #2-#4 do some preparatory work (and bid farewell to everyone's least favourite bit of code, hooray!), and #5-#8 do the dirty deed itself. The branch I've previously shared has been updated too: git://linux-arm.org/linux-rm iommu/pgtable All feedback welcome, as I'd really like to land this for 4.13. I tested the series on a QDF2400 development platform and see notable performance improvements particularly in workloads that make concurrent accesses to a single iommu_domain. Robin. Robin Murphy (8): iommu/io-pgtable-arm-v7s: Check table PTEs more precisely iommu/io-pgtable-arm: Improve split_blk_unmap iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap iommu/io-pgtable: Introduce explicit coherency iommu/io-pgtable-arm: Support lockless operation iommu/io-pgtable-arm-v7s: Support lockless operation iommu/arm-smmu: Remove io-pgtable spinlock iommu/arm-smmu-v3: Remove io-pgtable spinlock drivers/iommu/arm-smmu-v3.c| 36 ++- drivers/iommu/arm-smmu.c | 48 -- drivers/iommu/io-pgtable-arm-v7s.c | 173 + drivers/iommu/io-pgtable-arm.c | 190 - drivers/iommu/io-pgtable.h | 6 ++ 5 files changed, 268 insertions(+), 185 deletions(-) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/7] Add PCI ATS support to SMMUv3
Hi Jean-Philippe, On 5/24/2017 2:01 PM, Jean-Philippe Brucker wrote: PCIe devices can implement their own TLB, named Address Translation Cache (ATC). In order to support Address Translation Service (ATS), the following changes are needed in software: * Enable ATS on endpoints when the system supports it. Both PCI root complex and associated SMMU must implement the ATS protocol. * When unmapping an IOVA, send an ATC invalidate request to the endpoint in addition to the usual SMMU IOTLB invalidations. I previously sent this as part of a lengthy RFC [1] adding SVM (ATS + PASID + PRI) support to SMMUv3. The next PASID/PRI version is almost ready, but isn't likely to get merged because it needs hardware testing, so I will send it later. PRI depends on ATS, but ATS should be useful on its own. Without PASID and PRI, ATS is used for accelerating transactions. Instead of having all memory accesses go through SMMU translation, the endpoint can translate IOVA->PA once, store the result in its ATC, then issue subsequent transactions using the PA, partially bypassing the SMMU. So in theory it should be faster while keeping the advantages of an IOMMU, namely scatter-gather and access control. The ATS patches can now be tested on some hardware, even though the lack of compatible PCI endpoints makes it difficult to assess what performance optimizations we need. That's why the ATS implementation is a bit rough at the moment, and we will work on optimizing things like invalidation ranges later. Sinan and I have tested this series on a QDF2400 development platform using a PCIe exerciser card as the ATS capable endpoint. We were able to verify that ATS requests complete with a valid translated address and that DMA transactions using the pre-translated address "bypass" the SMMU. Testing ATC invalidations was a bit more difficult as we could not figure out how to get the exerciser card to automatically send the completion message. We ended up having to write a debugger script that would monitor the CMDQ and tell the exerciser to send the completion when a hanging CMD_SYNC following a CMD_ATC_INV was detected. Hopefully we'll get some real ATS capable endpoints to test with soon. Since the RFC [1]: * added DT and ACPI patches, * added invalidate-all on domain detach, * removed smmu_group again, * removed invalidation print from the fast path, * disabled tagged pointers for good, * some style changes. These patches are based on Linux v4.12-rc2 [1] https://www.spinics.net/lists/linux-pci/msg58650.html Jean-Philippe Brucker (7): PCI: Move ATS declarations outside of CONFIG_PCI dt-bindings: PCI: Describe ATS property for root complex nodes iommu/of: Check ATS capability in root complex nodes ACPI/IORT: Check ATS capability in root complex nodes iommu/arm-smmu-v3: Link domains and devices iommu/arm-smmu-v3: Add support for PCI ATS iommu/arm-smmu-v3: Disable tagged pointers .../devicetree/bindings/pci/pci-iommu.txt | 8 + drivers/acpi/arm64/iort.c | 10 + drivers/iommu/arm-smmu-v3.c| 258 - drivers/iommu/of_iommu.c | 8 + include/linux/iommu.h | 4 + include/linux/pci.h| 26 +-- 6 files changed, 293 insertions(+), 21 deletions(-) -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error
Hi Lorenzo, On 5/23/2017 5:26 AM, Lorenzo Pieralisi wrote: On Tue, May 23, 2017 at 02:31:17PM +0530, Sricharan R wrote: Hi Lorenzo, On 5/23/2017 2:22 PM, Lorenzo Pieralisi wrote: On Tue, May 23, 2017 at 02:26:10AM -0400, Nate Watterson wrote: Hi Sricharan, On 4/10/2017 7:21 AM, Sricharan R wrote: This is an equivalent to the DT's handling of the iommu master's probe with deferred probing when the corrsponding iommu is not probed yet. The lack of a registered IOMMU can be caused by the lack of a driver for the IOMMU, the IOMMU device probe not having been performed yet, having been deferred, or having failed. The first case occurs when the firmware describes the bus master and IOMMU topology correctly but no device driver exists for the IOMMU yet or the device driver has not been compiled in. Return NULL, the caller will configure the device without an IOMMU. The second and third cases are handled by deferring the probe of the bus master device which will eventually get reprobed after the IOMMU. The last case is currently handled by deferring the probe of the bus master device as well. A mechanism to either configure the bus master device without an IOMMU or to fail the bus master device probe depending on whether the IOMMU is optional or mandatory would be a good enhancement. Tested-by: Hanjun Guo Reviewed-by: Robin Murphy [Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure called multiple times for same device] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Sricharan R --- drivers/acpi/arm64/iort.c | 33 - drivers/acpi/scan.c| 11 --- drivers/base/dma-mapping.c | 2 +- include/acpi/acpi_bus.h| 2 +- include/linux/acpi.h | 7 +-- 5 files changed, 47 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 3dd9ec3..e323ece 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct device *dev, const struct iommu_ops *ops = NULL; int ret = -ENODEV; struct fwnode_handle *iort_fwnode; + struct iommu_fwspec *fwspec = dev->iommu_fwspec; + + /* +* If we already translated the fwspec there +* is nothing left to do, return the iommu_ops. +*/ + if (fwspec && fwspec->ops) + return fwspec->ops; Is this logic strictly required? It breaks masters with multiple SIDs as only the first SID is actually added to the master's fwspec. My bad, that's indeed a silly bug I introduced. Please let me know if the patch below fixes it, we will send it upstream shortly. oops, i think emails crossed. Please let me know if you are ok to add this to the other fixes. No worries, yes I am ok thanks but please give Nate some time to report back to make sure the diff I sent actually fixes the problem. The patch you sent fixes the problem. Thanks for the quick turnaround. Apologies for the breakage. Lorenzo Regards, Sricharan -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V11 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error
Hi Sricharan, On 4/10/2017 7:21 AM, Sricharan R wrote: This is an equivalent to the DT's handling of the iommu master's probe with deferred probing when the corrsponding iommu is not probed yet. The lack of a registered IOMMU can be caused by the lack of a driver for the IOMMU, the IOMMU device probe not having been performed yet, having been deferred, or having failed. The first case occurs when the firmware describes the bus master and IOMMU topology correctly but no device driver exists for the IOMMU yet or the device driver has not been compiled in. Return NULL, the caller will configure the device without an IOMMU. The second and third cases are handled by deferring the probe of the bus master device which will eventually get reprobed after the IOMMU. The last case is currently handled by deferring the probe of the bus master device as well. A mechanism to either configure the bus master device without an IOMMU or to fail the bus master device probe depending on whether the IOMMU is optional or mandatory would be a good enhancement. Tested-by: Hanjun Guo Reviewed-by: Robin Murphy [Lorenzo: Added fixes for dma_coherent_mask overflow, acpi_dma_configure called multiple times for same device] Signed-off-by: Lorenzo Pieralisi Signed-off-by: Sricharan R --- drivers/acpi/arm64/iort.c | 33 - drivers/acpi/scan.c| 11 --- drivers/base/dma-mapping.c | 2 +- include/acpi/acpi_bus.h| 2 +- include/linux/acpi.h | 7 +-- 5 files changed, 47 insertions(+), 8 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 3dd9ec3..e323ece 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -543,6 +543,14 @@ static const struct iommu_ops *iort_iommu_xlate(struct device *dev, const struct iommu_ops *ops = NULL; int ret = -ENODEV; struct fwnode_handle *iort_fwnode; + struct iommu_fwspec *fwspec = dev->iommu_fwspec; + + /* +* If we already translated the fwspec there +* is nothing left to do, return the iommu_ops. +*/ + if (fwspec && fwspec->ops) + return fwspec->ops; Is this logic strictly required? It breaks masters with multiple SIDs as only the first SID is actually added to the master's fwspec. if (node) { iort_fwnode = iort_get_fwnode(node); @@ -550,8 +558,17 @@ static const struct iommu_ops *iort_iommu_xlate(struct device *dev, return NULL; ops = iommu_ops_from_fwnode(iort_fwnode); + /* +* If the ops look-up fails, this means that either +* the SMMU drivers have not been probed yet or that +* the SMMU drivers are not built in the kernel; +* Depending on whether the SMMU drivers are built-in +* in the kernel or not, defer the IOMMU configuration +* or just abort it. +*/ if (!ops) - return NULL; + return iort_iommu_driver_enabled(node->type) ? + ERR_PTR(-EPROBE_DEFER) : NULL; ret = arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops); } @@ -625,12 +642,26 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) while (parent) { ops = iort_iommu_xlate(dev, parent, streamid); + if (IS_ERR_OR_NULL(ops)) + return ops; parent = iort_node_get_id(node, &streamid, IORT_IOMMU_TYPE, i++); } } + /* +* If we have reason to believe the IOMMU driver missed the initial +* add_device callback for dev, replay it to get things in order. +*/ + if (!IS_ERR_OR_NULL(ops) && ops->add_device && + dev->bus && !dev->iommu_group) { + int err = ops->add_device(dev); + + if (err) + ops = ERR_PTR(err); + } + return ops; } diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 1926918..2a513cc 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1373,20 +1373,25 @@ enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev) * @dev: The pointer to the device * @attr: device dma attributes */ -void acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) +int acpi_dma_configure(struct device *dev, enum dev_dma_attr attr) { const struct iommu_ops *iommu; + u64 size; iort_set_dma_mask(dev); iommu = iort_iommu_configure(dev); + if (IS_ERR(iommu)) + return PTR_ERR(iommu); + size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); /* * Assume dma valid range starts at 0 and covers the whole * coherent_dma_mask. */ - arch_setup_dma_
Re: [PATCH] iommu/dma: Setup iova_domain granule for IOMMU_DMA_MSI cookies
On 5/16/2017 3:55 PM, Auger Eric wrote: Hi, On 13/04/2017 21:38, Nate Watterson wrote: Hi Robin, On 4/13/2017 7:21 AM, Robin Murphy wrote: Hi Nate, On 13/04/17 09:55, Nate Watterson wrote: Currently, the __iommu_dma_{map/free} functions call iova_{offset/align} making them unsuitable for use with iommu_domains having an IOMMU_DMA_MSI cookie since the cookie's iova_domain member, iovad, is uninitialized. Now that iommu_dma_get_msi_page() calls __iommu_dma_map() regardless of cookie type, failures are being seen when mapping MSI target addresses for devices attached to UNMANAGED domains. To work around this issue, the iova_domain granule for IOMMU_DMA_MSI cookies is initialized to the value returned by cookie_msi_granule(). Oh bum. Thanks for the report. However, I really don't like bodging around it with deliberate undefined behaviour. Fixing things properly doesn't seem too hard: I was not especially please with my solution, but I wanted to avoid potentially missing any other spots in the code where granule was used uninitialized. The compile time check made me feel a little less dirty about innappropriately using the iova_domain with MSI cookies. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8348f366ddd1..62618e77bedc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -396,13 +396,13 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, dma_addr_t iova, size_t size) { struct iova_domain *iovad = &cookie->iovad; - unsigned long shift = iova_shift(iovad); /* The MSI case is only ever cleaning up its most recent allocation */ if (cookie->type == IOMMU_DMA_MSI_COOKIE) cookie->msi_iova -= size; else - free_iova_fast(iovad, iova >> shift, size >> shift); + free_iova_fast(iovad, iova_pfn(iovad, iova), + size >> iova_shift(iovad)); } static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr, @@ -617,11 +617,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; - struct iova_domain *iovad = &cookie->iovad; - size_t iova_off = iova_offset(iovad, phys); + size_t iova_off = 0; dma_addr_t iova; - size = iova_align(iovad, size + iova_off); + if (cookie->type == IOMMU_DMA_IOVA_COOKIE) { + iova_off = iova_offset(&cookie->iovad, phys); + size = iova_align(&cookie->iovad, size + iova_off); + } + iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); if (!iova) return DMA_ERROR_CODE; -8<- Untested, and you'll probably want to double-check it anyway given that the original oversight was mine in the first place ;) This looks good to me. As Shanker has already mentioned, it does fix the faults we were previously seeing with direct device assignment. I also verified that there aren't any other obvious cases of a granule == 0 being used in the dma_iommu code by adding BUG_ON(!iovad->granule) to iova_{mask/align/offset/...} and running a variety of tests without issue. Are you going to post the patch? I also noticed PCIe passthrough/ARM is broken for me with 4.12-r1. I tested Robin's patch as well, on Cavium ThunderX, and this fixes the faults I have seen. Has anyone sent a formal patch? iommu/dma: Don't touch invalid iova_domain members Thanks Eric Robin. Fixes: a44e6657585b ("iommu/dma: Clean up MSI IOVA allocation") Signed-off-by: Nate Watterson --- drivers/iommu/dma-iommu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8348f366..d7b0816 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -127,6 +127,16 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) cookie->msi_iova = base; domain->iova_cookie = cookie; + +/* + * Setup granule for compatibility with __iommu_dma_{alloc/free} and + * add a compile time check to ensure that writing granule won't + * clobber msi_iova. + */ +cookie->iovad.granule = cookie_msi_granule(cookie); +BUILD_BUG_ON(offsetof(struct iova_domain, granule) < +sizeof(cookie->msi_iova)); + return 0; } EXPORT_SYMBOL(iommu_get_msi_cookie); -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: Setup iova_domain granule for IOMMU_DMA_MSI cookies
Hi Robin, On 4/13/2017 7:21 AM, Robin Murphy wrote: Hi Nate, On 13/04/17 09:55, Nate Watterson wrote: Currently, the __iommu_dma_{map/free} functions call iova_{offset/align} making them unsuitable for use with iommu_domains having an IOMMU_DMA_MSI cookie since the cookie's iova_domain member, iovad, is uninitialized. Now that iommu_dma_get_msi_page() calls __iommu_dma_map() regardless of cookie type, failures are being seen when mapping MSI target addresses for devices attached to UNMANAGED domains. To work around this issue, the iova_domain granule for IOMMU_DMA_MSI cookies is initialized to the value returned by cookie_msi_granule(). Oh bum. Thanks for the report. However, I really don't like bodging around it with deliberate undefined behaviour. Fixing things properly doesn't seem too hard: I was not especially please with my solution, but I wanted to avoid potentially missing any other spots in the code where granule was used uninitialized. The compile time check made me feel a little less dirty about innappropriately using the iova_domain with MSI cookies. ->8- diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8348f366ddd1..62618e77bedc 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -396,13 +396,13 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, dma_addr_t iova, size_t size) { struct iova_domain *iovad = &cookie->iovad; - unsigned long shift = iova_shift(iovad); /* The MSI case is only ever cleaning up its most recent allocation */ if (cookie->type == IOMMU_DMA_MSI_COOKIE) cookie->msi_iova -= size; else - free_iova_fast(iovad, iova >> shift, size >> shift); + free_iova_fast(iovad, iova_pfn(iovad, iova), + size >> iova_shift(iovad)); } static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr, @@ -617,11 +617,14 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, { struct iommu_domain *domain = iommu_get_domain_for_dev(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; - struct iova_domain *iovad = &cookie->iovad; - size_t iova_off = iova_offset(iovad, phys); + size_t iova_off = 0; dma_addr_t iova; - size = iova_align(iovad, size + iova_off); + if (cookie->type == IOMMU_DMA_IOVA_COOKIE) { + iova_off = iova_offset(&cookie->iovad, phys); + size = iova_align(&cookie->iovad, size + iova_off); + } + iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev); if (!iova) return DMA_ERROR_CODE; -8<- Untested, and you'll probably want to double-check it anyway given that the original oversight was mine in the first place ;) This looks good to me. As Shanker has already mentioned, it does fix the faults we were previously seeing with direct device assignment. I also verified that there aren't any other obvious cases of a granule == 0 being used in the dma_iommu code by adding BUG_ON(!iovad->granule) to iova_{mask/align/offset/...} and running a variety of tests without issue. Are you going to post the patch? Robin. Fixes: a44e6657585b ("iommu/dma: Clean up MSI IOVA allocation") Signed-off-by: Nate Watterson --- drivers/iommu/dma-iommu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8348f366..d7b0816 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -127,6 +127,16 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) cookie->msi_iova = base; domain->iova_cookie = cookie; + + /* +* Setup granule for compatibility with __iommu_dma_{alloc/free} and +* add a compile time check to ensure that writing granule won't +* clobber msi_iova. +*/ + cookie->iovad.granule = cookie_msi_granule(cookie); + BUILD_BUG_ON(offsetof(struct iova_domain, granule) < + sizeof(cookie->msi_iova)); + return 0; } EXPORT_SYMBOL(iommu_get_msi_cookie); -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/dma: Setup iova_domain granule for IOMMU_DMA_MSI cookies
Currently, the __iommu_dma_{map/free} functions call iova_{offset/align} making them unsuitable for use with iommu_domains having an IOMMU_DMA_MSI cookie since the cookie's iova_domain member, iovad, is uninitialized. Now that iommu_dma_get_msi_page() calls __iommu_dma_map() regardless of cookie type, failures are being seen when mapping MSI target addresses for devices attached to UNMANAGED domains. To work around this issue, the iova_domain granule for IOMMU_DMA_MSI cookies is initialized to the value returned by cookie_msi_granule(). Fixes: a44e6657585b ("iommu/dma: Clean up MSI IOVA allocation") Signed-off-by: Nate Watterson --- drivers/iommu/dma-iommu.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 8348f366..d7b0816 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -127,6 +127,16 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) cookie->msi_iova = base; domain->iova_cookie = cookie; + + /* +* Setup granule for compatibility with __iommu_dma_{alloc/free} and +* add a compile time check to ensure that writing granule won't +* clobber msi_iova. +*/ + cookie->iovad.granule = cookie_msi_granule(cookie); + BUILD_BUG_ON(offsetof(struct iova_domain, granule) < + sizeof(cookie->msi_iova)); + return 0; } EXPORT_SYMBOL(iommu_get_msi_cookie); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND,3/3] iommu/dma: Plumb in the per-CPU IOVA caches
Hi Robin, On 4/6/2017 2:56 PM, Robin Murphy wrote: On 06/04/17 19:15, Manoj Iyer wrote: On Fri, 31 Mar 2017, Robin Murphy wrote: With IOVA allocation suitably tidied up, we are finally free to opt in to the per-CPU caching mechanism. The caching alone can provide a modest improvement over walking the rbtree for weedier systems (iperf3 shows ~10% more ethernet throughput on an ARM Juno r1 constrained to a single 650MHz Cortex-A53), but the real gain will be in sidestepping the rbtree lock contention which larger ARM-based systems with lots of parallel I/O are starting to feel the pain of. [...] This patch series helps to resolve the Ubuntu bug, where we see the Ubuntu Zesty (4.10 based) kernel reporting multi cpu soft lockups on QDF2400 SDP. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1680549 Wow, how many separate buffers does that driver have mapped at once to spend 22 seconds walking the rbtree for a single allocation!? I'd almost expect that to indicate a deadlock. I'm guessing you wouldn't have seen this on older kernels, since I assume that particular platform is booting via ACPI, so wouldn't have had the SMMU enabled without the IORT support which landed in 4.10. Although this series does improve performance, the soft lockups seen in the Ubuntu bug Manoj mentioned were actually because while the the mlx5 interface was being brought up, a huge number of concurrent calls to alloc_iova() were being made with limit_pfn != dma_32bit_pfn so the optimized iova lookup was being bypassed. Internally we worked around the issue by adding a set_dma_mask handler that would call iommu_dma_init_domain() to adjust dma_32bit_pfn to match the input mask. https://source.codeaurora.org/quic/server/kernel/commit/arch/arm64/mm/dma-mapping.c?h=qserver-4.10&id=503b36fd3866cab216fc51a5a4015aaa99daf173 This worked well, however it clearly would not have played nice with your dma-iommu PCI optimizations that force dma_limit to 32-bits so it was never sent out. The application of the "PCI allocation optimisations" patch is what actually remedied the cpu soft lockups seen by Manoj. Back to your question of how many buffers does the mlx5 driver have mapped at once. It seems to scale linearly with core count. For example, with 24-cores, after doing 'ifconfig eth up', ~38k calls to alloc_iova() have been made and the min iova is ~0xF600_. With 48-cores, those numbers jump to ~66k calls with min iova ~0xEF00_. Things get really scary when you start using 64k pages. The number of calls to alloc_iova() stays about the same which, when combined with the reserved PCI windows, ends up consuming all of our 32-bit iovas forcing us to once again call alloc_iova() but this time with a limit_pfn != dma_32bit_pfn. This is actually how I stumbled upon the alloc_iova() underflow bug that I posted about a bit earlier. This patch series along with the following cherry-picks from Linus's tree 632b072f iommu/dma: Implement PCI allocation optimisation de84f5f049d9 iommu/dma: Stop getting dma_32bit_pfn wrong were applied to Ubuntu Zesty 4.10 kernel (Ubuntu-4.10.0-18.20) and tested on a QDF2400 SDP. Tested-by: Manoj Iyer Thanks, Robin. -- Manoj Iyer Ubuntu/Canonical ARM Servers - Cloud ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu -- Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/iova: fix underflow bug in __alloc_and_insert_iova_range
Normally, calling alloc_iova() using an iova_domain with insufficient pfns remaining between start_pfn and dma_limit will fail and return a NULL pointer. Unexpectedly, if such a "full" iova_domain contains an iova with pfn_lo == 0, the alloc_iova() call will instead succeed and return an iova containing invalid pfns. This is caused by an underflow bug in __alloc_and_insert_iova_range() that occurs after walking the "full" iova tree when the search ends at the iova with pfn_lo == 0 and limit_pfn is then adjusted to be just below that (-1). This (now huge) limit_pfn gives the impression that a vast amount of space is available between it and start_pfn and thus a new iova is allocated with the invalid pfn_hi value, 0xFFF . To rememdy this, a check is introduced to ensure that adjustments to limit_pfn will not underflow. This issue has been observed in the wild, and is easily reproduced with the following sample code. struct iova_domain *iovad = kzalloc(sizeof(*iovad), GFP_KERNEL); struct iova *rsvd_iova, *good_iova, *bad_iova; unsigned long limit_pfn = 3; unsigned long start_pfn = 1; unsigned long va_size = 2; init_iova_domain(iovad, SZ_4K, start_pfn, limit_pfn); rsvd_iova = reserve_iova(iovad, 0, 0); good_iova = alloc_iova(iovad, va_size, limit_pfn, true); bad_iova = alloc_iova(iovad, va_size, limit_pfn, true); Prior to the patch, this yielded: *rsvd_iova == {0, 0} /* Expected */ *good_iova == {2, 3} /* Expected */ *bad_iova == {-2, -1} /* Oh no... */ After the patch, bad_iova is NULL as expected since inadequate space remains between limit_pfn and start_pfn after allocating good_iova. Signed-off-by: Nate Watterson --- drivers/iommu/iova.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index b7268a1..f6533e0 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -138,7 +138,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, break; /* found a free slot */ } adjust_limit_pfn: - limit_pfn = curr_iova->pfn_lo - 1; + limit_pfn = curr_iova->pfn_lo ? (curr_iova->pfn_lo - 1) : 0; move_left: prev = curr; curr = rb_prev(curr); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/3] IOVA allocation improvements for iommu-dma
On 2017-03-15 09:33, Robin Murphy wrote: Hi all, Hi Robin, Here's the first bit of lock contention removal to chew on - feedback welcome! Note that for the current users of the io-pgtable framework, this is most likely to simply push more contention onto the io-pgtable lock, so may not show a great improvement alone. Will and I both have rough proof-of-concept implementations of lock-free io-pgtable code which we need to sit down and agree on at some point, hopefullt fairly soon. I've taken the opportunity to do a bit of cleanup and refactoring within the series to make the final state of the code nicer, but the diffstat still turns out surprisingly reasonable in the end - it would actually be negative but for the new comments! Magnus, Shimoda-san, the first two patches should be of interest as they constitute the allocation rework I mentioned a while back[1] - if you still need to implement that scary workaround, this should make it simple to hook IPMMU-specific calls into the alloc and free paths, and let the driver take care of the details internally. I've tested your patches on a QDF2400 platform and generally see modest improvements in iperf/fio performance. As you suspected would happen, contention has indeed moved to the io-pgtable lock. I am looking forward to testing with the lock-free io-pgtable implementation, however I suspect that there will still be contention issues acquiring the (SMMUv3) cmdq lock on the unmap path. Reviewed/Tested-by: Nate Watterson Robin. [1]:https://lists.linuxfoundation.org/pipermail/iommu/2017-January/020189.html Robin Murphy (3): iommu/dma: Convert to address-based allocation iommu/dma: Clean up MSI IOVA allocation iommu/dma: Plumb in the per-CPU IOVA caches drivers/iommu/dma-iommu.c | 176 -- 1 file changed, 90 insertions(+), 86 deletions(-) -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/5] iommu/arm-smmu-v3: Make arm_smmu_install_ste_for_dev return void
Hi Will, On 2017-03-10 15:49, Will Deacon wrote: arm_smmu_install_ste_for_dev cannot fail and always returns 0, however the fact that it returns int means that callers end up implementing redundant error handling code which complicates STE tracking and is never executed. This patch changes the return type of arm_smmu_install_ste_for_dev to avoid, to make it explicit that it cannot fail. Did you mean "a void" or just "void" instead of "avoid"? Signed-off-by: Will Deacon --- drivers/iommu/arm-smmu-v3.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 3d38e682071a..e18dbcd26f66 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1579,7 +1579,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) return step; } -static int arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) +static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) { int i; struct arm_smmu_master_data *master = fwspec->iommu_priv; @@ -1591,8 +1591,6 @@ static int arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste); } - - return 0; } static void arm_smmu_detach_dev(struct device *dev) @@ -1600,8 +1598,7 @@ static void arm_smmu_detach_dev(struct device *dev) struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv; master->ste.bypass = true; - if (arm_smmu_install_ste_for_dev(dev->iommu_fwspec) < 0) - dev_warn(dev, "failed to install bypass STE\n"); + arm_smmu_install_ste_for_dev(dev->iommu_fwspec); } static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) @@ -1653,10 +1650,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) ste->s2_cfg = &smmu_domain->s2_cfg; } - ret = arm_smmu_install_ste_for_dev(dev->iommu_fwspec); - if (ret < 0) - ste->valid = false; - + arm_smmu_install_ste_for_dev(dev->iommu_fwspec); out_unlock: mutex_unlock(&smmu_domain->init_mutex); return ret; -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/5] iommu/arm-smmu-v3: Install bypass STEs for IOMMU_DOMAIN_IDENTITY domains
Hi Will, On 2017-03-10 15:49, Will Deacon wrote: In preparation for allowing the default domain type to be overridden, this patch adds support for IOMMU_DOMAIN_IDENTITY domains to the ARM SMMUv3 driver. An identity domain is created by placing the corresponding stream table entries into "bypass" mode, which allows transactions to flow through the SMMU without any translation. What about masters that require SMMU intervention to override their native memory attributes to make them consistent with the CCA (acpi) or dma-coherent (dt) values specified in FW? To make sure those cases are handled, you could store away the master's coherency setting in its strtab_ent at attach time and then setup STE[MemAttr/ALLOCCFG/SHCFG] so the attributes are forced to the correct values while still bypassing translation. Signed-off-by: Will Deacon --- drivers/iommu/arm-smmu-v3.c | 58 + 1 file changed, 37 insertions(+), 21 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index e18dbcd26f66..75fa4809f49e 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -554,9 +554,14 @@ struct arm_smmu_s2_cfg { }; struct arm_smmu_strtab_ent { - boolvalid; - - boolbypass; /* Overrides s1/s2 config */ + /* +* An STE is "assigned" if the master emitting the corresponding SID +* is attached to a domain. The behaviour of an unassigned STE is +* determined by the disable_bypass parameter, whereas an assigned +* STE behaves according to s1_cfg/s2_cfg, which themselves are +* configured according to the domain type. +*/ + boolassigned; struct arm_smmu_s1_cfg *s1_cfg; struct arm_smmu_s2_cfg *s2_cfg; }; @@ -632,6 +637,7 @@ enum arm_smmu_domain_stage { ARM_SMMU_DOMAIN_S1 = 0, ARM_SMMU_DOMAIN_S2, ARM_SMMU_DOMAIN_NESTED, + ARM_SMMU_DOMAIN_BYPASS, }; struct arm_smmu_domain { @@ -1005,9 +1011,9 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, * This is hideously complicated, but we only really care about * three cases at the moment: * -* 1. Invalid (all zero) -> bypass (init) -* 2. Bypass -> translation (attach) -* 3. Translation -> bypass (detach) +* 1. Invalid (all zero) -> bypass/fault (init) +* 2. Bypass/fault -> translation/bypass (attach) +* 3. Translation/bypass -> bypass/fault (detach) * * Given that we can't update the STE atomically and the SMMU * doesn't read the thing in a defined order, that leaves us @@ -1046,11 +1052,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } /* Nuke the existing STE_0 value, as we're going to rewrite it */ - val = ste->valid ? STRTAB_STE_0_V : 0; + val = STRTAB_STE_0_V; + + /* Bypass/fault */ + if (!ste->assigned || !(ste->s1_cfg || ste->s2_cfg)) { + if (!ste->assigned && disable_bypass) + val |= STRTAB_STE_0_CFG_ABORT; + else + val |= STRTAB_STE_0_CFG_BYPASS; - if (ste->bypass) { - val |= disable_bypass ? STRTAB_STE_0_CFG_ABORT - : STRTAB_STE_0_CFG_BYPASS; dst[0] = cpu_to_le64(val); dst[1] = cpu_to_le64(STRTAB_STE_1_SHCFG_INCOMING << STRTAB_STE_1_SHCFG_SHIFT); @@ -,10 +1121,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, static void arm_smmu_init_bypass_stes(u64 *strtab, unsigned int nent) { unsigned int i; - struct arm_smmu_strtab_ent ste = { - .valid = true, - .bypass = true, - }; + struct arm_smmu_strtab_ent ste = { .assigned = false }; for (i = 0; i < nent; ++i) { arm_smmu_write_strtab_ent(NULL, -1, strtab, &ste); @@ -1378,7 +1385,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) { struct arm_smmu_domain *smmu_domain; - if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) + if (type != IOMMU_DOMAIN_UNMANAGED && + type != IOMMU_DOMAIN_DMA && + type != IOMMU_DOMAIN_IDENTITY) return NULL; /* @@ -1509,6 +1518,11 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; + if (domain->type == IOMMU_DOMAIN_IDENTITY) { + smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS; + return 0; + } + /* Restrict the stage to what we can actually support */ if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
Re: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error
On 2017-02-01 13:52, Lorenzo Pieralisi wrote: I debugged the issue and Nate's fix is correct, the fact that you can't it hit it with mainline is just a matter of timing because it has to do with the CTX pointer value (we OR it with the existing value), so it may work or not depending on how the cdptr memory allocation pattern turns out to be (which explains why Nate and I can hit it with simple PCI device remove/add execution too). So it is neither an ACPI nor an IOMMU probe deferral issue per-se, fix is already queued, so it is all good. What about USB stalls ? Our fault. The USB controller was getting 48-bit IOVAs, but the bus it sits on only supports 44-bits so the controller's DMA transactions ended up targeting addresses that were never mapped. It started working once I applied the iort/dma-mapping patches I sent out earlier this week that use the iort memory_address_limit field to check if a dma_mask is sane. Sorry for the false alarm. Nate Thanks ! Lorenzo -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error
On 2017-01-30 09:38, Will Deacon wrote: On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote: On 1/30/2017 9:23 AM, Nate Watterson wrote: > On 2017-01-30 08:59, Sinan Kaya wrote: >> On 1/30/2017 7:22 AM, Robin Murphy wrote: >>> On 29/01/17 17:53, Sinan Kaya wrote: >>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote: >>>>> [+hanjun, tomasz, sinan] >>>>> >>>>> It is quite a key patchset, I would be glad if they can test on their >>>>> respective platforms with IORT. >>>>> >>>> >>>> Tested on top of 4.10-rc5. >>>> >>>> 1.Platform Hidma device passed dmatest >>>> 2.Seeing some USB stalls on a platform USB device. >>>> 3.PCIe NVME drive probed and worked fine with MSI interrupts after boot. >>>> 4. NVMe driver didn't probe following a hotplug insertion and received an >>>> SMMU error event during the insertion. >>> >>> What was the SMMU error - a translation/permission fault (implying the >>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell >>> the SMMU about the device at all)? >>> >> >> root@ubuntu:/sys/bus/pci/slots/4# echo 0 > power >> >> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0 >> [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down >> [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event >> ignored; already powering off >> >> root@ubuntu:/sys/bus/pci/slots/4# >> >> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8 >> [ 254.820599] nvme nvme0: pci function 0003:01:00.0 >> [ 254.820621] nvme 0003:01:00.0: enabling device ( -> 0002) >> [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received: >> [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x010a >> [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x >> [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x >> [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x > Looks like C_BAD_CD. Can you please try with: > iommu/arm-smmu-v3: Clear prior settings when updating STEs This resolved the issue. Can we pull Nate's patch to 4.10 so that I don't see this issue again. I already sent the pull request to Joerg for 4.11. Do you see this problem without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need to send the patch to stable after -rc1. Using vanilla mainline, I see it most commonly when directly assigning a device to a guest machine. I think I've also seen it after removing then re-adding a PCI device. Basically anytime an STE's CTX pointer is changed from a non-NULL value and STE[CFG] indicates translation will be performed. Nate Will -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error
On 2017-01-30 08:59, Sinan Kaya wrote: On 1/30/2017 7:22 AM, Robin Murphy wrote: On 29/01/17 17:53, Sinan Kaya wrote: On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote: [+hanjun, tomasz, sinan] It is quite a key patchset, I would be glad if they can test on their respective platforms with IORT. Tested on top of 4.10-rc5. 1. Platform Hidma device passed dmatest 2. Seeing some USB stalls on a platform USB device. 3. PCIe NVME drive probed and worked fine with MSI interrupts after boot. 4. NVMe driver didn't probe following a hotplug insertion and received an SMMU error event during the insertion. What was the SMMU error - a translation/permission fault (implying the wrong DMA ops) or a bad STE fault (implying we totally failed to tell the SMMU about the device at all)? root@ubuntu:/sys/bus/pci/slots/4# echo 0 > power [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0 [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event ignored; already powering off root@ubuntu:/sys/bus/pci/slots/4# [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8 [ 254.820599] nvme nvme0: pci function 0003:01:00.0 [ 254.820621] nvme 0003:01:00.0: enabling device ( -> 0002) [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received: [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x010a [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x Looks like C_BAD_CD. Can you please try with: iommu/arm-smmu-v3: Clear prior settings when updating STEs root@ubuntu:/sys/bus/pci/slots/4# root@ubuntu:/sys/bus/pci/slots/4#ls /dev/nvme* /dev/nvme0 I should have seen /dev/nvme0n1 partition here. Robin. /sys/bus/pci/slots/4 # /sys/bus/pci/slots/4 # dmesg | grep nvme [ 14.041357] nvme nvme0: pci function 0003:01:00.0 [ 198.399521] nvme nvme0: pci function 0003:01:00.0 [__198.416232]_nvme_0003:01:00.0:_enabling_device_(_->_0002) [ 264.402216] nvme nvme0: I/O 228 QID 0 timeout, disable controller [ 264.402313] nvme nvme0: Identify Controller failed (-4) [ 264.421270] nvme nvme0: Removing after probe failure status: -5 /sys/bus/pci/slots/4 # -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: limit use of 2-level stream tables
In the current arm-smmu-v3 driver, all smmus that support 2-level stream tables are being forced to use them. This is suboptimal for smmus that support fewer stream id bits than would fill in a single second level table. This patch limits the use of 2-level tables to smmus that both support the feature and whose first level table can possibly contain more than a single entry. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d6ec44..7d1a7e5 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1983,17 +1983,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) u32 size, l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - /* -* If we can resolve everything with a single L2 table, then we -* just need a single L1 descriptor. Otherwise, calculate the L1 -* size, capped to the SIDSIZE. -*/ - if (smmu->sid_bits < STRTAB_SPLIT) { - size = 0; - } else { - size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); - size = min(size, smmu->sid_bits - STRTAB_SPLIT); - } + /* Calculate the L1 size, capped to the SIDSIZE. */ + size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); + size = min(size, smmu->sid_bits - STRTAB_SPLIT); cfg->num_l1_ents = 1 << size; size += STRTAB_SPLIT; @@ -2504,6 +2496,13 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) smmu->ssid_bits = reg >> IDR1_SSID_SHIFT & IDR1_SSID_MASK; smmu->sid_bits = reg >> IDR1_SID_SHIFT & IDR1_SID_MASK; + /* +* If the SMMU supports fewer bits than would fill a single L2 stream +* table, use a linear table instead. +*/ + if (smmu->sid_bits <= STRTAB_SPLIT) + smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB; + /* IDR5 */ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: Clear prior settings when updating STEs
To prevent corruption of the stage-1 context pointer field when updating STEs, rebuild the entire containing dword instead of clearing individual fields. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d6ec44..94f305d 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1042,13 +1042,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, } } - /* Nuke the existing Config, as we're going to rewrite it */ - val &= ~(STRTAB_STE_0_CFG_MASK << STRTAB_STE_0_CFG_SHIFT); - - if (ste->valid) - val |= STRTAB_STE_0_V; - else - val &= ~STRTAB_STE_0_V; + /* Nuke the existing STE_0 value, as we're going to rewrite it */ + val = ste->valid ? STRTAB_STE_0_V : 0; if (ste->bypass) { val |= disable_bypass ? STRTAB_STE_0_CFG_ABORT @@ -1083,7 +1078,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK << STRTAB_STE_0_S1CTXPTR_SHIFT) | STRTAB_STE_0_CFG_S1_TRANS; - } if (ste->s2_cfg) { -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: prevent corruption of ste stage-1 context ptr
To ensure that the stage-1 context ptr for an ste points to the intended context descriptor, this patch adds code to clear away the stale context ptr value prior to or'ing in the new one. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d6ec44..093f9f1 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1080,6 +1080,8 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, if (smmu->features & ARM_SMMU_FEAT_STALLS) dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); + val &= ~(STRTAB_STE_0_S1CTXPTR_MASK << +STRTAB_STE_0_S1CTXPTR_SHIFT); val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK << STRTAB_STE_0_S1CTXPTR_SHIFT) | STRTAB_STE_0_CFG_S1_TRANS; -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: avoid over allocating for l2 stream tables
Currently, all l2 stream tables are being allocated with space for (1< --- drivers/iommu/arm-smmu-v3.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 4d6ec44..5dca671 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1129,6 +1129,7 @@ static void arm_smmu_init_bypass_stes(u64 *strtab, unsigned int nent) static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) { + u8 span; size_t size; void *strtab; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; @@ -1137,10 +1138,11 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) if (desc->l2ptr) return 0; - size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); + span = (smmu->sid_bits < STRTAB_SPLIT) ? smmu->sid_bits : STRTAB_SPLIT; + size = 1 << (span + ilog2(STRTAB_STE_DWORDS) + 3); strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS]; - desc->span = STRTAB_SPLIT + 1; + desc->span = span + 1; desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma, GFP_KERNEL | __GFP_ZERO); if (!desc->l2ptr) { @@ -1150,7 +1152,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return -ENOMEM; } - arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT); + arm_smmu_init_bypass_stes(desc->l2ptr, 1 << span); arm_smmu_write_strtab_l1_desc(strtab, desc); return 0; } @@ -2001,6 +2003,8 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) dev_warn(smmu->dev, "2-level strtab only covers %u/%u bits of SID\n", size, smmu->sid_bits); + else if (smmu->sid_bits < size) + size = smmu->sid_bits; l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3); strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma, -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 14/14] drivers: acpi: iort: introduce iort_iommu_configure
On 2016-09-09 10:23, Lorenzo Pieralisi wrote: DT based systems have a generic kernel API to configure IOMMUs for devices (ie of_iommu_configure()). On ARM based ACPI systems, the of_iommu_configure() equivalent can be implemented atop ACPI IORT kernel API, with the corresponding functions to map device identifiers to IOMMUs and retrieve the corresponding IOMMU operations necessary for DMA operations set-up. By relying on the iommu_fwspec generic kernel infrastructure, implement the IORT based IOMMU configuration for ARM ACPI systems and hook it up in the ACPI kernel layer that implements DMA configuration for a device. Signed-off-by: Lorenzo Pieralisi Cc: Hanjun Guo Cc: Tomasz Nowicki Cc: "Rafael J. Wysocki" --- drivers/acpi/arm64/iort.c | 96 +++ drivers/acpi/scan.c | 7 +++- include/linux/acpi_iort.h | 6 +++ 3 files changed, 108 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 7c68eb4..55a4ae9 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -19,6 +19,7 @@ #define pr_fmt(fmt)"ACPI: IORT: " fmt #include +#include #include #include #include @@ -27,6 +28,8 @@ #define IORT_TYPE_MASK(type) (1 << (type)) #define IORT_MSI_TYPE (1 << ACPI_IORT_NODE_ITS_GROUP) +#define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) | \ + (1 << ACPI_IORT_NODE_SMMU_V3)) struct iort_its_msi_chip { struct list_headlist; @@ -467,6 +470,99 @@ struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id) return irq_find_matching_fwnode(handle, DOMAIN_BUS_PCI_MSI); } +static int __get_pci_rid(struct pci_dev *pdev, u16 alias, void *data) +{ + u32 *rid = data; + + *rid = alias; + return 0; +} + +static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, + struct fwnode_handle *fwnode) +{ + int ret = iommu_fwspec_init(dev, fwnode); + + if (!ret) + ret = iommu_fwspec_add_ids(dev, &streamid, 1); + + return ret; +} + +static const struct iommu_ops *iort_iommu_xlate(struct device *dev, + struct acpi_iort_node *node, + u32 streamid) +{ + struct fwnode_handle *iort_fwnode = NULL; + int ret; + + if (node) { + iort_fwnode = iort_get_fwnode(node); + if (!iort_fwnode) + return NULL; + + ret = arm_smmu_iort_xlate(dev, streamid, + iort_fwnode); + if (!ret) + return fwspec_iommu_get_ops(iort_fwnode); + } + + return NULL; +} + +/** + * iort_iommu_configure - Set-up IOMMU configuration for a device. + * + * @dev: device to configure + * + * Returns: iommu_ops pointer on configuration success + * NULL on configuration failure + */ +const struct iommu_ops *iort_iommu_configure(struct device *dev) +{ + struct acpi_iort_node *node, *parent; + const struct iommu_ops *ops = NULL; + u32 streamid = 0; + + if (dev_is_pci(dev)) { + struct pci_bus *bus = to_pci_dev(dev)->bus; + u32 rid; + + pci_for_each_dma_alias(to_pci_dev(dev), __get_pci_rid, + &rid); + + node = iort_scan_node(ACPI_IORT_NODE_PCI_ROOT_COMPLEX, + iort_match_node_callback, &bus->dev); + if (!node) + return NULL; + + parent = iort_node_map_rid(node, rid, &streamid, + IORT_IOMMU_TYPE); + + ops = iort_iommu_xlate(dev, parent, streamid); + + } else { + int i = 0; + + node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT, + iort_match_node_callback, dev); + if (!node) + return NULL; + Nothing wrong with your code here, but wanted to warn you that there appears to be a bug in iort_match_node_callback() for NAMED_COMPONENTS. iort_match_node_callback() { acpi_status status = AE_NOT_FOUND; ... case ACPI_IORT_NODE_NAMED_COMPONENT: { ... status = acpi_get_name(adev->handle, ACPI_FULL_PATHNAME, &buf); if (ACPI_FAILURE(status)) { dev_warn(dev, "Can't get device full path name\n"); break; } ncomp = (struct acpi_iort_named_component *)node->node_data; if (!strcmp(ncomp->device_name, buf.pointer)) status = AE_OK; acpi_os_free(buf.pointer); break; } ... return status; } Notice how if strcmp() fails, status remains set to the
[PATCH] iommu/iova: validate iova_domain input to put_iova_domain
Passing a NULL or uninitialized iova_domain into put_iova_domain will currently crash the kernel when the unconfigured iova_domain data members are accessed. To prevent this from occurring, this patch adds a check to make sure that the domain is non-NULL and that the domain granule is non-zero. The granule can be used to check if the domain was properly initialized because calling init_iova_domain with a granule of zero would have already triggered a BUG statement crashing the kernel. Signed-off-by: Nate Watterson --- drivers/iommu/iova.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index e23001b..3511a1c 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -459,6 +459,10 @@ void put_iova_domain(struct iova_domain *iovad) struct rb_node *node; unsigned long flags; + /* Only teardown properly initialized domains */ + if (!iovad || !iovad->granule) + return; + free_iova_rcaches(iovad); spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); node = rb_first(&iovad->rbroot); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/dma-iommu: respect established iova region limits
In the current dma-iommu implementation, the upper limit used when allocating iovas is based on the calling device's dma_mask without considering the potentially more restrictive iova limits established in iommu_dma_init_domain. To ensure that iovas are allocated within the expected iova region, this patch adds logic in __alloc_iova to clip input dma_limit values that are out of bounds. Signed-off-by: Nate Watterson --- drivers/iommu/dma-iommu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ea5a9eb..2066066 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -157,11 +157,14 @@ static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size, unsigned long shift = iova_shift(iovad); unsigned long length = iova_align(iovad, size) >> shift; + /* Respect the upper limit established in iommu_dma_init_domain */ + dma_limit = min_t(dma_addr_t, dma_limit >> shift, iovad->dma_32bit_pfn); + /* * Enforce size-alignment to be safe - there could perhaps be an * attribute to control this per-device, or at least per-domain... */ - return alloc_iova(iovad, length, dma_limit >> shift, true); + return alloc_iova(iovad, length, dma_limit, true); } /* The IOVA allocator knows what we mapped, so just unmap whatever that was */ -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables
In the current arm-smmu-v3 driver, all smmus that support 2-level stream tables are being forced to use them. This is suboptimal for smmus that support fewer stream id bits than would fill in a single second level table. This patch limits the use of 2-level tables to smmus that both support the feature and whose first level table can possibly contain more than a single entry. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 5f6b3bc..f27b8dc 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2033,17 +2033,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) u32 size, l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - /* -* If we can resolve everything with a single L2 table, then we -* just need a single L1 descriptor. Otherwise, calculate the L1 -* size, capped to the SIDSIZE. -*/ - if (smmu->sid_bits < STRTAB_SPLIT) { - size = 0; - } else { - size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); - size = min(size, smmu->sid_bits - STRTAB_SPLIT); - } + /* Calculate the L1 size, capped to the SIDSIZE. */ + size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); + size = min(size, smmu->sid_bits - STRTAB_SPLIT); cfg->num_l1_ents = 1 << size; size += STRTAB_SPLIT; @@ -2531,6 +2523,13 @@ static int arm_smmu_device_probe(struct arm_smmu_device *smmu) smmu->ssid_bits = reg >> IDR1_SSID_SHIFT & IDR1_SSID_MASK; smmu->sid_bits = reg >> IDR1_SID_SHIFT & IDR1_SID_MASK; + /* +* If the SMMU supports fewer bits than would fill a single L2 stream +* table, use a linear table instead. +*/ + if (smmu->sid_bits <= STRTAB_SPLIT) + smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB; + /* IDR5 */ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5); -- Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: limit use of 2-level stream tables
In the current arm-smmu-v3 driver, all smmus that support 2-level stream tables are being forced to use them. This is suboptimal for smmus that support fewer stream id bits than would fill in a single second level table. This patch limits the use of 2-level tables to smmus that both support the feature and whose first level table can possibly contain more than a single entry. Signed-off-by: Nate Watterson --- drivers/iommu/arm-smmu-v3.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 5f6b3bc..742254c 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2531,6 +2531,17 @@ static int arm_smmu_device_probe(struct arm_smmu_device *smmu) smmu->ssid_bits = reg >> IDR1_SSID_SHIFT & IDR1_SSID_MASK; smmu->sid_bits = reg >> IDR1_SID_SHIFT & IDR1_SID_MASK; + /* +* If the SMMU supports fewer bits than would fill a single L2 stream +* table, use a linear table instead. +*/ + if (smmu->sid_bits <= STRTAB_SPLIT && + smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { + smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB; + dev_info(smmu->dev, "SIDSIZE (%d) <= STRTAB_SPLIT (%d) : disabling 2-level stream tables\n", +smmu->sid_bits, STRTAB_SPLIT); + } + /* IDR5 */ reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5); -- Qualcomm Technologies, Inc. on behalf of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu