RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)
Tested-by: Krishna Reddy Validated nested translations with NVMe PCI device assigned to Guest VM. Tested with both v12 and v13 of Jean-Philippe's patches as base. > This is based on Jean-Philippe's > [PATCH v12 00/10] iommu: I/O page faults for SMMUv3 > https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/ With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be resolved manually. -KR
RE: [PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part)
Tested-by: Krishna Reddy Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM and is functional. This patch series resolved the mismatch(seen with v11 patches) for VFIO_IOMMU_SET_PASID_TABLE and VFIO_IOMMU_CACHE_INVALIDATE Ioctls between linux and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" (v5.2.0-2stage-rfcv8). -KR
RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> Hi Krishna, > On 3/15/21 7:04 PM, Krishna Reddy wrote: > > Tested-by: Krishna Reddy > > > >> 1) pass the guest stage 1 configuration > > > > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM > along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and > QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from > v5.2.0-2stage-rfcv8. > > NVMe PCIe device is functional with 2-stage translations and no issues > observed. > Thank you very much for your testing efforts. For your info, there are more > recent kernel series: > [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH > v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23) > > working along with QEMU RFC > [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25) > > If you have cycles to test with those, this would be higly appreciated. Thanks Eric for the latest patches. Will validate and update. Feel free to reach out me for validating future patch sets as necessary. -KR
RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (VFIO part)
Tested-by: Krishna Reddy > 1) pass the guest stage 1 configuration > 3) invalidate stage 1 related caches Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v13 SMMUv3 Nested Stage Setup (IOMMU part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. NVMe PCIe device is functional with 2-stage translations and no issues observed. -KR
RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
Tested-by: Krishna Reddy > 1) pass the guest stage 1 configuration Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. NVMe PCIe device is functional with 2-stage translations and no issues observed. -KR
[PATCH v11 1/5] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same from vendor specific implementations. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 19f906de6420..cdd15ead9bc4 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be61..c7d0122a7c6c 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2
[PATCH v11 5/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow vendor specific implementations override default fault interrupt handlers. Update NVIDIA implementation to override the default global/context fault interrupt handlers and handle interrupts across the two ARM MMU-500s that are programmed identically. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 99 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 2f55e5793d34..31368057e9be 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + + return ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int idx; + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain; + + smmu_domain = container_of(domain, struct arm_smmu_domain, domain); + smmu = smmu_domain->smmu; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + } + + return ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_sm
[PATCH v11 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU. Reviewed-by: Jon Hunter Reviewed-by: Rob Herring Reviewed-by: Robin Murphy Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml | 25 ++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 93fb9fe068b9..503160a7b9a0 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -44,6 +44,11 @@ properties: items: - const: marvell,ap806-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that program two ARM MMU-500s identically +items: + - enum: + - nvidia,tegra194-smmu + - const: nvidia,smmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -61,7 +66,8 @@ properties: - cavium,smmu-v2 reg: -maxItems: 1 +minItems: 1 +maxItems: 2 '#global-interrupts': description: The number of global interrupts exposed by the device. @@ -144,6 +150,23 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 2 +else: + properties: +reg: + maxItems: 1 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2
[PATCH v11 0/5] NVIDIA ARM SMMU Implementation
Changes in v11: Addressed Rob comment on DT binding patch to set min/maxItems of reg property in else part. Rebased on top of https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates. Changes in v10: Perform SMMU base ioremap before calling implementation init. Check for Global faults across both ARM MMU-500s during global interrupt. Check for context faults across all contexts of both ARM MMU-500s during context fault interrupt. Add new DT binding nvidia,smmu-500 for NVIDIA implementation. https://lkml.org/lkml/2020/7/8/57 v9 - https://lkml.org/lkml/2020/6/30/1282 v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: ioremap smmu mmio region before implementation init iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 25 +- MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 278 ++ drivers/iommu/arm-smmu.c | 29 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 334 insertions(+), 11 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 49fbb25030265c660de732513f18275d88ff99d3 -- 2.26.2
[PATCH v11 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of the ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. This implementation supports programming the two ARM MMU-500s that must be programmed identically. The third ARM MMU-500 instance is supported by standard arm-smmu.c driver itself. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 179 drivers/iommu/arm-smmu.c| 1 + drivers/iommu/arm-smmu.h| 1 + 6 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 496fd4eafb68..ee2c0ba13a0f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16810,8 +16810,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb..2b8203db73ec 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c87d825f651e..f4ff124a1967 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -213,6 +213,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500") || of_device_is_compatible(np, "qcom,sm8150-smmu-500") || diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..2f55e5793d34 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together and must be programmed identically for + * interleaved IOVA accesses across them and translates accesses from + * non-isochronous HW devices. + * Third one is used for translating accesses from isochronous HW devices. + * This implementation supports programming of the two instances that must + * be programmed identically. + * The third instance usage is through standard arm-smmu driver itself and + * is out of scope of this implementation. + */ +#define NUM_SMMU_INSTANCES 2 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + void __iomem*bases[NUM_SMMU_INSTANCES]; +}; + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu; + + nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu); + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < NUM_SMMU_INSTANCES; i++) { + void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; + + writel_relaxed(val, reg); + } +} + +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readq_relaxe
[PATCH v11 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init
ioremap smmu mmio region before calling into implementation init. This is necessary to allow mapped address available during vendor specific implementation init. Reviewed-by: Jon Hunter Reviewed-by: Nicolin Chen Reviewed-by: Pritesh Raithatha Reviewed-by: Robin Murphy Reviewed-by: Thierry Reding Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index cdd15ead9bc4..de520115d3df 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2123,10 +2123,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; - smmu = arm_smmu_impl_init(smmu); - if (IS_ERR(smmu)) - return PTR_ERR(smmu); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ioaddr = res->start; smmu->base = devm_ioremap_resource(dev, res); @@ -2138,6 +2134,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) */ smmu->numpage = resource_size(res); + smmu = arm_smmu_impl_init(smmu); + if (IS_ERR(smmu)) + return PTR_ERR(smmu); + num_irqs = 0; while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { num_irqs++; -- 2.26.2
RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote: >> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote: > >> Changes in v10: >> > Perform SMMU base ioremap before calling implementation init. >> > Check for Global faults across both ARM MMU-500s during global interrupt. >> > Check for context faults across all contexts of both ARM MMU-500s during >> > context fault interrupt. > >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation. >> >> Please repost based on my SMMU queue, as this doesn't currently apply. >> >> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h= >> for-joerg/arm-smmu/updates >Any update on this, please? It would be a shame to miss 5.9 now that the >patches look alright. Thanks for pinging. I have been out of the office this week. Just started working on reposting patches. Will
RE: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Thanks Rob. One question on setting "minItems: ". Please see below. >> +allOf: >> + - if: >> + properties: >> +compatible: >> + contains: >> +enum: >> + - nvidia,tegra194-smmu >> +then: >> + properties: >> +reg: >> + minItems: 2 >> + maxItems: 2 >This doesn't work. The main part of the schema already said there's only >1 reg region. This part is ANDed with that, not an override. You need to add >an else clause with 'maxItems: 1' and change the base schema to >{minItems: 1, maxItems: 2}. As the earlier version of base schema doesn't have "minItems: " set, should it be set to 0 for backward compatibility? Or can it just be omitted setting in base schema as before? "else" part to set "maxItems: 1" and setting "maxItems: 2" in base schema is clear to me. -KR
[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow vendor specific implementations override default fault interrupt handlers. Update NVIDIA implementation to override the default global/context fault interrupt handlers and handle interrupts across the two ARM MMU-500s that are programmed identically. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 99 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 117 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 2f55e5793d34..31368057e9be 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + + return ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int idx; + unsigned int inst; + irqreturn_t ret = IRQ_NONE; + struct arm_smmu_device *smmu; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain; + + smmu_domain = container_of(domain, struct arm_smmu_domain, domain); + smmu = smmu_domain->smmu; + + for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) { + irqreturn_t irq_ret; + + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + if (irq_ret == IRQ_HANDLED) + ret = IRQ_HANDLED; + } + } + + return ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu
[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same from vendor specific implementations. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705..d2054178df35 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be61..c7d0122a7c6c 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2
[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of the ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. This implementation supports programming the two ARM MMU-500s that must be programmed identically. The third ARM MMU-500 instance is supported by standard arm-smmu.c driver itself. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 179 drivers/iommu/arm-smmu.c| 1 + drivers/iommu/arm-smmu.h| 1 + 6 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index c23352059a6b..534cedaf8e55 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb..2b8203db73ec 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b70..f15571d05474 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..2f55e5793d34 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,179 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together and must be programmed identically for + * interleaved IOVA accesses across them and translates accesses from + * non-isochronous HW devices. + * Third one is used for translating accesses from isochronous HW devices. + * This implementation supports programming of the two instances that must + * be programmed identically. + * The third instance usage is through standard arm-smmu driver itself and + * is out of scope of this implementation. + */ +#define NUM_SMMU_INSTANCES 2 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + void __iomem*bases[NUM_SMMU_INSTANCES]; +}; + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu; + + nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu); + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < NUM_SMMU_INSTANCES; i++) { + void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; + + writel_relaxed(val, reg); + } +} + +static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readq_relaxed(reg); +} + +static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int i; + +
[PATCH v10 0/5] NVIDIA ARM SMMU Implementation
Changes in v10: Perform SMMU base ioremap before calling implementation init. Check for Global faults across both ARM MMU-500s during global interrupt. Check for context faults across all contexts of both ARM MMU-500s during context fault interrupt. Add new DT binding nvidia,smmu-500 for NVIDIA implementation. v9 - https://lkml.org/lkml/2020/6/30/1282 v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: ioremap smmu mmio region before implementation init iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 18 ++ MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 278 ++ drivers/iommu/arm-smmu.c | 29 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 328 insertions(+), 10 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164 -- 2.26.2
[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init
ioremap smmu mmio region before calling into implementation init. This is necessary to allow mapped address available during vendor specific implementation init. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d2054178df35..e03e873d3bca 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) if (err) return err; - smmu = arm_smmu_impl_init(smmu); - if (IS_ERR(smmu)) - return PTR_ERR(smmu); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ioaddr = res->start; smmu->base = devm_ioremap_resource(dev, res); @@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device *pdev) */ smmu->numpage = resource_size(res); + smmu = arm_smmu_impl_init(smmu); + if (IS_ERR(smmu)) + return PTR_ERR(smmu); + num_irqs = 0; while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) { num_irqs++; -- 2.26.2
[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU. Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423..ac1f526c3424 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that program two ARM MMU-500s identically +items: + - enum: + - nvidia,tegra194-smmu + - const: nvidia,smmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -138,6 +143,19 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 2 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
On 01/07/2020 20:00, Krishna Reddy wrote: >>>>>> +items: >>>>>> + - enum: >>>>>> + - nvdia,tegra194-smmu >>>>>> + - const: arm,mmu-500 >>>> >>>>> Is the fallback compatible appropriate here? If software treats this as a >>>>> standard MMU-500 it will only program the first instance (because the >>>>> second isn't presented as a separate MMU-500) - is there any way that >>>>> isn't going to blow up? >>>> >>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, >>>> implementation override ensure that both instances are programmed. Isn't >>>> it? I am not sure I follow your comment fully. >> >>> The problem is, if for some reason someone had a Tegra194, but only set the >>> compatible string to 'arm,mmu-500' it would assume that it was a normal >>> arm,mmu-500 and only one instance would be programmed. We always want at >>> least 2 of the 3 instances >>programmed and so we should only match >>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the >>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to >>> _mmu500. >> >> In that case, new binding "nvidia,smmu-v2" can be added with data set to >> _mmu500 and enumeration would have nvidia,tegra194-smmu and another >> variant for next generation SoC in future. >I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be >something different. I see others have a smmu-v2 but I am not sure if that is >legacy. We have an smmu-500 and so that would seem more appropriate. I tried to use the binding synonymous to other vendors. V2 is the architecture version. MMU-500 is the actual implementation from ARM based on V2 arch. As we just use the MMU-500 IP as it is, It can be named as nvidia,smmu-500 or similar as well. Others probably having their own implementation based on V2 arch. KR -- nvpublic
RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>> With shared irq line, the context fault identification is not optimal >> already. Reading all the context banks all the time can be additional mmio >> read overhead. But, it may not hurt the real use cases as these happen only >> when there are bugs. >Right, I did ponder the idea of a whole programmatic "request_context_irq" >hook that would allow registering the handler for both interrupts with the >appropriate context bank and instance data, but since all interrupts are >currently unexpected it seems somewhat hard to justify the extra complexity. >Obviously we can revisit this in future if you want to start actually doing >something with faults like the qcom GPU folks do. Thanks, I would just avoid making changes to interrupt handlers till it is really necessary in future. The current code would just be simple and functional with more interrupts when there are multiple faults. -KR
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>Yeah, I realised later last night that this probably originated from forking >the whole driver downstream. But even then you could have treated the other >one as a separate nsmmu with a single instance ;) True, But the initial nvidia implementation had limitation that it can only handle one instance of usage. With your implementation hooks design, it should be able to handle multiple instances of usage now. >Since it does add a bit of confusion to the code and comments, let's just keep >things simple. I do like Jon's suggestion of actually enforcing that the >number of "reg" regions exactly matches the number expected for the given >compatible - I guess for now that means just hard-coding 2 and hoping the >hardware folks don't cook up any more of these... For T194, reg can just be forced to 2. No future plan to use more than two MMU-500s together as of now. -KR
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
+items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 > > >>> Is the fallback compatible appropriate here? If software treats this as a >>> standard MMU-500 it will only program the first instance (because the >>> second isn't presented as a separate MMU-500) - is there any way that isn't >>> going to blow up? > > >> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, >> implementation override ensure that both instances are programmed. Isn't it? >> I am not sure I follow your comment fully. >The problem is, if for some reason someone had a Tegra194, but only set the >compatible string to 'arm,mmu-500' it would assume that it was a normal >arm,mmu-500 and only one instance would be programmed. We always want at least >2 of the 3 instances >programmed and so we should only match >'nvidia,tegra194-smmu'. In fact, I think that we also need to update the >arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to >_mmu500. In that case, new binding "nvidia,smmu-v2" can be added with data set to _mmu500 and enumeration would have nvidia,tegra194-smmu and another variant for next generation SoC in future. -KR -- nvpublic
RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { >>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); >>> +if (irq_ret == IRQ_HANDLED) >>> +return irq_ret; >> >> Any chance there could be more than one SMMU faulting by the time we >> service the interrupt? >It certainly seems plausible if the interconnect is automatically >load-balancing requests across the SMMU instances - say a driver bug caused a >buffer to be unmapped too early, there could be many in-flight accesses to >parts of that buffer that aren't all taking the same path and thus could now >fault in parallel. >[ And anyone inclined to nitpick global vs. context faults, s/unmap a >buffer/tear down a domain/ ;) ] >Either way I think it would be easier to reason about if we just handled these >like a typical shared interrupt and always checked all the instances. It would be optimal to check at the same time across all instances. >>> +for (idx = 0; idx < smmu->num_context_banks; idx++) { >>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, >>> + idx, >>> + inst); >>> + >>> +if (irq_ret == IRQ_HANDLED) >>> +return irq_ret; >> >> Any reason why we don't check all banks? >As above, we certainly shouldn't bail out without checking the bank for the >offending domain across all of its instances, and I guess the way this works >means that we would have to iterate all the banks to achieve that. With shared irq line, the context fault identification is not optimal already. Reading all the context banks all the time can be additional mmio read overhead. But, it may not hurt the real use cases as these happen only when there are bugs. -KR
RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
>> + - description: NVIDIA SoCs that use more than one "arm,mmu-500" > Hmm, there must be a better way to word that to express that it only applies > to the sets of SMMUs that must be programmed identically, and not any other > independent MMU-500s that might also happen to be in the same SoC. Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s identically". >> +items: >> + - enum: >> + - nvdia,tegra194-smmu >> + - const: arm,mmu-500 >Is the fallback compatible appropriate here? If software treats this as a >standard MMU-500 it will only program the first instance (because the second >isn't presented as a separate MMU-500) - is there any way that isn't going to >blow up? When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, implementation override ensure that both instances are programmed. Isn't it? I am not sure I follow your comment fully. -KR
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device >> +used for >> + * isochornous HW devices should be added as a separate ARM MMU-500 >> +device >> + * in DT and be programmed independently for efficient TLB invalidates. >I don't understand the "When" there - the driver has always supported multiple >independent SMMUs, and it's not something that could be configured out or >otherwise disabled. Plus I really don't see why you would ever want to force >unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it >would also waste precious context bank resources and might lead to weird >device grouping via false stream ID aliasing, with no obvious upside at all. Sorry, I missed this comment. During the initial patches, when the iommu_ops were different between, support multiple SMMU drivers at the same is not possible as one of them(that gets probed last) overwrites the platform bus ops. On revisiting the original issue, This problem is no longer relevant. At this point, It makes more sense to just get rid of 3rd instance programming in arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical programming. -KR
[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 98 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 116 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 5c874912e1c1a..d279788eab954 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, +struct arm_smmu_device *smmu, +int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line is shared between all contexts. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d2054
[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances. It uses two of ARM MMU-500s together to interleave IOVA accesses across them and must be programmed identically. The third SMMU instance is used as a regular ARM MMU-500 and it can either be programmed independently or identical to other two ARM MMU-500s. This implementation supports programming two or three ARM MMU-500s identically as per DT config. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 206 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 213 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..f15571d05474e 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; + if (of_device_is_compatible(np, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..5c874912e1c1a --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,206 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, +unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + struc
[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..662c46e16f07d 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvidia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 @@ -138,6 +143,19 @@ required: additionalProperties: false +allOf: + - if: + properties: +compatible: + contains: +enum: + - nvidia,tegra194-smmu +then: + properties: +reg: + minItems: 2 + maxItems: 3 + examples: - |+ /* SMMU with stream matching or stream indexing */ -- 2.26.2
[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros
Move TLB timeout and spin count macros to header file to allow using the same values from vendor specific implementations. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu.c | 3 --- drivers/iommu/arm-smmu.h | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..d2054178df357 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -52,9 +52,6 @@ */ #define QCOM_DUMMY_VAL -1 -#define TLB_LOOP_TIMEOUT 100 /* 1s! */ -#define TLB_SPIN_COUNT 10 - #define MSI_IOVA_BASE 0x800 #define MSI_IOVA_LENGTH0x10 diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index d172c024be618..c7d0122a7c6ca 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -236,6 +236,8 @@ enum arm_smmu_cbar_type { /* Maximum number of context banks per SMMU */ #define ARM_SMMU_MAX_CBS 128 +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 /* Shared driver definitions */ enum arm_smmu_arch_version { -- 2.26.2
[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation
Changes in v9: Move TLB Timeout and spin count macros to arm-smmu.h header to share with implementation. Set minItems and maxItems for reg property when compatible contains nvidia,tegra194-smmu. Update commit message for NVIDIA implementation patch. Fail single SMMU instance usage through NVIDIA implementation to limit the usage to two or three instances. Fix checkpatch warnings with --strict checking. v8 - https://lkml.org/lkml/2020/6/29/2385 v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (4): iommu/arm-smmu: move TLB timeout and spin count macros iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage dt-bindings: arm-smmu: add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 18 ++ MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 304 ++ drivers/iommu/arm-smmu.c | 20 +- drivers/iommu/arm-smmu.h | 6 + 7 files changed, 349 insertions(+), 6 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> The driver intend to support up to 3 instances. It doesn't really mandate >> that all three instances be present in same DT node. >> Each mmio aperture in "reg" property is an instance here. reg = >> , , ; The reg can have >> all three or less and driver just configures based on reg and it works fine. >So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have >up to 3 (for Tegra194). So the question is do we have a use-case where we only >use 2 and not 3? If not, then it still seems that we should require that all 3 >are present. It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso). Let me fail the one instance case as it can use regular arm smmu implementation and don't need nvidia implementation explicitly. >The other problem I see here is that currently the arm-smmu binding defines >the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will >get caught by the 'dt_binding_check' when we try to populate the binding. Thanks for pointing it out! Will update the binding doc. -KR -- nvpublic
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>OK, well I see what you are saying, but if we intended to support all 3 for >Tegra194, then we should ensure all 3 are initialised correctly. The driver intend to support up to 3 instances. It doesn't really mandate that all three instances be present in same DT node. Each mmio aperture in "reg" property is an instance here. reg = , , ; The reg can have all three or less and driver just configures based on reg and it works fine. >It would be better to query the number of SMMUs populated in device-tree and >then ensure that all are initialised correctly. Getting the IORESOURCE_MEM is the way to count the instances driver need to support. In a way, It is already querying through IORESOURCE_MEM here. -KR
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave >> IOVA accesses across them. >> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible >> string for Tegra194 SoC SMMU topology. >There is no description here of the 3rd SMMU that you mention below. >I think that we should describe the full picture here. This driver is primarily for dual SMMU config. So, It is avoided in the commit message. However, Implementation supports option to configure 3 instances identically with one SMMU DT node and is documented in the implementation. >> + >> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, >> + unsigned int inst, int page) >If you run checkpatch --strict on these you will get a lot of ... >CHECK: Alignment should match open parenthesis >#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46: >+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, >+ unsigned int inst, int page) >We should fix these. I will fix these if I need to push a new patch set. >> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, >> +int page, int offset, u32 val) { >> +unsigned int i; >> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); >> + >> +for (i = 0; i < nvidia_smmu->num_inst; i++) { >> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset; >Personally, I would declare 'reg' outside of the loop as I feel it will make >the code cleaner and easier to read. It was like that before and is updated to its current form to limit the scope of variables as per Thierry's comments in v6. We can just leave it as it is as there is no technical issue here. -KR -- nvpublic
RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device >> +*smmu) { >> +unsigned int i; >> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) { >> +struct resource *res; >> + >> +res = platform_get_resource(pdev, IORESOURCE_MEM, i); >> +if (!res) >> +break; >Currently this driver is only supported for Tegra194 which I understand has 3 >SMMUs. Therefore, I don't feel that we should fail silently here, I think it >is better to return an error if all 3 cannot be initialised. Initialization of all the three SMMU instances is not necessary here. The driver can work with all the possible number of instances 1, 2 and 3 based on the DT config though it doesn't make much sense to use it with 1 instance. There is no silent failure here from driver point of view. If there is misconfig in DT, SMMU faults would catch issues. >> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res); >> +if (IS_ERR(nvidia_smmu->bases[i])) >> +return ERR_CAST(nvidia_smmu->bases[i]); >You want to use PTR_ERR() here. PTR_ERR() returns long integer. This function returns a pointer. ERR_CAST is the right one to use here. -- nvpublic
[PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 98 + drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 116 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 1124f0ac1823a..c9423b4199c65 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + if (!gfsr) + return IRQ_NONE; + + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..3bb0aba15a356 100644 --- a/drivers/iommu/arm-s
[PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..5b2586ac715ed 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2
[PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation
Changes in v8: Fixed incorrect CB_FSR read issue during context bank fault. Rebased and validated patches on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v7 - https://lkml.org/lkml/2020/6/28/347 v6 - https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (3): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 294 ++ drivers/iommu/arm-smmu.c | 17 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 324 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2
[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 SoC SMMU topology. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 196 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..70f7318017617 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; + if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..1124f0ac1823a --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,196 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, + unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (i = 0; i < nvidia_smmu->num_ins
RE: [PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, >> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, >> + smmu->numpage + idx); [...] >> + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); [...] >> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); >It reads FSR of the default inst (1st), but clears the FSR of corresponding >inst -- just want to make sure that this is okay and intended. FSR should be read from corresponding inst. Not from instance 0. Let me post updated patch. -KR
RE: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>> + if (!nvidia_smmu->bases[0]) >> + nvidia_smmu->bases[0] = smmu->base; >> + >> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); } >Not critical -- just a nit: why not put the bases[0] in init()? smmu->base is not available during nvidia_smmu_impl_init() call. It is set afterwards in arm-smmu.c. It can't be avoided without changing the devm_ioremap() and impl_init() call order in arm-smmu.c. -KR
[PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 SoC SMMU topology. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 195 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 202 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 7b5ffd646c6b9..64c37dbdd4426 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported +F: drivers/iommu/arm-smmu-nvidia.c F: drivers/iommu/tegra* TEGRA KBC DRIVER diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 342190196dfb0..2b8203db73ec3 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..70f7318017617 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_property_read_bool(np, "calxeda,smmu-secure-config-access")) smmu->impl = _impl; + if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu")) + return nvidia_smmu_impl_init(smmu); + if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") || of_device_is_compatible(np, "qcom,sc7180-smmu-500")) return qcom_smmu_impl_init(smmu); diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..b73c483fa3376 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0-only +// NVIDIA ARM SMMU v2 implementation quirks +// Copyright (C) 2019-2020 NVIDIA CORPORATION. All rights reserved. + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* + * Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for interleaved IOVA accesses and + * used by non-isochronous HW devices for SMMU translations. + * Third one is used for SMMU translations from isochronous HW devices. + * It is possible to use this implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant TLB + * invalidations as all three never need to be TLB invalidated for a HW device. + * + * When Linux kernel supports multiple SMMU devices, the SMMU device used for + * isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient TLB invalidates. + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu) +{ + return container_of(smmu, struct nvidia_smmu, smmu); +} + +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu, + unsigned int inst, int page) +{ + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + if (!nvidia_smmu->bases[0]) + nvidia_smmu->bases[0] = smmu->base; + + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); +} + +static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset; + + return readl_relaxed(reg); +} + +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (i = 0; i < nvidia_smmu->num_ins
[PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 101 +++- drivers/iommu/arm-smmu.c| 17 +- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 118 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index b73c483fa3376..7276bb203ae79 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nvidia_smmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0); + + gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu); + + for (inst = 0; inst < nvidia_smmu->num_inst; inst++) { + irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nvidia_smmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1); + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + idx); + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* +* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nvidia_smmu_context_fault_bank(irq, smmu, +idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nvidia_smmu_read_reg, .write_reg = nvidia_smmu_write_reg, @@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nvidia_smmu_write_reg64, .reset = nvidia_smmu_reset, .tlb_sync = nvidia_smmu_tlb_sync, + .global_fault = nvidia_smmu_global_fault, + .context_fault = nvidia_smmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) @@ -185,7 +283,8 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) } nvidia_smmu->smmu.i
[PATCH v7 0/3] Nvidia Arm SMMUv2 Implementation
Changes in v7: Incorporated the review feedback from Nicolin Chen, Robin Murphy and Thierry Reding. Rebased and validated patches on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v6- https://lkml.org/lkml/2020/6/4/1018 v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (3): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 294 ++ drivers/iommu/arm-smmu.c | 17 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 324 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e -- 2.26.2
[PATCH v7 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index d7ceb4c34423b..5b2586ac715ed 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -38,6 +38,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2
RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That >way we avoid this somewhat odd check here. NVIDIA haven't made any changes to arm,mmu-500. It is only used in different topology. New model would be mis-leading here. As suggested by Robin, It can just be moved to end of function. >> diff --git a/drivers/iommu/arm-smmu-nvidia.c >> b/drivers/iommu/arm-smmu-nvidia.c >I wonder if it would be better to name this arm-smmu-tegra.c to make it >clearer that this is for a Tegra chip. We do have regular expressions in >MAINTAINERS that catch anything with "tegra" in it to make this easier. >Also, the nsmmu_ prefix looks somewhat odd here. You already use struct >nvidia_smmu as the name of the structure, so why not be consistent and >continue to use nvidia_smmu_ as the prefix for function names? >Or perhaps even use tegra_smmu_ as the prefix to match the filename change I >suggested earlier. Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file name as arm-smmu-nvidia.c after the vendor name. >> +#define TLB_LOOP_TIMEOUT100 /* 1s! */ >USEC_PER_SEC? It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for tlb_sync implementation. Can rename it to TLB_LOOP_TIMEOUT_IN_US. >> +} >> +dev_err_ratelimited(smmu->dev, >> +"TLB sync timed out -- SMMU may be deadlocked\n"); >Same here. >Also, is there anything we can do when this happens? This is never expected to happen on Silicon. This code and message is reused from arm-smmu.c. >> +#define nsmmu_page(smmu, inst, page) \ >> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ >> +((page) << smmu->pgshift)) >Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in >nvidia_smmu_impl_init()? Then this would become just: > to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift) > + >Maybe add this here to simplify the nsmmu_page() macro above: > nsmmu->bases[0] = smmu->base; This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet populated when nvidia_smmu_impl_init() is called. Let me look at the alternative place to set it. -KR
RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
>> + - nvdia,tegra194-smmu-500 >The -500 suffix here seems a bit redundant since there's no other type of SMMU >in Tegra194, correct? Yeah, there is only one type of SMMU supported in T194. It was added to be synonymous with mmu-500. Can be removed. -KR
[PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 +++- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index dafc293a45217..5999b6a770992 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .tlb_sync = nsmmu_tlb_sync, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 243bc4cb2705b..d7
[PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot
>> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes >> address space '' of expression Reported-by: kbuild test robot Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index 5999b6a770992..6348d8dc17fc2 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -248,7 +248,7 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) break; nsmmu->bases[i] = devm_ioremap_resource(dev, res); if (IS_ERR(nsmmu->bases[i])) - return (struct arm_smmu_device *)nsmmu->bases[i]; + return ERR_CAST(nsmmu->bases[i]); nsmmu->num_inst++; } -- 2.26.2
[PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index e3ef1c69d1326..8f7ffd248f303 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -37,6 +37,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu-500 + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2
[PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 161 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 50659d76976b7..118da0893c964 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 57cf4ba5e27cb..35542df00da72 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index c75b9d957b702..52c84c30f83e4 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu-500")) + return nvidia_smmu_impl_init(smmu); smmu->impl = _mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 0..dafc293a45217 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int
[PATCH v6 0/4] Nvidia Arm SMMUv2 Implementation
Changes in v6: Restricted the patch set to driver specific patches. Fixed the cast warning reported by kbuild test robot. Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v5 - https://lkml.org/lkml/2020/5/21/1114 v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (4): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 261 ++ drivers/iommu/arm-smmu.c | 11 +- drivers/iommu/arm-smmu.h | 4 + 7 files changed, 285 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 431275afdc7155415254aef4bd3816a1b8a2ead0 -- 2.26.2
RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
>For the record: I don't think we should apply these because we don't have a >good way of testing them. We currently have three problems that prevent us >from enabling SMMU on Tegra194: Out of three issues pointed here, I see that only issue 2) is a real blocker for enabling SMMU HW by default in upstream. >That said, I have tested earlier versions of this patchset on top of my local >branch with fixes for the above and they do seem to work as expected. >So I'll leave it up to the IOMMU maintainers whether they're willing to merge >the driver patches as is. > But I want to clarify that I won't be applying the DTS patches until we've > solved all of the above issues and therefore it should be clear that these > won't be runtime tested until then. SMMU driver patches as such are complete and can be used by nvidia with a local config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and Protects the driver patches against kernel changes. This config disable option is tested already by Nicolin Chen and me. Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 patches) can be merged without DT enable patches? Is it reasonable to merge the driver patches alone? >1) If we enable SMMU support, then the DMA API will automatically try > to use SMMU domains for allocations. This means that translations > will happen as soon as a device's IOMMU operations are initialized > and that is typically a long time (in kernel time at least) before > a driver is bound and has a chance of configuring the device. > This causes problems for non-quiesced devices like display > controllers that the bootloader might have set up to scan out a > boot splash. > What we're missing here is a way to: > a) advertise reserved memory regions for boot splash framebuffers > b) map reserved memory regions early during SMMU setup > Patches have been floating on the public mailing lists for b) but > a) requires changes to the bootloader (both proprietary ones and > U-Boot for SoCs prior to Tegra194). This happens if SMMU translations is enabled for display before reserved Memory regions issue is fixed. This issue is not a real blocker for SMMU enable. > 2) Even if we don't enable SMMU for a given device (by not hooking up > the iommus property), with a default kernel configuration we get a > bunch of faults during boot because the ARM SMMU driver faults by > default (rather than bypass) for masters which aren't hooked up to > the SMMU. > We could work around that by changing the default configuration or > overriding it on the command-line, but that's not really an option > because it decreases security and means that Tegra194 won't work > out-of-the-box. This is the real issue that blocks enabling SMMU. The USF faults for devices that don't have SMMU translations enabled should be fixed or WAR'ed before SMMU can be enabled. We should look at keeping SID as 0x7F for the devices that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally. > 3) We don't properly describe the DMA hierarchy, which causes the DMA > masks to be improperly set. As a bit of background: Tegra194 has a > special address bit (bit 39) that causes some swizzling to happen > within the memory controller. As a result, any I/O virtual address > that has bit 39 set will cause this swizzling to happen on access. > The DMA/IOMMU allocator always starts allocating from the top of > the IOVA space, which means that the first couple of gigabytes of > allocations will cause most devices to fail because of the > undesired swizzling that occurs. > We had an initial patch for SDHCI merged that hard-codes the DMA > mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However, > the devices all do support addressing 40 bits and the restriction > on bit 39 is really a property of the bus rather than a capability > of the device. This means that we would have to work around this > for every device driver by adding similar hacks. A better option is > to properly describe the DMA hierarchy (using dma-ranges) because > that will then automatically be applied as a constraint on each > device's DMA mask. > I have been working on patches to address this, but they are fairly > involved because they require device tree bindings changes and so > on. Dma_mask issue is again outside SMMU driver and as long as the clients with Dma_mask issue don't have SMMU enabled, it would be fine. SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care. >So before we solve all of the above issues we can't really enable SMMU on >Tegra194 and hence won't be able to test it. As such we don't know if these >patches even work, nor can we validate that they continue to work. >As such, I don't think there's any use in applying these patches upstream >since they
[PATCH v5 1/5] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave IOVA accesses across them. Add NVIDIA implementation for dual ARM MMU-500s and add new compatible string for Tegra194 soc. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 161 drivers/iommu/arm-smmu.h| 1 + 5 files changed, 168 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index ecc0749810b0..0d8c966ecf17 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16560,9 +16560,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 57cf4ba5e27c..35542df00da7 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o amd_iommu_quirks.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o obj-$(CONFIG_ARM_SMMU) += arm_smmu.o -arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o +arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 74d97a886e93..dcdd513323aa 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) */ switch (smmu->model) { case ARM_MMU500: + if (of_device_is_compatible(smmu->dev->of_node, + "nvidia,tegra194-smmu-500")) + return nvidia_smmu_impl_init(smmu); smmu->impl = _mmu500_impl; break; case CAVIUM_SMMUV2: diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index ..dafc293a4521 --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,161 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +/* Tegra194 has three ARM MMU-500 Instances. + * Two of them are used together for Interleaved IOVA accesses and + * used by Non-Isochronous Hw devices for SMMU translations. + * Third one is used for SMMU translations from Isochronous HW devices. + * It is possible to use this Implementation to program either + * all three or two of the instances identically as desired through + * DT node. + * + * Programming all the three instances identically comes with redundant tlb + * invalidations as all three never need to be tlb invalidated for a HW device. + * + * When Linux Kernel supports multiple SMMU devices, The SMMU device used for + * Isochornous HW devices should be added as a separate ARM MMU-500 device + * in DT and be programmed independently for efficient tlb invalidates. + * + */ +#define MAX_SMMU_INSTANCES 3 + +#define TLB_LOOP_TIMEOUT 100 /* 1s! */ +#define TLB_SPIN_COUNT 10 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + unsigned intnum_inst; + void __iomem*bases[MAX_SMMU_INSTANCES]; +}; + +#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + unsigned int i; + + for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + unsigned int
[PATCH v5 4/5] arm64: tegra: Add DT node for T194 SMMU
Add DT node for T194 SMMU to enable SMMU support. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 1 file changed, 77 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index f4ede86e32b4..f7c4399afb55 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1620,6 +1620,83 @@ pcie@141a { 0x8200 0x0 0x4000 0x1f 0x4000 0x0 0xc000>; /* non-prefetchable memory (3GB) */ }; + smmu: iommu@1200 { + compatible = "arm,mmu-500","nvidia,tegra194-smmu-500"; + reg = <0 0x1200 0 0x80>, + <0 0x1100 0 0x80>, + <0 0x1000 0 0x80>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + stream-match-mask = <0x7f80>; + #global-interrupts = <3>; + #iommu-cells = <1>; + }; + pcie_ep@1416 { compatible = "nvidia,tegra194-pcie-ep", "snps,dw-pcie-ep"; power-domains = < TEGRA194_POWER_DOMAIN_PCIEX4A>; -- 2.26.2
[PATCH v5 3/5] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow NVIDIA SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 100 drivers/iommu/arm-smmu.c| 11 +++- drivers/iommu/arm-smmu.h| 3 + 3 files changed, 112 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index dafc293a4521..5999b6a77099 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu) return 0; } +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, void *dev) +{ + int inst; + irqreturn_t irq_ret = IRQ_NONE; + struct arm_smmu_device *smmu = dev; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, inst); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & ARM_SMMU_FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault(int irq, void *dev) +{ + int inst, idx; + irqreturn_t irq_ret = IRQ_NONE; + struct iommu_domain *domain = dev; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) { + /* Interrupt line shared between all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, + idx, inst); + + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + } + + return irq_ret; +} + static const struct arm_smmu_impl nvidia_smmu_impl = { .read_reg = nsmmu_read_reg, .write_reg = nsmmu_write_reg, @@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = { .write_reg64 = nsmmu_write_reg64, .reset = nsmmu_reset, .tlb_sync = nsmmu_tlb_sync, + .global_fault = nsmmu_global_fault, + .context_fault = nsmmu_context_fault, }; struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index e622f4e33379..9
[PATCH v5 5/5] arm64: tegra: enable SMMU for SDHCI and EQOS on T194
Enable SMMU translations for SDHCI and EQOS transactions on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index f7c4399afb55..706bbb439dcd 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -59,6 +59,7 @@ ethernet@249 { clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = < TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = < TEGRA194_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -457,6 +458,7 @@ sdmmc1: sdhci@340 { clock-names = "sdhci"; resets = < TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = < TEGRA194_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -479,6 +481,7 @@ sdmmc3: sdhci@344 { clock-names = "sdhci"; resets = < TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = < TEGRA194_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -506,6 +509,7 @@ sdmmc4: sdhci@346 { < TEGRA194_CLK_PLLC4>; resets = < TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = < TEGRA194_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.26.2
[PATCH v5 2/5] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 6515dbe47508..78aba7dd5a61 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -37,6 +37,11 @@ properties: - qcom,sc7180-smmu-500 - qcom,sdm845-smmu-500 - const: arm,mmu-500 + - description: NVIDIA SoCs that use more than one "arm,mmu-500" +items: + - enum: + - nvdia,tegra194-smmu-500 + - const: arm,mmu-500 - items: - const: arm,mmu-500 - const: arm,smmu-v2 -- 2.26.2
[PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
Changes in v5: Rebased on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next v4 - https://lkml.org/lkml/2019/10/30/1054 v3 - https://lkml.org/lkml/2019/10/18/1601 v2 - https://lkml.org/lkml/2019/9/2/980 v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (5): iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + MAINTAINERS | 2 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 81 ++ drivers/iommu/Makefile| 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c | 261 ++ drivers/iommu/arm-smmu.c | 11 +- drivers/iommu/arm-smmu.h | 4 + 8 files changed, 366 insertions(+), 3 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c base-commit: 365f8d504da50feaebf826d180113529c9383670 -- 2.26.2
RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation
>>https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates Thanks Will! Let me rebase my patches on top of this branch and send it out. -KR
RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation
Hi Robin, >>Apologies for crossed wires, but I had a series getting rid of >>arm_smmu_flush_ops which was also meant to end up making things a bit easier >>for you: I was looking to rebase on top of your changes first. Then I read Will's reply that said your work is queued for 5.5. Let me know if these patches need to rebased on top of iommu/devel or a different branch. I can resend the patch set on top of necessary branch. -KR
[PATCH v2 0/7] Nvidia Arm SMMUv2 Implementation
Changes in v2: - Prepare arm_smu_flush_ops for override. - Remove NVIDIA_SMMUv2 and use ARM_SMMUv2 model as T194 SMMU hasn't modified ARM MMU-500. - Add T194 specific compatible string - "nvidia,tegra194-smmu" - Remove tlb_sync hook added in v1 and Override arm_smmu_flush_ops->tlb_sync() from implementation. - Register implementation specific context/global fault hooks directly for irq handling. - Update global/context interrupt list in DT and releant fault handling code in arm-smmu-nvidia.c. - Implement reset hook in arm-smmu-nvidia.c to clear irq status and sync tlb. v1 - https://lkml.org/lkml/2019/8/29/1588 Krishna Reddy (7): iommu/arm-smmu: prepare arm_smmu_flush_ops for override iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage dt-bindings: arm-smmu: Add binding for Tegra194 SMMU iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add Memory controller DT node on T194 arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS on T194 .../devicetree/bindings/iommu/arm,smmu.txt | 4 + MAINTAINERS| 2 + arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 88 +++ drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 3 + drivers/iommu/arm-smmu-nvidia.c| 287 + drivers/iommu/arm-smmu.c | 27 +- drivers/iommu/arm-smmu.h | 8 +- 9 files changed, 413 insertions(+), 12 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c -- 2.1.4
[PATCH v2 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
Add binding for NVIDIA's Tegra194 Soc SMMU that is based on ARM MMU-500. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..1d72fac 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -31,6 +31,10 @@ conditions. as below, SoC-specific compatibles: "qcom,sdm845-smmu-500", "arm,mmu-500" + NVIDIA SoCs that use more than one ARM MMU-500 together + needs following SoC-specific compatibles along with "arm,mmu-500": + "nvidia,tegra194-smmu" + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- 2.1.4
RE: [PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation
>>> +ARM_SMMU_MATCH_DATA(nvidia_smmuv2, ARM_SMMU_V2, NVIDIA_SMMUV2); >> The ARM MMU-500 implementation is unmodified. It is the way the are >> integrated and used together(for interleaved accesses) is different from >> regular ARM MMU-500. >> I have added it to get the model number and to be able differentiate the >> SMMU implementation in arm-smmu-impl.c. >In that case, I would rather keep smmu->model representing the MMU-500 >microarchitecture - >since you'll still want to pick up errata workarounds etc. for that - and >detect the Tegra integration via an explicit of_device_is_compatible() > check in arm_smmu_impl_init(). Looks good to me. >For comparison, under ACPI we'd probably have to detect integration details by >looking at table headers, separately > from the IORT "Model" field, so I'd prefer if the DT vs. ACPI handling didn't > diverge more than necessary. ACPI support for T194 can be added based on need in subsequent patches. For now, I am updating it for DT support. -KR
[PATCH 4/7] iommu/arm-smmu: Add global/context fault implementation hooks
Add global/context fault hooks to allow Nvidia SMMU implementation handle faults across multiple SMMUs. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-nvidia.c | 127 drivers/iommu/arm-smmu.c| 6 ++ drivers/iommu/arm-smmu.h| 4 ++ 3 files changed, 137 insertions(+) diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c index a429b2c..b2a3c49 100644 --- a/drivers/iommu/arm-smmu-nvidia.c +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -14,6 +14,10 @@ #define NUM_SMMU_INSTANCES 3 +static irqreturn_t nsmmu_context_fault_inst(int irq, + struct arm_smmu_device *smmu, + int idx, int inst); + struct nvidia_smmu { struct arm_smmu_device smmu; int num_inst; @@ -87,12 +91,135 @@ static void nsmmu_tlb_sync(struct arm_smmu_device *smmu, int page, nsmmu_tlb_sync_wait(smmu, page, sync, status, i); } +static irqreturn_t nsmmu_global_fault_inst(int irq, + struct arm_smmu_device *smmu, + int inst) +{ + u32 gfsr, gfsynr0, gfsynr1, gfsynr2; + + gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR0); + gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR1); + gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) + + ARM_SMMU_GR0_sGFSYNR2); + + if (!gfsr) + return IRQ_NONE; + + dev_err_ratelimited(smmu->dev, + "Unexpected global fault, this could be serious\n"); + dev_err_ratelimited(smmu->dev, + "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n", + gfsr, gfsynr0, gfsynr1, gfsynr2); + + writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_global_fault(int irq, struct arm_smmu_device *smmu) +{ + int i; + irqreturn_t irq_ret = IRQ_NONE; + + /* Interrupt line is shared between global and context faults. +* Check for both type of interrupts on either fault handlers. +*/ + for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) { + irq_ret = nsmmu_context_fault_inst(irq, smmu, 0, i); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) { + irq_ret = nsmmu_global_fault_inst(irq, smmu, i); + if (irq_ret == IRQ_HANDLED) + return irq_ret; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault_bank(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + u32 fsr, fsynr, cbfrsynra; + unsigned long iova; + + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR); + if (!(fsr & FSR_FAULT)) + return IRQ_NONE; + + fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSYNR0); + iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) + +ARM_SMMU_CB_FAR); + cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) + + ARM_SMMU_GR1_CBFRSYNRA(idx)); + + dev_err_ratelimited(smmu->dev, + "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n", + fsr, iova, fsynr, cbfrsynra, idx); + + writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) + + ARM_SMMU_CB_FSR); + return IRQ_HANDLED; +} + +static irqreturn_t nsmmu_context_fault_inst(int irq, + struct arm_smmu_device *smmu, + int idx, int inst) +{ + irqreturn_t irq_ret = IRQ_NONE; + + /* Interrupt line shared between global and all context faults. +* Check for faults across all contexts. +*/ + for (idx = 0; idx < smmu->num_context_banks; idx++) { + irq_ret = nsmmu_context_fault_bank(irq, smmu, idx, inst); + + if (irq_ret == IRQ_HANDLED) + break; + } + + return irq_ret; +} + +static irqreturn_t nsmmu_context_fault(int irq, + struct arm_smmu_device *smmu, + int cbndx) +{ + int i; + irqreturn_t irq_ret = IRQ_NONE; + + /* Interrupt line is shared between g
[PATCH 5/7] arm64: tegra: Add Memory controller DT node on T194
Add Memory controller DT node on T194 and enable it. This patch is a prerequisite for SMMU enable on T194. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 7 +++ 2 files changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi index 62e07e11..4b3441b 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi @@ -47,6 +47,10 @@ }; }; + memory-controller@2c0 { + status = "okay"; + }; + serial@311 { status = "okay"; }; diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index adebbbf..d906958 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -6,6 +6,7 @@ #include #include #include +#include / { compatible = "nvidia,tegra194"; @@ -130,6 +131,12 @@ }; }; + memory-controller@2c0 { + compatible = "nvidia,tegra186-mc"; + reg = <0x02c0 0xb>; + status = "disabled"; + }; + uarta: serial@310 { compatible = "nvidia,tegra194-uart", "nvidia,tegra20-uart"; reg = <0x0310 0x40>; -- 2.1.4
[PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation
Add Nvidia SMMUv2 implementation and model info. Signed-off-by: Krishna Reddy --- MAINTAINERS | 2 + drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 2 + drivers/iommu/arm-smmu-nvidia.c | 97 + drivers/iommu/arm-smmu.c| 2 + drivers/iommu/arm-smmu.h| 2 + 6 files changed, 106 insertions(+), 1 deletion(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c diff --git a/MAINTAINERS b/MAINTAINERS index 289fb06..b9d59e51 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15785,9 +15785,11 @@ F: drivers/i2c/busses/i2c-tegra.c TEGRA IOMMU DRIVERS M: Thierry Reding +R: Krishna Reddy L: linux-te...@vger.kernel.org S: Supported F: drivers/iommu/tegra* +F: drivers/iommu/arm-smmu-nvidia.c TEGRA KBC DRIVER M: Laxman Dewangan diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index a2729aa..7f5489e 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o -obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o +obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 5c87a38..e5e595f 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -162,6 +162,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) break; case CAVIUM_SMMUV2: return cavium_smmu_impl_init(smmu); + case NVIDIA_SMMUV2: + return nvidia_smmu_impl_init(smmu); default: break; } diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c new file mode 100644 index 000..d93ceda --- /dev/null +++ b/drivers/iommu/arm-smmu-nvidia.c @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Nvidia ARM SMMU v2 implementation quirks +// Copyright (C) 2019 NVIDIA CORPORATION. All rights reserved. + +#define pr_fmt(fmt) "nvidia-smmu: " fmt + +#include +#include +#include +#include +#include + +#include "arm-smmu.h" + +#define NUM_SMMU_INSTANCES 3 + +struct nvidia_smmu { + struct arm_smmu_device smmu; + int num_inst; + void __iomem*bases[NUM_SMMU_INSTANCES]; +}; + +#define to_nsmmu(s)container_of(s, struct nvidia_smmu, smmu) + +#define nsmmu_page(smmu, inst, page) \ + (((inst) ? to_nsmmu(smmu)->bases[(inst)] : smmu->base) + \ + ((page) << smmu->pgshift)) + +static u32 nsmmu_read_reg(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readl_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg(struct arm_smmu_device *smmu, + int page, int offset, u32 val) +{ + int i; + + for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) + writel_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu, + int page, int offset) +{ + return readq_relaxed(nsmmu_page(smmu, 0, page) + offset); +} + +static void nsmmu_write_reg64(struct arm_smmu_device *smmu, + int page, int offset, u64 val) +{ + int i; + + for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) + writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset); +} + +static const struct arm_smmu_impl nsmmu_impl = { + .read_reg = nsmmu_read_reg, + .write_reg = nsmmu_write_reg, + .read_reg64 = nsmmu_read_reg64, + .write_reg64 = nsmmu_write_reg64, +}; + +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu) +{ + int i; + struct nvidia_smmu *nsmmu; + struct resource *res; + struct device *dev = smmu->dev; + struct platform_device *pdev = to_platform_device(smmu->dev); + + nsmmu = devm_kzalloc(smmu->dev, sizeof(*nsmmu), GFP_KERNEL); + if (!nsmmu) + return ERR_PTR(-ENOMEM); + + nsmmu->smmu = *smmu; + /* Instance 0 is ioremapped by arm-smmu.c */ + nsmmu->num_inst = 1; + + for (i = 1; i < NUM_SMMU_INSTANCES; i++) { + res = platform_get_resource(pdev, IORESOURCE_MEM, i); + if (!res) + break; + nsmmu->bases[i] = devm_ioremap_resource(dev, res); + if (IS_ERR(nsmmu->bases[i])) + return (struct arm_smmu_device *)nsmmu->bases[i]; +
[PATCH 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS
Enable SMMU translations for SDHCI and EQOS transactions. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index ad509bb..0496a87 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -51,6 +51,7 @@ clock-names = "master_bus", "slave_bus", "rx", "tx", "ptp_ref"; resets = < TEGRA194_RESET_EQOS>; reset-names = "eqos"; + iommus = < TEGRA186_SID_EQOS>; status = "disabled"; snps,write-requests = <1>; @@ -381,6 +382,7 @@ clock-names = "sdhci"; resets = < TEGRA194_RESET_SDMMC1>; reset-names = "sdhci"; + iommus = < TEGRA186_SID_SDMMC1>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; nvidia,pad-autocal-pull-down-offset-3v3-timeout = @@ -403,6 +405,7 @@ clock-names = "sdhci"; resets = < TEGRA194_RESET_SDMMC3>; reset-names = "sdhci"; + iommus = < TEGRA186_SID_SDMMC3>; nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>; nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>; nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>; @@ -430,6 +433,7 @@ < TEGRA194_CLK_PLLC4>; resets = < TEGRA194_RESET_SDMMC4>; reset-names = "sdhci"; + iommus = < TEGRA186_SID_SDMMC4>; nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>; nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>; -- 2.1.4
[PATCH 2/7] dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2
Add binding doc for Nvidia's smmu-v2 implementation. Signed-off-by: Krishna Reddy --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..0de3759 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -17,6 +17,7 @@ conditions. "arm,mmu-401" "arm,mmu-500" "cavium,smmu-v2" +"nidia,smmu-v2" "qcom,smmu-v2" depending on the particular implementation and/or the -- 2.1.4
[PATCH 0/7] Nvidia Arm SMMUv2 Implementation
Hi All, Nvidia Arm SMMUv2 implementation has two ARM SMMU(MMU-500) instances that are used together for SMMU translations. The IOVA accesses from HW devices are interleaved across these two SMMU instances and need to be programmed identical except during tlb sync and fault handling. This patch set adds Nvidia Arm SMMUv2 Implementation on top of ARM SMMU driver to handle Nvidia specific implementation. It is also adding hooks for tlb sync and fault handling to allow vendor specific implementation for the same. Please review the patch set and provide the feedback. This patch set is based on the following branch as it is dependent on the Arm SMMU Refactor changes from Robin Murphy that are present in this branch. https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git for-joerg/arm-smmu/updates Krishna Reddy (7): iommu/arm-smmu: add Nvidia SMMUv2 implementation dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2 iommu/arm-smmu: Add tlb_sync implementation hook iommu/arm-smmu: Add global/context fault implementation hooks arm64: tegra: Add Memory controller DT node on T194 arm64: tegra: Add DT node for T194 SMMU arm64: tegra: enable SMMU for SDHCI and EQOS .../devicetree/bindings/iommu/arm,smmu.txt | 1 + MAINTAINERS| 2 + arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 + arch/arm64/boot/dts/nvidia/tegra194.dtsi | 86 +++ drivers/iommu/Makefile | 2 +- drivers/iommu/arm-smmu-impl.c | 2 + drivers/iommu/arm-smmu-nvidia.c| 256 + drivers/iommu/arm-smmu.c | 16 +- drivers/iommu/arm-smmu.h | 10 + 9 files changed, 375 insertions(+), 4 deletions(-) create mode 100644 drivers/iommu/arm-smmu-nvidia.c -- 2.1.4
RE: [PATCH v3 2/6] iommu/arm-smmu: Add support to program multiple ARM SMMU's identically
Hi Robin, Thanks for the feedback :) >The whole point of the library idea was to factor out the code in such a way >that all the details >specific to a particular implementation can be kept together. But what this >patch does is insert >Tegra194-specific handling all through the 'common' code, which is the exact >opposite of that concept >and just makes more hard-to-maintain mess. In an attempt to reuse most of the ARM SMMU implementation, which heavily relies on data from arm_smmu_device, The library code has been added with some functionality only usable by Tegra194 SMMU driver. In V4 patches, I am working on to add a mechanism to override writel/readl functions in library so that Tegra smmu driver can override read/write functions and handle programming of multiple instances on its own. >The amount of copy-paste duplication in patch #4 has the opposite problem - >about 95% of that isn't >Tegra194-specific at all (I mean, how many fsl_mc instances does it have?), >and having multiple copies of > generic code with the potential to diverge is also not what anyone wants. I have split the code in a way that library only contains the code that deals with register programming. And avoided platform driver code and DT parsing code getting into library, which can allow drivers changing Independently if necessary in future. > Plus I don't think ending up building > multiple separate drivers will even work in general - thanks to the current > state of >bus_set_iommu() etc., you can't use the regular driver for your third SMMU at >the same time. Good point! From code, platform_dma_configure/of_dma_configure/of_iommu_configure takes care of setting right iommu_ops for devices based on the iommu DT node they have in iommus=<> entry. If iommu.c is updated to use dev->bus->dma_configure(), then it doesn't really need to use dev->bus->iommu_ops. dev->bus->dma_configure() can be used to set dev->dma_ops to the right one, if dev->dma_ops is not already set. If this approach looks good, I can make a patch to clean up bus->iommu_ops usage related code to allow devices to use specific SMMU instance as they need. >I think what really needs to be done is to conceptually split the driver into >"architecture" and "implementation" > layers - at some point after the holidays we're probably going to sit down > and go through all the various quirks and > specifics we know about to try and figure out what that should actually look > like. If you can provide some high level details on what to keep in library vs implementation after holidays, I would be happy to rework the patches. Will look forward for further discussions on this. -KR
[PATCH v2 1/5] iommu/arm-smmu: rearrange arm-smmu.c code
Rearrange arm-smmu.c code into arm-smmu-common.h, arm-smmu-common.c and arm-smmu.c. This patch rearranges the arm-smmu.c code to allow sharing the ARM SMMU driver code with dual ARM SMMU based Tegra194 SMMU driver. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-common.c | 1922 +++ drivers/iommu/arm-smmu-common.h | 256 + drivers/iommu/arm-smmu.c| 2133 +-- 3 files changed, 2180 insertions(+), 2131 deletions(-) create mode 100644 drivers/iommu/arm-smmu-common.c create mode 100644 drivers/iommu/arm-smmu-common.h diff --git a/drivers/iommu/arm-smmu-common.c b/drivers/iommu/arm-smmu-common.c new file mode 100644 index 000..1ad8e5f --- /dev/null +++ b/drivers/iommu/arm-smmu-common.c @@ -0,0 +1,1922 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2013 ARM Limited + * + * Author: Will Deacon + */ + +static int force_stage; +module_param(force_stage, int, S_IRUGO); +MODULE_PARM_DESC(force_stage, + "Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation."); +static bool disable_bypass; +module_param(disable_bypass, bool, S_IRUGO); +MODULE_PARM_DESC(disable_bypass, + "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); + +#define ARM_SMMU_MATCH_DATA(name, ver, imp)\ +static struct arm_smmu_match_data name = { .version = ver, .model = imp } + +static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu); +static void arm_smmu_tlb_sync_context(void *cookie); +static irqreturn_t arm_smmu_context_fault(int irq, void *dev); +static irqreturn_t arm_smmu_global_fault(int irq, void *dev); + +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static void parse_driver_options(struct arm_smmu_device *smmu) +{ + int i = 0; + + do { + if (of_property_read_bool(smmu->dev->of_node, + arm_smmu_options[i].prop)) { + smmu->options |= arm_smmu_options[i].opt; + dev_notice(smmu->dev, "option %s\n", + arm_smmu_options[i].prop); + } + } while (arm_smmu_options[++i].opt); +} + +static struct device_node *dev_get_dev_node(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_bus *bus = to_pci_dev(dev)->bus; + + while (!pci_is_root_bus(bus)) + bus = bus->parent; + return of_node_get(bus->bridge->parent->of_node); + } + + return of_node_get(dev->of_node); +} + +static int __arm_smmu_get_pci_sid(struct pci_dev *pdev, u16 alias, void *data) +{ + *((__be32 *)data) = cpu_to_be32(alias); + return 0; /* Continue walking */ +} + +static int __find_legacy_master_phandle(struct device *dev, void *data) +{ + struct of_phandle_iterator *it = *(void **)data; + struct device_node *np = it->node; + int err; + + of_for_each_phandle(it, err, dev->of_node, "mmu-masters", + "#stream-id-cells", 0) + if (it->node == np) { + *(void **)data = dev; + return 1; + } + it->node = np; + return err == -ENOENT ? 0 : err; +} + +static struct platform_driver arm_smmu_driver; +static struct iommu_ops arm_smmu_ops; + +static int arm_smmu_register_legacy_master(struct device *dev, + struct arm_smmu_device **smmu) +{ + struct device *smmu_dev; + struct device_node *np; + struct of_phandle_iterator it; + void *data = + u32 *sids; + __be32 pci_sid; + int err; + + np = dev_get_dev_node(dev); + if (!np || !of_find_property(np, "#stream-id-cells", NULL)) { + of_node_put(np);
[PATCH v2 1/5] iommu/arm-smmu: rearrange arm-smmu.c code
Rearrange arm-smmu.c code into arm-smmu-common.h, arm-smmu-common.c and arm-smmu.c. This patch rearranges the arm-smmu.c code to allow sharing the ARM SMMU driver code with dual ARM SMMU based Tegra194 SMMU driver. Signed-off-by: Krishna Reddy --- drivers/iommu/arm-smmu-common.c | 1922 +++ drivers/iommu/arm-smmu-common.h | 256 + drivers/iommu/arm-smmu.c| 2133 +-- 3 files changed, 2180 insertions(+), 2131 deletions(-) create mode 100644 drivers/iommu/arm-smmu-common.c create mode 100644 drivers/iommu/arm-smmu-common.h diff --git a/drivers/iommu/arm-smmu-common.c b/drivers/iommu/arm-smmu-common.c new file mode 100644 index 000..1ad8e5f --- /dev/null +++ b/drivers/iommu/arm-smmu-common.c @@ -0,0 +1,1922 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) 2013 ARM Limited + * + * Author: Will Deacon + */ + +static int force_stage; +module_param(force_stage, int, S_IRUGO); +MODULE_PARM_DESC(force_stage, + "Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation."); +static bool disable_bypass; +module_param(disable_bypass, bool, S_IRUGO); +MODULE_PARM_DESC(disable_bypass, + "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); + +#define ARM_SMMU_MATCH_DATA(name, ver, imp)\ +static struct arm_smmu_match_data name = { .version = ver, .model = imp } + +static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu); +static void arm_smmu_tlb_sync_context(void *cookie); +static irqreturn_t arm_smmu_context_fault(int irq, void *dev); +static irqreturn_t arm_smmu_global_fault(int irq, void *dev); + +static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) +{ + return container_of(dom, struct arm_smmu_domain, domain); +} + +static void parse_driver_options(struct arm_smmu_device *smmu) +{ + int i = 0; + + do { + if (of_property_read_bool(smmu->dev->of_node, + arm_smmu_options[i].prop)) { + smmu->options |= arm_smmu_options[i].opt; + dev_notice(smmu->dev, "option %s\n", + arm_smmu_options[i].prop); + } + } while (arm_smmu_options[++i].opt); +} + +static struct device_node *dev_get_dev_node(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_bus *bus = to_pci_dev(dev)->bus; + + while (!pci_is_root_bus(bus)) + bus = bus->parent; + return of_node_get(bus->bridge->parent->of_node); + } + + return of_node_get(dev->of_node); +} + +static int __arm_smmu_get_pci_sid(struct pci_dev *pdev, u16 alias, void *data) +{ + *((__be32 *)data) = cpu_to_be32(alias); + return 0; /* Continue walking */ +} + +static int __find_legacy_master_phandle(struct device *dev, void *data) +{ + struct of_phandle_iterator *it = *(void **)data; + struct device_node *np = it->node; + int err; + + of_for_each_phandle(it, err, dev->of_node, "mmu-masters", + "#stream-id-cells", 0) + if (it->node == np) { + *(void **)data = dev; + return 1; + } + it->node = np; + return err == -ENOENT ? 0 : err; +} + +static struct platform_driver arm_smmu_driver; +static struct iommu_ops arm_smmu_ops; + +static int arm_smmu_register_legacy_master(struct device *dev, + struct arm_smmu_device **smmu) +{ + struct device *smmu_dev; + struct device_node *np; + struct of_phandle_iterator it; + void *data = + u32 *sids; + __be32 pci_sid; + int err; + + np = dev_get_dev_node(dev); + if (!np || !of_find_property(np, "#stream-id-cells", NULL)) { + of_node_put(np);
[PATCH v2 0/5] Add Tegra194 Dual ARM SMMU driver
NVIDIA's Xavier (Tegra194) SOC has three ARM SMMU(MMU-500) instances. Two of the SMMU instances are used to interleave IOVA accesses across them. The IOVA accesses from HW devices are interleaved across these two SMMU instances and they need to be programmed identical. The existing ARM SMMU driver can't be used in its current form for programming the two SMMU instances identically. But, most of the code can be shared between ARM SMMU driver and Tegra194 SMMU driver. Page fault handling and TLB sync operations need to know about specific instance of SMMU for correct fault handling and optimal TLB sync wait. Rest of the code doesn't need to know about number of SMMU instances. Based on this fact, The patch series here rearranges the arm-smmu.c code to allow sharing most of the ARM SMMU programming/iommu_ops code between ARM SMMU driver and Tegra194 SMMU driver and transparently handles programming of two SMMU instances. The third SMMU instance would use the existing ARM SMMU driver. Changes in v2: * Added CONFIG_ARM_SMMU_TEGRA to protect Tegra194 SMMU driver compilation * Enabled CONFIG_ARM_SMMU_TEGRA in defconfig * Added SMMU nodes in Tegra194 device tree Krishna Reddy (5): iommu/arm-smmu: rearrange arm-smmu.c code iommu/arm-smmu: Prepare fault, probe, sync functions for sharing code iommu/tegra194_smmu: Add Tegra194 SMMU driver arm64: defconfig: Enable ARM_SMMU_TEGRA arm64: tegra: Add SMMU nodes to Tegra194 device tree arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 ++ arch/arm64/configs/defconfig |1 + drivers/iommu/Kconfig| 10 + drivers/iommu/Makefile |1 + drivers/iommu/arm-smmu-common.c | 1971 +++ drivers/iommu/arm-smmu-common.h | 256 drivers/iommu/arm-smmu.c | 2167 +- drivers/iommu/tegra194-smmu.c| 201 +++ 8 files changed, 2595 insertions(+), 2160 deletions(-) create mode 100644 drivers/iommu/arm-smmu-common.c create mode 100644 drivers/iommu/arm-smmu-common.h create mode 100644 drivers/iommu/tegra194-smmu.c -- 2.1.4
[PATCH v2 0/5] Add Tegra194 Dual ARM SMMU driver
NVIDIA's Xavier (Tegra194) SOC has three ARM SMMU(MMU-500) instances. Two of the SMMU instances are used to interleave IOVA accesses across them. The IOVA accesses from HW devices are interleaved across these two SMMU instances and they need to be programmed identical. The existing ARM SMMU driver can't be used in its current form for programming the two SMMU instances identically. But, most of the code can be shared between ARM SMMU driver and Tegra194 SMMU driver. Page fault handling and TLB sync operations need to know about specific instance of SMMU for correct fault handling and optimal TLB sync wait. Rest of the code doesn't need to know about number of SMMU instances. Based on this fact, The patch series here rearranges the arm-smmu.c code to allow sharing most of the ARM SMMU programming/iommu_ops code between ARM SMMU driver and Tegra194 SMMU driver and transparently handles programming of two SMMU instances. The third SMMU instance would use the existing ARM SMMU driver. Changes in v2: * Added CONFIG_ARM_SMMU_TEGRA to protect Tegra194 SMMU driver compilation * Enabled CONFIG_ARM_SMMU_TEGRA in defconfig * Added SMMU nodes in Tegra194 device tree Krishna Reddy (5): iommu/arm-smmu: rearrange arm-smmu.c code iommu/arm-smmu: Prepare fault, probe, sync functions for sharing code iommu/tegra194_smmu: Add Tegra194 SMMU driver arm64: defconfig: Enable ARM_SMMU_TEGRA arm64: tegra: Add SMMU nodes to Tegra194 device tree arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 ++ arch/arm64/configs/defconfig |1 + drivers/iommu/Kconfig| 10 + drivers/iommu/Makefile |1 + drivers/iommu/arm-smmu-common.c | 1971 +++ drivers/iommu/arm-smmu-common.h | 256 drivers/iommu/arm-smmu.c | 2167 +- drivers/iommu/tegra194-smmu.c| 201 +++ 8 files changed, 2595 insertions(+), 2160 deletions(-) create mode 100644 drivers/iommu/arm-smmu-common.c create mode 100644 drivers/iommu/arm-smmu-common.h create mode 100644 drivers/iommu/tegra194-smmu.c -- 2.1.4
[PATCH 2/2] arm64: dts: tegra186: Enable IOMMU for SDHCI
Enable IOMMU for all SDHCI controllers in Tegra186. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 230c0c8..996997e 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -234,6 +234,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0340 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC1>; clocks = < TEGRA186_CLK_SDMMC1>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC1>; @@ -259,6 +260,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0342 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC2>; clocks = < TEGRA186_CLK_SDMMC2>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC2>; @@ -279,6 +281,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0344 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC3>; clocks = < TEGRA186_CLK_SDMMC3>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC3>; @@ -301,6 +304,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0346 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC4>; clocks = < TEGRA186_CLK_SDMMC4>; clock-names = "sdhci"; assigned-clocks = < TEGRA186_CLK_SDMMC4>, -- 2.1.4
[PATCH 2/2] arm64: dts: tegra186: Enable IOMMU for SDHCI
Enable IOMMU for all SDHCI controllers in Tegra186. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 230c0c8..996997e 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -234,6 +234,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0340 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC1>; clocks = < TEGRA186_CLK_SDMMC1>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC1>; @@ -259,6 +260,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0342 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC2>; clocks = < TEGRA186_CLK_SDMMC2>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC2>; @@ -279,6 +281,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0344 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC3>; clocks = < TEGRA186_CLK_SDMMC3>; clock-names = "sdhci"; resets = < TEGRA186_RESET_SDMMC3>; @@ -301,6 +304,7 @@ compatible = "nvidia,tegra186-sdhci"; reg = <0x0 0x0346 0x0 0x1>; interrupts = ; + iommus = < TEGRA186_SID_SDMMC4>; clocks = < TEGRA186_CLK_SDMMC4>; clock-names = "sdhci"; assigned-clocks = < TEGRA186_CLK_SDMMC4>, -- 2.1.4
[PATCH 1/2] arm64: dts: tegra186: Add dma-ranges to avoid using bounce buffers
Add dma-ranges to avoid using DMA bounce buffers unnecessarily for the devices that can address the physcial memory and don't have SMMU enabled. This also resolves the failures in attaching devices to IOMMU. The following error is caused by the check in io-pgtable-arm.c, where the dma address is expected to match the physical address for the IOMMU devices that don't support coherent page table walking. Bounce buffer usage is causing the mismatch and device add failure. [7.000461] arm-smmu 1200.iommu: Cannot accommodate DMA translation for IOMMU page tables [7.010513] iommu: Failed to add device 1520.display to group 0: -12 Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 2f3c8e2..230c0c8 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -14,6 +14,7 @@ interrupt-parent = <>; #address-cells = <2>; #size-cells = <2>; + dma-ranges = <0x0 0x0 0x0 0x0 0x4 0x0>; misc@10 { compatible = "nvidia,tegra186-misc"; -- 2.1.4
[PATCH 1/2] arm64: dts: tegra186: Add dma-ranges to avoid using bounce buffers
Add dma-ranges to avoid using DMA bounce buffers unnecessarily for the devices that can address the physcial memory and don't have SMMU enabled. This also resolves the failures in attaching devices to IOMMU. The following error is caused by the check in io-pgtable-arm.c, where the dma address is expected to match the physical address for the IOMMU devices that don't support coherent page table walking. Bounce buffer usage is causing the mismatch and device add failure. [7.000461] arm-smmu 1200.iommu: Cannot accommodate DMA translation for IOMMU page tables [7.010513] iommu: Failed to add device 1520.display to group 0: -12 Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 2f3c8e2..230c0c8 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -14,6 +14,7 @@ interrupt-parent = <>; #address-cells = <2>; #size-cells = <2>; + dma-ranges = <0x0 0x0 0x0 0x0 0x4 0x0>; misc@10 { compatible = "nvidia,tegra186-misc"; -- 2.1.4
[PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS
MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory access fault, when HMM_DMIRROR test is enabled. In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48. HMM_DMIRROR test selects phys memory range from end based on MAX_PHYSMEM_BITS and gets mapped into VA space linearly. As VA space is 39-bit and phys space is 48-bit, this has caused incorrect mapping and leads to memory access fault. Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is the right thing instead of hard coding it as 48-bit always. [3.378655] Unable to handle kernel paging request at virtual address 3befd00 [3.378662] pgd = ff800a04b000 [3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, *pmd=006268200711 [3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP [3.378938] Modules linked in: [3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.52-tegra-g91402fdc013b-dirty #51 [3.378950] Hardware name: quill (DT) [3.378954] task: ffc1ebac task.stack: ffc1eba64000 [3.378967] PC is at __memset+0x1ac/0x1d0 [3.378976] LR is at sparse_add_one_section+0xf8/0x174 [3.378981] pc : [] lr : [] pstate: 404000c5 [3.378983] sp : ffc1eba67a40 [3.378993] x29: ffc1eba67a40 x28: [3.378999] x27: 0003 x26: 0040 [3.379005] x25: 03ff x24: ffc1e9f6cf80 [3.379010] x23: ff8009ecb2d4 x22: 03befd00 [3.379015] x21: ffc1e9923ff0 x20: 0003 [3.379020] x19: ffef x18: [3.379025] x17: 24d7 x16: [3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c [3.379035] x13: ffc1e9f6c70b x12: 0030 [3.379039] x11: 0040 x10: 0101010101010101 [3.379044] x9 : x8 : 03befd00 [3.379049] x7 : x6 : 003f [3.379053] x5 : 0040 x4 : [3.379058] x3 : 0004 x2 : 00c0 [3.379063] x1 : x0 : 03befd00 [3.379064] [3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028) [3.379071] Call trace: [3.379079] [] __memset+0x1ac/0x1d0 [3.379085] [] __add_pages+0x130/0x2e0 [3.379093] [] hmm_devmem_pages_create+0x20c/0x310 [3.379100] [] hmm_devmem_add+0x1d4/0x270 [3.379128] [] dmirror_probe+0x50/0x158 [3.379137] [] platform_drv_probe+0x60/0xc8 [3.379143] [] driver_probe_device+0x26c/0x420 [3.379149] [] __driver_attach+0x124/0x128 [3.379155] [] bus_for_each_dev+0x88/0xe8 [3.379166] [] driver_attach+0x30/0x40 [3.379171] [] bus_add_driver+0x1f8/0x2b0 [3.379177] [] driver_register+0x68/0x100 [3.379183] [] __platform_driver_register+0x5c/0x68 [3.379192] [] hmm_dmirror_init+0x88/0xc4 [3.379200] [] do_one_initcall+0x5c/0x170 [3.379208] [] kernel_init_freeable+0x1b8/0x258 [3.379231] [] kernel_init+0x18/0x108 [3.379236] [] ret_from_fork+0x10/0x40 [3.379246] ---[ end trace 578db63bb139b8b8 ]--- Signed-off-by: Krishna Reddy <vdu...@nvidia.com> --- arch/arm64/include/asm/sparsemem.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h index 74a9d301819f..19ecd0b0f3a3 100644 --- a/arch/arm64/include/asm/sparsemem.h +++ b/arch/arm64/include/asm/sparsemem.h @@ -17,7 +17,13 @@ #define __ASM_SPARSEMEM_H #ifdef CONFIG_SPARSEMEM + +#ifdef CONFIG_ARM64_VA_BITS +#define MAX_PHYSMEM_BITS CONFIG_ARM64_VA_BITS +#else #define MAX_PHYSMEM_BITS 48 +#endif + #define SECTION_SIZE_BITS 30 #endif -- 2.1.4
[PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS
MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory access fault, when HMM_DMIRROR test is enabled. In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48. HMM_DMIRROR test selects phys memory range from end based on MAX_PHYSMEM_BITS and gets mapped into VA space linearly. As VA space is 39-bit and phys space is 48-bit, this has caused incorrect mapping and leads to memory access fault. Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is the right thing instead of hard coding it as 48-bit always. [3.378655] Unable to handle kernel paging request at virtual address 3befd00 [3.378662] pgd = ff800a04b000 [3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, *pmd=006268200711 [3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP [3.378938] Modules linked in: [3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.52-tegra-g91402fdc013b-dirty #51 [3.378950] Hardware name: quill (DT) [3.378954] task: ffc1ebac task.stack: ffc1eba64000 [3.378967] PC is at __memset+0x1ac/0x1d0 [3.378976] LR is at sparse_add_one_section+0xf8/0x174 [3.378981] pc : [] lr : [] pstate: 404000c5 [3.378983] sp : ffc1eba67a40 [3.378993] x29: ffc1eba67a40 x28: [3.378999] x27: 0003 x26: 0040 [3.379005] x25: 03ff x24: ffc1e9f6cf80 [3.379010] x23: ff8009ecb2d4 x22: 03befd00 [3.379015] x21: ffc1e9923ff0 x20: 0003 [3.379020] x19: ffef x18: [3.379025] x17: 24d7 x16: [3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c [3.379035] x13: ffc1e9f6c70b x12: 0030 [3.379039] x11: 0040 x10: 0101010101010101 [3.379044] x9 : x8 : 03befd00 [3.379049] x7 : x6 : 003f [3.379053] x5 : 0040 x4 : [3.379058] x3 : 0004 x2 : 00c0 [3.379063] x1 : x0 : 03befd00 [3.379064] [3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028) [3.379071] Call trace: [3.379079] [] __memset+0x1ac/0x1d0 [3.379085] [] __add_pages+0x130/0x2e0 [3.379093] [] hmm_devmem_pages_create+0x20c/0x310 [3.379100] [] hmm_devmem_add+0x1d4/0x270 [3.379128] [] dmirror_probe+0x50/0x158 [3.379137] [] platform_drv_probe+0x60/0xc8 [3.379143] [] driver_probe_device+0x26c/0x420 [3.379149] [] __driver_attach+0x124/0x128 [3.379155] [] bus_for_each_dev+0x88/0xe8 [3.379166] [] driver_attach+0x30/0x40 [3.379171] [] bus_add_driver+0x1f8/0x2b0 [3.379177] [] driver_register+0x68/0x100 [3.379183] [] __platform_driver_register+0x5c/0x68 [3.379192] [] hmm_dmirror_init+0x88/0xc4 [3.379200] [] do_one_initcall+0x5c/0x170 [3.379208] [] kernel_init_freeable+0x1b8/0x258 [3.379231] [] kernel_init+0x18/0x108 [3.379236] [] ret_from_fork+0x10/0x40 [3.379246] ---[ end trace 578db63bb139b8b8 ]--- Signed-off-by: Krishna Reddy --- arch/arm64/include/asm/sparsemem.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h index 74a9d301819f..19ecd0b0f3a3 100644 --- a/arch/arm64/include/asm/sparsemem.h +++ b/arch/arm64/include/asm/sparsemem.h @@ -17,7 +17,13 @@ #define __ASM_SPARSEMEM_H #ifdef CONFIG_SPARSEMEM + +#ifdef CONFIG_ARM64_VA_BITS +#define MAX_PHYSMEM_BITS CONFIG_ARM64_VA_BITS +#else #define MAX_PHYSMEM_BITS 48 +#endif + #define SECTION_SIZE_BITS 30 #endif -- 2.1.4
[PATCH] arm64: tegra: Add SMMU node for Tegra186
Add the DT node for ARM SMMU on Tegra186. Signed-off-by: Krishna Reddy <vdu...@nvidia.com> --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 73 1 file changed, 73 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 0b0552c9f7dd..e2c3ad203c93 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -355,6 +355,79 @@ nvidia,bpmp = <>; }; + smmu: iommu@1200 { + compatible = "arm,mmu-500"; + reg = <0 0x1200 0 0x80>; + #global-interrupts = <1>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + #iommu-cells = <1>; + stream-match-mask = <0x7F80>; + }; + gpu@1700 { compatible = "nvidia,gp10b"; reg = <0x0 0x1700 0x0 0x100>, -- 2.1.4
[PATCH] arm64: tegra: Add SMMU node for Tegra186
Add the DT node for ARM SMMU on Tegra186. Signed-off-by: Krishna Reddy --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 73 1 file changed, 73 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index 0b0552c9f7dd..e2c3ad203c93 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -355,6 +355,79 @@ nvidia,bpmp = <>; }; + smmu: iommu@1200 { + compatible = "arm,mmu-500"; + reg = <0 0x1200 0 0x80>; + #global-interrupts = <1>; + interrupts = , +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +, +; + #iommu-cells = <1>; + stream-match-mask = <0x7F80>; + }; + gpu@1700 { compatible = "nvidia,gp10b"; reg = <0x0 0x1700 0x0 0x100>, -- 2.1.4
[PATCH] mmc: tegra: Mark 64 bit dma broken on Tegra186
SDHCI controllers on Tegra186 support 40 bit addressing. IOVA addresses are 48-bit wide on Tegra186. SDHCI host common code sets dma mask as either 32-bit or 64-bit. To avoid access issues when SMMU is enabled, disable 64-bit dma. Signed-off-by: Krishna Reddy <vdu...@nvidia.com> --- drivers/mmc/host/sdhci-tegra.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c index 0cd6fa80db66..b877c13184c2 100644 --- a/drivers/mmc/host/sdhci-tegra.c +++ b/drivers/mmc/host/sdhci-tegra.c @@ -422,7 +422,15 @@ static const struct sdhci_pltfm_data sdhci_tegra186_pdata = { SDHCI_QUIRK_NO_HISPD_BIT | SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC | SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN, - .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, + .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN | + /* SDHCI controllers on Tegra186 support 40-bit addressing. + * IOVA addresses are 48-bit wide on Tegra186. + * With 64-bit dma mask used for SDHCI, accesses can + * be broken. Disable 64-bit dma, which would fall back + * to 32-bit dma mask. Ideally 40-bit dma mask would work, + * But it is not supported as of now. + */ + SDHCI_QUIRK2_BROKEN_64_BIT_DMA, .ops = _sdhci_ops, }; -- 2.1.4
[PATCH] mmc: tegra: Mark 64 bit dma broken on Tegra186
SDHCI controllers on Tegra186 support 40 bit addressing. IOVA addresses are 48-bit wide on Tegra186. SDHCI host common code sets dma mask as either 32-bit or 64-bit. To avoid access issues when SMMU is enabled, disable 64-bit dma. Signed-off-by: Krishna Reddy --- drivers/mmc/host/sdhci-tegra.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c index 0cd6fa80db66..b877c13184c2 100644 --- a/drivers/mmc/host/sdhci-tegra.c +++ b/drivers/mmc/host/sdhci-tegra.c @@ -422,7 +422,15 @@ static const struct sdhci_pltfm_data sdhci_tegra186_pdata = { SDHCI_QUIRK_NO_HISPD_BIT | SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC | SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN, - .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN, + .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN | + /* SDHCI controllers on Tegra186 support 40-bit addressing. + * IOVA addresses are 48-bit wide on Tegra186. + * With 64-bit dma mask used for SDHCI, accesses can + * be broken. Disable 64-bit dma, which would fall back + * to 32-bit dma mask. Ideally 40-bit dma mask would work, + * But it is not supported as of now. + */ + SDHCI_QUIRK2_BROKEN_64_BIT_DMA, .ops = _sdhci_ops, }; -- 2.1.4
RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
>OK, so that's really just another variant of the existing problem we have with >certain PCI root complexes with restrictive inbound windows. >The appropriate way to handle that is to reserve the unusable areas of the >IOVA space up-front. > Since the support for the ACPI equivalent of "dma-ranges" has just landed, > this is now pretty much top of my to-do-list for > the upcoming cycle (there's various things still to fix in the DT code, but > that's essentially part of the same job). Reserving upfront would work. If you already have in it in to-do-list, we will wait for your patches. >In the case of a 64-bit-capable IP block with only 34 bits of address wired up >externally, if that 34-bit interconnect is described by "dma-ranges" > then the device will already be created with an appropriate 34-bit DMA mask. > The fact that the driver can stomp on that with a 64-bit mask later >is entirely down to the implementations of >dma_set_mask() etc. (I've had a patch to restrict masks for arm64 for a while, >but I worry that it carries quite a high risk of breakage in default cases). Exactly, It is the stomping issue. dma-ranges code updates the mask earlier than sdhci code. Sdhci code is overwriting the dma mask afterwards. -KR
RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
>OK, so that's really just another variant of the existing problem we have with >certain PCI root complexes with restrictive inbound windows. >The appropriate way to handle that is to reserve the unusable areas of the >IOVA space up-front. > Since the support for the ACPI equivalent of "dma-ranges" has just landed, > this is now pretty much top of my to-do-list for > the upcoming cycle (there's various things still to fix in the DT code, but > that's essentially part of the same job). Reserving upfront would work. If you already have in it in to-do-list, we will wait for your patches. >In the case of a 64-bit-capable IP block with only 34 bits of address wired up >externally, if that 34-bit interconnect is described by "dma-ranges" > then the device will already be created with an appropriate 34-bit DMA mask. > The fact that the driver can stomp on that with a 64-bit mask later >is entirely down to the implementations of >dma_set_mask() etc. (I've had a patch to restrict masks for arm64 for a while, >but I worry that it carries quite a high risk of breakage in default cases). Exactly, It is the stomping issue. dma-ranges code updates the mask earlier than sdhci code. Sdhci code is overwriting the dma mask afterwards. -KR
RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
Robin, >>Why? IOVA allocation is already constrained as much as it should be - if the >>device's DMA mask is wrong that's another problem, and this isn't the right >>place to fix it. This is because of following reasons. 1. Many of the HW modules in Tegra114 and Tegra30 can't access IOVA that overlap with MMIO. Even though these HW modules support 32-bit addressing, They can't access IOVA overlapping with range 0xff00: to 0x:f, which is MMIO region for high vectors. These modules can only see 0x8000: to 0xff00: as IOVA. If dma mask is set to use 31-bit, the IOVA range reduces from ~2GB to 1GB. The patch is to get most of IOVA range usable by using dma-ranges property. As we already use dma-ranges start as bottom on IOVA range, The patch is to restrict the IOVA top as well using dma-ranges range. 2. Most of the drivers in Linux Kernel are setting mask as either 32-bit or 64-bit. Especially, I am referring to driver/mmc/host/sdhci.c (sdhci_set_dma_mask()) here. In Tegra124/210/186, HW modules support 34-bit addressing. But, sdhci only support setting mask as either 32-bit or 64-bit. As you pointed, This can be fixed by changing mask in sdhci common code. Or This patch can help here as well without changing the driver code. 1 is the main driving factor for this patch. >>dma_32bit_pfn means nothing more than an internal detail of IOVA allocator >>caching, which is subject to change[1]. As-is, on some platforms this patch >>will effectively force all allocations to fail already. I see that your patches are removing specifying end of IOVA during init_iova_domain() and use mask as the end of IOVA range. Do you have any suggestion on how to handle 1 in a way to use most of IOVA range supported by HW? Can IOVA code look for dma-ranges on its own and limit the iova top to lowest of mask and dma-ranges, if it is present? or any other ways you can think of? -KR -Original Message- From: Robin Murphy [mailto:robin.mur...@arm.com] Sent: Friday, September 1, 2017 2:43 AM To: Joerg Roedel <j...@8bytes.org>; Krishna Reddy <vdu...@nvidia.com> Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region On 01/09/17 10:26, Joerg Roedel wrote: > Adding Robin for review. > > On Thu, Aug 31, 2017 at 03:08:21PM -0700, Krishna Reddy wrote: >> Limit the IOVA allocated to dma-ranges specified for the device. >> This is necessary to ensure that IOVA allocated is addressable by >> device. Why? IOVA allocation is already constrained as much as it should be - if the device's DMA mask is wrong that's another problem, and this isn't the right place to fix it. dma_32bit_pfn means nothing more than an internal detail of IOVA allocator caching, which is subject to change[1]. As-is, on some platforms this patch will effectively force all allocations to fail already. Robin. [1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19694.html >> Signed-off-by: Krishna Reddy <vdu...@nvidia.com> >> --- >> drivers/iommu/dma-iommu.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >> index 9d1cebe7f6cb..e8a8320b571b 100644 >> --- a/drivers/iommu/dma-iommu.c >> +++ b/drivers/iommu/dma-iommu.c >> @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct >> iommu_domain *domain, >> struct iommu_dma_cookie *cookie = domain->iova_cookie; >> struct iova_domain *iovad = >iovad; >> unsigned long shift, iova_len, iova = 0; >> +dma_addr_t dma_end_addr; >> >> if (cookie->type == IOMMU_DMA_MSI_COOKIE) { >> cookie->msi_iova += size; >> @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct >> iommu_domain *domain, >> if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) >> iova_len = roundup_pow_of_two(iova_len); >> >> +/* Limit IOVA allocated to device addressable dma-ranges region. */ >> +dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift; >> +dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit; > > This looks like a good use-case for min(). > >> + >> if (domain->geometry.force_aperture) >> dma_limit = min(dma_limit, domain->geometry.aperture_end); >> >> -- >> 2.1.4
RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
Robin, >>Why? IOVA allocation is already constrained as much as it should be - if the >>device's DMA mask is wrong that's another problem, and this isn't the right >>place to fix it. This is because of following reasons. 1. Many of the HW modules in Tegra114 and Tegra30 can't access IOVA that overlap with MMIO. Even though these HW modules support 32-bit addressing, They can't access IOVA overlapping with range 0xff00: to 0x:f, which is MMIO region for high vectors. These modules can only see 0x8000: to 0xff00: as IOVA. If dma mask is set to use 31-bit, the IOVA range reduces from ~2GB to 1GB. The patch is to get most of IOVA range usable by using dma-ranges property. As we already use dma-ranges start as bottom on IOVA range, The patch is to restrict the IOVA top as well using dma-ranges range. 2. Most of the drivers in Linux Kernel are setting mask as either 32-bit or 64-bit. Especially, I am referring to driver/mmc/host/sdhci.c (sdhci_set_dma_mask()) here. In Tegra124/210/186, HW modules support 34-bit addressing. But, sdhci only support setting mask as either 32-bit or 64-bit. As you pointed, This can be fixed by changing mask in sdhci common code. Or This patch can help here as well without changing the driver code. 1 is the main driving factor for this patch. >>dma_32bit_pfn means nothing more than an internal detail of IOVA allocator >>caching, which is subject to change[1]. As-is, on some platforms this patch >>will effectively force all allocations to fail already. I see that your patches are removing specifying end of IOVA during init_iova_domain() and use mask as the end of IOVA range. Do you have any suggestion on how to handle 1 in a way to use most of IOVA range supported by HW? Can IOVA code look for dma-ranges on its own and limit the iova top to lowest of mask and dma-ranges, if it is present? or any other ways you can think of? -KR -Original Message- From: Robin Murphy [mailto:robin.mur...@arm.com] Sent: Friday, September 1, 2017 2:43 AM To: Joerg Roedel ; Krishna Reddy Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org Subject: Re: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region On 01/09/17 10:26, Joerg Roedel wrote: > Adding Robin for review. > > On Thu, Aug 31, 2017 at 03:08:21PM -0700, Krishna Reddy wrote: >> Limit the IOVA allocated to dma-ranges specified for the device. >> This is necessary to ensure that IOVA allocated is addressable by >> device. Why? IOVA allocation is already constrained as much as it should be - if the device's DMA mask is wrong that's another problem, and this isn't the right place to fix it. dma_32bit_pfn means nothing more than an internal detail of IOVA allocator caching, which is subject to change[1]. As-is, on some platforms this patch will effectively force all allocations to fail already. Robin. [1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19694.html >> Signed-off-by: Krishna Reddy >> --- >> drivers/iommu/dma-iommu.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >> index 9d1cebe7f6cb..e8a8320b571b 100644 >> --- a/drivers/iommu/dma-iommu.c >> +++ b/drivers/iommu/dma-iommu.c >> @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct >> iommu_domain *domain, >> struct iommu_dma_cookie *cookie = domain->iova_cookie; >> struct iova_domain *iovad = >iovad; >> unsigned long shift, iova_len, iova = 0; >> +dma_addr_t dma_end_addr; >> >> if (cookie->type == IOMMU_DMA_MSI_COOKIE) { >> cookie->msi_iova += size; >> @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct >> iommu_domain *domain, >> if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) >> iova_len = roundup_pow_of_two(iova_len); >> >> +/* Limit IOVA allocated to device addressable dma-ranges region. */ >> +dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift; >> +dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit; > > This looks like a good use-case for min(). > >> + >> if (domain->geometry.force_aperture) >> dma_limit = min(dma_limit, domain->geometry.aperture_end); >> >> -- >> 2.1.4
[PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
Limit the IOVA allocated to dma-ranges specified for the device. This is necessary to ensure that IOVA allocated is addressable by device. Signed-off-by: Krishna Reddy <vdu...@nvidia.com> --- drivers/iommu/dma-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9d1cebe7f6cb..e8a8320b571b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; unsigned long shift, iova_len, iova = 0; + dma_addr_t dma_end_addr; if (cookie->type == IOMMU_DMA_MSI_COOKIE) { cookie->msi_iova += size; @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) iova_len = roundup_pow_of_two(iova_len); + /* Limit IOVA allocated to device addressable dma-ranges region. */ + dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift; + dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit; + if (domain->geometry.force_aperture) dma_limit = min(dma_limit, domain->geometry.aperture_end); -- 2.1.4
[PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region
Limit the IOVA allocated to dma-ranges specified for the device. This is necessary to ensure that IOVA allocated is addressable by device. Signed-off-by: Krishna Reddy --- drivers/iommu/dma-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9d1cebe7f6cb..e8a8320b571b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = >iovad; unsigned long shift, iova_len, iova = 0; + dma_addr_t dma_end_addr; if (cookie->type == IOMMU_DMA_MSI_COOKIE) { cookie->msi_iova += size; @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain, if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) iova_len = roundup_pow_of_two(iova_len); + /* Limit IOVA allocated to device addressable dma-ranges region. */ + dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift; + dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit; + if (domain->geometry.force_aperture) dma_limit = min(dma_limit, domain->geometry.aperture_end); -- 2.1.4
RE: [HMM v17 09/14] mm/hmm/mirror: mirror process address space on device with HMM helpers
+/* + * struct hmm_mirror_ops - HMM mirror device operations callback + * + * @update: callback to update range on a device */ struct +hmm_mirror_ops { + /* update() - update virtual address range of memory +* +* @mirror: pointer to struct hmm_mirror +* @update: update's type (turn read only, unmap, ...) +* @start: virtual start address of the range to update +* @end: virtual end address of the range to update ... +*/ + void (*update)(struct hmm_mirror *mirror, + enum hmm_update action, + unsigned long start, + unsigned long end); +}; minor arg documentation issue. @update should be @action.
RE: [HMM v17 09/14] mm/hmm/mirror: mirror process address space on device with HMM helpers
+/* + * struct hmm_mirror_ops - HMM mirror device operations callback + * + * @update: callback to update range on a device */ struct +hmm_mirror_ops { + /* update() - update virtual address range of memory +* +* @mirror: pointer to struct hmm_mirror +* @update: update's type (turn read only, unmap, ...) +* @start: virtual start address of the range to update +* @end: virtual end address of the range to update ... +*/ + void (*update)(struct hmm_mirror *mirror, + enum hmm_update action, + unsigned long start, + unsigned long end); +}; minor arg documentation issue. @update should be @action.
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
> > The device(H/W controller) need to access few special memory > > blocks(IOVA==PA) and DRAM as well. > > OK, so only /some/ of the VA space is VA==PA, and some is remapped; that's a > little different that what you originally implied above. > > BTW, which HW module is this; AVP/COP or something else. This sounds like an > odd requirement. This is not specific to ARM7. There are protected memory regions on Tegra that can be accessed by some controllers like display, 2D, 3D, VDE, HDA. These are DRAM regions configured as protected by BootRom. These memory regions are not exposed to and not managed by OS page allocator. The H/W controller accesses to these regions still to go through IOMMU. The IOMMU view for all the H/W controllers is not uniform on Tegra. Some Controllers see entire 4GB IOVA space. i.e all accesses go though IOMMU. Some controllers see the IOVA Space that don't overlap with MMIO space. i.e The MMIO address access bypass IOMMU and directly go to MMIO space. Tegra IOMMU can support multiple address spaces as well. To hide controller Specific behavior, the drivers should take care of one to one mapping and remove inaccessible iova spaces in their address space's based platform device info. In my initial mail, I referred protected memory regions as MMIO blocks, which is incorrect. -KR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
The device(H/W controller) need to access few special memory blocks(IOVA==PA) and DRAM as well. OK, so only /some/ of the VA space is VA==PA, and some is remapped; that's a little different that what you originally implied above. BTW, which HW module is this; AVP/COP or something else. This sounds like an odd requirement. This is not specific to ARM7. There are protected memory regions on Tegra that can be accessed by some controllers like display, 2D, 3D, VDE, HDA. These are DRAM regions configured as protected by BootRom. These memory regions are not exposed to and not managed by OS page allocator. The H/W controller accesses to these regions still to go through IOMMU. The IOMMU view for all the H/W controllers is not uniform on Tegra. Some Controllers see entire 4GB IOVA space. i.e all accesses go though IOMMU. Some controllers see the IOVA Space that don't overlap with MMIO space. i.e The MMIO address access bypass IOMMU and directly go to MMIO space. Tegra IOMMU can support multiple address spaces as well. To hide controller Specific behavior, the drivers should take care of one to one mapping and remove inaccessible iova spaces in their address space's based platform device info. In my initial mail, I referred protected memory regions as MMIO blocks, which is incorrect. -KR -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
> > On Tegra, the following use cases need specific IOVA mapping. > > 1. Few MMIO blocks need IOVA=PA mapping setup. > > In that case, why would we enable the IOMMU for that one device; IOMMU > disabled means VA==PA, right? Perhaps isolation of the device so it can only > access certain PA ranges for security? The device(H/W controller) need to access few special memory blocks(IOVA==PA) and DRAM as well. If IOMMU is disabled, then it has to handle memory fragmentation, which defeats the purpose of IOMMU support. There is also a case where frame buffer memory is passed from BootLoader to Kernel and display H/W continues to access it with IOMMU enabled. To support this, the one to one mapping has to be setup before enabling IOMMU. -KR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
On Tegra, the following use cases need specific IOVA mapping. 1. Few MMIO blocks need IOVA=PA mapping setup. In that case, why would we enable the IOMMU for that one device; IOMMU disabled means VA==PA, right? Perhaps isolation of the device so it can only access certain PA ranges for security? The device(H/W controller) need to access few special memory blocks(IOVA==PA) and DRAM as well. If IOMMU is disabled, then it has to handle memory fragmentation, which defeats the purpose of IOMMU support. There is also a case where frame buffer memory is passed from BootLoader to Kernel and display H/W continues to access it with IOMMU enabled. To support this, the one to one mapping has to be setup before enabling IOMMU. -KR -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
> When a device driver would only use the IOMMU-API and needs small DMA- > able areas it has to re-implement something like the DMA-API (basically an > address allocator) for that. So I don't see a reason why both can't be used > in a > device driver. On Tegra, the following use cases need specific IOVA mapping. 1. Few MMIO blocks need IOVA=PA mapping setup. 2. CPU side loads the firmware into physical memory, which has to be mapped to a specific IOVA address, as firmware is statically linked based on specific IOVA address. DMA api's allow specifying only one address space per platform device. For #1, DMA API can't be used as it doesn't allow mapping specific IOVA to PA. IOMMU API can be used for mapping specific IOVA to PA. But, in order to use IOMMU API, the driver has to dereference the dev pointer, get domain ptr, take lock, and allocate memory from dma_iommu_mapping. This breaks the abstraction for struct device. Each device driver that need IOVA=PA has to do this, which is redundant. For #2, physical memory allocations alone can be done through DMA as it also allocates IOVA space Implicitly. Even after allocating physical memory through DMA API's, it would have same problem as #1 for IOVA to PA mapping. If a fake device is expected to be created for specific IOVA allocation, then it may lead to creating multiple fake devices per specific IOVA and per ASID(unique IOVA address space). As domain init would be done based on device name, the fake device should have the same name as of original platform device. If DMA API allows allocating specific IOVA address and mapping IOVA to specific PA, device driver don't need to know any details of struct device and specifying one mapping per device is enough and no need for fake devices. Comments are much appreciated. -KR > -Original Message- > From: Joerg Roedel [mailto:joerg.roe...@amd.com] > Sent: Wednesday, September 19, 2012 5:50 AM > To: Arnd Bergmann > Cc: Hiroshi Doyu; m.szyprow...@samsung.com; li...@arm.linux.org.uk; > minc...@kernel.org; chunsang.je...@linaro.org; linux- > ker...@vger.kernel.org; subas...@gmail.com; linaro-mm-...@lists.linaro.org; > linux...@kvack.org; io...@lists.linux-foundation.org; Krishna Reddy; linux- > te...@vger.kernel.org; kyungmin.p...@samsung.com; > pullip@samsung.com; linux-arm-ker...@lists.infradead.org > Subject: Re: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA > more precisely > > On Wed, Sep 19, 2012 at 07:59:45AM +, Arnd Bergmann wrote: > > On Wednesday 19 September 2012, Hiroshi Doyu wrote: > > > I guess that it would work. Originally I thought that using DMA-API > > > and IOMMU-API together in driver might be kind of layering violation > > > since IOMMU-API itself is used in DMA-API. Only DMA-API used in > > > driver might be cleaner. Considering that DMA API traditionally > > > handling anonymous {bus,iova} address only, introducing the concept > > > of specific address in DMA API may not be so encouraged, though. > > > > > > It would be nice to listen how other SoCs have solved similar needs. > > > > In general, I would recommend using only the IOMMU API when you have a > > device driver that needs to control the bus virtual address space and > > that manages a device that resides in its own IOMMU context. I would > > recommend using only the dma-mapping API when you have a device that > > lives in a shared bus virtual address space with other devices, and > > then never ask for a specific bus virtual address. > > > > Can you explain what devices you see that don't fit in one of those > > two categories? > > Well, I don't think that a driver should limit to one of these 2 APIs. A > driver can > very well use the IOMMU-API during initialization (for example to map the > firmware to an address the device expects it to be) and use the DMA-API later > during normal operation to exchange data with the device. > > When a device driver would only use the IOMMU-API and needs small DMA- > able areas it has to re-implement something like the DMA-API (basically an > address allocator) for that. So I don't see a reason why both can't be used > in a > device driver. > > Regards, > > Joerg > > -- > AMD Operating System Research Center > > Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General > Managers: Alberto Bozzo > Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. > 43632 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely
When a device driver would only use the IOMMU-API and needs small DMA- able areas it has to re-implement something like the DMA-API (basically an address allocator) for that. So I don't see a reason why both can't be used in a device driver. On Tegra, the following use cases need specific IOVA mapping. 1. Few MMIO blocks need IOVA=PA mapping setup. 2. CPU side loads the firmware into physical memory, which has to be mapped to a specific IOVA address, as firmware is statically linked based on specific IOVA address. DMA api's allow specifying only one address space per platform device. For #1, DMA API can't be used as it doesn't allow mapping specific IOVA to PA. IOMMU API can be used for mapping specific IOVA to PA. But, in order to use IOMMU API, the driver has to dereference the dev pointer, get domain ptr, take lock, and allocate memory from dma_iommu_mapping. This breaks the abstraction for struct device. Each device driver that need IOVA=PA has to do this, which is redundant. For #2, physical memory allocations alone can be done through DMA as it also allocates IOVA space Implicitly. Even after allocating physical memory through DMA API's, it would have same problem as #1 for IOVA to PA mapping. If a fake device is expected to be created for specific IOVA allocation, then it may lead to creating multiple fake devices per specific IOVA and per ASID(unique IOVA address space). As domain init would be done based on device name, the fake device should have the same name as of original platform device. If DMA API allows allocating specific IOVA address and mapping IOVA to specific PA, device driver don't need to know any details of struct device and specifying one mapping per device is enough and no need for fake devices. Comments are much appreciated. -KR -Original Message- From: Joerg Roedel [mailto:joerg.roe...@amd.com] Sent: Wednesday, September 19, 2012 5:50 AM To: Arnd Bergmann Cc: Hiroshi Doyu; m.szyprow...@samsung.com; li...@arm.linux.org.uk; minc...@kernel.org; chunsang.je...@linaro.org; linux- ker...@vger.kernel.org; subas...@gmail.com; linaro-mm-...@lists.linaro.org; linux...@kvack.org; io...@lists.linux-foundation.org; Krishna Reddy; linux- te...@vger.kernel.org; kyungmin.p...@samsung.com; pullip@samsung.com; linux-arm-ker...@lists.infradead.org Subject: Re: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely On Wed, Sep 19, 2012 at 07:59:45AM +, Arnd Bergmann wrote: On Wednesday 19 September 2012, Hiroshi Doyu wrote: I guess that it would work. Originally I thought that using DMA-API and IOMMU-API together in driver might be kind of layering violation since IOMMU-API itself is used in DMA-API. Only DMA-API used in driver might be cleaner. Considering that DMA API traditionally handling anonymous {bus,iova} address only, introducing the concept of specific address in DMA API may not be so encouraged, though. It would be nice to listen how other SoCs have solved similar needs. In general, I would recommend using only the IOMMU API when you have a device driver that needs to control the bus virtual address space and that manages a device that resides in its own IOMMU context. I would recommend using only the dma-mapping API when you have a device that lives in a shared bus virtual address space with other devices, and then never ask for a specific bus virtual address. Can you explain what devices you see that don't fit in one of those two categories? Well, I don't think that a driver should limit to one of these 2 APIs. A driver can very well use the IOMMU-API during initialization (for example to map the firmware to an address the device expects it to be) and use the DMA-API later during normal operation to exchange data with the device. When a device driver would only use the IOMMU-API and needs small DMA- able areas it has to re-implement something like the DMA-API (basically an address allocator) for that. So I don't see a reason why both can't be used in a device driver. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/