RE: [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-17 Thread Krishna Reddy
Tested-by: Krishna Reddy 

Validated nested translations with NVMe PCI device assigned to Guest VM. 
Tested with both v12 and v13 of Jean-Philippe's patches as base.

> This is based on Jean-Philippe's
> [PATCH v12 00/10] iommu: I/O page faults for SMMUv3
> https://lore.kernel.org/linux-arm-kernel/YBfij71tyYvh8LhB@myrica/T/

With Jean-Philippe's V13, Patch 12 of this series has a conflict that had to be 
resolved manually.

-KR




RE: [PATCH v12 00/13] SMMUv3 Nested Stage Setup (VFIO part)

2021-03-17 Thread Krishna Reddy
Tested-by: Krishna Reddy 

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM and is 
functional.

This patch series resolved the mismatch(seen with v11 patches) for 
VFIO_IOMMU_SET_PASID_TABLE and VFIO_IOMMU_CACHE_INVALIDATE Ioctls between linux 
and QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" 
(v5.2.0-2stage-rfcv8). 

-KR



RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-16 Thread Krishna Reddy
> Hi Krishna,
> On 3/15/21 7:04 PM, Krishna Reddy wrote:
> > Tested-by: Krishna Reddy 
> >
> >> 1) pass the guest stage 1 configuration
> >
> > Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM
> along with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and
> QEMU patch series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from
> v5.2.0-2stage-rfcv8.
> > NVMe PCIe device is functional with 2-stage translations and no issues
> observed.
> Thank you very much for your testing efforts. For your info, there are more
> recent kernel series:
> [PATCH v14 00/13] SMMUv3 Nested Stage Setup (IOMMU part) (Feb 23) [PATCH
> v12 00/13] SMMUv3 Nested Stage Setup (VFIO part) (Feb 23)
> 
> working along with QEMU RFC
> [RFC v8 00/28] vSMMUv3/pSMMUv3 2 stage VFIO integration (Feb 25)
> 
> If you have cycles to test with those, this would be higly appreciated.
 
Thanks Eric for the latest patches. Will validate and update. 
Feel free to reach out me for validating future patch sets as necessary.

-KR


RE: [PATCH v11 00/13] SMMUv3 Nested Stage Setup (VFIO part)

2021-03-15 Thread Krishna Reddy
Tested-by: Krishna Reddy 

> 1) pass the guest stage 1 configuration
> 3) invalidate stage 1 related caches

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
with patch series "v13 SMMUv3 Nested Stage Setup (IOMMU part)" and QEMU patch 
series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR



RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-03-15 Thread Krishna Reddy
Tested-by: Krishna Reddy 

> 1) pass the guest stage 1 configuration

Validated Nested SMMUv3 translations for NVMe PCIe device from Guest VM along 
with patch series "v11 SMMUv3 Nested Stage Setup (VFIO part)" and QEMU patch 
series "vSMMUv3/pSMMUv3 2 stage VFIO integration" from v5.2.0-2stage-rfcv8. 
NVMe PCIe device is functional with 2-stage translations and no issues observed.

-KR



[PATCH v11 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-18 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 19f906de6420..cdd15ead9bc4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2



[PATCH v11 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-18 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_sm

[PATCH v11 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-18 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Reviewed-by: Jon Hunter 
Reviewed-by: Rob Herring 
Reviewed-by: Robin Murphy 
Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml   | 25 ++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 93fb9fe068b9..503160a7b9a0 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -44,6 +44,11 @@ properties:
 items:
   - const: marvell,ap806-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -61,7 +66,8 @@ properties:
   - cavium,smmu-v2
 
   reg:
-maxItems: 1
+minItems: 1
+maxItems: 2
 
   '#global-interrupts':
 description: The number of global interrupts exposed by the device.
@@ -144,6 +150,23 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+else:
+  properties:
+reg:
+  maxItems: 1
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2



[PATCH v11 0/5] NVIDIA ARM SMMU Implementation

2020-07-18 Thread Krishna Reddy
Changes in v11:
Addressed Rob comment on DT binding patch to set min/maxItems of reg property 
in else part.
Rebased on top of 
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates.

Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
https://lkml.org/lkml/2020/7/8/57

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  25 +-
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 334 insertions(+), 11 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 49fbb25030265c660de732513f18275d88ff99d3
-- 
2.26.2



[PATCH v11 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-18 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 496fd4eafb68..ee2c0ba13a0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16810,8 +16810,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c87d825f651e..f4ff124a1967 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -213,6 +213,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxe

[PATCH v11 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-18 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Reviewed-by: Jon Hunter 
Reviewed-by: Nicolin Chen 
Reviewed-by: Pritesh Raithatha 
Reviewed-by: Robin Murphy 
Reviewed-by: Thierry Reding 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index cdd15ead9bc4..de520115d3df 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2123,10 +2123,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2138,6 +2134,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2



RE: [PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-17 Thread Krishna Reddy
>On Mon, Jul 13, 2020 at 02:50:20PM +0100, Will Deacon wrote:
>> On Tue, Jul 07, 2020 at 10:00:12PM -0700, Krishna Reddy wrote:
> >> Changes in v10:
>> > Perform SMMU base ioremap before calling implementation init.
>> > Check for Global faults across both ARM MMU-500s during global interrupt.
>> > Check for context faults across all contexts of both ARM MMU-500s during 
>> > context fault interrupt.
> >> Add new DT binding nvidia,smmu-500 for NVIDIA implementation.
>>
>> Please repost based on my SMMU queue, as this doesn't currently apply.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=
>> for-joerg/arm-smmu/updates

>Any update on this, please? It would be a shame to miss 5.9 now that the 
>patches look alright.

Thanks for pinging.
I have been out of the office this week. Just started working on reposting 
patches.

Will


RE: [PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-10 Thread Krishna Reddy
Thanks Rob. One question on setting "minItems: ". Please see below.

>> +allOf:
>> +  - if:
>> +  properties:
>> +compatible:
>> +  contains:
>> +enum:
>> +  - nvidia,tegra194-smmu
>> +then:
>> +  properties:
>> +reg:
>> +  minItems: 2
>> +  maxItems: 2

>This doesn't work. The main part of the schema already said there's only
>1 reg region. This part is ANDed with that, not an override. You need to add 
>an else clause with 'maxItems: 1' and change the base schema to
>{minItems: 1, maxItems: 2}.

As the earlier version of base schema doesn't have "minItems: " set, should it 
be set to 0 for backward compatibility?  Or can it just be omitted setting in 
base schema as before?

"else" part to set "maxItems: 1" and setting "maxItems: 2" in base schema is 
clear to me.


-KR


[PATCH v10 5/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-07 Thread Krishna Reddy
Add global/context fault hooks to allow vendor specific implementations
override default fault interrupt handlers.

Update NVIDIA implementation to override the default global/context fault
interrupt handlers and handle interrupts across the two ARM MMU-500s that
are programmed identically.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 99 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 117 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 2f55e5793d34..31368057e9be 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -127,6 +127,103 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+
+   return ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int idx;
+   unsigned int inst;
+   irqreturn_t ret = IRQ_NONE;
+   struct arm_smmu_device *smmu;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain;
+
+   smmu_domain = container_of(domain, struct arm_smmu_domain, domain);
+   smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < NUM_SMMU_INSTANCES; inst++) {
+   irqreturn_t irq_ret;
+
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+   if (irq_ret == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   }
+
+   return ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -134,6 +231,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu

[PATCH v10 1/5] iommu/arm-smmu: move TLB timeout and spin count macros

2020-07-07 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..d2054178df35 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..c7d0122a7c6c 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2



[PATCH v10 3/5] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-07-07 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of the ARM MMU-500s together to interleave IOVA
accesses across them and must be programmed identically.
This implementation supports programming the two ARM MMU-500s
that must be programmed identically.

The third ARM MMU-500 instance is supported by standard
arm-smmu.c driver itself.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 179 
 drivers/iommu/arm-smmu.c|   1 +
 drivers/iommu/arm-smmu.h|   1 +
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c23352059a6b..534cedaf8e55 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16811,8 +16811,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb..2b8203db73ec 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..f15571d05474 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..2f55e5793d34
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together and must be programmed identically for
+ * interleaved IOVA accesses across them and translates accesses from
+ * non-isochronous HW devices.
+ * Third one is used for translating accesses from isochronous HW devices.
+ * This implementation supports programming of the two instances that must
+ * be programmed identically.
+ * The third instance usage is through standard arm-smmu driver itself and
+ * is out of scope of this implementation.
+ */
+#define NUM_SMMU_INSTANCES 2
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu;
+
+   nvidia_smmu = container_of(smmu, struct nvidia_smmu, smmu);
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < NUM_SMMU_INSTANCES; i++) {
+   void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;
+
+   writel_relaxed(val, reg);
+   }
+}
+
+static u64 nvidia_smmu_read_reg64(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readq_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg64(struct arm_smmu_device *smmu,
+   int page, int offset, u64 val)
+{
+   unsigned int i;
+
+  

[PATCH v10 0/5] NVIDIA ARM SMMU Implementation

2020-07-07 Thread Krishna Reddy
Changes in v10:
Perform SMMU base ioremap before calling implementation init.
Check for Global faults across both ARM MMU-500s during global interrupt.
Check for context faults across all contexts of both ARM MMU-500s during 
context fault interrupt.
Add new DT binding nvidia,smmu-500 for NVIDIA implementation.

v9 - https://lkml.org/lkml/2020/6/30/1282
v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588


Krishna Reddy (5):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: ioremap smmu mmio region before implementation init
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 278 ++
 drivers/iommu/arm-smmu.c  |  29 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 328 insertions(+), 10 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: e5640f21b63d2a5d3e4e0c4111b2b38e99fe5164
-- 
2.26.2



[PATCH v10 2/5] iommu/arm-smmu: ioremap smmu mmio region before implementation init

2020-07-07 Thread Krishna Reddy
ioremap smmu mmio region before calling into implementation init.
This is necessary to allow mapped address available during vendor
specific implementation init.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054178df35..e03e873d3bca 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2120,10 +2120,6 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
if (err)
return err;
 
-   smmu = arm_smmu_impl_init(smmu);
-   if (IS_ERR(smmu))
-   return PTR_ERR(smmu);
-
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
ioaddr = res->start;
smmu->base = devm_ioremap_resource(dev, res);
@@ -2135,6 +2131,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
 */
smmu->numpage = resource_size(res);
 
+   smmu = arm_smmu_impl_init(smmu);
+   if (IS_ERR(smmu))
+   return PTR_ERR(smmu);
+
num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
-- 
2.26.2



[PATCH v10 4/5] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-07-07 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..ac1f526c3424 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that program two ARM MMU-500s identically
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 2
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2



RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
On 01/07/2020 20:00, Krishna Reddy wrote:
>>>>>> +items:
>>>>>> +  - enum:
>>>>>> +  - nvdia,tegra194-smmu
>>>>>> +  - const: arm,mmu-500
>>>>
>>>>> Is the fallback compatible appropriate here? If software treats this as a 
>>>>> standard MMU-500 it will only program the first instance (because the 
>>>>> second isn't presented as a separate MMU-500) - is there any way that 
>>>>> isn't going to blow up?
>>>>
>>>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>>>> implementation override ensure that both instances are programmed. Isn't 
>>>> it? I am not sure I follow your comment fully.
>> 
>>> The problem is, if for some reason someone had a Tegra194, but only set the 
>>> compatible string to 'arm,mmu-500' it would assume that it was a normal 
>>> arm,mmu-500 and only one instance would be programmed. We always want at 
>>> least 2 of the 3 instances >>programmed and so we should only match 
>>> 'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>>> arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>>> _mmu500.
>> 
>> In that case, new binding "nvidia,smmu-v2" can be added with data set to 
>> _mmu500 and enumeration would have nvidia,tegra194-smmu and another 
>> variant for next generation SoC in future. 

>I think you would be better off with nvidia,smmu-500 as smmu-v2 appears to be 
>something different. I see others have a smmu-v2 but I am not sure if that is 
>legacy. We have an smmu-500 and so that would seem more appropriate.

I tried to use the binding synonymous to other vendors. 
V2 is the architecture version.  MMU-500 is the actual implementation from ARM 
based on V2 arch.  As we just use the MMU-500 IP as it is, It can be named as 
nvidia,smmu-500 or similar as well.
Others probably having their own implementation based on V2 arch.

KR
--
nvpublic


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>> With shared irq line, the context fault identification is not optimal 
>> already.  Reading all the context banks all the time can be additional mmio 
>> read overhead. But, it may not hurt the real use cases as these happen only 
>> when there are bugs.

>Right, I did ponder the idea of a whole programmatic "request_context_irq" 
>hook that would allow registering the handler for both interrupts with the 
>appropriate context bank and instance data, but since all interrupts are 
>currently unexpected it seems somewhat hard to justify the extra complexity. 
>Obviously we can revisit this in future if you want to start actually doing 
>something with faults like the qcom GPU folks do.

Thanks, I would just avoid making changes to interrupt handlers till it is 
really necessary in future. The current code would just be simple and 
functional with more interrupts when there are multiple faults.

-KR


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>Yeah, I realised later last night that this probably originated from forking 
>the whole driver downstream. But even then you could have treated the other 
>one as a separate nsmmu with a single instance ;)

True, But the initial nvidia implementation had limitation that it can only 
handle one instance of usage. With your implementation hooks design, it should 
be able to handle multiple instances of usage now.

>Since it does add a bit of confusion to the code and comments, let's just keep 
>things simple. I do like Jon's suggestion of actually enforcing that the 
>number of "reg" regions exactly matches the number expected for the given 
>compatible - I guess for now that means just hard-coding 2 and hoping the 
>hardware folks don't cook up any more of these...

For T194, reg can just be forced to 2. No future plan to use more than two 
MMU-500s together as of now. 

-KR 


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
 +items:
 +  - enum:
 +  - nvdia,tegra194-smmu
 +  - const: arm,mmu-500
> >
>>> Is the fallback compatible appropriate here? If software treats this as a 
>>> standard MMU-500 it will only program the first instance (because the 
>>> second isn't presented as a separate MMU-500) - is there any way that isn't 
>>> going to blow up?
> >
>> When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
>> implementation override ensure that both instances are programmed. Isn't it? 
>> I am not sure I follow your comment fully.

>The problem is, if for some reason someone had a Tegra194, but only set the 
>compatible string to 'arm,mmu-500' it would assume that it was a normal 
>arm,mmu-500 and only one instance would be programmed. We always want at least 
>2 of the 3 instances >programmed and so we should only match 
>'nvidia,tegra194-smmu'. In fact, I think that we also need to update the 
>arm_smmu_of_match table to add 'nvidia,tegra194-smmu' with the data set to 
>_mmu500.

In that case, new binding "nvidia,smmu-v2" can be added with data set to 
_mmu500 and enumeration would have nvidia,tegra194-smmu and another variant 
for next generation SoC in future. 

-KR
--
nvpublic


RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-07-01 Thread Krishna Reddy
>>> +for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
>>> +irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any chance there could be more than one SMMU faulting by the time we 
>> service the interrupt?

>It certainly seems plausible if the interconnect is automatically 
>load-balancing requests across the SMMU instances - say a driver bug caused a 
>buffer to be unmapped too early, there could be many in-flight accesses to 
>parts of that buffer that aren't all taking the same path and thus could now 
>fault in parallel.
>[ And anyone inclined to nitpick global vs. context faults, s/unmap a 
>buffer/tear down a domain/ ;) ]
>Either way I think it would be easier to reason about if we just handled these 
>like a typical shared interrupt and always checked all the instances.

It would be optimal to check at the same time across all instances. 

>>> +for (idx = 0; idx < smmu->num_context_banks; idx++) {
>>> +irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
>>> + idx, 
>>> + inst);
>>> +
>>> +if (irq_ret == IRQ_HANDLED)
>>> +return irq_ret;
>>
>> Any reason why we don't check all banks?

>As above, we certainly shouldn't bail out without checking the bank for the 
>offending domain across all of its instances, and I guess the way this works 
>means that we would have to iterate all the banks to achieve that.

With shared irq line, the context fault identification is not optimal already.  
Reading all the context banks all the time can be additional mmio read 
overhead. But, it may not hurt the real use cases as these happen only when 
there are bugs. 

-KR


RE: [PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-07-01 Thread Krishna Reddy
>> +  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
> Hmm, there must be a better way to word that to express that it only applies 
> to the sets of SMMUs that must be programmed identically, and not any other 
> independent MMU-500s that might also happen to be in the same SoC.

Let me reword it to "NVIDIA SoCs that must program multiple MMU-500s 
identically".

>> +items:
>> +  - enum:
>> +  - nvdia,tegra194-smmu
>> +  - const: arm,mmu-500

>Is the fallback compatible appropriate here? If software treats this as a 
>standard MMU-500 it will only program the first instance (because the second 
>isn't presented as a separate MMU-500) - is there any way that isn't going to 
>blow up?

When compatible is set to both nvidia,tegra194-smmu and arm,mmu-500, 
implementation override ensure that both instances are programmed. Isn't it? I 
am not sure I follow your comment fully.

-KR



RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-07-01 Thread Krishna Reddy
>> + * When Linux kernel supports multiple SMMU devices, the SMMU device 
>> +used for
>> + * isochornous HW devices should be added as a separate ARM MMU-500 
>> +device
>> + * in DT and be programmed independently for efficient TLB invalidates.

>I don't understand the "When" there - the driver has always supported multiple 
>independent SMMUs, and it's not something that could be configured out or 
>otherwise disabled. Plus I really don't see why you would ever want to force 
>unrelated SMMUs to be >programmed together - beyond the TLB thing mentioned it 
>would also waste precious context bank resources and might lead to weird 
>device grouping via false stream ID aliasing, with no obvious upside at all.

Sorry, I missed this comment.
During the initial patches, when the iommu_ops were different between, support 
multiple SMMU drivers at the same is not possible as one of them(that gets 
probed last) overwrites the platform bus ops. 
On revisiting the original issue, This problem is no longer relevant. At this 
point, It makes more sense to just get rid of 3rd instance programming in 
arm-smmu-nvidia.c and just limit it to the SMMU instances that need identical 
programming.

-KR





[PATCH v9 4/4] iommu/arm-smmu: add global/context fault implementation hooks

2020-06-30 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5c874912e1c1a..d279788eab954 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -144,6 +144,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+struct arm_smmu_device *smmu,
+int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, 
GFSYNR2 0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+ struct arm_smmu_device *smmu,
+ int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, 
fsynr=0x%x, cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line is shared between all contexts.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -151,6 +247,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d2054

[PATCH v9 2/4] iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
NVIDIA's Tegra194 SoC has three ARM MMU-500 instances.
It uses two of ARM MMU-500s together to interleave IOVA accesses
across them and must be programmed identically.
The third SMMU instance is used as a regular ARM MMU-500 and it
can either be programmed independently or identical to other
two ARM MMU-500s.

This implementation supports programming two or three ARM MMU-500s
identically as per DT config.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 206 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..f15571d05474e 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..5c874912e1c1a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+ int page, int offset, u32 val)
+{
+   unsigned int i;
+   struc

[PATCH v9 3/4] dt-bindings: arm-smmu: add binding for Tegra194 SMMU

2020-06-30 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 .../devicetree/bindings/iommu/arm,smmu.yaml| 18 ++
 1 file changed, 18 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..662c46e16f07d 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvidia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
@@ -138,6 +143,19 @@ required:
 
 additionalProperties: false
 
+allOf:
+  - if:
+  properties:
+compatible:
+  contains:
+enum:
+  - nvidia,tegra194-smmu
+then:
+  properties:
+reg:
+  minItems: 2
+  maxItems: 3
+
 examples:
   - |+
 /* SMMU with stream matching or stream indexing */
-- 
2.26.2



[PATCH v9 1/4] iommu/arm-smmu: move TLB timeout and spin count macros

2020-06-30 Thread Krishna Reddy
Move TLB timeout and spin count macros to header file to
allow using the same values from vendor specific implementations.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu.c | 3 ---
 drivers/iommu/arm-smmu.h | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d2054178df357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -52,9 +52,6 @@
  */
 #define QCOM_DUMMY_VAL -1
 
-#define TLB_LOOP_TIMEOUT   100 /* 1s! */
-#define TLB_SPIN_COUNT 10
-
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be618..c7d0122a7c6ca 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -236,6 +236,8 @@ enum arm_smmu_cbar_type {
 /* Maximum number of context banks per SMMU */
 #define ARM_SMMU_MAX_CBS   128
 
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
 
 /* Shared driver definitions */
 enum arm_smmu_arch_version {
-- 
2.26.2



[PATCH v9 0/4] NVIDIA ARM SMMUv2 Implementation

2020-06-30 Thread Krishna Reddy
Changes in v9:
Move TLB Timeout and spin count macros to arm-smmu.h header to share with 
implementation.
Set minItems and maxItems for reg property when compatible contains 
nvidia,tegra194-smmu.
Update commit message for NVIDIA implementation patch.
Fail single SMMU instance usage through NVIDIA implementation to limit the 
usage to two or three instances.
Fix checkpatch warnings with --strict checking.

v8 - https://lkml.org/lkml/2020/6/29/2385
v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: move TLB timeout and spin count macros
  iommu/arm-smmu: add NVIDIA implementation for ARM MMU-500 usage
  dt-bindings: arm-smmu: add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |  18 ++
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 304 ++
 drivers/iommu/arm-smmu.c  |  20 +-
 drivers/iommu/arm-smmu.h  |   6 +
 7 files changed, 349 insertions(+), 6 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2



RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> The driver intend to support up to 3 instances. It doesn't really mandate 
>> that all three instances be present in same DT node.
>> Each mmio aperture in "reg" property is an instance here. reg = 
>> , , ; The reg can have 
>> all three or less and driver just configures based on reg and it works fine.

>So it sounds like we need at least 2 SMMUs (for non-iso and iso) but we have 
>up to 3 (for Tegra194). So the question is do we have a use-case where we only 
>use 2 and not 3? If not, then it still seems that we should require that all 3 
>are present.

It can be either 2 SMMUs (for non-iso) or 3 SMMUs (for non-iso and iso).  Let 
me fail the one instance case as it can use regular arm smmu implementation and 
don't  need nvidia implementation explicitly.
 
>The other problem I see here is that currently the arm-smmu binding defines 
>the 'reg' with a 'maxItems' of 1, whereas we have 3. I believe that this will 
>get caught by the 'dt_binding_check' when we try to populate the binding.

Thanks for pointing it out! Will update the binding doc.

-KR

--
nvpublic


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>OK, well I see what you are saying, but if we intended to support all 3 for 
>Tegra194, then we should ensure all 3 are initialised correctly.

The driver intend to support up to 3 instances. It doesn't really mandate that 
all three instances be present in same DT node.
Each mmio aperture in "reg" property is an instance here. reg = , , ;
The reg can have all three or less and driver just configures based on reg and 
it works fine.

>It would be better to query the number of SMMUs populated in device-tree and 
>then ensure that all are initialised correctly.

Getting the IORESOURCE_MEM is the way to count the instances driver need to 
support.  
In a way, It is already querying through IORESOURCE_MEM here. 


-KR



RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave 
>> IOVA accesses across them.
>> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible 
>> string for Tegra194 SoC SMMU topology.

>There is no description here of the 3rd SMMU that you mention below.
>I think that we should describe the full picture here.

This driver is primarily for dual SMMU config.  So, It is avoided in the commit 
message.
However, Implementation supports option to configure 3 instances identically 
with one SMMU DT node
and is documented in the implementation.

>> +
>> +static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>> +   unsigned int inst, int page)

>If you run checkpatch --strict on these you will get a lot of ...

>CHECK: Alignment should match open parenthesis
>#116: FILE: drivers/iommu/arm-smmu-nvidia.c:46:
>+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
>+ unsigned int inst, int page)

>We should fix these.

I will fix these if I need to push a new patch set.

>> +static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
>> +int page, int offset, u32 val) {
>> +unsigned int i;
>> +struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
>> +
>> +for (i = 0; i < nvidia_smmu->num_inst; i++) {
>> +void __iomem *reg = nvidia_smmu_page(smmu, i, page) + offset;

>Personally, I would declare 'reg' outside of the loop as I feel it will make 
>the code cleaner and easier to read.

It was like that before and is updated to its current form to limit the scope 
of variables as per Thierry's comments in v6. 
We can just leave it as it is as there is no technical issue here.

-KR

--
nvpublic


RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-30 Thread Krishna Reddy
>> +struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device 
>> +*smmu) {
>> +unsigned int i;

>> +for (i = 1; i < MAX_SMMU_INSTANCES; i++) {
>> +struct resource *res;
>> +
>> +res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +if (!res)
>> +break;

>Currently this driver is only supported for Tegra194 which I understand has 3 
>SMMUs. Therefore, I don't feel that we should fail silently here, I think it 
>is better to return an error if all 3 cannot be initialised.

Initialization of all the three SMMU instances is not necessary here.
The driver can work with all the possible number of instances 1, 2 and 3 based 
on the DT config though it doesn't make much sense to use it with 1 instance.
There is no silent failure here from driver point of view. If there is 
misconfig in DT, SMMU faults would catch issues.

>> +nvidia_smmu->bases[i] = devm_ioremap_resource(smmu->dev, res);
>> +if (IS_ERR(nvidia_smmu->bases[i]))
>> +return ERR_CAST(nvidia_smmu->bases[i]);

>You want to use PTR_ERR() here.

PTR_ERR() returns long integer. 
This function returns a pointer. ERR_CAST is the right one to use here. 


--
nvpublic


[PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 98 +
 drivers/iommu/arm-smmu.c| 17 +-
 drivers/iommu/arm-smmu.h|  3 +
 3 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 1124f0ac1823a..c9423b4199c65 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   if (!gfsr)
+   return IRQ_NONE;
+
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..3bb0aba15a356 100644
--- a/drivers/iommu/arm-s

[PATCH v8 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-29 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2



[PATCH v8 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-29 Thread Krishna Reddy
Changes in v8:
Fixed incorrect CB_FSR read issue during context bank fault.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v7 - https://lkml.org/lkml/2020/6/28/347
v6 - https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2



[PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 196 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 203 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..1124f0ac1823a
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

RE: [PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Krishna Reddy
>> +static irqreturn_t nvidia_smmu_context_fault_bank(int irq,

>> + void __iomem *cb_base = nvidia_smmu_page(smmu, inst, 
>> + smmu->numpage + idx);
[...]
>> + fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
[...]
>> + writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);

>It reads FSR of the default inst (1st), but clears the FSR of corresponding 
>inst -- just want to make sure that this is okay and intended.

FSR should be read from corresponding inst. Not from instance 0. 
Let me post updated patch.

-KR


RE: [PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Krishna Reddy
>> + if (!nvidia_smmu->bases[0])
>> + nvidia_smmu->bases[0] = smmu->base;
>> +
>> + return nvidia_smmu->bases[inst] + (page << smmu->pgshift); }

>Not critical -- just a nit: why not put the bases[0] in init()?

smmu->base is not available during nvidia_smmu_impl_init() call. It is set 
afterwards in arm-smmu.c.
It can't be avoided without changing the devm_ioremap() and impl_init() call 
order in arm-smmu.c. 

-KR


[PATCH v7 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-28 Thread Krishna Reddy
NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 SoC SMMU topology.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 195 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 7b5ffd646c6b9..64c37dbdd4426 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16808,8 +16808,10 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
+F: drivers/iommu/arm-smmu-nvidia.c
 F: drivers/iommu/tegra*
 
 TEGRA KBC DRIVER
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 342190196dfb0..2b8203db73ec3 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd/iommu.o amd/init.o amd/quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd/debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd/iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o arm-smmu-qcom.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += intel/dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel/iommu.o intel/pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..70f7318017617 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -171,6 +171,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_property_read_bool(np, "calxeda,smmu-secure-config-access"))
smmu->impl = _impl;
 
+   if (of_device_is_compatible(smmu->dev->of_node, "nvidia,tegra194-smmu"))
+   return nvidia_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
return qcom_smmu_impl_init(smmu);
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..b73c483fa3376
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,195 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// NVIDIA ARM SMMU v2 implementation quirks
+// Copyright (C) 2019-2020 NVIDIA CORPORATION.  All rights reserved.
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/*
+ * Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for interleaved IOVA accesses and
+ * used by non-isochronous HW devices for SMMU translations.
+ * Third one is used for SMMU translations from isochronous HW devices.
+ * It is possible to use this implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant TLB
+ * invalidations as all three never need to be TLB invalidated for a HW device.
+ *
+ * When Linux kernel supports multiple SMMU devices, the SMMU device used for
+ * isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient TLB invalidates.
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT_IN_US 100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+static inline struct nvidia_smmu *to_nvidia_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct nvidia_smmu, smmu);
+}
+
+static inline void __iomem *nvidia_smmu_page(struct arm_smmu_device *smmu,
+  unsigned int inst, int page)
+{
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   if (!nvidia_smmu->bases[0])
+   nvidia_smmu->bases[0] = smmu->base;
+
+   return nvidia_smmu->bases[inst] + (page << smmu->pgshift);
+}
+
+static u32 nvidia_smmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   void __iomem *reg = nvidia_smmu_page(smmu, 0, page) + offset;
+
+   return readl_relaxed(reg);
+}
+
+static void nvidia_smmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (i = 0; i < nvidia_smmu->num_ins

[PATCH v7 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-28 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 101 +++-
 drivers/iommu/arm-smmu.c|  17 +-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 118 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index b73c483fa3376..7276bb203ae79 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -147,6 +147,102 @@ static int nvidia_smmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nvidia_smmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+   void __iomem *gr0_base = nvidia_smmu_page(smmu, inst, 0);
+
+   gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+   struct nvidia_smmu *nvidia_smmu = to_nvidia_smmu(smmu);
+
+   for (inst = 0; inst < nvidia_smmu->num_inst; inst++) {
+   irq_ret = nvidia_smmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nvidia_smmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+   void __iomem *gr1_base = nvidia_smmu_page(smmu, inst, 1);
+   void __iomem *cb_base = nvidia_smmu_page(smmu, inst, smmu->numpage + 
idx);
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(gr1_base + ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, cb_base + ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nvidia_smmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /*
+* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nvidia_smmu_context_fault_bank(irq, smmu,
+idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -154,6 +250,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nvidia_smmu_write_reg64,
.reset = nvidia_smmu_reset,
.tlb_sync = nvidia_smmu_tlb_sync,
+   .global_fault = nvidia_smmu_global_fault,
+   .context_fault = nvidia_smmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
@@ -185,7 +283,8 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
}
 
nvidia_smmu->smmu.i

[PATCH v7 0/3] Nvidia Arm SMMUv2 Implementation

2020-06-28 Thread Krishna Reddy
Changes in v7:
Incorporated the review feedback from Nicolin Chen, Robin Murphy and Thierry 
Reding.
Rebased and validated patches on top of 
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v6- https://lkml.org/lkml/2020/6/4/1018
v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (3):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 294 ++
 drivers/iommu/arm-smmu.c  |  17 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 324 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 48f0bcfb7aad2c6eb4c1e66476b58475aa14393e
-- 
2.26.2



[PATCH v7 2/3] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-28 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 SoC SMMU topology that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423b..5b2586ac715ed 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2



RE: [PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-25 Thread Krishna Reddy
>Should NVIDIA_TEGRA194_SMMU be a separate value for smmu->model, perhaps? That 
>way we avoid this somewhat odd check here.

NVIDIA haven't made any changes to arm,mmu-500. It is only used in different 
topology.  New model would be mis-leading here.
As suggested by Robin, It can just be moved to end of function.

>> diff --git a/drivers/iommu/arm-smmu-nvidia.c 
>> b/drivers/iommu/arm-smmu-nvidia.c
>I wonder if it would be better to name this arm-smmu-tegra.c to make it 
>clearer that this is for a Tegra chip. We do have regular expressions in 
>MAINTAINERS that catch anything with "tegra" in it to make this easier.
>Also, the nsmmu_ prefix looks somewhat odd here. You already use struct 
>nvidia_smmu as the name of the structure, so why not be consistent and 
>continue to use nvidia_smmu_ as the prefix for function names?
>Or perhaps even use tegra_smmu_ as the prefix to match the filename change I 
>suggested earlier.

Prefix can be updated to nvidia_smmu as we seem to be okay for now to keep file 
name as arm-smmu-nvidia.c after the vendor name.  

>> +#define TLB_LOOP_TIMEOUT100 /* 1s! */
>USEC_PER_SEC?

It is not meant for a conversion. Reused Timeout variable from arm-smmu.c for 
tlb_sync implementation.  Can rename it to TLB_LOOP_TIMEOUT_IN_US.


>> +}
>> +dev_err_ratelimited(smmu->dev,
>> +"TLB sync timed out -- SMMU may be deadlocked\n");
>Same here.
>Also, is there anything we can do when this happens?

This is never expected to happen on Silicon. This code and message is reused 
from arm-smmu.c.


>> +#define nsmmu_page(smmu, inst, page) \
>> +(((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
>> +((page) << smmu->pgshift))

>Can we simply define to_nvidia_smmu(smmu)->bases[0] = smmu->base in 
>nvidia_smmu_impl_init()? Then this would become just:
>   to_nvidia_smmu(smmu)->bases[inst] + ((page) << (smmu)->pgshift)
> +
>Maybe add this here to simplify the nsmmu_page() macro above:
>   nsmmu->bases[0] = smmu->base;

This preferred to avoid the check in nsmmu_page(). But, smmu->base is not yet 
populated when nvidia_smmu_impl_init() is called.  
Let me look at the alternative place to set it.

-KR


RE: [PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-25 Thread Krishna Reddy
>> +  - nvdia,tegra194-smmu-500
>The -500 suffix here seems a bit redundant since there's no other type of SMMU 
>in Tegra194, correct?

Yeah, there is only one type of SMMU supported in T194. It was added to be 
synonymous with mmu-500.  Can be removed.

-KR


[PATCH v6 3/4] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-04 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a45217..5999b6a770992 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705b..d7

[PATCH v6 4/4] iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

2020-06-04 Thread Krishna Reddy
>> drivers/iommu/arm-smmu-nvidia.c:151:33: sparse: sparse: cast removes
>> address space '' of expression

Reported-by: kbuild test robot 
Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index 5999b6a770992..6348d8dc17fc2 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -248,7 +248,7 @@ struct arm_smmu_device *nvidia_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
nsmmu->bases[i] = devm_ioremap_resource(dev, res);
if (IS_ERR(nsmmu->bases[i]))
-   return (struct arm_smmu_device *)nsmmu->bases[i];
+   return ERR_CAST(nsmmu->bases[i]);
nsmmu->num_inst++;
}
 
-- 
2.26.2



[PATCH v6 2/4] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-06-04 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index e3ef1c69d1326..8f7ffd248f303 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2



[PATCH v6 1/4] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-04 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 50659d76976b7..118da0893c964 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16572,9 +16572,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27cb..35542df00da72 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b702..52c84c30f83e4 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -160,6 +160,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 0..dafc293a45217
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v6 0/4] Nvidia Arm SMMUv2 Implementation

2020-06-04 Thread Krishna Reddy
Changes in v6:
Restricted the patch set to driver specific patches.
Fixed the cast warning reported by kbuild test robot.
Rebased on git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next

v5 - https://lkml.org/lkml/2020/5/21/1114
v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (4):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  iommu/arm-smmu-nvidia: fix the warning reported by kbuild test robot

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 7 files changed, 285 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 431275afdc7155415254aef4bd3816a1b8a2ead0
-- 
2.26.2



RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-22 Thread Krishna Reddy
>For the record: I don't think we should apply these because we don't have a 
>good way of testing them. We currently have three problems that prevent us 
>from enabling SMMU on Tegra194:

Out of three issues pointed here, I see that only issue 2) is a real blocker 
for enabling SMMU HW by default in upstream.

>That said, I have tested earlier versions of this patchset on top of my local 
>branch with fixes for the above and they do seem to work as expected.
>So I'll leave it up to the IOMMU maintainers whether they're willing to merge 
>the driver patches as is.
> But I want to clarify that I won't be applying the DTS patches until we've 
> solved all of the above issues and therefore it should be clear that these 
> won't be runtime tested until then.

SMMU driver patches as such are complete and can be used by nvidia with a local 
config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and
Protects the driver patches against kernel changes. This config disable option 
is tested already by Nicolin Chen and me.

Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 
patches) can be merged without DT enable patches? Is it reasonable to merge the 
driver patches alone?

>1) If we enable SMMU support, then the DMA API will automatically try
> to use SMMU domains for allocations. This means that translations
> will happen as soon as a device's IOMMU operations are initialized
> and that is typically a long time (in kernel time at least) before
> a driver is bound and has a chance of configuring the device.

> This causes problems for non-quiesced devices like display
> controllers that the bootloader might have set up to scan out a
> boot splash.

> What we're missing here is a way to:

> a) advertise reserved memory regions for boot splash framebuffers
> b) map reserved memory regions early during SMMU setup

> Patches have been floating on the public mailing lists for b) but
> a) requires changes to the bootloader (both proprietary ones and
> U-Boot for SoCs prior to Tegra194).

This happens if SMMU translations is enabled for display before reserved
 Memory regions issue is fixed. This issue is not a real blocker for SMMU 
enable.


>  2) Even if we don't enable SMMU for a given device (by not hooking up
> the iommus property), with a default kernel configuration we get a
> bunch of faults during boot because the ARM SMMU driver faults by
> default (rather than bypass) for masters which aren't hooked up to
> the SMMU.

> We could work around that by changing the default configuration or
> overriding it on the command-line, but that's not really an option
> because it decreases security and means that Tegra194 won't work
> out-of-the-box.

This is the real issue that blocks enabling SMMU.  The USF faults for devices
that don't have SMMU translations enabled should be fixed or WAR'ed before
SMMU can be enabled. We should look at keeping SID as 0x7F for the devices
that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally.

>  3) We don't properly describe the DMA hierarchy, which causes the DMA
> masks to be improperly set. As a bit of background: Tegra194 has a
> special address bit (bit 39) that causes some swizzling to happen
> within the memory controller. As a result, any I/O virtual address
> that has bit 39 set will cause this swizzling to happen on access.
> The DMA/IOMMU allocator always starts allocating from the top of
> the IOVA space, which means that the first couple of gigabytes of
> allocations will cause most devices to fail because of the
> undesired swizzling that occurs.

> We had an initial patch for SDHCI merged that hard-codes the DMA
> mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However,
> the devices all do support addressing 40 bits and the restriction
> on bit 39 is really a property of the bus rather than a capability
> of the device. This means that we would have to work around this
> for every device driver by adding similar hacks. A better option is
> to properly describe the DMA hierarchy (using dma-ranges) because
> that will then automatically be applied as a constraint on each
> device's DMA mask.

> I have been working on patches to address this, but they are fairly
> involved because they require device tree bindings changes and so
> on.

Dma_mask issue is again outside SMMU driver and as long as the clients with
Dma_mask issue don't have SMMU enabled, it would be fine.
SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care.

>So before we solve all of the above issues we can't really enable SMMU on 
>Tegra194 and hence won't be able to test it. As such we don't know if these 
>patches even work, nor can we validate that they continue to work.
>As such, I don't think there's any use in applying these patches upstream 
>since they 

[PATCH v5 1/5] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-05-21 Thread Krishna Reddy
NVIDIA's Tegra194 soc uses two ARM MMU-500s together to interleave
IOVA accesses across them.
Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
string for Tegra194 soc.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |   2 +
 drivers/iommu/Makefile  |   2 +-
 drivers/iommu/arm-smmu-impl.c   |   3 +
 drivers/iommu/arm-smmu-nvidia.c | 161 
 drivers/iommu/arm-smmu.h|   1 +
 5 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ecc0749810b0..0d8c966ecf17 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16560,9 +16560,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 57cf4ba5e27c..35542df00da7 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -15,7 +15,7 @@ obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o 
amd_iommu_quirks.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
 obj-$(CONFIG_ARM_SMMU) += arm_smmu.o
-arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o
+arm_smmu-objs += arm-smmu.o arm-smmu-impl.o arm-smmu-qcom.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 74d97a886e93..dcdd513323aa 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -158,6 +158,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
 */
switch (smmu->model) {
case ARM_MMU500:
+   if (of_device_is_compatible(smmu->dev->of_node,
+   "nvidia,tegra194-smmu-500"))
+   return nvidia_smmu_impl_init(smmu);
smmu->impl = _mmu500_impl;
break;
case CAVIUM_SMMUV2:
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index ..dafc293a4521
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+/* Tegra194 has three ARM MMU-500 Instances.
+ * Two of them are used together for Interleaved IOVA accesses and
+ * used by Non-Isochronous Hw devices for SMMU translations.
+ * Third one is used for SMMU translations from Isochronous HW devices.
+ * It is possible to use this Implementation to program either
+ * all three or two of the instances identically as desired through
+ * DT node.
+ *
+ * Programming all the three instances identically comes with redundant tlb
+ * invalidations as all three never need to be tlb invalidated for a HW device.
+ *
+ * When Linux Kernel supports multiple SMMU devices, The SMMU device used for
+ * Isochornous HW devices should be added as a separate ARM MMU-500 device
+ * in DT and be programmed independently for efficient tlb invalidates.
+ *
+ */
+#define MAX_SMMU_INSTANCES 3
+
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   unsigned intnum_inst;
+   void __iomem*bases[MAX_SMMU_INSTANCES];
+};
+
+#define to_nvidia_smmu(s) container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nvidia_smmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   unsigned int i;
+
+   for (i = 0; i < to_nvidia_smmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   unsigned int

[PATCH v5 4/5] arm64: tegra: Add DT node for T194 SMMU

2020-05-21 Thread Krishna Reddy
Add DT node for T194 SMMU to enable SMMU support.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 77 
 1 file changed, 77 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f4ede86e32b4..f7c4399afb55 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1620,6 +1620,83 @@ pcie@141a {
  0x8200 0x0  0x4000 0x1f 0x4000 0x0 
0xc000>; /* non-prefetchable memory (3GB) */
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500","nvidia,tegra194-smmu-500";
+   reg = <0 0x1200 0 0x80>,
+ <0 0x1100 0 0x80>,
+ <0 0x1000 0 0x80>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   stream-match-mask = <0x7f80>;
+   #global-interrupts = <3>;
+   #iommu-cells = <1>;
+   };
+
pcie_ep@1416 {
compatible = "nvidia,tegra194-pcie-ep", "snps,dw-pcie-ep";
power-domains = < TEGRA194_POWER_DOMAIN_PCIEX4A>;
-- 
2.26.2



[PATCH v5 3/5] iommu/arm-smmu: Add global/context fault implementation hooks

2020-05-21 Thread Krishna Reddy
Add global/context fault hooks to allow NVIDIA SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 100 
 drivers/iommu/arm-smmu.c|  11 +++-
 drivers/iommu/arm-smmu.h|   3 +
 3 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index dafc293a4521..5999b6a77099 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -117,6 +117,104 @@ static int nsmmu_reset(struct arm_smmu_device *smmu)
return 0;
 }
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, void *dev)
+{
+   int inst;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct arm_smmu_device *smmu = dev;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, inst);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & ARM_SMMU_FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq, void *dev)
+{
+   int inst, idx;
+   irqreturn_t irq_ret = IRQ_NONE;
+   struct iommu_domain *domain = dev;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+   for (inst = 0; inst < to_nvidia_smmu(smmu)->num_inst; inst++) {
+   /* Interrupt line shared between all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu,
+  idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+   }
+
+   return irq_ret;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nsmmu_read_reg,
.write_reg = nsmmu_write_reg,
@@ -124,6 +222,8 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.write_reg64 = nsmmu_write_reg64,
.reset = nsmmu_reset,
.tlb_sync = nsmmu_tlb_sync,
+   .global_fault = nsmmu_global_fault,
+   .context_fault = nsmmu_context_fault,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e622f4e33379..9

[PATCH v5 5/5] arm64: tegra: enable SMMU for SDHCI and EQOS on T194

2020-05-21 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index f7c4399afb55..706bbb439dcd 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -59,6 +59,7 @@ ethernet@249 {
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA194_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -457,6 +458,7 @@ sdmmc1: sdhci@340 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -479,6 +481,7 @@ sdmmc3: sdhci@344 {
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -506,6 +509,7 @@ sdmmc4: sdhci@346 {
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA194_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.26.2



[PATCH v5 2/5] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2020-05-21 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe47508..78aba7dd5a61 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -37,6 +37,11 @@ properties:
   - qcom,sc7180-smmu-500
   - qcom,sdm845-smmu-500
   - const: arm,mmu-500
+  - description: NVIDIA SoCs that use more than one "arm,mmu-500"
+items:
+  - enum:
+  - nvdia,tegra194-smmu-500
+  - const: arm,mmu-500
   - items:
   - const: arm,mmu-500
   - const: arm,smmu-v2
-- 
2.26.2



[PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation

2020-05-21 Thread Krishna Reddy
Changes in v5:
Rebased on top of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
next

v4 - https://lkml.org/lkml/2019/10/30/1054
v3 - https://lkml.org/lkml/2019/10/18/1601
v2 - https://lkml.org/lkml/2019/9/2/980
v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (5):
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   5 +
 MAINTAINERS   |   2 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  81 ++
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm-smmu-nvidia.c   | 261 ++
 drivers/iommu/arm-smmu.c  |  11 +-
 drivers/iommu/arm-smmu.h  |   4 +
 8 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c


base-commit: 365f8d504da50feaebf826d180113529c9383670
-- 
2.26.2



RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-23 Thread Krishna Reddy
>>https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Thanks Will! Let me rebase my patches on top of this branch and send it out.

-KR



RE: [PATCH v3 0/7] Nvidia Arm SMMUv2 Implementation

2019-10-22 Thread Krishna Reddy
Hi Robin,
>>Apologies for crossed wires, but I had a series getting rid of 
>>arm_smmu_flush_ops which was also meant to end up making things a bit easier 
>>for you:

I was looking to rebase on top of your changes first.  Then I read Will's reply 
that said your work is queued for 5.5. 
Let me know if these patches need to rebased on top of iommu/devel or a 
different branch. I can resend the patch set on top of necessary branch.

-KR


[PATCH v2 0/7] Nvidia Arm SMMUv2 Implementation

2019-09-02 Thread Krishna Reddy
Changes in v2:
- Prepare arm_smu_flush_ops for override.
- Remove NVIDIA_SMMUv2 and use ARM_SMMUv2 model as T194 SMMU hasn't modified 
ARM MMU-500.
- Add T194 specific compatible string - "nvidia,tegra194-smmu"
- Remove tlb_sync hook added in v1 and Override arm_smmu_flush_ops->tlb_sync() 
from implementation.
- Register implementation specific context/global fault hooks directly for irq 
handling.
- Update global/context interrupt list in DT and releant fault handling code in 
arm-smmu-nvidia.c.
- Implement reset hook in arm-smmu-nvidia.c to clear irq status and sync tlb.

v1 - https://lkml.org/lkml/2019/8/29/1588

Krishna Reddy (7):
  iommu/arm-smmu: prepare arm_smmu_flush_ops for override
  iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage
  dt-bindings: arm-smmu: Add binding for Tegra194 SMMU
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS on T194

 .../devicetree/bindings/iommu/arm,smmu.txt |   4 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  88 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   3 +
 drivers/iommu/arm-smmu-nvidia.c| 287 +
 drivers/iommu/arm-smmu.c   |  27 +-
 drivers/iommu/arm-smmu.h   |   8 +-
 9 files changed, 413 insertions(+), 12 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.1.4



[PATCH v2 3/7] dt-bindings: arm-smmu: Add binding for Tegra194 SMMU

2019-09-02 Thread Krishna Reddy
Add binding for NVIDIA's Tegra194 Soc SMMU that is based
on ARM MMU-500.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..1d72fac 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -31,6 +31,10 @@ conditions.
   as below, SoC-specific compatibles:
   "qcom,sdm845-smmu-500", "arm,mmu-500"
 
+  NVIDIA SoCs that use more than one ARM MMU-500 together
+  needs following SoC-specific compatibles along with 
"arm,mmu-500":
+  "nvidia,tegra194-smmu"
+
 - reg   : Base address and size of the SMMU.
 
 - #global-interrupts : The number of global interrupts exposed by the
-- 
2.1.4



RE: [PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation

2019-09-02 Thread Krishna Reddy
>>> +ARM_SMMU_MATCH_DATA(nvidia_smmuv2, ARM_SMMU_V2, NVIDIA_SMMUV2);
 
>> The ARM MMU-500 implementation is unmodified.  It is the way the are 
>> integrated and used together(for interleaved accesses) is different from 
>> regular ARM MMU-500.
>> I have added it to get the model number and to be able differentiate the 
>> SMMU implementation in arm-smmu-impl.c.

>In that case, I would rather keep smmu->model representing the MMU-500 
>microarchitecture - 
>since you'll still want to pick up errata workarounds etc. for that - and 
>detect the Tegra integration via an explicit of_device_is_compatible()
> check in arm_smmu_impl_init().

Looks good to me. 

>For comparison, under ACPI we'd probably have to detect integration details by 
>looking at table headers, separately
> from the IORT "Model" field, so I'd prefer if the DT vs. ACPI handling didn't 
> diverge more than necessary.

ACPI support for T194 can be added based on need in subsequent patches. For 
now, I am updating it for DT support.

-KR


[PATCH 4/7] iommu/arm-smmu: Add global/context fault implementation hooks

2019-08-29 Thread Krishna Reddy
Add global/context fault hooks to allow Nvidia SMMU implementation
handle faults across multiple SMMUs.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-nvidia.c | 127 
 drivers/iommu/arm-smmu.c|   6 ++
 drivers/iommu/arm-smmu.h|   4 ++
 3 files changed, 137 insertions(+)

diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
index a429b2c..b2a3c49 100644
--- a/drivers/iommu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -14,6 +14,10 @@
 
 #define NUM_SMMU_INSTANCES 3
 
+static irqreturn_t nsmmu_context_fault_inst(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst);
+
 struct nvidia_smmu {
struct arm_smmu_device  smmu;
int num_inst;
@@ -87,12 +91,135 @@ static void nsmmu_tlb_sync(struct arm_smmu_device *smmu, 
int page,
nsmmu_tlb_sync_wait(smmu, page, sync, status, i);
 }
 
+static irqreturn_t nsmmu_global_fault_inst(int irq,
+  struct arm_smmu_device *smmu,
+  int inst)
+{
+   u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+
+   gfsr = readl_relaxed(nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   gfsynr0 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR0);
+   gfsynr1 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR1);
+   gfsynr2 = readl_relaxed(nsmmu_page(smmu, inst, 0) +
+   ARM_SMMU_GR0_sGFSYNR2);
+
+   if (!gfsr)
+   return IRQ_NONE;
+
+   dev_err_ratelimited(smmu->dev,
+   "Unexpected global fault, this could be serious\n");
+   dev_err_ratelimited(smmu->dev,
+   "\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 
0x%08x\n",
+   gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+   writel_relaxed(gfsr, nsmmu_page(smmu, inst, 0) + ARM_SMMU_GR0_sGFSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_global_fault(int irq, struct arm_smmu_device *smmu)
+{
+   int i;
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line is shared between global and context faults.
+* Check for both type of interrupts on either fault handlers.
+*/
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) {
+   irq_ret = nsmmu_context_fault_inst(irq, smmu, 0, i);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++) {
+   irq_ret = nsmmu_global_fault_inst(irq, smmu, i);
+   if (irq_ret == IRQ_HANDLED)
+   return irq_ret;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault_bank(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   u32 fsr, fsynr, cbfrsynra;
+   unsigned long iova;
+
+   fsr = arm_smmu_cb_read(smmu, idx, ARM_SMMU_CB_FSR);
+   if (!(fsr & FSR_FAULT))
+   return IRQ_NONE;
+
+   fsynr = readl_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ ARM_SMMU_CB_FSYNR0);
+   iova = readq_relaxed(nsmmu_page(smmu, inst, smmu->numpage + idx) +
+ARM_SMMU_CB_FAR);
+   cbfrsynra = readl_relaxed(nsmmu_page(smmu, inst, 1) +
+ ARM_SMMU_GR1_CBFRSYNRA(idx));
+
+   dev_err_ratelimited(smmu->dev,
+   "Unhandled context fault: fsr=0x%x, iova=0x%08lx, fsynr=0x%x, 
cbfrsynra=0x%x, cb=%d\n",
+   fsr, iova, fsynr, cbfrsynra, idx);
+
+   writel_relaxed(fsr, nsmmu_page(smmu, inst, smmu->numpage + idx) +
+   ARM_SMMU_CB_FSR);
+   return IRQ_HANDLED;
+}
+
+static irqreturn_t nsmmu_context_fault_inst(int irq,
+   struct arm_smmu_device *smmu,
+   int idx, int inst)
+{
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line shared between global and all context faults.
+* Check for faults across all contexts.
+*/
+   for (idx = 0; idx < smmu->num_context_banks; idx++) {
+   irq_ret = nsmmu_context_fault_bank(irq, smmu, idx, inst);
+
+   if (irq_ret == IRQ_HANDLED)
+   break;
+   }
+
+   return irq_ret;
+}
+
+static irqreturn_t nsmmu_context_fault(int irq,
+  struct arm_smmu_device *smmu,
+  int cbndx)
+{
+   int i;
+   irqreturn_t irq_ret = IRQ_NONE;
+
+   /* Interrupt line is shared between g

[PATCH 5/7] arm64: tegra: Add Memory controller DT node on T194

2019-08-29 Thread Krishna Reddy
Add Memory controller DT node on T194 and enable it.
This patch is a prerequisite for SMMU enable on T194.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi | 4 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   | 7 +++
 2 files changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
index 62e07e11..4b3441b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi
@@ -47,6 +47,10 @@
};
};
 
+   memory-controller@2c0 {
+   status = "okay";
+   };
+
serial@311 {
status = "okay";
};
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index adebbbf..d906958 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
compatible = "nvidia,tegra194";
@@ -130,6 +131,12 @@
};
};
 
+   memory-controller@2c0 {
+   compatible = "nvidia,tegra186-mc";
+   reg = <0x02c0 0xb>;
+   status = "disabled";
+   };
+
uarta: serial@310 {
compatible = "nvidia,tegra194-uart", 
"nvidia,tegra20-uart";
reg = <0x0310 0x40>;
-- 
2.1.4



[PATCH 1/7] iommu/arm-smmu: add Nvidia SMMUv2 implementation

2019-08-29 Thread Krishna Reddy
Add Nvidia SMMUv2 implementation and model info.

Signed-off-by: Krishna Reddy 
---
 MAINTAINERS |  2 +
 drivers/iommu/Makefile  |  2 +-
 drivers/iommu/arm-smmu-impl.c   |  2 +
 drivers/iommu/arm-smmu-nvidia.c | 97 +
 drivers/iommu/arm-smmu.c|  2 +
 drivers/iommu/arm-smmu.h|  2 +
 6 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 289fb06..b9d59e51 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15785,9 +15785,11 @@ F: drivers/i2c/busses/i2c-tegra.c
 
 TEGRA IOMMU DRIVERS
 M: Thierry Reding 
+R: Krishna Reddy 
 L: linux-te...@vger.kernel.org
 S: Supported
 F: drivers/iommu/tegra*
+F: drivers/iommu/arm-smmu-nvidia.c
 
 TEGRA KBC DRIVER
 M: Laxman Dewangan 
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index a2729aa..7f5489e 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -13,7 +13,7 @@ obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += amd_iommu_debugfs.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
-obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o arm-smmu-nvidia.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 5c87a38..e5e595f 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -162,6 +162,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
break;
case CAVIUM_SMMUV2:
return cavium_smmu_impl_init(smmu);
+   case NVIDIA_SMMUV2:
+   return nvidia_smmu_impl_init(smmu);
default:
break;
}
diff --git a/drivers/iommu/arm-smmu-nvidia.c b/drivers/iommu/arm-smmu-nvidia.c
new file mode 100644
index 000..d93ceda
--- /dev/null
+++ b/drivers/iommu/arm-smmu-nvidia.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0-only
+// Nvidia ARM SMMU v2 implementation quirks
+// Copyright (C) 2019 NVIDIA CORPORATION.  All rights reserved.
+
+#define pr_fmt(fmt) "nvidia-smmu: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "arm-smmu.h"
+
+#define NUM_SMMU_INSTANCES 3
+
+struct nvidia_smmu {
+   struct arm_smmu_device  smmu;
+   int num_inst;
+   void __iomem*bases[NUM_SMMU_INSTANCES];
+};
+
+#define to_nsmmu(s)container_of(s, struct nvidia_smmu, smmu)
+
+#define nsmmu_page(smmu, inst, page) \
+   (((inst) ? to_nsmmu(smmu)->bases[(inst)] : smmu->base) + \
+   ((page) << smmu->pgshift))
+
+static u32 nsmmu_read_reg(struct arm_smmu_device *smmu,
+ int page, int offset)
+{
+   return readl_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg(struct arm_smmu_device *smmu,
+   int page, int offset, u32 val)
+{
+   int i;
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++)
+   writel_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static u64 nsmmu_read_reg64(struct arm_smmu_device *smmu,
+   int page, int offset)
+{
+   return readq_relaxed(nsmmu_page(smmu, 0, page) + offset);
+}
+
+static void nsmmu_write_reg64(struct arm_smmu_device *smmu,
+ int page, int offset, u64 val)
+{
+   int i;
+
+   for (i = 0; i < to_nsmmu(smmu)->num_inst; i++)
+   writeq_relaxed(val, nsmmu_page(smmu, i, page) + offset);
+}
+
+static const struct arm_smmu_impl nsmmu_impl = {
+   .read_reg = nsmmu_read_reg,
+   .write_reg = nsmmu_write_reg,
+   .read_reg64 = nsmmu_read_reg64,
+   .write_reg64 = nsmmu_write_reg64,
+};
+
+struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+   int i;
+   struct nvidia_smmu *nsmmu;
+   struct resource *res;
+   struct device *dev = smmu->dev;
+   struct platform_device *pdev = to_platform_device(smmu->dev);
+
+   nsmmu = devm_kzalloc(smmu->dev, sizeof(*nsmmu), GFP_KERNEL);
+   if (!nsmmu)
+   return ERR_PTR(-ENOMEM);
+
+   nsmmu->smmu = *smmu;
+   /* Instance 0 is ioremapped by arm-smmu.c */
+   nsmmu->num_inst = 1;
+
+   for (i = 1; i < NUM_SMMU_INSTANCES; i++) {
+   res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+   if (!res)
+   break;
+   nsmmu->bases[i] = devm_ioremap_resource(dev, res);
+   if (IS_ERR(nsmmu->bases[i]))
+   return (struct arm_smmu_device *)nsmmu->bases[i];
+

[PATCH 7/7] arm64: tegra: enable SMMU for SDHCI and EQOS

2019-08-29 Thread Krishna Reddy
Enable SMMU translations for SDHCI and EQOS transactions.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index ad509bb..0496a87 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -51,6 +51,7 @@
clock-names = "master_bus", "slave_bus", "rx", "tx", 
"ptp_ref";
resets = < TEGRA194_RESET_EQOS>;
reset-names = "eqos";
+   iommus = < TEGRA186_SID_EQOS>;
status = "disabled";
 
snps,write-requests = <1>;
@@ -381,6 +382,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC1>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC1>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout =
<0x07>;
nvidia,pad-autocal-pull-down-offset-3v3-timeout =
@@ -403,6 +405,7 @@
clock-names = "sdhci";
resets = < TEGRA194_RESET_SDMMC3>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC3>;
nvidia,pad-autocal-pull-up-offset-1v8 = <0x00>;
nvidia,pad-autocal-pull-down-offset-1v8 = <0x7a>;
nvidia,pad-autocal-pull-up-offset-3v3-timeout = <0x07>;
@@ -430,6 +433,7 @@
  < TEGRA194_CLK_PLLC4>;
resets = < TEGRA194_RESET_SDMMC4>;
reset-names = "sdhci";
+   iommus = < TEGRA186_SID_SDMMC4>;
nvidia,pad-autocal-pull-up-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-down-offset-hs400 = <0x00>;
nvidia,pad-autocal-pull-up-offset-1v8-timeout = <0x0a>;
-- 
2.1.4



[PATCH 2/7] dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2

2019-08-29 Thread Krishna Reddy
Add binding doc for Nvidia's smmu-v2 implementation.

Signed-off-by: Krishna Reddy 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt 
b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
index 3133f3b..0de3759 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -17,6 +17,7 @@ conditions.
 "arm,mmu-401"
 "arm,mmu-500"
 "cavium,smmu-v2"
+"nidia,smmu-v2"
 "qcom,smmu-v2"
 
   depending on the particular implementation and/or the
-- 
2.1.4



[PATCH 0/7] Nvidia Arm SMMUv2 Implementation

2019-08-29 Thread Krishna Reddy
Hi All,
Nvidia Arm SMMUv2 implementation has two ARM SMMU(MMU-500) instances
that are used together for SMMU translations. The IOVA accesses from
HW devices are interleaved across these two SMMU instances and need
to be programmed identical except during tlb sync and fault handling.

This patch set adds Nvidia Arm SMMUv2 Implementation on top of ARM SMMU
driver to handle Nvidia specific implementation. It is also adding
hooks for tlb sync and fault handling to allow vendor specific
implementation for the same.

Please review the patch set and provide the feedback.

This patch set is based on the following branch as it is dependent on the
Arm SMMU Refactor changes from Robin Murphy that are present in this branch.

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-joerg/arm-smmu/updates


Krishna Reddy (7):
  iommu/arm-smmu: add Nvidia SMMUv2 implementation
  dt-bindings: arm-smmu: Add binding for nvidia,smmu-v2
  iommu/arm-smmu: Add tlb_sync implementation hook
  iommu/arm-smmu: Add global/context fault implementation hooks
  arm64: tegra: Add Memory controller DT node on T194
  arm64: tegra: Add DT node for T194 SMMU
  arm64: tegra: enable SMMU for SDHCI and EQOS

 .../devicetree/bindings/iommu/arm,smmu.txt |   1 +
 MAINTAINERS|   2 +
 arch/arm64/boot/dts/nvidia/tegra194-p2888.dtsi |   4 +
 arch/arm64/boot/dts/nvidia/tegra194.dtsi   |  86 +++
 drivers/iommu/Makefile |   2 +-
 drivers/iommu/arm-smmu-impl.c  |   2 +
 drivers/iommu/arm-smmu-nvidia.c| 256 +
 drivers/iommu/arm-smmu.c   |  16 +-
 drivers/iommu/arm-smmu.h   |  10 +
 9 files changed, 375 insertions(+), 4 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-nvidia.c

-- 
2.1.4



RE: [PATCH v3 2/6] iommu/arm-smmu: Add support to program multiple ARM SMMU's identically

2018-12-19 Thread Krishna Reddy
Hi Robin,
Thanks for the feedback :)

>The whole point of the library idea was to factor out the code in such a way 
>that all the details
>specific to a particular implementation can be kept together. But what this 
>patch does is insert
>Tegra194-specific handling all through the 'common' code, which is the exact 
>opposite of that concept
>and just makes more hard-to-maintain mess.

In an attempt to reuse most of the ARM SMMU implementation, which heavily 
relies on data from arm_smmu_device,
The library code has been added with some functionality only usable by Tegra194 
SMMU driver. 
In V4 patches, I am working on to add a mechanism to override writel/readl 
functions in library so that
Tegra smmu driver can override read/write functions and handle programming of 
multiple instances on its own. 

>The amount of copy-paste duplication in patch #4 has the opposite problem - 
>about 95% of that isn't 
>Tegra194-specific at all (I mean, how many fsl_mc instances does it have?), 
>and having multiple copies of
> generic code with the potential to diverge is also not what anyone wants.

I have split the code in a way that library only contains the code that deals 
with register programming.
And avoided platform driver code and DT parsing code getting into library, 
which can allow drivers changing
Independently if necessary in future.

> Plus I don't think ending up building 
> multiple separate drivers will even work in general - thanks to the current 
> state of
>bus_set_iommu() etc., you can't use the regular driver for your third SMMU at 
>the same time.

Good point!
From code, platform_dma_configure/of_dma_configure/of_iommu_configure takes care
of setting right iommu_ops for devices based on the iommu DT node they have in 
iommus=<> entry.

If iommu.c is updated to use dev->bus->dma_configure(),  then it doesn't really 
need to use dev->bus->iommu_ops.
dev->bus->dma_configure() can be used to set dev->dma_ops to the right one, if 
dev->dma_ops is not
already set. 
If this approach looks good, I can make a patch to clean up bus->iommu_ops 
usage related code to allow
devices to use specific SMMU instance as they need.

>I think what really needs to be done is to conceptually split the driver into 
>"architecture" and "implementation"
> layers - at some point after the holidays we're probably going to sit down 
> and go through all the various quirks and
> specifics we know about to try and figure out what that should actually look 
> like.

If you can provide some high level details on what to keep in library vs 
implementation after holidays, I would be
 happy to rework the patches.  Will look forward for further discussions on 
this.

-KR



[PATCH v2 1/5] iommu/arm-smmu: rearrange arm-smmu.c code

2018-10-31 Thread Krishna Reddy
Rearrange arm-smmu.c code into arm-smmu-common.h, arm-smmu-common.c
and arm-smmu.c.
This patch rearranges the arm-smmu.c code to allow sharing the ARM SMMU
driver code with dual ARM SMMU based Tegra194 SMMU driver.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-common.c | 1922 +++
 drivers/iommu/arm-smmu-common.h |  256 +
 drivers/iommu/arm-smmu.c| 2133 +--
 3 files changed, 2180 insertions(+), 2131 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-common.c
 create mode 100644 drivers/iommu/arm-smmu-common.h

diff --git a/drivers/iommu/arm-smmu-common.c b/drivers/iommu/arm-smmu-common.c
new file mode 100644
index 000..1ad8e5f
--- /dev/null
+++ b/drivers/iommu/arm-smmu-common.c
@@ -0,0 +1,1922 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon 
+ */
+
+static int force_stage;
+module_param(force_stage, int, S_IRUGO);
+MODULE_PARM_DESC(force_stage,
+   "Force SMMU mappings to be installed at a particular stage of 
translation. A value of '1' or '2' forces the corresponding stage. All other 
values are ignored (i.e. no stage is forced). Note that selecting a specific 
stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param(disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+   "Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
+
+#define ARM_SMMU_MATCH_DATA(name, ver, imp)\
+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+
+static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu);
+static void arm_smmu_tlb_sync_context(void *cookie);
+static irqreturn_t arm_smmu_context_fault(int irq, void *dev);
+static irqreturn_t arm_smmu_global_fault(int irq, void *dev);
+
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static void parse_driver_options(struct arm_smmu_device *smmu)
+{
+   int i = 0;
+
+   do {
+   if (of_property_read_bool(smmu->dev->of_node,
+   arm_smmu_options[i].prop)) {
+   smmu->options |= arm_smmu_options[i].opt;
+   dev_notice(smmu->dev, "option %s\n",
+   arm_smmu_options[i].prop);
+   }
+   } while (arm_smmu_options[++i].opt);
+}
+
+static struct device_node *dev_get_dev_node(struct device *dev)
+{
+   if (dev_is_pci(dev)) {
+   struct pci_bus *bus = to_pci_dev(dev)->bus;
+
+   while (!pci_is_root_bus(bus))
+   bus = bus->parent;
+   return of_node_get(bus->bridge->parent->of_node);
+   }
+
+   return of_node_get(dev->of_node);
+}
+
+static int __arm_smmu_get_pci_sid(struct pci_dev *pdev, u16 alias, void *data)
+{
+   *((__be32 *)data) = cpu_to_be32(alias);
+   return 0; /* Continue walking */
+}
+
+static int __find_legacy_master_phandle(struct device *dev, void *data)
+{
+   struct of_phandle_iterator *it = *(void **)data;
+   struct device_node *np = it->node;
+   int err;
+
+   of_for_each_phandle(it, err, dev->of_node, "mmu-masters",
+   "#stream-id-cells", 0)
+   if (it->node == np) {
+   *(void **)data = dev;
+   return 1;
+   }
+   it->node = np;
+   return err == -ENOENT ? 0 : err;
+}
+
+static struct platform_driver arm_smmu_driver;
+static struct iommu_ops arm_smmu_ops;
+
+static int arm_smmu_register_legacy_master(struct device *dev,
+  struct arm_smmu_device **smmu)
+{
+   struct device *smmu_dev;
+   struct device_node *np;
+   struct of_phandle_iterator it;
+   void *data = 
+   u32 *sids;
+   __be32 pci_sid;
+   int err;
+
+   np = dev_get_dev_node(dev);
+   if (!np || !of_find_property(np, "#stream-id-cells", NULL)) {
+   of_node_put(np);

[PATCH v2 1/5] iommu/arm-smmu: rearrange arm-smmu.c code

2018-10-31 Thread Krishna Reddy
Rearrange arm-smmu.c code into arm-smmu-common.h, arm-smmu-common.c
and arm-smmu.c.
This patch rearranges the arm-smmu.c code to allow sharing the ARM SMMU
driver code with dual ARM SMMU based Tegra194 SMMU driver.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/arm-smmu-common.c | 1922 +++
 drivers/iommu/arm-smmu-common.h |  256 +
 drivers/iommu/arm-smmu.c| 2133 +--
 3 files changed, 2180 insertions(+), 2131 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-common.c
 create mode 100644 drivers/iommu/arm-smmu-common.h

diff --git a/drivers/iommu/arm-smmu-common.c b/drivers/iommu/arm-smmu-common.c
new file mode 100644
index 000..1ad8e5f
--- /dev/null
+++ b/drivers/iommu/arm-smmu-common.c
@@ -0,0 +1,1922 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon 
+ */
+
+static int force_stage;
+module_param(force_stage, int, S_IRUGO);
+MODULE_PARM_DESC(force_stage,
+   "Force SMMU mappings to be installed at a particular stage of 
translation. A value of '1' or '2' forces the corresponding stage. All other 
values are ignored (i.e. no stage is forced). Note that selecting a specific 
stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param(disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+   "Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
+
+#define ARM_SMMU_MATCH_DATA(name, ver, imp)\
+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+
+static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu);
+static void arm_smmu_tlb_sync_context(void *cookie);
+static irqreturn_t arm_smmu_context_fault(int irq, void *dev);
+static irqreturn_t arm_smmu_global_fault(int irq, void *dev);
+
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+   return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+static void parse_driver_options(struct arm_smmu_device *smmu)
+{
+   int i = 0;
+
+   do {
+   if (of_property_read_bool(smmu->dev->of_node,
+   arm_smmu_options[i].prop)) {
+   smmu->options |= arm_smmu_options[i].opt;
+   dev_notice(smmu->dev, "option %s\n",
+   arm_smmu_options[i].prop);
+   }
+   } while (arm_smmu_options[++i].opt);
+}
+
+static struct device_node *dev_get_dev_node(struct device *dev)
+{
+   if (dev_is_pci(dev)) {
+   struct pci_bus *bus = to_pci_dev(dev)->bus;
+
+   while (!pci_is_root_bus(bus))
+   bus = bus->parent;
+   return of_node_get(bus->bridge->parent->of_node);
+   }
+
+   return of_node_get(dev->of_node);
+}
+
+static int __arm_smmu_get_pci_sid(struct pci_dev *pdev, u16 alias, void *data)
+{
+   *((__be32 *)data) = cpu_to_be32(alias);
+   return 0; /* Continue walking */
+}
+
+static int __find_legacy_master_phandle(struct device *dev, void *data)
+{
+   struct of_phandle_iterator *it = *(void **)data;
+   struct device_node *np = it->node;
+   int err;
+
+   of_for_each_phandle(it, err, dev->of_node, "mmu-masters",
+   "#stream-id-cells", 0)
+   if (it->node == np) {
+   *(void **)data = dev;
+   return 1;
+   }
+   it->node = np;
+   return err == -ENOENT ? 0 : err;
+}
+
+static struct platform_driver arm_smmu_driver;
+static struct iommu_ops arm_smmu_ops;
+
+static int arm_smmu_register_legacy_master(struct device *dev,
+  struct arm_smmu_device **smmu)
+{
+   struct device *smmu_dev;
+   struct device_node *np;
+   struct of_phandle_iterator it;
+   void *data = 
+   u32 *sids;
+   __be32 pci_sid;
+   int err;
+
+   np = dev_get_dev_node(dev);
+   if (!np || !of_find_property(np, "#stream-id-cells", NULL)) {
+   of_node_put(np);

[PATCH v2 0/5] Add Tegra194 Dual ARM SMMU driver

2018-10-31 Thread Krishna Reddy
NVIDIA's Xavier (Tegra194) SOC has three ARM SMMU(MMU-500) instances.
Two of the SMMU instances are used to interleave IOVA accesses across them.
The IOVA accesses from HW devices are interleaved across these two SMMU 
instances
and they need to be programmed identical.

The existing ARM SMMU driver can't be used in its current form for programming 
the
two SMMU instances identically. But, most of the code can be shared between ARM 
SMMU
driver and Tegra194 SMMU driver.
Page fault handling and TLB sync operations need to know about specific instance
of SMMU for correct fault handling and optimal TLB sync wait. Rest of the code 
doesn't
need to know about number of SMMU instances. Based on this fact, The patch 
series here
rearranges the arm-smmu.c code to allow sharing most of the ARM SMMU 
programming/iommu_ops
code between ARM SMMU driver and Tegra194 SMMU driver and transparently
handles programming of two SMMU instances.

The third SMMU instance would use the existing ARM SMMU driver.

Changes in v2:
* Added CONFIG_ARM_SMMU_TEGRA to protect Tegra194 SMMU driver compilation
* Enabled CONFIG_ARM_SMMU_TEGRA in defconfig
* Added SMMU nodes in Tegra194 device tree

Krishna Reddy (5):
  iommu/arm-smmu: rearrange arm-smmu.c code
  iommu/arm-smmu: Prepare fault, probe, sync functions for sharing code
  iommu/tegra194_smmu: Add Tegra194 SMMU driver
  arm64: defconfig: Enable ARM_SMMU_TEGRA
  arm64: tegra: Add SMMU nodes to Tegra194 device tree

 arch/arm64/boot/dts/nvidia/tegra194.dtsi |  148 ++
 arch/arm64/configs/defconfig |1 +
 drivers/iommu/Kconfig|   10 +
 drivers/iommu/Makefile   |1 +
 drivers/iommu/arm-smmu-common.c  | 1971 +++
 drivers/iommu/arm-smmu-common.h  |  256 
 drivers/iommu/arm-smmu.c | 2167 +-
 drivers/iommu/tegra194-smmu.c|  201 +++
 8 files changed, 2595 insertions(+), 2160 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-common.c
 create mode 100644 drivers/iommu/arm-smmu-common.h
 create mode 100644 drivers/iommu/tegra194-smmu.c

-- 
2.1.4



[PATCH v2 0/5] Add Tegra194 Dual ARM SMMU driver

2018-10-31 Thread Krishna Reddy
NVIDIA's Xavier (Tegra194) SOC has three ARM SMMU(MMU-500) instances.
Two of the SMMU instances are used to interleave IOVA accesses across them.
The IOVA accesses from HW devices are interleaved across these two SMMU 
instances
and they need to be programmed identical.

The existing ARM SMMU driver can't be used in its current form for programming 
the
two SMMU instances identically. But, most of the code can be shared between ARM 
SMMU
driver and Tegra194 SMMU driver.
Page fault handling and TLB sync operations need to know about specific instance
of SMMU for correct fault handling and optimal TLB sync wait. Rest of the code 
doesn't
need to know about number of SMMU instances. Based on this fact, The patch 
series here
rearranges the arm-smmu.c code to allow sharing most of the ARM SMMU 
programming/iommu_ops
code between ARM SMMU driver and Tegra194 SMMU driver and transparently
handles programming of two SMMU instances.

The third SMMU instance would use the existing ARM SMMU driver.

Changes in v2:
* Added CONFIG_ARM_SMMU_TEGRA to protect Tegra194 SMMU driver compilation
* Enabled CONFIG_ARM_SMMU_TEGRA in defconfig
* Added SMMU nodes in Tegra194 device tree

Krishna Reddy (5):
  iommu/arm-smmu: rearrange arm-smmu.c code
  iommu/arm-smmu: Prepare fault, probe, sync functions for sharing code
  iommu/tegra194_smmu: Add Tegra194 SMMU driver
  arm64: defconfig: Enable ARM_SMMU_TEGRA
  arm64: tegra: Add SMMU nodes to Tegra194 device tree

 arch/arm64/boot/dts/nvidia/tegra194.dtsi |  148 ++
 arch/arm64/configs/defconfig |1 +
 drivers/iommu/Kconfig|   10 +
 drivers/iommu/Makefile   |1 +
 drivers/iommu/arm-smmu-common.c  | 1971 +++
 drivers/iommu/arm-smmu-common.h  |  256 
 drivers/iommu/arm-smmu.c | 2167 +-
 drivers/iommu/tegra194-smmu.c|  201 +++
 8 files changed, 2595 insertions(+), 2160 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-common.c
 create mode 100644 drivers/iommu/arm-smmu-common.h
 create mode 100644 drivers/iommu/tegra194-smmu.c

-- 
2.1.4



[PATCH 2/2] arm64: dts: tegra186: Enable IOMMU for SDHCI

2018-10-16 Thread Krishna Reddy
Enable IOMMU for all SDHCI controllers in Tegra186.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 230c0c8..996997e 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -234,6 +234,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0340 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC1>;
clocks = < TEGRA186_CLK_SDMMC1>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC1>;
@@ -259,6 +260,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0342 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC2>;
clocks = < TEGRA186_CLK_SDMMC2>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC2>;
@@ -279,6 +281,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0344 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC3>;
clocks = < TEGRA186_CLK_SDMMC3>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC3>;
@@ -301,6 +304,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0346 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC4>;
clocks = < TEGRA186_CLK_SDMMC4>;
clock-names = "sdhci";
assigned-clocks = < TEGRA186_CLK_SDMMC4>,
-- 
2.1.4



[PATCH 2/2] arm64: dts: tegra186: Enable IOMMU for SDHCI

2018-10-16 Thread Krishna Reddy
Enable IOMMU for all SDHCI controllers in Tegra186.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 230c0c8..996997e 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -234,6 +234,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0340 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC1>;
clocks = < TEGRA186_CLK_SDMMC1>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC1>;
@@ -259,6 +260,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0342 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC2>;
clocks = < TEGRA186_CLK_SDMMC2>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC2>;
@@ -279,6 +281,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0344 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC3>;
clocks = < TEGRA186_CLK_SDMMC3>;
clock-names = "sdhci";
resets = < TEGRA186_RESET_SDMMC3>;
@@ -301,6 +304,7 @@
compatible = "nvidia,tegra186-sdhci";
reg = <0x0 0x0346 0x0 0x1>;
interrupts = ;
+   iommus = < TEGRA186_SID_SDMMC4>;
clocks = < TEGRA186_CLK_SDMMC4>;
clock-names = "sdhci";
assigned-clocks = < TEGRA186_CLK_SDMMC4>,
-- 
2.1.4



[PATCH 1/2] arm64: dts: tegra186: Add dma-ranges to avoid using bounce buffers

2018-10-16 Thread Krishna Reddy
Add dma-ranges to avoid using DMA bounce buffers unnecessarily for
the devices that can address the physcial memory and don't
have SMMU enabled.

This also resolves the failures in attaching devices to IOMMU.
The following error is caused by the check in io-pgtable-arm.c, where
the dma address is expected to match the physical address for the IOMMU
devices that don't support coherent page table walking. Bounce buffer
usage is causing the mismatch and device add failure.

[7.000461] arm-smmu 1200.iommu: Cannot accommodate DMA
translation for IOMMU page tables
[7.010513] iommu: Failed to add device 1520.display to group 0:
-12

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 2f3c8e2..230c0c8 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -14,6 +14,7 @@
interrupt-parent = <>;
#address-cells = <2>;
#size-cells = <2>;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x4 0x0>;
 
misc@10 {
compatible = "nvidia,tegra186-misc";
-- 
2.1.4



[PATCH 1/2] arm64: dts: tegra186: Add dma-ranges to avoid using bounce buffers

2018-10-16 Thread Krishna Reddy
Add dma-ranges to avoid using DMA bounce buffers unnecessarily for
the devices that can address the physcial memory and don't
have SMMU enabled.

This also resolves the failures in attaching devices to IOMMU.
The following error is caused by the check in io-pgtable-arm.c, where
the dma address is expected to match the physical address for the IOMMU
devices that don't support coherent page table walking. Bounce buffer
usage is causing the mismatch and device add failure.

[7.000461] arm-smmu 1200.iommu: Cannot accommodate DMA
translation for IOMMU page tables
[7.010513] iommu: Failed to add device 1520.display to group 0:
-12

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 2f3c8e2..230c0c8 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -14,6 +14,7 @@
interrupt-parent = <>;
#address-cells = <2>;
#size-cells = <2>;
+   dma-ranges = <0x0 0x0 0x0 0x0 0x4 0x0>;
 
misc@10 {
compatible = "nvidia,tegra186-misc";
-- 
2.1.4



[PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS

2017-11-09 Thread Krishna Reddy
MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory
access fault, when HMM_DMIRROR test is enabled.
In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48.
HMM_DMIRROR test selects phys memory range from end based on
MAX_PHYSMEM_BITS and gets mapped into VA space linearly.
As VA space is 39-bit and phys space is 48-bit, this has caused
incorrect mapping and leads to memory access fault.

Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is
the right thing instead of hard coding it as 48-bit always.

[3.378655] Unable to handle kernel paging request at virtual address 
3befd00
[3.378662] pgd = ff800a04b000
[3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, 
*pmd=006268200711
[3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP
[3.378938] Modules linked in:
[3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.9.52-tegra-g91402fdc013b-dirty #51
[3.378950] Hardware name: quill (DT)
[3.378954] task: ffc1ebac task.stack: ffc1eba64000
[3.378967] PC is at __memset+0x1ac/0x1d0
[3.378976] LR is at sparse_add_one_section+0xf8/0x174
[3.378981] pc : [] lr : [] pstate: 
404000c5
[3.378983] sp : ffc1eba67a40
[3.378993] x29: ffc1eba67a40 x28: 
[3.378999] x27: 0003 x26: 0040
[3.379005] x25: 03ff x24: ffc1e9f6cf80
[3.379010] x23: ff8009ecb2d4 x22: 03befd00
[3.379015] x21: ffc1e9923ff0 x20: 0003
[3.379020] x19: ffef x18: 
[3.379025] x17: 24d7 x16: 
[3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c
[3.379035] x13: ffc1e9f6c70b x12: 0030
[3.379039] x11: 0040 x10: 0101010101010101
[3.379044] x9 :  x8 : 03befd00
[3.379049] x7 :  x6 : 003f
[3.379053] x5 : 0040 x4 : 
[3.379058] x3 : 0004 x2 : 00c0
[3.379063] x1 :  x0 : 03befd00
[3.379064]
[3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028)
[3.379071] Call trace:
[3.379079] [] __memset+0x1ac/0x1d0
[3.379085] [] __add_pages+0x130/0x2e0
[3.379093] [] hmm_devmem_pages_create+0x20c/0x310
[3.379100] [] hmm_devmem_add+0x1d4/0x270
[3.379128] [] dmirror_probe+0x50/0x158
[3.379137] [] platform_drv_probe+0x60/0xc8
[3.379143] [] driver_probe_device+0x26c/0x420
[3.379149] [] __driver_attach+0x124/0x128
[3.379155] [] bus_for_each_dev+0x88/0xe8
[3.379166] [] driver_attach+0x30/0x40
[3.379171] [] bus_add_driver+0x1f8/0x2b0
[3.379177] [] driver_register+0x68/0x100
[3.379183] [] __platform_driver_register+0x5c/0x68
[3.379192] [] hmm_dmirror_init+0x88/0xc4
[3.379200] [] do_one_initcall+0x5c/0x170
[3.379208] [] kernel_init_freeable+0x1b8/0x258
[3.379231] [] kernel_init+0x18/0x108
[3.379236] [] ret_from_fork+0x10/0x40
[3.379246] ---[ end trace 578db63bb139b8b8 ]---

Signed-off-by: Krishna Reddy <vdu...@nvidia.com>
---
 arch/arm64/include/asm/sparsemem.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h
index 74a9d301819f..19ecd0b0f3a3 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -17,7 +17,13 @@
 #define __ASM_SPARSEMEM_H
 
 #ifdef CONFIG_SPARSEMEM
+
+#ifdef CONFIG_ARM64_VA_BITS
+#define MAX_PHYSMEM_BITS   CONFIG_ARM64_VA_BITS
+#else
 #define MAX_PHYSMEM_BITS   48
+#endif
+
 #define SECTION_SIZE_BITS  30
 #endif
 
-- 
2.1.4



[PATCH] arm64: mm: Set MAX_PHYSMEM_BITS based on ARM64_VA_BITS

2017-11-09 Thread Krishna Reddy
MAX_PHYSMEM_BITS greater than ARM64_VA_BITS is causing memory
access fault, when HMM_DMIRROR test is enabled.
In the failing case, ARM64_VA_BITS=39 and MAX_PHYSMEM_BITS=48.
HMM_DMIRROR test selects phys memory range from end based on
MAX_PHYSMEM_BITS and gets mapped into VA space linearly.
As VA space is 39-bit and phys space is 48-bit, this has caused
incorrect mapping and leads to memory access fault.

Limiting the MAX_PHYSMEM_BITS to ARM64_VA_BITS fixes the issue and is
the right thing instead of hard coding it as 48-bit always.

[3.378655] Unable to handle kernel paging request at virtual address 
3befd00
[3.378662] pgd = ff800a04b000
[3.378900] [3befd00] *pgd=81fa3003, *pud=81fa3003, 
*pmd=006268200711
[3.378933] Internal error: Oops: 9644 [#1] PREEMPT SMP
[3.378938] Modules linked in:
[3.378948] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 
4.9.52-tegra-g91402fdc013b-dirty #51
[3.378950] Hardware name: quill (DT)
[3.378954] task: ffc1ebac task.stack: ffc1eba64000
[3.378967] PC is at __memset+0x1ac/0x1d0
[3.378976] LR is at sparse_add_one_section+0xf8/0x174
[3.378981] pc : [] lr : [] pstate: 
404000c5
[3.378983] sp : ffc1eba67a40
[3.378993] x29: ffc1eba67a40 x28: 
[3.378999] x27: 0003 x26: 0040
[3.379005] x25: 03ff x24: ffc1e9f6cf80
[3.379010] x23: ff8009ecb2d4 x22: 03befd00
[3.379015] x21: ffc1e9923ff0 x20: 0003
[3.379020] x19: ffef x18: 
[3.379025] x17: 24d7 x16: 
[3.379030] x15: ff8009cd8690 x14: ffc1e9f6c70c
[3.379035] x13: ffc1e9f6c70b x12: 0030
[3.379039] x11: 0040 x10: 0101010101010101
[3.379044] x9 :  x8 : 03befd00
[3.379049] x7 :  x6 : 003f
[3.379053] x5 : 0040 x4 : 
[3.379058] x3 : 0004 x2 : 00c0
[3.379063] x1 :  x0 : 03befd00
[3.379064]
[3.379069] Process swapper/0 (pid: 1, stack limit = 0xffc1eba64028)
[3.379071] Call trace:
[3.379079] [] __memset+0x1ac/0x1d0
[3.379085] [] __add_pages+0x130/0x2e0
[3.379093] [] hmm_devmem_pages_create+0x20c/0x310
[3.379100] [] hmm_devmem_add+0x1d4/0x270
[3.379128] [] dmirror_probe+0x50/0x158
[3.379137] [] platform_drv_probe+0x60/0xc8
[3.379143] [] driver_probe_device+0x26c/0x420
[3.379149] [] __driver_attach+0x124/0x128
[3.379155] [] bus_for_each_dev+0x88/0xe8
[3.379166] [] driver_attach+0x30/0x40
[3.379171] [] bus_add_driver+0x1f8/0x2b0
[3.379177] [] driver_register+0x68/0x100
[3.379183] [] __platform_driver_register+0x5c/0x68
[3.379192] [] hmm_dmirror_init+0x88/0xc4
[3.379200] [] do_one_initcall+0x5c/0x170
[3.379208] [] kernel_init_freeable+0x1b8/0x258
[3.379231] [] kernel_init+0x18/0x108
[3.379236] [] ret_from_fork+0x10/0x40
[3.379246] ---[ end trace 578db63bb139b8b8 ]---

Signed-off-by: Krishna Reddy 
---
 arch/arm64/include/asm/sparsemem.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h
index 74a9d301819f..19ecd0b0f3a3 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -17,7 +17,13 @@
 #define __ASM_SPARSEMEM_H
 
 #ifdef CONFIG_SPARSEMEM
+
+#ifdef CONFIG_ARM64_VA_BITS
+#define MAX_PHYSMEM_BITS   CONFIG_ARM64_VA_BITS
+#else
 #define MAX_PHYSMEM_BITS   48
+#endif
+
 #define SECTION_SIZE_BITS  30
 #endif
 
-- 
2.1.4



[PATCH] arm64: tegra: Add SMMU node for Tegra186

2017-09-13 Thread Krishna Reddy
Add the DT node for ARM SMMU on Tegra186.

Signed-off-by: Krishna Reddy <vdu...@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 73 
 1 file changed, 73 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 0b0552c9f7dd..e2c3ad203c93 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -355,6 +355,79 @@
nvidia,bpmp = <>;
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500";
+   reg = <0 0x1200 0 0x80>;
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   #iommu-cells = <1>;
+   stream-match-mask = <0x7F80>;
+   };
+
gpu@1700 {
compatible = "nvidia,gp10b";
reg = <0x0 0x1700 0x0 0x100>,
-- 
2.1.4



[PATCH] arm64: tegra: Add SMMU node for Tegra186

2017-09-13 Thread Krishna Reddy
Add the DT node for ARM SMMU on Tegra186.

Signed-off-by: Krishna Reddy 
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 73 
 1 file changed, 73 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 0b0552c9f7dd..e2c3ad203c93 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -355,6 +355,79 @@
nvidia,bpmp = <>;
};
 
+   smmu: iommu@1200 {
+   compatible = "arm,mmu-500";
+   reg = <0 0x1200 0 0x80>;
+   #global-interrupts = <1>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   #iommu-cells = <1>;
+   stream-match-mask = <0x7F80>;
+   };
+
gpu@1700 {
compatible = "nvidia,gp10b";
reg = <0x0 0x1700 0x0 0x100>,
-- 
2.1.4



[PATCH] mmc: tegra: Mark 64 bit dma broken on Tegra186

2017-09-08 Thread Krishna Reddy
SDHCI controllers on Tegra186 support 40 bit addressing.
IOVA addresses are 48-bit wide on Tegra186.
SDHCI host common code sets dma mask as either 32-bit or 64-bit.
To avoid access issues when SMMU is enabled, disable 64-bit dma.

Signed-off-by: Krishna Reddy <vdu...@nvidia.com>
---
 drivers/mmc/host/sdhci-tegra.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
index 0cd6fa80db66..b877c13184c2 100644
--- a/drivers/mmc/host/sdhci-tegra.c
+++ b/drivers/mmc/host/sdhci-tegra.c
@@ -422,7 +422,15 @@ static const struct sdhci_pltfm_data sdhci_tegra186_pdata 
= {
  SDHCI_QUIRK_NO_HISPD_BIT |
  SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC |
  SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
-   .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
+   .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
+  /* SDHCI controllers on Tegra186 support 40-bit addressing.
+   * IOVA addresses are 48-bit wide on Tegra186.
+   * With 64-bit dma mask used for SDHCI, accesses can
+   * be broken. Disable 64-bit dma, which would fall back
+   * to 32-bit dma mask. Ideally 40-bit dma mask would work,
+   * But it is not supported as of now.
+   */
+  SDHCI_QUIRK2_BROKEN_64_BIT_DMA,
.ops  = _sdhci_ops,
 };
 
-- 
2.1.4



[PATCH] mmc: tegra: Mark 64 bit dma broken on Tegra186

2017-09-08 Thread Krishna Reddy
SDHCI controllers on Tegra186 support 40 bit addressing.
IOVA addresses are 48-bit wide on Tegra186.
SDHCI host common code sets dma mask as either 32-bit or 64-bit.
To avoid access issues when SMMU is enabled, disable 64-bit dma.

Signed-off-by: Krishna Reddy 
---
 drivers/mmc/host/sdhci-tegra.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/sdhci-tegra.c b/drivers/mmc/host/sdhci-tegra.c
index 0cd6fa80db66..b877c13184c2 100644
--- a/drivers/mmc/host/sdhci-tegra.c
+++ b/drivers/mmc/host/sdhci-tegra.c
@@ -422,7 +422,15 @@ static const struct sdhci_pltfm_data sdhci_tegra186_pdata 
= {
  SDHCI_QUIRK_NO_HISPD_BIT |
  SDHCI_QUIRK_BROKEN_ADMA_ZEROLEN_DESC |
  SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN,
-   .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
+   .quirks2 = SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
+  /* SDHCI controllers on Tegra186 support 40-bit addressing.
+   * IOVA addresses are 48-bit wide on Tegra186.
+   * With 64-bit dma mask used for SDHCI, accesses can
+   * be broken. Disable 64-bit dma, which would fall back
+   * to 32-bit dma mask. Ideally 40-bit dma mask would work,
+   * But it is not supported as of now.
+   */
+  SDHCI_QUIRK2_BROKEN_64_BIT_DMA,
.ops  = _sdhci_ops,
 };
 
-- 
2.1.4



RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-09-08 Thread Krishna Reddy
>OK, so that's really just another variant of the existing problem we have with 
>certain PCI root complexes with restrictive inbound windows.
>The appropriate way to handle that is to reserve the unusable areas of the 
>IOVA space up-front.
> Since the support for the ACPI equivalent of "dma-ranges" has just landed, 
> this is now pretty much top of my to-do-list for
> the upcoming cycle (there's various things still to fix in the DT code, but 
> that's essentially part of the same job).

Reserving upfront would work. If you already have in it in to-do-list, we will 
wait for your patches.

>In the case of a 64-bit-capable IP block with only 34 bits of address wired up 
>externally, if that 34-bit interconnect is described by "dma-ranges"
> then the device will already be created with an appropriate 34-bit DMA mask. 
> The fact that the driver can stomp on that with a 64-bit mask later 
>is entirely down to the implementations of
>dma_set_mask() etc. (I've had a patch to restrict masks for arm64 for a while, 
>but I worry that it carries quite a high risk of breakage in default cases).

Exactly, It is the stomping issue. dma-ranges code updates the mask earlier 
than sdhci code. Sdhci code is overwriting the dma mask afterwards.

-KR


RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-09-08 Thread Krishna Reddy
>OK, so that's really just another variant of the existing problem we have with 
>certain PCI root complexes with restrictive inbound windows.
>The appropriate way to handle that is to reserve the unusable areas of the 
>IOVA space up-front.
> Since the support for the ACPI equivalent of "dma-ranges" has just landed, 
> this is now pretty much top of my to-do-list for
> the upcoming cycle (there's various things still to fix in the DT code, but 
> that's essentially part of the same job).

Reserving upfront would work. If you already have in it in to-do-list, we will 
wait for your patches.

>In the case of a 64-bit-capable IP block with only 34 bits of address wired up 
>externally, if that 34-bit interconnect is described by "dma-ranges"
> then the device will already be created with an appropriate 34-bit DMA mask. 
> The fact that the driver can stomp on that with a 64-bit mask later 
>is entirely down to the implementations of
>dma_set_mask() etc. (I've had a patch to restrict masks for arm64 for a while, 
>but I worry that it carries quite a high risk of breakage in default cases).

Exactly, It is the stomping issue. dma-ranges code updates the mask earlier 
than sdhci code. Sdhci code is overwriting the dma mask afterwards.

-KR


RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-09-05 Thread Krishna Reddy
Robin,
>>Why? IOVA allocation is already constrained as much as it should be - if the 
>>device's DMA mask is wrong that's another problem, and this isn't the right 
>>place to fix it.

This is because of following reasons.
1. Many of the HW modules in Tegra114 and Tegra30 can't access IOVA that 
overlap with MMIO. Even though these HW modules support 32-bit addressing, They 
can't access IOVA overlapping with range 0xff00: to 0x:f, which is 
MMIO region for high vectors. These modules can only see 0x8000: to 
0xff00: as IOVA. If dma mask is set to use 31-bit, the IOVA range reduces 
from ~2GB to 1GB.  The patch is to get most of IOVA range usable by using 
dma-ranges property. As we already use dma-ranges start as bottom on IOVA 
range, The patch is to restrict the IOVA top as well using dma-ranges range. 
2. Most of  the drivers in Linux Kernel are setting mask as either 32-bit or 
64-bit.  Especially, I am referring to driver/mmc/host/sdhci.c 
(sdhci_set_dma_mask()) here.  In Tegra124/210/186, HW modules support 34-bit 
addressing.  But, sdhci only support setting mask as either 32-bit or 64-bit.  
As you pointed, This can be fixed by changing mask in sdhci common code.  Or 
This patch can help here as well without changing the driver code. 1 is the 
main driving factor for this patch.

>>dma_32bit_pfn means nothing more than an internal detail of IOVA allocator 
>>caching, which is subject to change[1]. As-is, on some platforms this patch 
>>will effectively force all allocations to fail already.

I see that your patches are removing specifying end of IOVA during 
init_iova_domain() and use mask as the end of IOVA range.  Do you have any 
suggestion on how to handle 1 in a way to use most of IOVA range supported by 
HW?  Can IOVA code look for dma-ranges on its own and limit the iova top to 
lowest of mask and dma-ranges, if it is present? or any other ways you can 
think of?

-KR

-Original Message-
From: Robin Murphy [mailto:robin.mur...@arm.com] 
Sent: Friday, September 1, 2017 2:43 AM
To: Joerg Roedel <j...@8bytes.org>; Krishna Reddy <vdu...@nvidia.com>
Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

On 01/09/17 10:26, Joerg Roedel wrote:
> Adding Robin for review.
> 
> On Thu, Aug 31, 2017 at 03:08:21PM -0700, Krishna Reddy wrote:
>> Limit the IOVA allocated to dma-ranges specified for the device.
>> This is necessary to ensure that IOVA allocated is addressable by 
>> device.

Why? IOVA allocation is already constrained as much as it should be - if the 
device's DMA mask is wrong that's another problem, and this isn't the right 
place to fix it.

dma_32bit_pfn means nothing more than an internal detail of IOVA allocator 
caching, which is subject to change[1]. As-is, on some platforms this patch 
will effectively force all allocations to fail already.

Robin.

[1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19694.html
>> Signed-off-by: Krishna Reddy <vdu...@nvidia.com>
>> ---
>>  drivers/iommu/dma-iommu.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c 
>> index 9d1cebe7f6cb..e8a8320b571b 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>> iommu_domain *domain,
>>  struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>  struct iova_domain *iovad = >iovad;
>>  unsigned long shift, iova_len, iova = 0;
>> +dma_addr_t dma_end_addr;
>>  
>>  if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
>>  cookie->msi_iova += size;
>> @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>> iommu_domain *domain,
>>  if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>  iova_len = roundup_pow_of_two(iova_len);
>>  
>> +/* Limit IOVA allocated to device addressable dma-ranges region. */
>> +dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift;
>> +dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit;
> 
> This looks like a good use-case for min().
> 
>> +
>>  if (domain->geometry.force_aperture)
>>  dma_limit = min(dma_limit, domain->geometry.aperture_end);
>>  
>> --
>> 2.1.4



RE: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-09-05 Thread Krishna Reddy
Robin,
>>Why? IOVA allocation is already constrained as much as it should be - if the 
>>device's DMA mask is wrong that's another problem, and this isn't the right 
>>place to fix it.

This is because of following reasons.
1. Many of the HW modules in Tegra114 and Tegra30 can't access IOVA that 
overlap with MMIO. Even though these HW modules support 32-bit addressing, They 
can't access IOVA overlapping with range 0xff00: to 0x:f, which is 
MMIO region for high vectors. These modules can only see 0x8000: to 
0xff00: as IOVA. If dma mask is set to use 31-bit, the IOVA range reduces 
from ~2GB to 1GB.  The patch is to get most of IOVA range usable by using 
dma-ranges property. As we already use dma-ranges start as bottom on IOVA 
range, The patch is to restrict the IOVA top as well using dma-ranges range. 
2. Most of  the drivers in Linux Kernel are setting mask as either 32-bit or 
64-bit.  Especially, I am referring to driver/mmc/host/sdhci.c 
(sdhci_set_dma_mask()) here.  In Tegra124/210/186, HW modules support 34-bit 
addressing.  But, sdhci only support setting mask as either 32-bit or 64-bit.  
As you pointed, This can be fixed by changing mask in sdhci common code.  Or 
This patch can help here as well without changing the driver code. 1 is the 
main driving factor for this patch.

>>dma_32bit_pfn means nothing more than an internal detail of IOVA allocator 
>>caching, which is subject to change[1]. As-is, on some platforms this patch 
>>will effectively force all allocations to fail already.

I see that your patches are removing specifying end of IOVA during 
init_iova_domain() and use mask as the end of IOVA range.  Do you have any 
suggestion on how to handle 1 in a way to use most of IOVA range supported by 
HW?  Can IOVA code look for dma-ranges on its own and limit the iova top to 
lowest of mask and dma-ranges, if it is present? or any other ways you can 
think of?

-KR

-Original Message-
From: Robin Murphy [mailto:robin.mur...@arm.com] 
Sent: Friday, September 1, 2017 2:43 AM
To: Joerg Roedel ; Krishna Reddy 
Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

On 01/09/17 10:26, Joerg Roedel wrote:
> Adding Robin for review.
> 
> On Thu, Aug 31, 2017 at 03:08:21PM -0700, Krishna Reddy wrote:
>> Limit the IOVA allocated to dma-ranges specified for the device.
>> This is necessary to ensure that IOVA allocated is addressable by 
>> device.

Why? IOVA allocation is already constrained as much as it should be - if the 
device's DMA mask is wrong that's another problem, and this isn't the right 
place to fix it.

dma_32bit_pfn means nothing more than an internal detail of IOVA allocator 
caching, which is subject to change[1]. As-is, on some platforms this patch 
will effectively force all allocations to fail already.

Robin.

[1]:https://www.mail-archive.com/iommu@lists.linux-foundation.org/msg19694.html
>> Signed-off-by: Krishna Reddy 
>> ---
>>  drivers/iommu/dma-iommu.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c 
>> index 9d1cebe7f6cb..e8a8320b571b 100644
>> --- a/drivers/iommu/dma-iommu.c
>> +++ b/drivers/iommu/dma-iommu.c
>> @@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>> iommu_domain *domain,
>>  struct iommu_dma_cookie *cookie = domain->iova_cookie;
>>  struct iova_domain *iovad = >iovad;
>>  unsigned long shift, iova_len, iova = 0;
>> +dma_addr_t dma_end_addr;
>>  
>>  if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
>>  cookie->msi_iova += size;
>> @@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
>> iommu_domain *domain,
>>  if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
>>  iova_len = roundup_pow_of_two(iova_len);
>>  
>> +/* Limit IOVA allocated to device addressable dma-ranges region. */
>> +dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift;
>> +dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit;
> 
> This looks like a good use-case for min().
> 
>> +
>>  if (domain->geometry.force_aperture)
>>  dma_limit = min(dma_limit, domain->geometry.aperture_end);
>>  
>> --
>> 2.1.4



[PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-08-31 Thread Krishna Reddy
Limit the IOVA allocated to dma-ranges specified for the device.
This is necessary to ensure that IOVA allocated is addressable
by device.

Signed-off-by: Krishna Reddy <vdu...@nvidia.com>
---
 drivers/iommu/dma-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9d1cebe7f6cb..e8a8320b571b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long shift, iova_len, iova = 0;
+   dma_addr_t dma_end_addr;
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
iova_len = roundup_pow_of_two(iova_len);
 
+   /* Limit IOVA allocated to device addressable dma-ranges region. */
+   dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift;
+   dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit;
+
if (domain->geometry.force_aperture)
dma_limit = min(dma_limit, domain->geometry.aperture_end);
 
-- 
2.1.4



[PATCH] iommu/dma: limit the IOVA allocated to dma-ranges region

2017-08-31 Thread Krishna Reddy
Limit the IOVA allocated to dma-ranges specified for the device.
This is necessary to ensure that IOVA allocated is addressable
by device.

Signed-off-by: Krishna Reddy 
---
 drivers/iommu/dma-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 9d1cebe7f6cb..e8a8320b571b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -364,6 +364,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long shift, iova_len, iova = 0;
+   dma_addr_t dma_end_addr;
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -381,6 +382,10 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
iova_len = roundup_pow_of_two(iova_len);
 
+   /* Limit IOVA allocated to device addressable dma-ranges region. */
+   dma_end_addr = (dma_addr_t)iovad->dma_32bit_pfn << shift;
+   dma_limit = dma_limit > dma_end_addr ? dma_end_addr : dma_limit;
+
if (domain->geometry.force_aperture)
dma_limit = min(dma_limit, domain->geometry.aperture_end);
 
-- 
2.1.4



RE: [HMM v17 09/14] mm/hmm/mirror: mirror process address space on device with HMM helpers

2017-03-13 Thread Krishna Reddy
+/*
+ * struct hmm_mirror_ops - HMM mirror device operations callback
+ *
+ * @update: callback to update range on a device  */ struct 
+hmm_mirror_ops {
+   /* update() - update virtual address range of memory
+*
+* @mirror: pointer to struct hmm_mirror
+* @update: update's type (turn read only, unmap, ...)
+* @start: virtual start address of the range to update
+* @end: virtual end address of the range to update
...
+*/
+   void (*update)(struct hmm_mirror *mirror,
+  enum hmm_update action,
+  unsigned long start,
+  unsigned long end);
+};

minor arg documentation issue. @update should be @action. 



RE: [HMM v17 09/14] mm/hmm/mirror: mirror process address space on device with HMM helpers

2017-03-13 Thread Krishna Reddy
+/*
+ * struct hmm_mirror_ops - HMM mirror device operations callback
+ *
+ * @update: callback to update range on a device  */ struct 
+hmm_mirror_ops {
+   /* update() - update virtual address range of memory
+*
+* @mirror: pointer to struct hmm_mirror
+* @update: update's type (turn read only, unmap, ...)
+* @start: virtual start address of the range to update
+* @end: virtual end address of the range to update
...
+*/
+   void (*update)(struct hmm_mirror *mirror,
+  enum hmm_update action,
+  unsigned long start,
+  unsigned long end);
+};

minor arg documentation issue. @update should be @action. 



RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-21 Thread Krishna Reddy
> > The device(H/W controller) need to access few special memory
> > blocks(IOVA==PA) and DRAM as well.
> 
> OK, so only /some/ of the VA space is VA==PA, and some is remapped; that's a
> little different that what you originally implied above.
> 
> BTW, which HW module is this; AVP/COP or something else. This sounds like an
> odd requirement.

This is not specific to ARM7. There are protected memory regions on Tegra that
can be accessed by some controllers like display, 2D, 3D, VDE, HDA. These are
DRAM regions configured as protected by BootRom. These memory regions
are not exposed to and not managed by OS page allocator. The H/W controller
 accesses to these regions still to go through IOMMU.
The IOMMU view for all the H/W controllers is not uniform on Tegra.
Some Controllers see entire 4GB IOVA space. i.e all accesses go though IOMMU.
Some controllers see the IOVA Space that don't overlap with MMIO space.  i.e
The MMIO address access bypass IOMMU and directly go to MMIO space.
Tegra IOMMU can support multiple address spaces as well. To hide controller
Specific behavior, the drivers should take care of one to one mapping and
remove inaccessible iova spaces in their address space's based platform device 
info.

In my initial mail, I referred protected memory regions as MMIO blocks, which
is incorrect.




-KR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-21 Thread Krishna Reddy
  The device(H/W controller) need to access few special memory
  blocks(IOVA==PA) and DRAM as well.
 
 OK, so only /some/ of the VA space is VA==PA, and some is remapped; that's a
 little different that what you originally implied above.
 
 BTW, which HW module is this; AVP/COP or something else. This sounds like an
 odd requirement.

This is not specific to ARM7. There are protected memory regions on Tegra that
can be accessed by some controllers like display, 2D, 3D, VDE, HDA. These are
DRAM regions configured as protected by BootRom. These memory regions
are not exposed to and not managed by OS page allocator. The H/W controller
 accesses to these regions still to go through IOMMU.
The IOMMU view for all the H/W controllers is not uniform on Tegra.
Some Controllers see entire 4GB IOVA space. i.e all accesses go though IOMMU.
Some controllers see the IOVA Space that don't overlap with MMIO space.  i.e
The MMIO address access bypass IOMMU and directly go to MMIO space.
Tegra IOMMU can support multiple address spaces as well. To hide controller
Specific behavior, the drivers should take care of one to one mapping and
remove inaccessible iova spaces in their address space's based platform device 
info.

In my initial mail, I referred protected memory regions as MMIO blocks, which
is incorrect.




-KR

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-20 Thread Krishna Reddy
> > On Tegra, the following use cases need specific IOVA mapping.
> > 1. Few MMIO blocks need IOVA=PA mapping setup.
> 
> In that case, why would we enable the IOMMU for that one device; IOMMU
> disabled means VA==PA, right? Perhaps isolation of the device so it can only
> access certain PA ranges for security?

The device(H/W controller) need to access few special memory blocks(IOVA==PA)
and DRAM as well. If IOMMU is disabled, then it has to handle memory 
fragmentation,
 which defeats the purpose of IOMMU support.
There is also a case where frame buffer memory is passed from BootLoader to 
Kernel and
display H/W  continues to access it with IOMMU enabled. To support this, the 
one to one
mapping has to be setup before enabling IOMMU.

-KR



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-20 Thread Krishna Reddy
  On Tegra, the following use cases need specific IOVA mapping.
  1. Few MMIO blocks need IOVA=PA mapping setup.
 
 In that case, why would we enable the IOMMU for that one device; IOMMU
 disabled means VA==PA, right? Perhaps isolation of the device so it can only
 access certain PA ranges for security?

The device(H/W controller) need to access few special memory blocks(IOVA==PA)
and DRAM as well. If IOMMU is disabled, then it has to handle memory 
fragmentation,
 which defeats the purpose of IOMMU support.
There is also a case where frame buffer memory is passed from BootLoader to 
Kernel and
display H/W  continues to access it with IOMMU enabled. To support this, the 
one to one
mapping has to be setup before enabling IOMMU.

-KR



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-19 Thread Krishna Reddy
> When a device driver would only use the IOMMU-API and needs small DMA-
> able areas it has to re-implement something like the DMA-API (basically an
> address allocator) for that. So I don't see a reason why both can't be used 
> in a
> device driver.

On Tegra, the following use cases need specific IOVA mapping.
1. Few MMIO blocks need IOVA=PA mapping setup.
2. CPU side loads the firmware into physical memory, which has to be
mapped to a specific IOVA address, as  firmware is statically linked based
 on specific IOVA address. 

DMA api's allow specifying only one address space per platform device.

For #1, DMA API can't be used as it doesn't allow mapping specific IOVA to PA.
IOMMU API can be used for mapping specific IOVA to PA. But, in order to use
 IOMMU API, the driver has to  dereference the dev pointer, get domain ptr,
 take lock, and allocate memory from dma_iommu_mapping.  This breaks
 the abstraction for struct device. Each device driver that need IOVA=PA has to
 do this, which is redundant.

For #2, physical memory allocations alone can be done through DMA as it also 
allocates IOVA space Implicitly. Even after allocating physical memory through
DMA API's, it would have same problem as #1 for IOVA to PA mapping.

If a fake device is expected to be created for specific IOVA allocation, then it
may  lead to creating multiple fake devices per specific IOVA and per 
ASID(unique IOVA address space).  As domain init would be done based on
device name, the fake device should have the same name as of original platform
device.

If DMA API allows allocating specific IOVA address and mapping IOVA to specific 
PA,
 device driver don't need to know any details of struct device and specifying
 one mapping per device is enough and no  need for fake devices.

Comments are much appreciated.

-KR


> -Original Message-
> From: Joerg Roedel [mailto:joerg.roe...@amd.com]
> Sent: Wednesday, September 19, 2012 5:50 AM
> To: Arnd Bergmann
> Cc: Hiroshi Doyu; m.szyprow...@samsung.com; li...@arm.linux.org.uk;
> minc...@kernel.org; chunsang.je...@linaro.org; linux-
> ker...@vger.kernel.org; subas...@gmail.com; linaro-mm-...@lists.linaro.org;
> linux...@kvack.org; io...@lists.linux-foundation.org; Krishna Reddy; linux-
> te...@vger.kernel.org; kyungmin.p...@samsung.com;
> pullip@samsung.com; linux-arm-ker...@lists.infradead.org
> Subject: Re: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA
> more precisely
> 
> On Wed, Sep 19, 2012 at 07:59:45AM +, Arnd Bergmann wrote:
> > On Wednesday 19 September 2012, Hiroshi Doyu wrote:
> > > I guess that it would work. Originally I thought that using DMA-API
> > > and IOMMU-API together in driver might be kind of layering violation
> > > since IOMMU-API itself is used in DMA-API. Only DMA-API used in
> > > driver might be cleaner. Considering that DMA API traditionally
> > > handling anonymous {bus,iova} address only, introducing the concept
> > > of specific address in DMA API may not be so encouraged, though.
> > >
> > > It would be nice to listen how other SoCs have solved similar needs.
> >
> > In general, I would recommend using only the IOMMU API when you have a
> > device driver that needs to control the bus virtual address space and
> > that manages a device that resides in its own IOMMU context. I would
> > recommend using only the dma-mapping API when you have a device that
> > lives in a shared bus virtual address space with other devices, and
> > then never ask for a specific bus virtual address.
> >
> > Can you explain what devices you see that don't fit in one of those
> > two categories?
> 
> Well, I don't think that a driver should limit to one of these 2 APIs. A 
> driver can
> very well use the IOMMU-API during initialization (for example to map the
> firmware to an address the device expects it to be) and use the DMA-API later
> during normal operation to exchange data with the device.
> 
> When a device driver would only use the IOMMU-API and needs small DMA-
> able areas it has to re-implement something like the DMA-API (basically an
> address allocator) for that. So I don't see a reason why both can't be used 
> in a
> device driver.
> 
> Regards,
> 
>   Joerg
> 
> --
> AMD Operating System Research Center
> 
> Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General
> Managers: Alberto Bozzo
> Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr.
> 43632

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA more precisely

2012-09-19 Thread Krishna Reddy
 When a device driver would only use the IOMMU-API and needs small DMA-
 able areas it has to re-implement something like the DMA-API (basically an
 address allocator) for that. So I don't see a reason why both can't be used 
 in a
 device driver.

On Tegra, the following use cases need specific IOVA mapping.
1. Few MMIO blocks need IOVA=PA mapping setup.
2. CPU side loads the firmware into physical memory, which has to be
mapped to a specific IOVA address, as  firmware is statically linked based
 on specific IOVA address. 

DMA api's allow specifying only one address space per platform device.

For #1, DMA API can't be used as it doesn't allow mapping specific IOVA to PA.
IOMMU API can be used for mapping specific IOVA to PA. But, in order to use
 IOMMU API, the driver has to  dereference the dev pointer, get domain ptr,
 take lock, and allocate memory from dma_iommu_mapping.  This breaks
 the abstraction for struct device. Each device driver that need IOVA=PA has to
 do this, which is redundant.

For #2, physical memory allocations alone can be done through DMA as it also 
allocates IOVA space Implicitly. Even after allocating physical memory through
DMA API's, it would have same problem as #1 for IOVA to PA mapping.

If a fake device is expected to be created for specific IOVA allocation, then it
may  lead to creating multiple fake devices per specific IOVA and per 
ASID(unique IOVA address space).  As domain init would be done based on
device name, the fake device should have the same name as of original platform
device.

If DMA API allows allocating specific IOVA address and mapping IOVA to specific 
PA,
 device driver don't need to know any details of struct device and specifying
 one mapping per device is enough and no  need for fake devices.

Comments are much appreciated.

-KR


 -Original Message-
 From: Joerg Roedel [mailto:joerg.roe...@amd.com]
 Sent: Wednesday, September 19, 2012 5:50 AM
 To: Arnd Bergmann
 Cc: Hiroshi Doyu; m.szyprow...@samsung.com; li...@arm.linux.org.uk;
 minc...@kernel.org; chunsang.je...@linaro.org; linux-
 ker...@vger.kernel.org; subas...@gmail.com; linaro-mm-...@lists.linaro.org;
 linux...@kvack.org; io...@lists.linux-foundation.org; Krishna Reddy; linux-
 te...@vger.kernel.org; kyungmin.p...@samsung.com;
 pullip@samsung.com; linux-arm-ker...@lists.infradead.org
 Subject: Re: [RFC 0/5] ARM: dma-mapping: New dma_map_ops to control IOVA
 more precisely
 
 On Wed, Sep 19, 2012 at 07:59:45AM +, Arnd Bergmann wrote:
  On Wednesday 19 September 2012, Hiroshi Doyu wrote:
   I guess that it would work. Originally I thought that using DMA-API
   and IOMMU-API together in driver might be kind of layering violation
   since IOMMU-API itself is used in DMA-API. Only DMA-API used in
   driver might be cleaner. Considering that DMA API traditionally
   handling anonymous {bus,iova} address only, introducing the concept
   of specific address in DMA API may not be so encouraged, though.
  
   It would be nice to listen how other SoCs have solved similar needs.
 
  In general, I would recommend using only the IOMMU API when you have a
  device driver that needs to control the bus virtual address space and
  that manages a device that resides in its own IOMMU context. I would
  recommend using only the dma-mapping API when you have a device that
  lives in a shared bus virtual address space with other devices, and
  then never ask for a specific bus virtual address.
 
  Can you explain what devices you see that don't fit in one of those
  two categories?
 
 Well, I don't think that a driver should limit to one of these 2 APIs. A 
 driver can
 very well use the IOMMU-API during initialization (for example to map the
 firmware to an address the device expects it to be) and use the DMA-API later
 during normal operation to exchange data with the device.
 
 When a device driver would only use the IOMMU-API and needs small DMA-
 able areas it has to re-implement something like the DMA-API (basically an
 address allocator) for that. So I don't see a reason why both can't be used 
 in a
 device driver.
 
 Regards,
 
   Joerg
 
 --
 AMD Operating System Research Center
 
 Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General
 Managers: Alberto Bozzo
 Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr.
 43632

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/