[PATCH] iommu/amd: Add prefetch iommu pages command build function

2020-09-05 Thread Wesley Sheng
Add function to build prefetch iommu pages command

Signed-off-by: Wesley Sheng 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 ++
 drivers/iommu/amd/iommu.c   | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index baa31cd2411c..73734a0c4679 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -173,6 +173,7 @@
 #define CMD_INV_IOMMU_PAGES0x03
 #define CMD_INV_IOTLB_PAGES0x04
 #define CMD_INV_IRT0x05
+#define CMD_PF_IOMMU_PAGES 0x06
 #define CMD_COMPLETE_PPR   0x07
 #define CMD_INV_ALL0x08
 
@@ -181,6 +182,7 @@
 #define CMD_INV_IOMMU_PAGES_SIZE_MASK  0x01
 #define CMD_INV_IOMMU_PAGES_PDE_MASK   0x02
 #define CMD_INV_IOMMU_PAGES_GN_MASK0x04
+#define CMD_PF_IOMMU_PAGES_INV_MASK0x10
 
 #define PPR_STATUS_MASK0xf
 #define PPR_STATUS_SHIFT   12
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ba9f3dbc5b94..b3971595b0e9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -976,6 +976,25 @@ static void build_inv_irt(struct iommu_cmd *cmd, u16 devid)
CMD_SET_TYPE(cmd, CMD_INV_IRT);
 }
 
+static void build_pf_iommu_pages(struct iommu_cmd *cmd, u64 address,
+   u16 devid, int pfcnt, bool size,
+   bool inv)
+{
+   memset(cmd, 0, sizeof(*cmd));
+
+   address &= PAGE_MASK;
+
+   cmd->data[0]  = devid;
+   cmd->data[0] |= (pfcnt & 0xff) << 24;
+   cmd->data[2]  = lower_32_bits(address);
+   cmd->data[3]  = upper_32_bits(address);
+   if (size)
+   cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
+   if (inv)
+   cmd->data[2] |= CMD_PF_IOMMU_PAGES_INV_MASK;
+   CMD_SET_TYPE(cmd, CMD_PF_IOMMU_PAGES);
+}
+
 /*
  * Writes the command to the IOMMUs command buffer and informs the
  * hardware about the new command.
-- 
2.16.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/8] iommu/arm-smmu: Support maintaining bootloader mappings

2020-09-05 Thread Rob Clark
On Fri, Sep 4, 2020 at 8:55 AM Bjorn Andersson
 wrote:
>
> Based on previous attempts and discussions this is the latest attempt at
> inheriting stream mappings set up by the bootloader, for e.g. boot splash or
> efifb.
>
> Per Will's request this builds on the work by Jordan and Rob for the Adreno
> SMMU support. It applies cleanly ontop of v16 of their series, which can be
> found at
> https://lore.kernel.org/linux-arm-msm/20200901164707.2645413-1-robdcl...@gmail.com/
>
> Bjorn Andersson (8):
>   iommu/arm-smmu: Refactor context bank allocation
>   iommu/arm-smmu: Delay modifying domain during init
>   iommu/arm-smmu: Consult context bank allocator for identify domains
>   iommu/arm-smmu-qcom: Emulate bypass by using context banks
>   iommu/arm-smmu-qcom: Consistently initialize stream mappings
>   iommu/arm-smmu: Add impl hook for inherit boot mappings
>   iommu/arm-smmu: Provide helper for allocating identity domain
>   iommu/arm-smmu-qcom: Setup identity domain for boot mappings

I have squashed 1/8 into v17 of the adreno-smmu series as suggested by
Bjorn, the remainder are:

Reviewed-by: Rob Clark 

and on the lenovo c630,

Tested-by: Rob Clark 

>  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 111 ++-
>  drivers/iommu/arm/arm-smmu/arm-smmu.c  | 122 ++---
>  drivers/iommu/arm/arm-smmu/arm-smmu.h  |  14 ++-
>  3 files changed, 205 insertions(+), 42 deletions(-)
>
> --
> 2.28.0
>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Add prefetch iommu pages command build function

2020-09-05 Thread Wesley Sheng
Add function to build prefetch iommu pages command

Signed-off-by: Wesley Sheng 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 ++
 drivers/iommu/amd/iommu.c   | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index baa31cd2411c..73734a0c4679 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -173,6 +173,7 @@
 #define CMD_INV_IOMMU_PAGES0x03
 #define CMD_INV_IOTLB_PAGES0x04
 #define CMD_INV_IRT0x05
+#define CMD_PF_IOMMU_PAGES 0x06
 #define CMD_COMPLETE_PPR   0x07
 #define CMD_INV_ALL0x08
 
@@ -181,6 +182,7 @@
 #define CMD_INV_IOMMU_PAGES_SIZE_MASK  0x01
 #define CMD_INV_IOMMU_PAGES_PDE_MASK   0x02
 #define CMD_INV_IOMMU_PAGES_GN_MASK0x04
+#define CMD_PF_IOMMU_PAGES_INV_MASK0x10
 
 #define PPR_STATUS_MASK0xf
 #define PPR_STATUS_SHIFT   12
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index ba9f3dbc5b94..b3971595b0e9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -976,6 +976,25 @@ static void build_inv_irt(struct iommu_cmd *cmd, u16 devid)
CMD_SET_TYPE(cmd, CMD_INV_IRT);
 }
 
+static void build_pf_iommu_pages(struct iommu_cmd *cmd, u64 address,
+   u16 devid, int pfcnt, bool size,
+   bool inv)
+{
+   memset(cmd, 0, sizeof(*cmd));
+
+   address &= PAGE_MASK;
+
+   cmd->data[0]  = devid;
+   cmd->data[0] |= (pfcnt & 0xff) << 24;
+   cmd->data[2]  = lower_32_bits(address);
+   cmd->data[3]  = upper_32_bits(address;
+   if (size)
+   cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
+   if (inv)
+   cmd->data[2] |= CMD_PF_IOMMU_PAGES_INV_MASK;
+   CMD_SET_TYPE(cmd, CMD_PF_IOMMU_PAGES);
+}
+
 /*
  * Writes the command to the IOMMUs command buffer and informs the
  * hardware about the new command.
-- 
2.16.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/amd: Revise ga_tag member in struct of fields_remap

2020-09-05 Thread Wesley Sheng
Per <>
ga_tag member is only available when IRTE[GuestMode]=1, this field
should be reserved when IRTE[GuestMode]=0. So change the ga_tag
to rsvd_1 in struct of fields_remap.

Signed-off-by: Wesley Sheng 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index e5b05a97eb46..baa31cd2411c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -832,7 +832,7 @@ union irte_ga_lo {
/* -- */
guest_mode  : 1,
destination : 24,
-   ga_tag  : 32;
+   rsvd_1  : 32;
} fields_remap;
 
/* For guest vAPIC */
-- 
2.16.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/amd: Unify reserved member naming convention in struct

2020-09-05 Thread Wesley Sheng
Unify reserved member naming convention to rsvd_x in struct

Signed-off-by: Wesley Sheng 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..e5b05a97eb46 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -841,7 +841,7 @@ union irte_ga_lo {
no_fault: 1,
/* -- */
ga_log_intr : 1,
-   rsvd1   : 3,
+   rsvd_1  : 3,
is_run  : 1,
/* -- */
guest_mode  : 1,
-- 
2.16.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 18/20] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse 
Reviewed-by: Rob Herring 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 503160a7b9a0..3b63f2ae24db 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -28,8 +28,6 @@ properties:
   - enum:
   - qcom,msm8996-smmu-v2
   - qcom,msm8998-smmu-v2
-  - qcom,sc7180-smmu-v2
-  - qcom,sdm845-smmu-v2
   - const: qcom,smmu-v2
 
   - description: Qcom SoCs implementing "arm,mmu-500"
@@ -40,6 +38,13 @@ properties:
   - qcom,sm8150-smmu-500
   - qcom,sm8250-smmu-500
   - const: arm,mmu-500
+  - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+items:
+  - enum:
+  - qcom,sc7180-smmu-v2
+  - qcom,sdm845-smmu-v2
+  - const: qcom,adreno-smmu
+  - const: qcom,smmu-v2
   - description: Marvell SoCs implementing "arm,mmu-500"
 items:
   - const: marvell,ap806-smmu-500
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 19/20] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi | 9 +
 arch/arm64/boot/dts/qcom/sdm845.dtsi   | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
index 64fc1bfd66fa..39f23cdcbd02 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845-cheza.dtsi
@@ -633,6 +633,15 @@ _mdp {
status = "okay";
 };
 
+/*
+ * Cheza fw does not properly program the GPU aperture to allow the
+ * GPU to update the SMMU pagetables for context switches.  Work
+ * around this by dropping the "qcom,adreno-smmu" compat string.
+ */
+_smmu {
+   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+};
+
 _pil {
iommus = <_smmu 0x781 0x0>,
 <_smmu 0x724 0x3>;
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 2884577dcb77..76a8a34640ae 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -4058,7 +4058,7 @@ opp-25700 {
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,sdm845-smmu-v2", "qcom,adreno-smmu", 
"qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 20/20] arm: dts: qcom: sc7180: Set the compatible string for the GPU SMMU

2020-09-05 Thread Rob Clark
From: Rob Clark 

Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.

Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/sc7180.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi 
b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index d46b3833e52f..f3bef1cad889 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -1937,7 +1937,7 @@ opp-18000 {
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sc7180-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,sc7180-smmu-v2", "qcom,adreno-smmu", 
"qcom,smmu-v2";
reg = <0 0x0504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 07/20] drm/msm: Set the global virtual address range from the IOMMU domain

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++--
 drivers/gpu/drm/msm/msm_iommu.c |  7 +++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index a712e1cfcba8..b703e5308b01 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
struct iommu_domain *iommu = iommu_domain_alloc(_bus_type);
struct msm_mmu *mmu = msm_iommu_new(>dev, iommu);
struct msm_gem_address_space *aspace;
+   u64 start, size;
 
-   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-   0x - SZ_16M);
+   /*
+* Use the aperture start or SZ_16M, whichever is greater. This will
+* ensure that we align with the allocated pagetable range while still
+* allowing room in the lower 32 bits for GMEM and whatnot
+*/
+   start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+   size = iommu->geometry.aperture_end - start + 1;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu",
+   start & GENMASK(48, 0), size);
 
if (IS_ERR(aspace) && !IS_ERR(mmu))
mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..1b6635504069 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
struct msm_iommu *iommu = to_msm_iommu(mmu);
size_t ret;
 
+   /* The arm-smmu driver expects the addresses to be sign extended */
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t 
iova, size_t len)
 {
struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+   if (iova & BIT_ULL(48))
+   iova |= GENMASK_ULL(63, 49);
+
iommu_unmap(iommu->domain, iova, len);
 
return 0;
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 09/20] drm/msm: Add support for private address space instances

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Add support for allocating private address space instances. Targets that
support per-context pagetables should implement their own function to
allocate private address spaces.

The default will return a pointer to the global address space.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/msm_drv.c | 13 +++--
 drivers/gpu/drm/msm/msm_drv.h |  5 +
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +
 drivers/gpu/drm/msm/msm_gpu.c | 22 ++
 drivers/gpu/drm/msm/msm_gpu.h |  5 +
 5 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 75cd7639f560..7e963f707852 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -597,7 +597,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu);
file->driver_priv = ctx;
 
return 0;
@@ -780,18 +780,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, 
void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-   struct drm_gem_object *obj, uint64_t *iova)
+   struct drm_file *file, struct drm_gem_object *obj,
+   uint64_t *iova)
 {
-   struct msm_drm_private *priv = dev->dev_private;
+   struct msm_file_private *ctx = file->driver_priv;
 
-   if (!priv->gpu)
+   if (!ctx->aspace)
return -EINVAL;
 
/*
 * Don't pin the memory here - just get an address so that userspace can
 * be productive
 */
-   return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+   return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -830,7 +831,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void 
*data,
args->value = msm_gem_mmap_offset(obj);
break;
case MSM_INFO_GET_IOVA:
-   ret = msm_ioctl_gem_info_iova(dev, obj, >value);
+   ret = msm_ioctl_gem_info_iova(dev, file, obj, >value);
break;
case MSM_INFO_SET_NAME:
/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 4561bfb5e745..2ca9c3c03845 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -249,6 +249,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
@@ -434,6 +438,7 @@ static inline void __msm_file_private_destroy(struct kref 
*kref)
struct msm_file_private *ctx = container_of(kref,
struct msm_file_private, ref);
 
+   msm_gem_address_space_put(ctx->aspace);
kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space 
*aspace)
kref_put(>kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+   if (!IS_ERR_OR_NULL(aspace))
+   kref_get(>kref);
+
+   return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 57532b6b4702..9f1bd17dfa47 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -823,6 +823,28 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
return 0;
 }
 
+/* Return a new address space for a msm_drm_private instance */
+struct msm_gem_address_space *
+msm_gpu_create_private_address_space(struct msm_gpu *gpu)
+{
+   struct msm_gem_address_space *aspace = NULL;
+
+   if (!gpu)
+   return NULL;
+
+   /*
+* If the target doesn't support private address spaces then return
+* the global one
+*/
+   if (gpu->funcs->create_private_address_space)
+   aspace = gpu->funcs->create_private_address_space(gpu);
+
+   if (IS_ERR_OR_NULL(aspace))
+   aspace = msm_gem_address_space_get(gpu->aspace);
+
+   return aspace;

[PATCH v17 06/20] drm/msm: Drop context arg to gpu->submit()

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Now that we can get the ctx from the submitqueue, the extra arg is
redundant.

Signed-off-by: Jordan Crouse 
[split out of previous patch to reduce churny noise]
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 12 +---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 +--
 drivers/gpu/drm/msm/msm_gem_submit.c|  2 +-
 drivers/gpu/drm/msm/msm_gpu.c   |  9 -
 drivers/gpu/drm/msm/msm_gpu.h   |  6 ++
 7 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index ce3c0b5c167b..616d9e798058 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -43,8 +43,7 @@ static void a5xx_flush(struct msm_gpu *gpu, struct 
msm_ringbuffer *ring)
gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
-static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit 
*submit,
-   struct msm_file_private *ctx)
+static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit 
*submit)
 {
struct msm_drm_private *priv = gpu->dev->dev_private;
struct msm_ringbuffer *ring = submit->ring;
@@ -57,7 +56,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
msm_gem_submit *submit
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
@@ -103,8 +102,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct 
msm_gem_submit *submit
msm_gpu_retire(gpu);
 }
 
-static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
@@ -114,7 +112,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
 
if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
priv->lastctx = NULL;
-   a5xx_submit_in_rb(gpu, submit, ctx);
+   a5xx_submit_in_rb(gpu, submit);
return;
}
 
@@ -148,7 +146,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 74bc27eb4203..f6aad038d8b6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,8 +81,7 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
-static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -115,7 +114,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
case MSM_SUBMIT_CMD_IB_TARGET_BUF:
break;
case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-   if (priv->lastctx == ctx)
+   if (priv->lastctx == submit->queue->ctx)
break;
/* fall-thru */
case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 459f10a3710b..a712e1cfcba8 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -434,8 +434,7 @@ void adreno_recover(struct msm_gpu *gpu)
}
 }
 
-void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-   struct msm_file_private *ctx)
+void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -449,7 +448,7 @@ void adreno_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit,
break;
 

[PATCH v17 17/20] iommu/arm-smmu: Add a way for implementations to influence SCTLR

2020-09-05 Thread Rob Clark
From: Rob Clark 

For the Adreno GPU's SMMU, we want SCTLR.HUPCF set to ensure that
pending translations are not terminated on iova fault.  Otherwise
a terminated CP read could hang the GPU by returning invalid
command-stream data.

Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 6 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 3 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.h  | 3 +++
 3 files changed, 12 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 1e942eed2dfc..0663d7d26908 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -129,6 +129,12 @@ static int qcom_adreno_smmu_init_context(struct 
arm_smmu_domain *smmu_domain,
(smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
 
+   /*
+* On the GPU device we want to process subsequent transactions after a
+* fault to keep the GPU from hanging
+*/
+   smmu_domain->cfg.sctlr_set |= ARM_SMMU_SCTLR_HUPCF;
+
/*
 * Initialize private interface with GPU:
 */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..1f06ab219819 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -617,6 +617,9 @@ void arm_smmu_write_context_bank(struct arm_smmu_device 
*smmu, int idx)
if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN))
reg |= ARM_SMMU_SCTLR_E;
 
+   reg |= cfg->sctlr_set;
+   reg &= ~cfg->sctlr_clr;
+
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 6c5ffeae..ddf2ca4c923d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -144,6 +144,7 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_SCTLR  0x0
 #define ARM_SMMU_SCTLR_S1_ASIDPNE  BIT(12)
 #define ARM_SMMU_SCTLR_CFCFG   BIT(7)
+#define ARM_SMMU_SCTLR_HUPCF   BIT(8)
 #define ARM_SMMU_SCTLR_CFIEBIT(6)
 #define ARM_SMMU_SCTLR_CFREBIT(5)
 #define ARM_SMMU_SCTLR_E   BIT(4)
@@ -341,6 +342,8 @@ struct arm_smmu_cfg {
u16 asid;
u16 vmid;
};
+   u32 sctlr_set;/* extra bits to set in 
SCTLR */
+   u32 sctlr_clr;/* bits to mask in SCTLR 
*/
enum arm_smmu_cbar_type cbar;
enum arm_smmu_context_fmt   fmt;
 };
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 08/20] drm/msm: Add support to create a local pagetable

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Add support to create a io-pgtable for use by targets that support
per-instance pagetables. In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables enabled.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/Kconfig  |   1 +
 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 199 ++-
 drivers/gpu/drm/msm/msm_mmu.h|  16 ++-
 4 files changed, 215 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 6deaa7d01654..5102a58830b9 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -8,6 +8,7 @@ config DRM_MSM
depends on MMU
depends on INTERCONNECT || !INTERCONNECT
depends on QCOM_OCMEM || QCOM_OCMEM=n
+   select IOMMU_IO_PGTABLE
select QCOM_MDT_LOADER if ARCH_QCOM
select REGULATOR
select DRM_KMS_HELPER
diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct 
msm_gpu *gpu)
}
 
gpummu->gpu = gpu;
-   msm_mmu_init(>base, dev, );
+   msm_mmu_init(>base, dev, , MSM_MMU_GPUMMU);
 
return >base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..697cc0a059d6 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,210 @@
  * Author: Rob Clark 
  */
 
+#include 
+#include 
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
struct msm_mmu base;
struct iommu_domain *domain;
+   atomic_t pagetables;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+   struct msm_mmu base;
+   struct msm_mmu *parent;
+   struct io_pgtable_ops *pgtbl_ops;
+   phys_addr_t ttbr;
+   u32 asid;
+};
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+   return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+   size_t size)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   size_t unmapped = 0;
+
+   /* Unmap the block one page at a time */
+   while (size) {
+   unmapped += ops->unmap(ops, iova, 4096, NULL);
+   iova += 4096;
+   size -= 4096;
+   }
+
+   iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+   return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+   struct sg_table *sgt, size_t len, int prot)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+   struct scatterlist *sg;
+   size_t mapped = 0;
+   u64 addr = iova;
+   unsigned int i;
+
+   for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+   size_t size = sg->length;
+   phys_addr_t phys = sg_phys(sg);
+
+   /* Map the block one page at a time */
+   while (size) {
+   if (ops->map(ops, addr, phys, 4096, prot, GFP_KERNEL)) {
+   msm_iommu_pagetable_unmap(mmu, iova, mapped);
+   return -EINVAL;
+   }
+
+   phys += 4096;
+   addr += 4096;
+   size -= 4096;
+   mapped += 4096;
+   }
+   }
+
+   return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+   struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+   struct msm_iommu *iommu = to_msm_iommu(pagetable->parent);
+   struct adreno_smmu_priv *adreno_smmu =
+   dev_get_drvdata(pagetable->parent->dev);
+
+   /*
+* If this is the last attached pagetable for the parent,
+* disable TTBR0 in the arm-smmu driver
+*/
+   if (atomic_dec_return(>pagetables) == 0)
+   adreno_smmu->set_ttbr0_cfg(adreno_smmu->cookie, NULL);
+
+   free_io_pgtable_ops(pagetable->pgtbl_ops);
+   kfree(pagetable);
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+   phys_addr_t *ttbr, int *asid)
+{
+   struct msm_iommu_pagetable *pagetable;
+
+   if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+   return -EINVAL;
+
+   pagetable = to_pagetable(mmu);
+
+   if (ttbr)
+   *ttbr = pagetable->ttbr;
+
+   if (asid)
+   *asid = 

[PATCH v17 13/20] iommu/arm-smmu: Add support for split pagetables

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 +++
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 25 +++--
 2 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 37d8d49299b4..8e884e58f208 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -552,11 +552,15 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
cb->ttbr[1] = 0;
} else {
-   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
- cfg->asid);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+cfg->asid);
cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
 cfg->asid);
+
+   if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+   cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   else
+   cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
}
} else {
cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -822,7 +826,14 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
 
/* Update the domain's page sizes to reflect the page table format */
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+
+   if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   domain->geometry.aperture_start = ~0UL << ias;
+   domain->geometry.aperture_end = ~0UL;
+   } else {
+   domain->geometry.aperture_end = (1UL << ias) - 1;
+   }
+
domain->geometry.force_aperture = true;
 
/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 83294516ac08..f3e456893f28 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -169,10 +169,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR0x30
 #define ARM_SMMU_TCR_EAE   BIT(31)
 #define ARM_SMMU_TCR_EPD1  BIT(23)
+#define ARM_SMMU_TCR_A1BIT(22)
 #define ARM_SMMU_TCR_TG0   GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0   GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0 GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0 GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0  BIT(7)
 #define ARM_SMMU_TCR_T0SZ  GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1 BIT(31)
@@ -350,12 +352,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-   return ARM_SMMU_TCR_EPD1 |
-  FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-  FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-  FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-  FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-  FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+   u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+   FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+   FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+   FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+   FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+   /*
+   * When TTBR1 is selected shift the TCR fields by 16 bits and disable
+   * translation in TTBR0
+   */
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+   tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+   tcr |= ARM_SMMU_TCR_EPD0;
+   } else
+   tcr |= ARM_SMMU_TCR_EPD1;
+
+   return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 14/20] iommu/arm-smmu: Prepare for the adreno-smmu implementation

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Do a bit of prep work to add the upcoming adreno-smmu implementation.

Add an hook to allow the implementation to choose which context banks
to allocate.

Move some of the common structs to arm-smmu.h in anticipation of them
being used by the implementations and update some of the existing hooks
to pass more information that the implementation will need.

These modifications will be used by the upcoming Adreno SMMU
implementation to identify the GPU device and properly configure it
for pagetable switching.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 74 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.h  | 52 ++-
 3 files changed, 73 insertions(+), 55 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index a9861dcd0884..88f17cc33023 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -69,7 +69,7 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 }
 
 static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
-   struct io_pgtable_cfg *pgtbl_cfg)
+   struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 8e884e58f208..dad7fa86fbd4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -65,41 +65,10 @@ module_param(disable_bypass, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_bypass,
"Disable bypass streams such that incoming transactions from devices 
that are not attached to an iommu domain will report an abort back to the 
device and will not be allowed to pass through the SMMU.");
 
-struct arm_smmu_s2cr {
-   struct iommu_group  *group;
-   int count;
-   enum arm_smmu_s2cr_type type;
-   enum arm_smmu_s2cr_privcfg  privcfg;
-   u8  cbndx;
-};
-
 #define s2cr_init_val (struct arm_smmu_s2cr){  \
.type = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS,\
 }
 
-struct arm_smmu_smr {
-   u16 mask;
-   u16 id;
-   boolvalid;
-};
-
-struct arm_smmu_cb {
-   u64 ttbr[2];
-   u32 tcr[2];
-   u32 mair[2];
-   struct arm_smmu_cfg *cfg;
-};
-
-struct arm_smmu_master_cfg {
-   struct arm_smmu_device  *smmu;
-   s16 smendx[];
-};
-#define INVALID_SMENDX -1
-#define cfg_smendx(cfg, fw, i) \
-   (i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
-#define for_each_cfg_sme(cfg, fw, i, idx) \
-   for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -234,19 +203,6 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
 }
 #endif /* CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS */
 
-static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
-{
-   int idx;
-
-   do {
-   idx = find_next_zero_bit(map, end, start);
-   if (idx == end)
-   return -ENOSPC;
-   } while (test_and_set_bit(idx, map));
-
-   return idx;
-}
-
 static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 {
clear_bit(idx, map);
@@ -578,7 +534,7 @@ static void arm_smmu_init_context_bank(struct 
arm_smmu_domain *smmu_domain,
}
 }
 
-static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
+void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
 {
u32 reg;
bool stage1;
@@ -664,8 +620,19 @@ static void arm_smmu_write_context_bank(struct 
arm_smmu_device *smmu, int idx)
arm_smmu_cb_write(smmu, idx, ARM_SMMU_CB_SCTLR, reg);
 }
 
+static int arm_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
+  struct arm_smmu_device *smmu,
+  struct device *dev, unsigned int start)
+{
+   if (smmu->impl && smmu->impl->alloc_context_bank)
+   return smmu->impl->alloc_context_bank(smmu_domain, smmu, dev, 
start);
+
+   return __arm_smmu_alloc_bitmap(smmu->context_map, start, 
smmu->num_context_banks);
+}
+
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-   

[PATCH v17 10/20] drm/msm/a6xx: Add support for per-instance pagetables

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Akhil P Oommen 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 62 +++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 3 files changed, 64 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index f6aad038d8b6..92ebc73f51e6 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,6 +81,49 @@ static void get_stats_counter(struct msm_ringbuffer *ring, 
u32 counter,
OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
+   struct msm_ringbuffer *ring, struct msm_file_private *ctx)
+{
+   phys_addr_t ttbr;
+   u32 asid;
+   u64 memptr = rbmemptr(ring, ttbr0);
+
+   if (ctx == a6xx_gpu->cur_ctx)
+   return;
+
+   if (msm_iommu_pagetable_params(ctx->aspace->mmu, , ))
+   return;
+
+   /* Execute the table update */
+   OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_0_TTBR0_LO(lower_32_bits(ttbr)));
+
+   OUT_RING(ring,
+   CP_SMMU_TABLE_UPDATE_1_TTBR0_HI(upper_32_bits(ttbr)) |
+   CP_SMMU_TABLE_UPDATE_1_ASID(asid));
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_2_CONTEXTIDR(0));
+   OUT_RING(ring, CP_SMMU_TABLE_UPDATE_3_CONTEXTBANK(0));
+
+   /*
+* Write the new TTBR0 to the memstore. This is good for debugging.
+*/
+   OUT_PKT7(ring, CP_MEM_WRITE, 4);
+   OUT_RING(ring, CP_MEM_WRITE_0_ADDR_LO(lower_32_bits(memptr)));
+   OUT_RING(ring, CP_MEM_WRITE_1_ADDR_HI(upper_32_bits(memptr)));
+   OUT_RING(ring, lower_32_bits(ttbr));
+   OUT_RING(ring, (asid << 16) | upper_32_bits(ttbr));
+
+   /*
+* And finally, trigger a uche flush to be sure there isn't anything
+* lingering in that part of the GPU
+*/
+
+   OUT_PKT7(ring, CP_EVENT_WRITE, 1);
+   OUT_RING(ring, 0x31);
+
+   a6xx_gpu->cur_ctx = ctx;
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
@@ -90,6 +133,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct 
msm_gem_submit *submit)
struct msm_ringbuffer *ring = submit->ring;
unsigned int i;
 
+   a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
+
get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -704,6 +749,8 @@ static int a6xx_hw_init(struct msm_gpu *gpu)
/* Always come up on rb 0 */
a6xx_gpu->cur_ring = gpu->rb[0];
 
+   a6xx_gpu->cur_ctx = NULL;
+
/* Enable the SQE_to start the CP engine */
gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
 
@@ -1016,6 +1063,20 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_private_address_space(struct msm_gpu *gpu)
+{
+   struct msm_mmu *mmu;
+
+   mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+
+   if (IS_ERR(mmu))
+   return ERR_CAST(mmu);
+
+   return msm_gem_address_space_create(mmu,
+   "gpu", 0x1ULL, 0x1ULL);
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -1039,6 +1100,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_state_put = a6xx_gpu_state_put,
 #endif
.create_address_space = adreno_iommu_create_address_space,
+   .create_private_address_space = 
a6xx_create_private_address_space,
},
.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 03ba60d5b07f..da22d7549d9b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -19,6 +19,7 @@ struct a6xx_gpu {
uint64_t sqe_iova;
 
struct msm_ringbuffer *cur_ring;
+   struct msm_file_private *cur_ctx;
 
struct a6xx_gmu gmu;
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h 
b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
volatile uint32_t fence;
 
volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+   volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 16/20] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-smmu compatible string.

The new Adreno SMMU implementation will enable split pagetables
(TTBR1) for the domain attached to the GPU device (SID 0) and
hard code it context bank 0 so the GPU hardware can implement
per-instance pagetables.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |   3 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 151 -
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |   1 +
 3 files changed, 153 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 88f17cc33023..d199b4bff15d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -223,6 +223,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+   return qcom_adreno_smmu_impl_init(smmu);
+
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = _mmu500_impl;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..1e942eed2dfc 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2019, The Linux Foundation. All rights reserved.
  */
 
+#include 
 #include 
 #include 
 
@@ -12,6 +13,134 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+#define QCOM_ADRENO_SMMU_GPU_SID 0
+
+static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   int i;
+
+   /*
+* The GPU will always use SID 0 so that is a handy way to uniquely
+* identify it and configure it for per-instance pagetables
+*/
+   for (i = 0; i < fwspec->num_ids; i++) {
+   u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
+
+   if (sid == QCOM_ADRENO_SMMU_GPU_SID)
+   return true;
+   }
+
+   return false;
+}
+
+static const struct io_pgtable_cfg *qcom_adreno_smmu_get_ttbr1_cfg(
+   const void *cookie)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct io_pgtable *pgtable =
+   io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+   return >cfg;
+}
+
+/*
+ * Local implementation to configure TTBR0 with the specified pagetable config.
+ * The GPU driver will call this to enable TTBR0 when per-instance pagetables
+ * are active
+ */
+
+static int qcom_adreno_smmu_set_ttbr0_cfg(const void *cookie,
+   const struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_domain *smmu_domain = (void *)cookie;
+   struct io_pgtable *pgtable = 
io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+   struct arm_smmu_cfg *cfg = _domain->cfg;
+   struct arm_smmu_cb *cb = _domain->smmu->cbs[cfg->cbndx];
+
+   /* The domain must have split pagetables already enabled */
+   if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
+   return -EINVAL;
+
+   /* If the pagetable config is NULL, disable TTBR0 */
+   if (!pgtbl_cfg) {
+   /* Do nothing if it is already disabled */
+   if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+   return -EINVAL;
+
+   /* Set TCR to the original configuration */
+   cb->tcr[0] = arm_smmu_lpae_tcr(>cfg);
+   cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+   } else {
+   u32 tcr = cb->tcr[0];
+
+   /* Don't call this again if TTBR0 is already enabled */
+   if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+   return -EINVAL;
+
+   tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+   tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+   cb->tcr[0] = tcr;
+   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+   }
+
+   arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
+
+   return 0;
+}
+
+static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain 
*smmu_domain,
+  struct arm_smmu_device *smmu,
+  struct device *dev, int start)
+{
+   int count;
+
+   /*
+* Assign context bank 0 to the GPU device so the GPU hardware can
+* switch pagetables
+*/
+   if (qcom_adreno_smmu_is_gpu_device(dev)) {
+

[PATCH v17 12/20] iommu/arm-smmu: Pass io-pgtable config to implementation specific function

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index f4ff124a1967..a9861dcd0884 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 09c42af9f31e..37d8d49299b4 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -795,11 +795,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -810,6 +805,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, _cfg);
+   if (ret)
+   goto out_clear_smmu;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index d890a4a968e8..83294516ac08 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -386,7 +386,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
int (*def_domain_type)(struct device *dev);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 11/20] drm/msm: Show process names in gem_describe

2020-09-05 Thread Rob Clark
From: Rob Clark 

In $debugfs/gem we already show any vma(s) associated with an object.
Also show process names if the vma's address space is a per-process
address space.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/msm_drv.c |  2 +-
 drivers/gpu/drm/msm/msm_gem.c | 25 +
 drivers/gpu/drm/msm/msm_gem.h |  5 +
 drivers/gpu/drm/msm/msm_gem_vma.c |  1 +
 drivers/gpu/drm/msm/msm_gpu.c |  8 +---
 drivers/gpu/drm/msm/msm_gpu.h |  2 +-
 6 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 7e963f707852..7143756b7e83 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -597,7 +597,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
-   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu);
+   ctx->aspace = msm_gpu_create_private_address_space(priv->gpu, current);
file->driver_priv = ctx;
 
return 0;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 3cb7aeb93fd3..76a6c5271e57 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -842,11 +842,28 @@ void msm_gem_describe(struct drm_gem_object *obj, struct 
seq_file *m)
 
seq_puts(m, "  vmas:");
 
-   list_for_each_entry(vma, _obj->vmas, list)
-   seq_printf(m, " [%s: %08llx,%s,inuse=%d]",
-   vma->aspace != NULL ? vma->aspace->name : NULL,
-   vma->iova, vma->mapped ? "mapped" : "unmapped",
+   list_for_each_entry(vma, _obj->vmas, list) {
+   const char *name, *comm;
+   if (vma->aspace) {
+   struct msm_gem_address_space *aspace = 
vma->aspace;
+   struct task_struct *task =
+   get_pid_task(aspace->pid, PIDTYPE_PID);
+   if (task) {
+   comm = kstrdup(task->comm, GFP_KERNEL);
+   } else {
+   comm = NULL;
+   }
+   name = aspace->name;
+   } else {
+   name = comm = NULL;
+   }
+   seq_printf(m, " [%s%s%s: aspace=%p, 
%08llx,%s,inuse=%d]",
+   name, comm ? ":" : "", comm ? comm : "",
+   vma->aspace, vma->iova,
+   vma->mapped ? "mapped" : "unmapped",
vma->inuse);
+   kfree(comm);
+   }
 
seq_puts(m, "\n");
}
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 9c573c4269cb..7b1c7a5f8eef 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -24,6 +24,11 @@ struct msm_gem_address_space {
spinlock_t lock; /* Protects drm_mm node allocation/removal */
struct msm_mmu *mmu;
struct kref kref;
+
+   /* For address spaces associated with a specific process, this
+* will be non-NULL:
+*/
+   struct pid *pid;
 };
 
 struct msm_gem_vma {
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 29cc1305cf37..80a8a266d68f 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -17,6 +17,7 @@ msm_gem_address_space_destroy(struct kref *kref)
drm_mm_takedown(>mm);
if (aspace->mmu)
aspace->mmu->funcs->destroy(aspace->mmu);
+   put_pid(aspace->pid);
kfree(aspace);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 9f1bd17dfa47..59eed0fb12fc 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -825,10 +825,9 @@ static int get_clocks(struct platform_device *pdev, struct 
msm_gpu *gpu)
 
 /* Return a new address space for a msm_drm_private instance */
 struct msm_gem_address_space *
-msm_gpu_create_private_address_space(struct msm_gpu *gpu)
+msm_gpu_create_private_address_space(struct msm_gpu *gpu, struct task_struct 
*task)
 {
struct msm_gem_address_space *aspace = NULL;
-
if (!gpu)
return NULL;
 
@@ -836,8 +835,11 @@ msm_gpu_create_private_address_space(struct msm_gpu *gpu)
 * If the target doesn't support private address spaces then return
 * the global one
 */
-   if (gpu->funcs->create_private_address_space)
+   if (gpu->funcs->create_private_address_space) {
aspace = gpu->funcs->create_private_address_space(gpu);
+   if 

[PATCH v17 15/20] iommu/arm-smmu: Constify some helpers

2020-09-05 Thread Rob Clark
From: Rob Clark 

Sprinkle a few `const`s where helpers don't need write access.

Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 9aaacc906597..1a746476927c 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -377,7 +377,7 @@ struct arm_smmu_master_cfg {
s16 smendx[];
 };
 
-static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_tcr(const struct io_pgtable_cfg *cfg)
 {
u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
@@ -398,13 +398,13 @@ static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg 
*cfg)
return tcr;
 }
 
-static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_tcr2(const struct io_pgtable_cfg *cfg)
 {
return FIELD_PREP(ARM_SMMU_TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
   FIELD_PREP(ARM_SMMU_TCR2_SEP, ARM_SMMU_TCR2_SEP_UPSTREAM);
 }
 
-static inline u32 arm_smmu_lpae_vtcr(struct io_pgtable_cfg *cfg)
+static inline u32 arm_smmu_lpae_vtcr(const struct io_pgtable_cfg *cfg)
 {
return ARM_SMMU_VTCR_RES1 |
   FIELD_PREP(ARM_SMMU_VTCR_PS, cfg->arm_lpae_s2_cfg.vtcr.ps) |
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 02/20] drm/msm: Add private interface for adreno-smmu

2020-09-05 Thread Rob Clark
From: Rob Clark 

This interface will be used for drm/msm to coordinate with the
qcom_adreno_smmu_impl to enable/disable TTBR0 translation.

Once TTBR0 translation is enabled, the GPU's CP (Command Processor)
will directly switch TTBR0 pgtables (and do the necessary TLB inv)
synchronized to the GPU's operation.  But help from the SMMU driver
is needed to initially bootstrap TTBR0 translation, which cannot be
done from the GPU.

Since this is a very special case, a private interface is used to
avoid adding highly driver specific things to the public iommu
interface.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
Reviewed-by: Bjorn Andersson 
---
 include/linux/adreno-smmu-priv.h | 36 
 1 file changed, 36 insertions(+)
 create mode 100644 include/linux/adreno-smmu-priv.h

diff --git a/include/linux/adreno-smmu-priv.h b/include/linux/adreno-smmu-priv.h
new file mode 100644
index ..a889f28afb42
--- /dev/null
+++ b/include/linux/adreno-smmu-priv.h
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google, Inc
+ */
+
+#ifndef __ADRENO_SMMU_PRIV_H
+#define __ADRENO_SMMU_PRIV_H
+
+#include 
+
+/**
+ * struct adreno_smmu_priv - private interface between adreno-smmu and GPU
+ *
+ * @cookie:An opque token provided by adreno-smmu and passed
+ * back into the callbacks
+ * @get_ttbr1_cfg: Get the TTBR1 config for the GPUs context-bank
+ * @set_ttbr0_cfg: Set the TTBR0 config for the GPUs context bank.  A
+ * NULL config disables TTBR0 translation, otherwise
+ * TTBR0 translation is enabled with the specified cfg
+ *
+ * The GPU driver (drm/msm) and adreno-smmu work together for controlling
+ * the GPU's SMMU instance.  This is by necessity, as the GPU is directly
+ * updating the SMMU for context switches, while on the other hand we do
+ * not want to duplicate all of the initial setup logic from arm-smmu.
+ *
+ * This private interface is used for the two drivers to coordinate.  The
+ * cookie and callback functions are populated when the GPU driver attaches
+ * it's domain.
+ */
+struct adreno_smmu_priv {
+const void *cookie;
+const struct io_pgtable_cfg *(*get_ttbr1_cfg)(const void *cookie);
+int (*set_ttbr0_cfg)(const void *cookie, const struct io_pgtable_cfg *cfg);
+};
+
+#endif /* __ADRENO_SMMU_PRIV_H */
\ No newline at end of file
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 03/20] drm/msm/gpu: Add dev_to_gpu() helper

2020-09-05 Thread Rob Clark
From: Rob Clark 

In a later patch, the drvdata will not directly be 'struct msm_gpu *',
so add a helper to reduce the churn.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 10 --
 drivers/gpu/drm/msm/msm_gpu.c  |  6 +++---
 drivers/gpu/drm/msm/msm_gpu.h  |  5 +
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 9eeb46bf2a5d..26664e1b30c0 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -282,7 +282,7 @@ struct msm_gpu *adreno_load_gpu(struct drm_device *dev)
int ret;
 
if (pdev)
-   gpu = platform_get_drvdata(pdev);
+   gpu = dev_to_gpu(>dev);
 
if (!gpu) {
dev_err_once(dev->dev, "no GPU device was found\n");
@@ -425,7 +425,7 @@ static int adreno_bind(struct device *dev, struct device 
*master, void *data)
 static void adreno_unbind(struct device *dev, struct device *master,
void *data)
 {
-   struct msm_gpu *gpu = dev_get_drvdata(dev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
pm_runtime_force_suspend(dev);
gpu->funcs->destroy(gpu);
@@ -490,16 +490,14 @@ static const struct of_device_id dt_match[] = {
 #ifdef CONFIG_PM
 static int adreno_resume(struct device *dev)
 {
-   struct platform_device *pdev = to_platform_device(dev);
-   struct msm_gpu *gpu = platform_get_drvdata(pdev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
return gpu->funcs->pm_resume(gpu);
 }
 
 static int adreno_suspend(struct device *dev)
 {
-   struct platform_device *pdev = to_platform_device(dev);
-   struct msm_gpu *gpu = platform_get_drvdata(pdev);
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
return gpu->funcs->pm_suspend(gpu);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 57ddc9438351..4c67aedc5c33 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -24,7 +24,7 @@
 static int msm_devfreq_target(struct device *dev, unsigned long *freq,
u32 flags)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
struct dev_pm_opp *opp;
 
opp = devfreq_recommended_opp(dev, freq, flags);
@@ -45,7 +45,7 @@ static int msm_devfreq_target(struct device *dev, unsigned 
long *freq,
 static int msm_devfreq_get_dev_status(struct device *dev,
struct devfreq_dev_status *status)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
ktime_t time;
 
if (gpu->funcs->gpu_get_freq)
@@ -64,7 +64,7 @@ static int msm_devfreq_get_dev_status(struct device *dev,
 
 static int msm_devfreq_get_cur_freq(struct device *dev, unsigned long *freq)
 {
-   struct msm_gpu *gpu = platform_get_drvdata(to_platform_device(dev));
+   struct msm_gpu *gpu = dev_to_gpu(dev);
 
if (gpu->funcs->gpu_get_freq)
*freq = gpu->funcs->gpu_get_freq(gpu);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 37cffac4cbe3..da1ae2263047 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -144,6 +144,11 @@ struct msm_gpu {
bool hw_apriv;
 };
 
+static inline struct msm_gpu *dev_to_gpu(struct device *dev)
+{
+   return dev_get_drvdata(dev);
+}
+
 /* It turns out that all targets use the same ringbuffer size */
 #define MSM_GPU_RINGBUFFER_SZ SZ_32K
 #define MSM_GPU_RINGBUFFER_BLKSIZE 32
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 05/20] drm/msm: Add a context pointer to the submitqueue

2020-09-05 Thread Rob Clark
From: Jordan Crouse 

Each submitqueue is attached to a context. Add a pointer to the
context to the submitqueue at create time and refcount it so
that it stays around through the life of the queue.

Co-developed-by: Rob Clark 
Signed-off-by: Jordan Crouse 
Signed-off-by: Rob Clark 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/msm_drv.c |  3 ++-
 drivers/gpu/drm/msm/msm_drv.h | 20 
 drivers/gpu/drm/msm/msm_gem.h |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c  |  6 +++---
 drivers/gpu/drm/msm/msm_gpu.h |  1 +
 drivers/gpu/drm/msm/msm_submitqueue.c |  3 +++
 6 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 79333842f70a..75cd7639f560 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -594,6 +594,7 @@ static int context_init(struct drm_device *dev, struct 
drm_file *file)
if (!ctx)
return -ENOMEM;
 
+   kref_init(>ref);
msm_submitqueue_init(dev, ctx);
 
ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
@@ -615,7 +616,7 @@ static int msm_open(struct drm_device *dev, struct drm_file 
*file)
 static void context_close(struct msm_file_private *ctx)
 {
msm_submitqueue_close(ctx);
-   kfree(ctx);
+   msm_file_private_put(ctx);
 }
 
 static void msm_postclose(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index af259b0573ea..4561bfb5e745 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -57,6 +57,7 @@ struct msm_file_private {
struct list_head submitqueues;
int queueid;
struct msm_gem_address_space *aspace;
+   struct kref ref;
 };
 
 enum msm_mdp_plane_property {
@@ -428,6 +429,25 @@ void msm_submitqueue_close(struct msm_file_private *ctx);
 
 void msm_submitqueue_destroy(struct kref *kref);
 
+static inline void __msm_file_private_destroy(struct kref *kref)
+{
+   struct msm_file_private *ctx = container_of(kref,
+   struct msm_file_private, ref);
+
+   kfree(ctx);
+}
+
+static inline void msm_file_private_put(struct msm_file_private *ctx)
+{
+   kref_put(>ref, __msm_file_private_destroy);
+}
+
+static inline struct msm_file_private *msm_file_private_get(
+   struct msm_file_private *ctx)
+{
+   kref_get(>ref);
+   return ctx;
+}
 
 #define DBG(fmt, ...) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
 #define VERB(fmt, ...) if (0) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 972490b14ba5..9c573c4269cb 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -142,6 +142,7 @@ struct msm_gem_submit {
bool valid; /* true if no cmdstream patching needed */
bool in_rb; /* "sudo" mode, copy cmds into RB */
struct msm_ringbuffer *ring;
+   struct msm_file_private *ctx;
unsigned int nr_cmds;
unsigned int nr_bos;
u32 ident; /* A "identifier" for the submit for logging */
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 8cb9aa15ff90..1464b04d25d3 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -27,7 +27,7 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-   struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+   struct msm_gpu *gpu,
struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
uint32_t nr_cmds)
 {
@@ -43,7 +43,7 @@ static struct msm_gem_submit *submit_create(struct drm_device 
*dev,
return NULL;
 
submit->dev = dev;
-   submit->aspace = aspace;
+   submit->aspace = queue->ctx->aspace;
submit->gpu = gpu;
submit->fence = NULL;
submit->cmd = (void *)>bos[nr_bos];
@@ -677,7 +677,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
}
}
 
-   submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
+   submit = submit_create(dev, gpu, queue, args->nr_bos,
args->nr_cmds);
if (!submit) {
ret = -ENOMEM;
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 1f65aec57a8f..c4ce462c30c5 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -193,6 +193,7 @@ struct msm_gpu_submitqueue {
u32 flags;
u32 prio;
int faults;
+   struct msm_file_private *ctx;
struct list_head node;
struct kref ref;
 };
diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
b/drivers/gpu/drm/msm/msm_submitqueue.c
index 90c9d84e6155..c3d206105d28 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c

[PATCH v17 01/20] drm/msm: Remove dangling submitqueue references

2020-09-05 Thread Rob Clark
From: Rob Clark 

Currently it doesn't matter, since we free the ctx immediately.  But
when we start refcnt'ing the ctx, we don't want old dangling list
entries to hang around.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/msm_submitqueue.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c 
b/drivers/gpu/drm/msm/msm_submitqueue.c
index a1d94be7883a..90c9d84e6155 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -49,8 +49,10 @@ void msm_submitqueue_close(struct msm_file_private *ctx)
 * No lock needed in close and there won't
 * be any more user ioctls coming our way
 */
-   list_for_each_entry_safe(entry, tmp, >submitqueues, node)
+   list_for_each_entry_safe(entry, tmp, >submitqueues, node) {
+   list_del(>node);
msm_submitqueue_put(entry);
+   }
 }
 
 int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private 
*ctx,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v17 00/20] iommu/arm-smmu + drm/msm: per-process GPU pgtables

2020-09-05 Thread Rob Clark
From: Rob Clark 

NOTE: I have re-ordered the series, and propose that we could merge this
  series in the following order:

   1) 01-11 - merge via drm / msm-next
   2) 12-15 - merge via iommu, no dependency on msm-next pull req
   3) 16-18 - patch 16 has a dependency on 02 and 04, so it would
  need to come post -rc1 or on following cycle, but I
  think it would be unlikely to conflict with other
  arm-smmu patches (other than Bjorn's smmu handover
  series?)
   4) 19-20 - dt bits should be safe to land in any order without
  breaking anything



This series adds an Adreno SMMU implementation to arm-smmu to allow GPU hardware
pagetable switching.

The Adreno GPU has built in capabilities to switch the TTBR0 pagetable during
runtime to allow each individual instance or application to have its own
pagetable.  In order to take advantage of the HW capabilities there are certain
requirements needed of the SMMU hardware.

This series adds support for an Adreno specific arm-smmu implementation. The new
implementation 1) ensures that the GPU domain is always assigned context bank 0,
2) enables split pagetable support (TTBR1) so that the instance specific
pagetable can be swapped while the global memory remains in place and 3) shares
the current pagetable configuration with the GPU driver to allow it to create
its own io-pgtable instances.

The series then adds the drm/msm code to enable these features. For targets that
support it allocate new pagetables using the io-pgtable configuration shared by
the arm-smmu driver and swap them in during runtime.

This version of the series merges the previous patchset(s) [1] and [2]
with the following improvements:

v17: (Respin by Rob)
  - Squash cleanup from Bjorn into 14/20
  - Small fix in 10/20 for issue found in testing
v16: (Respin by Rob)
  - Fix indentation
  - Re-order series to split drm and iommu parts
v15: (Respin by Rob)
  - Adjust dt bindings to keep SoC specific compatible (Doug)
  - Add dts workaround for cheza fw limitation
  - Add missing 'select IOMMU_IO_PGTABLE' (Guenter)
v14: (Respin by Rob)
  - Minor update to 16/20 (only force ASID to zero in one place)
  - Addition of sc7180 dtsi patch.
v13: (Respin by Rob)
  - Switch to a private interface between adreno-smmu and GPU driver,
dropping the custom domain attr (Will Deacon)
  - Rework the SCTLR.HUPCF patch to add new fields in smmu_domain->cfg
rather than adding new impl hook (Will Deacon)
  - Drop for_each_cfg_sme() in favor of plain for() loop (Will Deacon)
  - Fix context refcnt'ing issue which was causing problems with GPU
crash recover stress testing.
  - Spiff up $debugfs/gem to show process information associated with
VMAs
v12:
  - Nitpick cleanups in gpu/drm/msm/msm_iommu.c (Rob Clark)
  - Reorg in gpu/drm/msm/msm_gpu.c (Rob Clark)
  - Use the default asid for the context bank so that iommu_tlb_flush_all works
  - Flush the UCHE after a page switch
  - Add the SCTLR.HUPCF patch at the end of the series
v11:
  - Add implementation specific get_attr/set_attr functions (per Rob Clark)
  - Fix context bank allocation (per Bjorn Andersson)
v10:
  - arm-smmu: add implementation hook to allocate context banks
  - arm-smmu: Match the GPU domain by stream ID instead of compatible string
  - arm-smmu: Make DOMAIN_ATTR_PGTABLE_CFG bi-directional. The leaf driver
queries the configuration to create a pagetable and then sends the newly
created configuration back to the smmu-driver to enable TTBR0
  - drm/msm: Add context reference counting for submissions
  - drm/msm: Use dummy functions to skip TLB operations on per-instance
pagetables

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045653.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045659.html

Jordan Crouse (12):
  drm/msm: Add a context pointer to the submitqueue
  drm/msm: Drop context arg to gpu->submit()
  drm/msm: Set the global virtual address range from the IOMMU domain
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for private address space instances
  drm/msm/a6xx: Add support for per-instance pagetables
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
function
  iommu/arm-smmu: Add support for split pagetables
  iommu/arm-smmu: Prepare for the adreno-smmu implementation
  iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

Rob Clark (8):
  drm/msm: Remove dangling submitqueue references
  drm/msm: Add private interface for adreno-smmu
  drm/msm/gpu: Add dev_to_gpu() helper
  drm/msm: Set adreno_smmu as gpu's drvdata
  drm/msm: Show process names in gem_describe
  iommu/arm-smmu: Constify some helpers
  iommu/arm-smmu: Add a way for implementations to influence SCTLR
  arm: dts: 

[PATCH v17 04/20] drm/msm: Set adreno_smmu as gpu's drvdata

2020-09-05 Thread Rob Clark
From: Rob Clark 

This will be populated by adreno-smmu, to provide a way for coordinating
enabling/disabling TTBR0 translation.

Signed-off-by: Rob Clark 
Reviewed-by: Jordan Crouse 
Reviewed-by: Bjorn Andersson 
---
 drivers/gpu/drm/msm/adreno/adreno_device.c | 2 --
 drivers/gpu/drm/msm/msm_gpu.c  | 2 +-
 drivers/gpu/drm/msm/msm_gpu.h  | 6 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c 
b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 26664e1b30c0..58e03b20e1c7 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -417,8 +417,6 @@ static int adreno_bind(struct device *dev, struct device 
*master, void *data)
return PTR_ERR(gpu);
}
 
-   dev_set_drvdata(dev, gpu);
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 4c67aedc5c33..144dd63e747e 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -892,7 +892,7 @@ int msm_gpu_init(struct drm_device *drm, struct 
platform_device *pdev,
gpu->gpu_cx = NULL;
 
gpu->pdev = pdev;
-   platform_set_drvdata(pdev, gpu);
+   platform_set_drvdata(pdev, >adreno_smmu);
 
msm_devfreq_init(gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index da1ae2263047..1f65aec57a8f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -7,6 +7,7 @@
 #ifndef __MSM_GPU_H__
 #define __MSM_GPU_H__
 
+#include 
 #include 
 #include 
 #include 
@@ -74,6 +75,8 @@ struct msm_gpu {
struct platform_device *pdev;
const struct msm_gpu_funcs *funcs;
 
+   struct adreno_smmu_priv adreno_smmu;
+
/* performance counters (hw & sw): */
spinlock_t perf_lock;
bool perfcntr_active;
@@ -146,7 +149,8 @@ struct msm_gpu {
 
 static inline struct msm_gpu *dev_to_gpu(struct device *dev)
 {
-   return dev_get_drvdata(dev);
+   struct adreno_smmu_priv *adreno_smmu = dev_get_drvdata(dev);
+   return container_of(adreno_smmu, struct msm_gpu, adreno_smmu);
 }
 
 /* It turns out that all targets use the same ringbuffer size */
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v16 14/20] iommu/arm-smmu: Prepare for the adreno-smmu implementation

2020-09-05 Thread Rob Clark
On Fri, Sep 4, 2020 at 9:00 AM Bjorn Andersson
 wrote:
>
> On Tue 01 Sep 11:46 CDT 2020, Rob Clark wrote:
>
> > From: Jordan Crouse 
> >
> > Do a bit of prep work to add the upcoming adreno-smmu implementation.
> >
> > Add an hook to allow the implementation to choose which context banks
> > to allocate.
> >
> > Move some of the common structs to arm-smmu.h in anticipation of them
> > being used by the implementations and update some of the existing hooks
> > to pass more information that the implementation will need.
> >
> > These modifications will be used by the upcoming Adreno SMMU
> > implementation to identify the GPU device and properly configure it
> > for pagetable switching.
> >
> > Co-developed-by: Rob Clark 
> > Signed-off-by: Jordan Crouse 
> > Signed-off-by: Rob Clark 
>
> As I built the handoff support on top of this patch I ended up
> reworking the alloc_context_bank() prototype to something I found a
> little bit cleaner.
>
> So perhaps you would be interested in squashing
> https://lore.kernel.org/linux-arm-msm/20200904155513.282067-2-bjorn.anders...@linaro.org/
> into this patch?

Yeah, I think this looks nicer, thanks

BR,
-R

> Otherwise, feel free to add my:
>
> Reviewed-by: Bjorn Andersson 
>
> Regards,
> Bjorn
>
> > ---
> >  drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  2 +-
> >  drivers/iommu/arm/arm-smmu/arm-smmu.c  | 69 ++
> >  drivers/iommu/arm/arm-smmu/arm-smmu.h  | 51 +++-
> >  3 files changed, 68 insertions(+), 54 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> > index a9861dcd0884..88f17cc33023 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
> > @@ -69,7 +69,7 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
> >  }
> >
> >  static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
> > - struct io_pgtable_cfg *pgtbl_cfg)
> > + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
> >  {
> >   struct cavium_smmu *cs = container_of(smmu_domain->smmu,
> > struct cavium_smmu, smmu);
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > index 8e884e58f208..68b7b9e6140e 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> > @@ -65,41 +65,10 @@ module_param(disable_bypass, bool, S_IRUGO);
> >  MODULE_PARM_DESC(disable_bypass,
> >   "Disable bypass streams such that incoming transactions from devices 
> > that are not attached to an iommu domain will report an abort back to the 
> > device and will not be allowed to pass through the SMMU.");
> >
> > -struct arm_smmu_s2cr {
> > - struct iommu_group  *group;
> > - int count;
> > - enum arm_smmu_s2cr_type type;
> > - enum arm_smmu_s2cr_privcfg  privcfg;
> > - u8  cbndx;
> > -};
> > -
> >  #define s2cr_init_val (struct arm_smmu_s2cr){  
> >   \
> >   .type = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS,\
> >  }
> >
> > -struct arm_smmu_smr {
> > - u16 mask;
> > - u16 id;
> > - boolvalid;
> > -};
> > -
> > -struct arm_smmu_cb {
> > - u64 ttbr[2];
> > - u32 tcr[2];
> > - u32 mair[2];
> > - struct arm_smmu_cfg *cfg;
> > -};
> > -
> > -struct arm_smmu_master_cfg {
> > - struct arm_smmu_device  *smmu;
> > - s16 smendx[];
> > -};
> > -#define INVALID_SMENDX   -1
> > -#define cfg_smendx(cfg, fw, i) \
> > - (i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
> > -#define for_each_cfg_sme(cfg, fw, i, idx) \
> > - for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)
> > -
> >  static bool using_legacy_binding, using_generic_binding;
> >
> >  static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
> > @@ -234,19 +203,6 @@ static int arm_smmu_register_legacy_master(struct 
> > device *dev,
> >  }
> >  #endif /* CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS */
> >
> > -static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
> > -{
> > - int idx;
> > -
> > - do {
> > - idx = find_next_zero_bit(map, end, start);
> > - if (idx == end)
> > - return -ENOSPC;
> > - } while (test_and_set_bit(idx, map));
> > -
> > - return idx;
> > -}
> > -
> >  static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
> >  {
> >   clear_bit(idx, map);
> > @@ -578,7 +534,7 @@ static void arm_smmu_init_context_bank(struct 
> > arm_smmu_domain *smmu_domain,
> >   }
> >  }
> >
> > 

Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-09-05 Thread Randy Dunlap
Hi,

I'll add a few edits other than those that Borislav made.
(nice review job, BP)


On 8/27/20 8:06 AM, Fenghua Yu wrote:
> From: Ashok Raj 
> 
> ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
> features are a complicated stack with lots of interconnected pieces.
> This documentation provides a big picture overview for all of the
> features.
> 
> Signed-off-by: Ashok Raj 
> Co-developed-by: Fenghua Yu 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 
> ---
> v7:
> - Change the doc for updating PASID by IPI and context switch (Andy).
> 
> v3:
> - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
> - Fix a couple of typos (Baolu)
> 
> v2:
> - Fix the doc format and add the doc in toctree (Thomas)
> - Modify the doc for better description (Thomas, Tony, Dave)
> 
>  Documentation/x86/index.rst |   1 +
>  Documentation/x86/sva.rst   | 254 
>  2 files changed, 255 insertions(+)
>  create mode 100644 Documentation/x86/sva.rst


> diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
> new file mode 100644
> index ..6e7ac565e127
> --- /dev/null
> +++ b/Documentation/x86/sva.rst
> @@ -0,0 +1,254 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===
> +Shared Virtual Addressing (SVA) with ENQCMD
> +===
> +
> +Background
> +==
> +

...

> +
> +Shared Hardware Workqueues
> +==
> +
> +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
> +the use of Shared Work Queues (SWQ) by both applications and Virtual
> +Machines (VM's). This allows better hardware utilization vs. hard
> +partitioning resources that could result in under utilization. In order to
> +allow the hardware to distinguish the context for which work is being
> +executed in the hardware by SWQ interface, SIOV uses Process Address Space
> +ID (PASID), which is a 20bit number defined by the PCIe SIG.

  20-bit

> +
> +PASID value is encoded in all transactions from the device. This allows the
> +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
> +Resource Identifier (RID) which is the Bus/Device/Function.
> +
> +
> +ENQCMD
> +==
> +

...

> +
> +Process Address Space Tagging
> +=
> +

...

> +
> +PASID Management
> +
> +

...

> +
> +Relationships
> +=
> +
> + * Each process has many threads, but only one PASID

   (end with) PASID.

> + * Devices have a limited number (~10's to 1000's) of hardware
> +   workqueues and each portal maps down to a single workqueue.
> +   The device driver manages allocating hardware workqueues.
> + * A single mmap() maps a single hardware workqueue as a "portal"

 (end with)  .

> + * For each device with which a process interacts, there must be
> +   one or more mmap()'d portals.
> + * Many threads within a process can share a single portal to access
> +   a single device.
> + * Multiple processes can separately mmap() the same portal, in
> +   which case they still share one device hardware workqueue.
> + * The single process-wide PASID is used by all threads to interact
> +   with all devices.  There is not, for instance, a PASID for each
> +   thread or each thread<->device pair.
> +
> +FAQ
> +===
> +
> +* What is SVA/SVM?
> +
> +Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
> +work in the same address space. In short, sharing the address space. Some
> +call it Shared Virtual Memory (SVM), but Linux community wanted to avoid

waned to avoid 
confusing

> +it with Posix Shared Memory and Secure Virtual Machines which were terms

   POSIX

> +already in circulation.
> +
> +* What is a PASID?
> +
> +A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is

ah, BP already commented about using acronyms to define acronyms.  :)

> +a 20 bit number allocated and managed by the OS. PASID is included in all

 20-bit

> +transactions between the platform and the device.
> +
> +* How are shared work queues different?
> +
> +Traditionally to allow user space applications interact with hardware,
> +there is a separate instance required per process. For example, consider
> +doorbells as a mechanism of informing hardware about work to process. Each
> +doorbell is required to be spaced 4k (or page-size) apart for process
> +isolation. This requires hardware to provision that space and reserve in

 reserve it in

> +MMIO. This doesn't scale as the number of threads becomes quite large. The
> +hardware also manages the queue depth for Shared Work Queues (SWQ), and
> +consumers don't need to track queue depth. If 

Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode

2020-09-05 Thread Alexey Kardashevskiy



On 31/08/2020 16:40, Christoph Hellwig wrote:

On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote:

Hello,

On 7/8/20 5:24 PM, Christoph Hellwig wrote:

Use the DMA API bypass mechanism for direct window mappings.  This uses
common code and speed up the direct mapping case by avoiding indirect
calls just when not using dma ops at all.  It also fixes a problem where
the sync_* methods were using the bypass check for DMA allocations, but
those are part of the streaming ops.

Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which
has never been well defined, as is only used by a few drivers, which
IIRC never showed up in the typical Cell blade setups that are affected
by the ordering workaround.

Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB")
Signed-off-by: Christoph Hellwig 
---
  arch/powerpc/Kconfig  |  1 +
  arch/powerpc/include/asm/device.h |  5 --
  arch/powerpc/kernel/dma-iommu.c   | 90 ---
  3 files changed, 10 insertions(+), 86 deletions(-)


I am seeing corruptions on a couple of POWER9 systems (boston) when
stressed with IO. stress-ng gives some results but I have first seen
it when compiling the kernel in a guest and this is still the best way
to raise the issue.

These systems have of a SAS Adaptec controller :

   0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 
3 (rev 01)

When the failure occurs, the POWERPC EEH interrupt fires and dumps
lowlevel PHB4 registers among which :

   [ 2179.251069490,3] PHB#0003[0:3]:   phbErrorStatus = 
0280
   [ 2179.251117476,3] PHB#0003[0:3]:  phbFirstErrorStatus = 
0200

The bits raised identify a PPC 'TCE' error, which means it is related
to DMAs. See below for more details.


Reverting this patch "fixes" the issue but it is probably else where,
in some other layers or in the aacraid driver. How should I proceed
to get more information ?


The aacraid DMA masks look like a mess.



It kinds does and is. The thing is that after f1565c24b596 the driver 
sets 32 bit DMA mask which in turn enables the small DMA window (not 
bypass) and since the aacraid driver has at least one bug with double 
unmap of the same DMA handle, this somehow leads to EEH (PCI DMA error).



The driver sets 32but mask because it callis dma_get_required_mask() 
_before_ setting the mask so dma_get_required_mask() does not go the 
dma_alloc_direct() path and calls the powerpc's 
dma_iommu_get_required_mask() which:


1. does the math like this (spot 2 bugs):

mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1)

2. but even after fixing that, the driver crashes as f1565c24b596 
removed the call to dma_iommu_bypass_supported() so it enforces IOMMU.



The patch below (the first hunk to be precise) brings the things back to 
where they were (64bit mask). The double unmap bug in the driver is 
still to be investigated.




diff --git a/arch/powerpc/kernel/dma-iommu.c 
b/arch/powerpc/kernel/dma-iommu.c

index 569fecd7b5b2..785abccb90fc 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -117,10 +117,18 @@ u64 dma_iommu_get_required_mask(struct device *dev)
struct iommu_table *tbl = get_iommu_table_base(dev);
u64 mask;

+   if (dev_is_pci(dev)) {
+   u64 bypass_mask = dma_direct_get_required_mask(dev);
+
+   if (dma_iommu_bypass_supported(dev, bypass_mask))
+   return bypass_mask;
+   }
+
if (!tbl)
return 0;

-   mask = 1ULL < (fls_long(tbl->it_offset + tbl->it_size) - 1);
+   mask = 1ULL << (fls_long(tbl->it_offset + tbl->it_size) +
+   tbl->it_page_shift - 1);
mask += mask - 1;

return mask;



--
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

2020-09-05 Thread Borislav Petkov
> Subject: Re: [PATCH v7 3/9] docs: x86: Add documentation for SVA (Shared 
> Virtual Addressing)

Fix prefix: Documentation/x86: ...

On Thu, Aug 27, 2020 at 08:06:28AM -0700, Fenghua Yu wrote:
> From: Ashok Raj 
> 
> ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
> features are a complicated stack with lots of interconnected pieces.
> This documentation provides a big picture overview for all of the
> features.
> 
> Signed-off-by: Ashok Raj 
> Co-developed-by: Fenghua Yu 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 
> ---
> v7:
> - Change the doc for updating PASID by IPI and context switch (Andy).
> 
> v3:
> - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
> - Fix a couple of typos (Baolu)
> 
> v2:
> - Fix the doc format and add the doc in toctree (Thomas)
> - Modify the doc for better description (Thomas, Tony, Dave)
> 
>  Documentation/x86/index.rst |   1 +
>  Documentation/x86/sva.rst   | 254 
>  2 files changed, 255 insertions(+)
>  create mode 100644 Documentation/x86/sva.rst
> 
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> index 265d9e9a093b..e5d5ff096685 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -30,3 +30,4 @@ x86-specific Documentation
> usb-legacy-support
> i386/index
> x86_64/index
> +   sva
> diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
> new file mode 100644
> index ..6e7ac565e127
> --- /dev/null
> +++ b/Documentation/x86/sva.rst
> @@ -0,0 +1,254 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===
> +Shared Virtual Addressing (SVA) with ENQCMD
> +===
> +
> +Background
> +==
> +
> +Shared Virtual Addressing (SVA) allows the processor and device to use the
> +same virtual addresses avoiding the need for software to translate virtual
> +addresses to physical addresses. SVA is what PCIe calls Shared Virtual
> +Memory (SVM)
   ^
   . <-- Fullstop


> +
> +In addition to the convenience of using application virtual addresses
> +by the device, it also doesn't require pinning pages for DMA.
> +PCIe Address Translation Services (ATS) along with Page Request Interface
> +(PRI) allow devices to function much the same way as the CPU handling
> +application page-faults. For more information please refer to PCIe
^
the

> +specification Chapter 10: ATS Specification.
> +
> +Use of SVA requires IOMMU support in the platform. IOMMU also is required
> +to support PCIe features ATS and PRI. ATS allows devices to cache
> +translations for the virtual address. IOMMU driver uses the mmu_notifier()

... for virtual addresses. The IOMMU driver...

> +support to keep the device tlb cache and the CPU cache in sync. PRI allows

  TLB

> +the device to request paging the virtual address before using if they are
^
> +not paged in the CPU page tables.

That sentence is reading strange and needs fixing.

> +
> +
> +Shared Hardware Workqueues
> +==
> +
> +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
> +the use of Shared Work Queues (SWQ) by both applications and Virtual
> +Machines (VM's). This allows better hardware utilization vs. hard
> +partitioning resources that could result in under utilization. In order to
> +allow the hardware to distinguish the context for which work is being
> +executed in the hardware by SWQ interface, SIOV uses Process Address Space
> +ID (PASID), which is a 20bit number defined by the PCIe SIG.
> +
> +PASID value is encoded in all transactions from the device. This allows the
> +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
> +Resource Identifier (RID) which is the Bus/Device/Function.
> +
> +
> +ENQCMD
> +==
> +
> +ENQCMD is a new instruction on Intel platforms that atomically submits a
> +work descriptor to a device. The descriptor includes the operation to be
> +performed, virtual addresses of all parameters, virtual address of a 
> completion
> +record, and the PASID (process address space ID) of the current process.
> +
> +ENQCMD works with non-posted semantics and carries a status back if the
> +command was accepted by hardware. This allows the submitter to know if the
> +submission needs to be retried or other device specific mechanisms to
> +implement fairness or ensure forward progress can be made.

can/should be provided.

> +
> +ENQCMD is the glue that ensures applications can directly submit commands
> +to the hardware and also permit hardware to be aware of application context

permits

> +to 

[PATCH v2 19/23] iommu/mediatek: Support report iova 34bit translation fault in ISR

2020-09-05 Thread Yong Wu
If the iova is over 32bit, the fault status register bit is a little
different. Add a flag for the special register bits.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 9fe9a706742c..4da92e5739f3 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -4,6 +4,7 @@
  * Author: Yong Wu 
  */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -87,6 +88,9 @@
 #define F_REG_MMU1_FAULT_MASK  GENMASK(13, 7)
 
 #define REG_MMU0_FAULT_VA  0x13c
+#define F_MMU_INVAL_VA_31_12_MASK  GENMASK(31, 12)
+#define F_MMU_INVAL_VA_34_32_MASK  GENMASK(11, 9)
+#define F_MMU_INVAL_PA_34_32_MASK  GENMASK(8, 6)
 #define F_MMU_FAULT_VA_WRITE_BIT   BIT(1)
 #define F_MMU_FAULT_VA_LAYER_BIT   BIT(0)
 
@@ -110,6 +114,7 @@
 #define OUT_ORDER_WR_ENBIT(4)
 #define HAS_SUB_COMM   BIT(5)
 #define WR_THROT_ENBIT(6)
+#define IOVA_34_EN BIT(7)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -258,8 +263,9 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
 {
struct mtk_iommu_data *data = dev_id;
struct mtk_iommu_domain *dom = data->m4u_dom;
-   u32 int_state, regval, fault_iova, fault_pa;
unsigned int fault_larb, fault_port, sub_comm = 0;
+   u32 int_state, regval, va34_32, pa34_32;
+   u64 fault_iova, fault_pa;
bool layer, write;
 
/* Read error info from registers */
@@ -275,6 +281,14 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
}
layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, IOVA_34_EN)) {
+   va34_32 = FIELD_GET(F_MMU_INVAL_VA_34_32_MASK, fault_iova);
+   pa34_32 = FIELD_GET(F_MMU_INVAL_PA_34_32_MASK, fault_iova);
+   fault_iova = fault_iova & F_MMU_INVAL_VA_31_12_MASK;
+   fault_iova |=  (u64)va34_32 << 32;
+   fault_pa |= (u64)pa34_32 << 32;
+   }
+
fault_port = F_MMU_INT_ID_PORT_ID(regval);
if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) {
fault_larb = F_MMU_INT_ID_COMM_ID(regval);
@@ -288,7 +302,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
   write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
dev_err_ratelimited(
data->dev,
-   "fault type=0x%x iova=0x%x pa=0x%x larb=%d port=%d 
layer=%d %s\n",
+   "fault type=0x%x iova=0x%llx pa=0x%llx larb=%d port=%d 
layer=%d %s\n",
int_state, fault_iova, fault_pa, fault_larb, fault_port,
layer, write ? "write" : "read");
}
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 23/23] memory: mtk-smi: Add mt8192 support

2020-09-05 Thread Yong Wu
Add mt8192 smi support.

Signed-off-by: Yong Wu 
---
 drivers/memory/mtk-smi.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index e94c99ca2883..0ec3eff4d92d 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -261,6 +261,10 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt8183 = 
{
  /* IPU0 | IPU1 | CCU */
 };
 
+static const struct mtk_smi_larb_gen mtk_smi_larb_mt8192 = {
+   .config_port= mtk_smi_larb_config_port_gen2_general,
+};
+
 static const struct of_device_id mtk_smi_larb_of_ids[] = {
{
.compatible = "mediatek,mt8173-smi-larb",
@@ -282,6 +286,10 @@ static const struct of_device_id mtk_smi_larb_of_ids[] = {
.compatible = "mediatek,mt8183-smi-larb",
.data = _smi_larb_mt8183
},
+   {
+   .compatible = "mediatek,mt8192-smi-larb",
+   .data = _smi_larb_mt8192
+   },
{}
 };
 
@@ -421,6 +429,13 @@ static const struct mtk_smi_common_plat 
mtk_smi_common_mt8183 = {
F_MMU1_LARB(7),
 };
 
+static const struct mtk_smi_common_plat mtk_smi_common_mt8192 = {
+   .gen  = MTK_SMI_GEN2,
+   .has_gals = true,
+   .bus_sel  = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) |
+   F_MMU1_LARB(6),
+};
+
 static const struct of_device_id mtk_smi_common_of_ids[] = {
{
.compatible = "mediatek,mt8173-smi-common",
@@ -442,6 +457,10 @@ static const struct of_device_id mtk_smi_common_of_ids[] = 
{
.compatible = "mediatek,mt8183-smi-common",
.data = _smi_common_mt8183,
},
+   {
+   .compatible = "mediatek,mt8192-smi-common",
+   .data = _smi_common_mt8192,
+   },
{}
 };
 
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 20/23] iommu/mediatek: Add support for multi domain

2020-09-05 Thread Yong Wu
Some HW IP(ex: CCU) require the special iova range. That means the
iova got from dma_alloc_attrs for that devices must locate in his
special range. In this patch, we allocate a special iova_range for
each a special requirement and create each a iommu domain for each
a iova_range.

meanwhile we still use one pagetable which support 16GB iova.

After this patch, If the iova range of a master is over 4G, the master
should:
a) Declare its special dma_ranges in its dtsi node. For example, If we
   preassign the iova 4G-8G for vcodec, then the vcodec dtsi node should
   add this:
   /*
* iova start at 0x1__, pa still start at 0x4000_
* size is 0x1__.
*/
   dma-ranges = <0x1 0x0 0x0 0x4000 0x1 0x0>;  /* 4G ~ 8G */
 Note: we don't have a actual bus concept here. the master doesn't have its
 special parent node, thus this dma-ranges can only be put in the master's
 node.

b) Update the dma_mask:
  dma_set_mask_and_coherent(dev, DMA_BIT_MASK(33));

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 47 +++
 drivers/iommu/mtk_iommu.h |  3 ++-
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 4da92e5739f3..93ce4fa806cf 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -350,6 +350,14 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
 {
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
 
+   /* Use the exist domain as there is only one m4u pgtable here. */
+   if (data->m4u_dom) {
+   dom->iop = data->m4u_dom->iop;
+   dom->cfg = data->m4u_dom->cfg;
+   dom->domain.pgsize_bitmap = data->m4u_dom->cfg.pgsize_bitmap;
+   return 0;
+   }
+
dom->cfg = (struct io_pgtable_cfg) {
.quirks = IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NO_PERMS |
@@ -375,6 +383,8 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
 
 static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
 {
+   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+   const struct mtk_iommu_iova_region *region;
struct mtk_iommu_domain *dom;
 
if (type != IOMMU_DOMAIN_DMA)
@@ -390,8 +400,9 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned 
type)
if (mtk_iommu_domain_finalise(dom))
goto  put_dma_cookie;
 
-   dom->domain.geometry.aperture_start = 0;
-   dom->domain.geometry.aperture_end = DMA_BIT_MASK(32);
+   region = data->plat_data->iova_region + data->cur_domid;
+   dom->domain.geometry.aperture_start = region->iova_base;
+   dom->domain.geometry.aperture_end = region->iova_base + region->size - 
1;
dom->domain.geometry.force_aperture = true;
 
return >domain;
@@ -535,19 +546,31 @@ static void mtk_iommu_release_device(struct device *dev)
 static struct iommu_group *mtk_iommu_device_group(struct device *dev)
 {
struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct iommu_group *group;
+   int domid;
 
if (!data)
return ERR_PTR(-ENODEV);
 
-   /* All the client devices are in the same m4u iommu-group */
-   if (!data->m4u_group) {
-   data->m4u_group = iommu_group_alloc();
-   if (IS_ERR(data->m4u_group))
+   domid = MTK_M4U_TO_DOM(fwspec->ids[0]);
+   if (domid >= data->plat_data->iova_region_nr) {
+   dev_err(dev, "iommu domain id(%d/%d) is error.\n", domid,
+   data->plat_data->iova_region_nr);
+   return ERR_PTR(-EINVAL);
+   }
+
+   group = data->m4u_group[domid];
+   if (!group) {
+   group = iommu_group_alloc();
+   if (IS_ERR(group))
dev_err(dev, "Failed to allocate M4U IOMMU group\n");
+   data->m4u_group[domid] = group;
} else {
-   iommu_group_ref_get(data->m4u_group);
+   iommu_group_ref_get(group);
}
-   return data->m4u_group;
+   data->cur_domid = domid;
+   return group;
 }
 
 static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
@@ -576,14 +599,20 @@ static void mtk_iommu_get_resv_regions(struct device *dev,
   struct list_head *head)
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
-   const struct mtk_iommu_iova_region *resv;
+   const struct mtk_iommu_iova_region *resv, *curdom;
struct iommu_resv_region *region;
int prot = IOMMU_WRITE | IOMMU_READ;
unsigned int i;
 
+   curdom = data->plat_data->iova_region + data->cur_domid;
for (i = 0; i < data->plat_data->iova_region_nr; i++) {
resv = data->plat_data->iova_region + i;
 
+   /* Only 

[PATCH v2 22/23] iommu/mediatek: Add mt8192 support

2020-09-05 Thread Yong Wu
Add mt8192 iommu support.

For multi domain, Add 1M gap for the vdec domain size. That is because
vdec HW has a end address register which require (start_addr +
len) rather than (start_addr + len - 1). Take a example, if the start_addr
is 0xfff0, size is 0x10, then the end_address is 0xfff0 +
0x10 = 0x1  . but the register only is 32bit. thus HW will get
the end address is 0. To avoid this issue, I add 1M gap for this.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 22 ++
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d0593d317240..ebc3767b62ab 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -171,6 +171,16 @@ static const struct mtk_iommu_iova_region single_domain[] 
= {
{.iova_base = 0,.size = SZ_4G},
 };
 
+static const struct mtk_iommu_iova_region mt8192_multi_dom[] = {
+   { .iova_base = 0x0, .size = SZ_4G}, /* disp: 0 ~ 4G 
*/
+   #if IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT)
+   { .iova_base = SZ_4G,   .size = SZ_4G - SZ_1M}, /* vdec: 4G ~ 
8G gap: 1M */
+   { .iova_base = SZ_4G * 2,   .size = SZ_4G - SZ_1M}, /* CAM/MDP: 8G 
~ 12G */
+   { .iova_base = 0x24000ULL,  .size = 0x400}, /* CCU0 */
+   { .iova_base = 0x24400ULL,  .size = 0x400}, /* CCU1 */
+   #endif
+};
+
 /*
  * There may be 1 or 2 M4U HWs, But we always expect they are in the same 
domain
  * for the performance.
@@ -967,11 +977,23 @@ static const struct mtk_iommu_plat_data mt8183_data = {
.larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}},
 };
 
+static const struct mtk_iommu_plat_data mt8192_data = {
+   .m4u_plat   = M4U_MT8192,
+   .flags  = HAS_BCLK | HAS_SUB_COMM | OUT_ORDER_WR_EN |
+ WR_THROT_EN | IOVA_34_EN,
+   .inv_sel_reg= REG_MMU_INV_SEL_GEN2,
+   .iova_region= mt8192_multi_dom,
+   .iova_region_nr = ARRAY_SIZE(mt8192_multi_dom),
+   .larbid_remap   = {{0}, {1}, {4, 5}, {7}, {2}, {9, 11, 19, 20},
+  {0, 14, 16}, {0, 13, 18, 17}},
+};
+
 static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt2712-m4u", .data = _data},
{ .compatible = "mediatek,mt6779-m4u", .data = _data},
{ .compatible = "mediatek,mt8173-m4u", .data = _data},
{ .compatible = "mediatek,mt8183-m4u", .data = _data},
+   { .compatible = "mediatek,mt8192-m4u", .data = _data},
{}
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 5e346464cdf8..d2702eda25d4 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -42,6 +42,7 @@ enum mtk_iommu_plat {
M4U_MT6779,
M4U_MT8173,
M4U_MT8183,
+   M4U_MT8192,
 };
 
 struct mtk_iommu_iova_region;
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 21/23] iommu/mediatek: Adjust the structure

2020-09-05 Thread Yong Wu
Add "struct mtk_iommu_data *" in the "struct mtk_iommu_domain",
reduce the call mtk_iommu_get_m4u_data().
No functional change.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 93ce4fa806cf..d0593d317240 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -123,6 +123,7 @@ struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
struct io_pgtable_ops   *iop;
 
+   struct mtk_iommu_data   *data;
struct iommu_domain domain;
 };
 
@@ -348,7 +349,7 @@ static void mtk_iommu_config(struct mtk_iommu_data *data,
 
 static int mtk_iommu_domain_finalise(struct mtk_iommu_domain *dom)
 {
-   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+   struct mtk_iommu_data *data = dom->data;
 
/* Use the exist domain as there is only one m4u pgtable here. */
if (data->m4u_dom) {
@@ -397,6 +398,7 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned 
type)
if (iommu_get_dma_cookie(>domain))
goto  free_dom;
 
+   dom->data = data;
if (mtk_iommu_domain_finalise(dom))
goto  put_dma_cookie;
 
@@ -469,10 +471,9 @@ static int mtk_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
 phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
 
/* The "4GB mode" M4U physically can not use the lower remap of Dram. */
-   if (data->enable_4GB)
+   if (dom->data->enable_4GB)
paddr |= BIT_ULL(32);
 
/* Synchronize with the tlb_lock */
@@ -490,31 +491,32 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain,
 
 static void mtk_iommu_flush_iotlb_all(struct iommu_domain *domain)
 {
-   mtk_iommu_tlb_flush_all(mtk_iommu_get_m4u_data());
+   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+
+   mtk_iommu_tlb_flush_all(dom->data);
 }
 
 static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
 struct iommu_iotlb_gather *gather)
 {
-   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
+   struct mtk_iommu_domain *dom = to_mtk_domain(domain);
size_t length = gather->end - gather->start;
 
if (gather->start == ULONG_MAX)
return;
 
mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
-  data);
+  dom->data);
 }
 
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
  dma_addr_t iova)
 {
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
phys_addr_t pa;
 
pa = dom->iop->iova_to_phys(dom->iop, iova);
-   if (data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
+   if (dom->data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
pa &= ~BIT_ULL(32);
 
return pa;
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 18/23] iommu/mediatek: Support up to 34bit iova in tlb flush

2020-09-05 Thread Yong Wu
If the iova is 34bit, the iova[32][33] is the bit0/1 in the tlb flush
register. Add a new macro for this.

there is a minor change unrelated with this patch. it also use the new
macro.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 54cad923ae16..9fe9a706742c 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -125,6 +125,9 @@ static const struct iommu_ops mtk_iommu_ops;
 
 static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);
 
+#define MTK_IOMMU_ADDR(addr) ({unsigned long _addr = addr; \
+ (lower_32_bits(_addr) | upper_32_bits(_addr)); })
+
 /*
  * In M4U 4GB mode, the physical address is remapped as below:
  *
@@ -213,8 +216,9 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
   data->base + data->plat_data->inv_sel_reg);
 
-   writel_relaxed(iova, data->base + REG_MMU_INVLD_START_A);
-   writel_relaxed(iova + size - 1,
+   writel_relaxed(MTK_IOMMU_ADDR(iova),
+  data->base + REG_MMU_INVLD_START_A);
+   writel_relaxed(MTK_IOMMU_ADDR(iova + size - 1),
   data->base + REG_MMU_INVLD_END_A);
writel_relaxed(F_MMU_INV_RANGE,
   data->base + REG_MMU_INVALIDATE);
@@ -634,8 +638,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
if (data->plat_data->m4u_plat == M4U_MT8173)
regval = (data->protect_base >> 1) | (data->enable_4GB << 31);
else
-   regval = lower_32_bits(data->protect_base) |
-upper_32_bits(data->protect_base);
+   regval = MTK_IOMMU_ADDR(data->protect_base);
writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
 
if (data->enable_4GB &&
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 17/23] iommu/mediatek: Support master use iova over 32bit

2020-09-05 Thread Yong Wu
After extending v7s, our pagetable already support iova reach
16GB(34bit). the master got the iova via dma_alloc_attrs may reach
34bits, but its HW register still is 32bit. then how to set the
bit32/bit33 iova? this depend on a SMI larb setting(bank_sel).

we separate whole 16GB iova to four banks:
bank: 0: 0~4G; 1: 4~8G; 2: 8-12G; 3: 12-16G;
The bank number is (iova >> 32).

We will preassign which bank the larbs belong to. currently we don't
have a interface for master to adjust its bank number.

Each a bank is a iova_region which is a independent iommu-domain.
the iova range for each iommu-domain can't cross 4G.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c  | 12 +---
 drivers/memory/mtk-smi.c   |  7 +++
 include/soc/mediatek/smi.h |  1 +
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 10b9a86ea826..54cad923ae16 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -303,17 +303,23 @@ static void mtk_iommu_config(struct mtk_iommu_data *data,
 struct device *dev, bool enable)
 {
struct mtk_smi_larb_iommu*larb_mmu;
-   unsigned int larbid, portid;
+   unsigned int larbid, portid, domid;
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   const struct mtk_iommu_iova_region *region;
int i;
 
for (i = 0; i < fwspec->num_ids; ++i) {
larbid = MTK_M4U_TO_LARB(fwspec->ids[i]);
portid = MTK_M4U_TO_PORT(fwspec->ids[i]);
+   domid = MTK_M4U_TO_DOM(fwspec->ids[i]);
+
larb_mmu = >larb_imu[larbid];
+   region = data->plat_data->iova_region + domid;
+   larb_mmu->bank[portid] = upper_32_bits(region->iova_base);
 
-   dev_dbg(dev, "%s iommu port: %d\n",
-   enable ? "enable" : "disable", portid);
+   dev_dbg(dev, "%s iommu for larb(%s) port %d dom %d bank %d.\n",
+   enable ? "enable" : "disable", dev_name(larb_mmu->dev),
+   portid, domid, larb_mmu->bank[portid]);
 
if (enable)
larb_mmu->mmu |= MTK_SMI_MMU_EN(portid);
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index ec83f1ac48b1..e94c99ca2883 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -41,6 +41,10 @@
 /* mt2712 */
 #define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4))
 #define F_MMU_EN   BIT(0)
+#define BANK_SEL(id)   ({  \
+   u32 _id = (id) & 0x3;   \
+   (_id << 8 | _id << 10 | _id << 12 | _id << 14); \
+})
 
 /* SMI COMMON */
 #define SMI_BUS_SEL0x220
@@ -85,6 +89,7 @@ struct mtk_smi_larb { /* larb: local arbiter */
const struct mtk_smi_larb_gen   *larb_gen;
int larbid;
u32 *mmu;
+   unsigned char   *bank;
 };
 
 static int mtk_smi_clk_enable(const struct mtk_smi *smi)
@@ -151,6 +156,7 @@ mtk_smi_larb_bind(struct device *dev, struct device 
*master, void *data)
if (dev == larb_mmu[i].dev) {
larb->larbid = i;
larb->mmu = _mmu[i].mmu;
+   larb->bank = larb_mmu[i].bank;
return 0;
}
}
@@ -169,6 +175,7 @@ static void mtk_smi_larb_config_port_gen2_general(struct 
device *dev)
for_each_set_bit(i, (unsigned long *)larb->mmu, 32) {
reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i));
reg |= F_MMU_EN;
+   reg |= BANK_SEL(larb->bank[i]);
writel(reg, larb->base + SMI_LARB_NONSEC_CON(i));
}
 }
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
index 9371bf572ab8..4cf445dbbdaa 100644
--- a/include/soc/mediatek/smi.h
+++ b/include/soc/mediatek/smi.h
@@ -16,6 +16,7 @@
 struct mtk_smi_larb_iommu {
struct device *dev;
unsigned int   mmu;
+   unsigned char  bank[32];
 };
 
 /*
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 16/23] iommu/mediatek: Add single domain

2020-09-05 Thread Yong Wu
Defaultly the iova range is 0-4G. here we add a single-domain(0-4G)
for the previous SoC. this also is a preparing patch for supporting
multi-domains.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 5b63732044de..10b9a86ea826 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -158,6 +158,10 @@ struct mtk_iommu_iova_region {
unsigned long long  size;
 };
 
+static const struct mtk_iommu_iova_region single_domain[] = {
+   {.iova_base = 0,.size = SZ_4G},
+};
+
 /*
  * There may be 1 or 2 M4U HWs, But we always expect they are in the same 
domain
  * for the performance.
@@ -877,6 +881,8 @@ static const struct mtk_iommu_plat_data mt2712_data = {
.m4u_plat = M4U_MT2712,
.flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+   .iova_region  = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}},
 };
 
@@ -884,6 +890,8 @@ static const struct mtk_iommu_plat_data mt6779_data = {
.m4u_plat  = M4U_MT6779,
.flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
.inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
+   .iova_region   = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
 };
 
@@ -891,6 +899,8 @@ static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
.flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+   .iova_region  = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}}, /* Linear mapping. */
 };
 
@@ -898,6 +908,8 @@ static const struct mtk_iommu_plat_data mt8183_data = {
.m4u_plat = M4U_MT8183,
.flags= RESET_AXI,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
+   .iova_region  = single_domain,
+   .iova_region_nr = ARRAY_SIZE(single_domain),
.larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}},
 };
 
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 15/23] iommu/mediatek: Add iova reserved function

2020-09-05 Thread Yong Wu
For multiple iommu_domains, we need to reserve some iova regions. Take a
example, If the default iova region is 0 ~ 4G, but the 0x4000_ ~
0x43ff_ is only for the special CCU0 domain. Thus we should exclude
this region for the default iova region.

This patch adds iova reserved flow. It's a preparing patch for supporting
multi-domain.

Signed-off-by: Anan sun 
Signed-off-by: Chao Hao 
Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 28 
 drivers/iommu/mtk_iommu.h |  5 +
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d22772ec64c8..5b63732044de 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -153,6 +153,11 @@ static LIST_HEAD(m4ulist); /* List all the M4U HWs */
 
 #define for_each_m4u(data) list_for_each_entry(data, , list)
 
+struct mtk_iommu_iova_region {
+   dma_addr_t  iova_base;
+   unsigned long long  size;
+};
+
 /*
  * There may be 1 or 2 M4U HWs, But we always expect they are in the same 
domain
  * for the performance.
@@ -539,6 +544,27 @@ static int mtk_iommu_of_xlate(struct device *dev, struct 
of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
 }
 
+static void mtk_iommu_get_resv_regions(struct device *dev,
+  struct list_head *head)
+{
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
+   const struct mtk_iommu_iova_region *resv;
+   struct iommu_resv_region *region;
+   int prot = IOMMU_WRITE | IOMMU_READ;
+   unsigned int i;
+
+   for (i = 0; i < data->plat_data->iova_region_nr; i++) {
+   resv = data->plat_data->iova_region + i;
+
+   region = iommu_alloc_resv_region(resv->iova_base, resv->size,
+prot, IOMMU_RESV_RESERVED);
+   if (!region)
+   return;
+
+   list_add_tail(>list, head);
+   }
+}
+
 static const struct iommu_ops mtk_iommu_ops = {
.domain_alloc   = mtk_iommu_domain_alloc,
.domain_free= mtk_iommu_domain_free,
@@ -553,6 +579,8 @@ static const struct iommu_ops mtk_iommu_ops = {
.release_device = mtk_iommu_release_device,
.device_group   = mtk_iommu_device_group,
.of_xlate   = mtk_iommu_of_xlate,
+   .get_resv_regions = mtk_iommu_get_resv_regions,
+   .put_resv_regions = generic_iommu_put_resv_regions,
.pgsize_bitmap  = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index ae7909815cdb..d45c13c9d324 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -44,10 +44,15 @@ enum mtk_iommu_plat {
M4U_MT8183,
 };
 
+struct mtk_iommu_iova_region;
+
 struct mtk_iommu_plat_data {
enum mtk_iommu_plat m4u_plat;
u32 flags;
u32 inv_sel_reg;
+
+   unsigned intiova_region_nr;
+   const struct mtk_iommu_iova_region   *iova_region;
unsigned char   larbid_remap[MTK_LARB_COM_MAX][MTK_LARB_SUBCOM_MAX];
 };
 
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 14/23] iommu/mediatek: Add power-domain operation

2020-09-05 Thread Yong Wu
In the previous SoC, the M4U HW is in the EMI power domain which is
always on. the latest M4U is in the display power domain which may be
turned on/off, thus we have to add pm_runtime interface for it.

When the engine work, the engine always enable the power and clocks for
smi-larb/smi-common, then the M4U's power will always be powered on
automatically via the device link with smi-common.

Note: we don't enable the M4U power in iommu_map/unmap for tlb flush.
If its power already is on, of course it is ok. if the power is off,
the main tlb will be reset while M4U power on, thus the tlb flush while
m4u power off is unnecessary, just skip it.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 68de59e0d521..d22772ec64c8 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -196,6 +196,10 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
u32 tmp;
 
for_each_m4u(data) {
+   /* skip tlb flush when pm is not active. */
+   if (!pm_runtime_active(data->dev))
+   continue;
+
spin_lock_irqsave(>tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
   data->base + data->plat_data->inv_sel_reg);
@@ -380,6 +384,7 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   struct device *m4udev = data->dev;
int ret;
 
if (!data)
@@ -387,12 +392,18 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 
/* Update the pgtable base address register of the M4U HW */
if (!data->m4u_dom) {
+   ret = pm_runtime_get_sync(m4udev);
+   if (ret < 0)
+   return ret;
ret = mtk_iommu_hw_init(data);
-   if (ret)
+   if (ret) {
+   pm_runtime_put(m4udev);
return ret;
+   }
data->m4u_dom = dom;
writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
   data->base + REG_MMU_PT_BASE_ADDR);
+   pm_runtime_put(m4udev);
}
 
mtk_iommu_config(data, dev, true);
@@ -722,10 +733,13 @@ static int mtk_iommu_probe(struct platform_device *pdev)
if (dev->pm_domain) {
struct device_link *link;
 
+   pm_runtime_enable(dev);
+
link = device_link_add(data->smicomm_dev, dev,
   DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
if (!link) {
dev_err(dev, "Unable link %s.\n", 
dev_name(data->smicomm_dev));
+   pm_runtime_disable(dev);
return -EINVAL;
}
}
@@ -752,8 +766,10 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 
return component_master_add_with_match(dev, _iommu_com_ops, match);
 out:
-   if (dev->pm_domain)
+   if (dev->pm_domain) {
device_link_remove(data->smicomm_dev, dev);
+   pm_runtime_disable(dev);
+   }
return ret;
 }
 
@@ -768,8 +784,10 @@ static int mtk_iommu_remove(struct platform_device *pdev)
bus_set_iommu(_bus_type, NULL);
 
clk_disable_unprepare(data->bclk);
-   if (pdev->dev.pm_domain)
+   if (pdev->dev.pm_domain) {
device_link_remove(data->smicomm_dev, >dev);
+   pm_runtime_disable(>dev);
+   }
devm_free_irq(>dev, data->irq, data);
component_master_del(>dev, _iommu_com_ops);
return 0;
@@ -801,6 +819,9 @@ static int __maybe_unused mtk_iommu_resume(struct device 
*dev)
void __iomem *base = data->base;
int ret;
 
+   /* Avoid first resume to affect the default value of registers below. */
+   if (!m4u_dom)
+   return 0;
ret = clk_prepare_enable(data->bclk);
if (ret) {
dev_err(data->dev, "Failed to enable clk(%d) in resume\n", ret);
@@ -814,13 +835,13 @@ static int __maybe_unused mtk_iommu_resume(struct device 
*dev)
writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
-   if (m4u_dom)
-   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
-  base + REG_MMU_PT_BASE_ADDR);
+   writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
+  base + REG_MMU_PT_BASE_ADDR);
return 0;
 }
 
 static const struct dev_pm_ops mtk_iommu_pm_ops = {
+   

[PATCH v2 13/23] iommu/mediatek: Add device link for smi-common and m4u

2020-09-05 Thread Yong Wu
In the lastest SoC, M4U has its special power domain. thus, If the engine
begin to work, it should help enable the power for M4U firstly.
Currently if the engine work, it always enable the power/clocks for
smi-larbs/smi-common. This patch adds device_link for smi-common and M4U.
then, if smi-common power is enabled, the M4U power also is powered on
automatically.

Normally M4U connect with several smi-larbs and their smi-common always
are the same, In this patch it get smi-common dev from the first smi-larb
device(i==0), then add the device_link only while m4u has power-domain.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 33 ++---
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 940b7a9191b2..68de59e0d521 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -681,7 +682,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
return larb_nr;
 
for (i = 0; i < larb_nr; i++) {
-   struct device_node *larbnode;
+   struct device_node *larbnode, *smicomm_node;
struct platform_device *plarbdev;
u32 id;
 
@@ -707,6 +708,26 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 
component_match_add_release(dev, , release_of,
compare_of, larbnode);
+   if (!i) {
+   smicomm_node = of_parse_phandle(larbnode, 
"mediatek,smi", 0);
+   if (!smicomm_node)
+   return -EINVAL;
+
+   plarbdev = of_find_device_by_node(smicomm_node);
+   of_node_put(smicomm_node);
+   data->smicomm_dev = >dev;
+   }
+   }
+
+   if (dev->pm_domain) {
+   struct device_link *link;
+
+   link = device_link_add(data->smicomm_dev, dev,
+  DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
+   if (!link) {
+   dev_err(dev, "Unable link %s.\n", 
dev_name(data->smicomm_dev));
+   return -EINVAL;
+   }
}
 
platform_set_drvdata(pdev, data);
@@ -714,14 +735,14 @@ static int mtk_iommu_probe(struct platform_device *pdev)
ret = iommu_device_sysfs_add(>iommu, dev, NULL,
 "mtk-iommu.%pa", );
if (ret)
-   return ret;
+   goto out;
 
iommu_device_set_ops(>iommu, _iommu_ops);
iommu_device_set_fwnode(>iommu, >dev.of_node->fwnode);
 
ret = iommu_device_register(>iommu);
if (ret)
-   return ret;
+   goto out;
 
spin_lock_init(>tlb_lock);
list_add_tail(>list, );
@@ -730,6 +751,10 @@ static int mtk_iommu_probe(struct platform_device *pdev)
bus_set_iommu(_bus_type, _iommu_ops);
 
return component_master_add_with_match(dev, _iommu_com_ops, match);
+out:
+   if (dev->pm_domain)
+   device_link_remove(data->smicomm_dev, dev);
+   return ret;
 }
 
 static int mtk_iommu_remove(struct platform_device *pdev)
@@ -743,6 +768,8 @@ static int mtk_iommu_remove(struct platform_device *pdev)
bus_set_iommu(_bus_type, NULL);
 
clk_disable_unprepare(data->bclk);
+   if (pdev->dev.pm_domain)
+   device_link_remove(data->smicomm_dev, >dev);
devm_free_irq(>dev, data->irq, data);
component_master_del(>dev, _iommu_com_ops);
return 0;
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index a2e2c844b96e..ae7909815cdb 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -67,6 +67,7 @@ struct mtk_iommu_data {
 
struct iommu_device iommu;
const struct mtk_iommu_plat_data *plat_data;
+   struct device   *smicomm_dev;
 
struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
 
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 06/23] dt-bindings: mediatek: Add binding for mt8192 IOMMU and SMI

2020-09-05 Thread Yong Wu
This patch adds decriptions for mt8192 IOMMU and SMI.

mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
table format. The M4U-SMI HW diagram is as below:

  EMI
   |
  M4U
   |
  
   SMI Common
  
   |
  +---+--+--+--+---+
  |   |  |  |   .. |   |
  |   |  |  |  |   |
larb0   larb1  larb2  larb4 ..  larb19   larb20
disp0   disp1   mdpvdec   IPE  IPE

All the connections are HW fixed, SW can NOT adjust it.

mt8192 M4U support 0~16GB iova range. we preassign different engines
into different iova ranges:

domain-id  module iova-range  larbs
   0   disp0 ~ 4G  larb0/1
   1   vcodec  4G ~ 8G larb4/5/7
   2   cam/mdp 8G ~ 12G larb2/9/11/13/14/16/17/18/19/20
   3   CCU00x4000_ ~ 0x43ff_ larb13: port 9/10
   4   CCU10x4400_ ~ 0x47ff_ larb14: port 4/5

The iova range for CCU0/1(camera control unit) is HW requirement.

Signed-off-by: Yong Wu 
---
 .../bindings/iommu/mediatek,iommu.yaml|   9 +-
 .../mediatek,smi-common.yaml  |   5 +-
 .../memory-controllers/mediatek,smi-larb.yaml |   3 +-
 include/dt-bindings/memory/mt8192-larb-port.h | 239 ++
 4 files changed, 251 insertions(+), 5 deletions(-)
 create mode 100644 include/dt-bindings/memory/mt8192-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
index d7b31bda3e35..f55fcdee90d1 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
@@ -76,6 +76,7 @@ properties:
   - mediatek,mt7623-m4u, mediatek,mt2701-m4u #mt7623 generation one HW
   - mediatek,mt8173-m4u #mt8173 generation two HW
   - mediatek,mt8183-m4u #mt8183 generation two HW
+  - mediatek,mt8192-m4u #mt8192 generation two HW
 
   reg:
 maxItems: 1
@@ -86,7 +87,7 @@ properties:
   clocks:
 description: |
   bclk is optional. here is the list which require this bclk:
-  mt2701, mt2712, mt7623 and mt8173.
+  mt2701, mt2712, mt7623, mt8173 and mt8192.
   M4U will use the EMI clock which always has been enabled before
   kernel if there is no this bclk.
 items:
@@ -112,7 +113,11 @@ properties:
   dt-binding/memory/mt2712-larb-port.h for mt2712,
   dt-binding/memory/mt6779-larb-port.h for mt6779,
   dt-binding/memory/mt8173-larb-port.h for mt8173,
-  dt-binding/memory/mt8183-larb-port.h for mt8183.
+  dt-binding/memory/mt8183-larb-port.h for mt8183,
+  dt-binding/memory/mt8192-larb-port.h for mt8192.
+
+  power-domains:
+maxItems: 1
 
 required:
   - compatible
diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
index 781d7b7617d9..bbf6062433fa 100644
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
@@ -15,7 +15,7 @@ description: |+
   MediaTek SMI have two generations of HW architecture, here is the list
   which generation the SoCs use:
   generation 1: mt2701 and mt7623.
-  generation 2: mt2712, mt6779, mt8173 and mt8183.
+  generation 2: mt2712, mt6779, mt8173, mt8183 and mt8192.
 
   There's slight differences between the two SMI, for generation 2, the
   register which control the iommu port is at each larb's register base. But
@@ -33,6 +33,7 @@ properties:
   - mediatek,mt7623-smi-common, mediatek,mt2701-smi-common
   - mediatek,mt8173-smi-common
   - mediatek,mt8183-smi-common
+  - mediatek,mt8192-smi-common
 
   reg:
 maxItems: 1
@@ -41,7 +42,7 @@ properties:
 description: |
   apb and smi are mandatory. the async is only for generation 1 smi HW.
   gals(global async local sync) also is optional, here is the list which
-  require gals: mt6779 and mt8183.
+  require gals: mt6779, mt8183 and mt8192.
 minItems: 2
 maxItems: 4
 items:
diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
index e8919f280c84..fa0137d794ba 100644
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
@@ -21,6 +21,7 @@ properties:
   - mediatek,mt7623-smi-larb, mediatek,mt2701-smi-larb
   - mediatek,mt8173-smi-larb
 

[PATCH v2 12/23] iommu/mediatek: Move hw_init into attach_device

2020-09-05 Thread Yong Wu
In attach device, it will update the pagetable base address register.
Move the hw_init function also here. Then it only need call
pm_runtime_get/put one time here if m4u has power domain.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6e85c9976a33..940b7a9191b2 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -122,6 +122,8 @@ struct mtk_iommu_domain {
 
 static const struct iommu_ops mtk_iommu_ops;
 
+static int mtk_iommu_hw_init(const struct mtk_iommu_data *data);
+
 /*
  * In M4U 4GB mode, the physical address is remapped as below:
  *
@@ -377,12 +379,16 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
+   int ret;
 
if (!data)
return -ENODEV;
 
/* Update the pgtable base address register of the M4U HW */
if (!data->m4u_dom) {
+   ret = mtk_iommu_hw_init(data);
+   if (ret)
+   return ret;
data->m4u_dom = dom;
writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
   data->base + REG_MMU_PT_BASE_ADDR);
@@ -705,10 +711,6 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, data);
 
-   ret = mtk_iommu_hw_init(data);
-   if (ret)
-   return ret;
-
ret = iommu_device_sysfs_add(>iommu, dev, NULL,
 "mtk-iommu.%pa", );
if (ret)
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 09/23] iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek

2020-09-05 Thread Yong Wu
MediaTek extend the bit5 in lvl1 and lvl2 descriptor as PA34.

Signed-off-by: Yong Wu 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 9 +++--
 drivers/iommu/mtk_iommu.c  | 2 +-
 include/linux/io-pgtable.h | 4 ++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 4c9d8dccfc5a..a3b3e9147b8d 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -112,9 +112,10 @@
 #define ARM_V7S_TEX_MASK   0x7
 #define ARM_V7S_ATTR_TEX(val)  (((val) & ARM_V7S_TEX_MASK) << 
ARM_V7S_TEX_SHIFT)
 
-/* MediaTek extend the two bits for PA 32bit/33bit */
+/* MediaTek extend the bits below for PA 32bit/33bit/34bit */
 #define ARM_V7S_ATTR_MTK_PA_BIT32  BIT(9)
 #define ARM_V7S_ATTR_MTK_PA_BIT33  BIT(4)
+#define ARM_V7S_ATTR_MTK_PA_BIT34  BIT(5)
 
 /* *well, except for TEX on level 2 large pages, of course :( */
 #define ARM_V7S_CONT_PAGE_TEX_SHIFT6
@@ -194,6 +195,8 @@ static arm_v7s_iopte paddr_to_iopte(phys_addr_t paddr, int 
lvl,
pte |= ARM_V7S_ATTR_MTK_PA_BIT32;
if (paddr & BIT_ULL(33))
pte |= ARM_V7S_ATTR_MTK_PA_BIT33;
+   if (paddr & BIT_ULL(34))
+   pte |= ARM_V7S_ATTR_MTK_PA_BIT34;
return pte;
 }
 
@@ -218,6 +221,8 @@ static phys_addr_t iopte_to_paddr(arm_v7s_iopte pte, int 
lvl,
paddr |= BIT_ULL(32);
if (pte & ARM_V7S_ATTR_MTK_PA_BIT33)
paddr |= BIT_ULL(33);
+   if (pte & ARM_V7S_ATTR_MTK_PA_BIT34)
+   paddr |= BIT_ULL(34);
return paddr;
 }
 
@@ -753,7 +758,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
if (cfg->ias > ARM_V7S_ADDR_BITS)
return NULL;
 
-   if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
+   if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
return NULL;
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index fd8322ee4980..f6a2e3eb59d2 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -317,7 +317,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
.ias = 32,
-   .oas = 34,
+   .oas = 35,
.tlb = _iommu_flush_ops,
.iommu_dev = data->dev,
};
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 23285ba645db..8c9a889d8df9 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -77,8 +77,8 @@ struct io_pgtable_cfg {
 *  TLB maintenance when mapping as well as when unmapping.
 *
 * IO_PGTABLE_QUIRK_ARM_MTK_EXT: (ARM v7s format) MediaTek IOMMUs extend
-*  to support up to 34 bits PA where the bit32 and bit33 are
-*  encoded in the bit9 and bit4 of the PTE respectively.
+*  to support up to 35 bits PA where the bit32, bit33 and bit34 are
+*  encoded in the bit9, bit4 and bit5 of the PTE respectively.
 *
 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 *  on unmap, for DMA domains using the flush queue mechanism for
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 11/23] iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek

2020-09-05 Thread Yong Wu
The standard input iova bits is 32. MediaTek quad the lvl1 pagetable
(4 * lvl1). No change for lvl2 pagetable. Then the iova bits can reach
34bit.

Signed-off-by: Yong Wu 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 13 ++---
 drivers/iommu/mtk_iommu.c  |  2 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 8362fdf76657..0e97ca978c04 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -50,10 +50,17 @@
  */
 #define ARM_V7S_ADDR_BITS  32
 #define _ARM_V7S_LVL_BITS(lvl) (16 - (lvl) * 4)
+/* MediaTek: totally 34bits, 14bits at lvl1 and 8bits at lvl2. */
+#define _ARM_V7S_LVL_BITS_MTK(lvl) (20 - (lvl) * 6)
 #define ARM_V7S_LVL_SHIFT(lvl) (ARM_V7S_ADDR_BITS - (4 + 8 * (lvl)))
 #define ARM_V7S_TABLE_SHIFT10
 
-#define ARM_V7S_PTES_PER_LVL(lvl, cfg) (1 << _ARM_V7S_LVL_BITS(lvl))
+#define ARM_V7S_PTES_PER_LVL(lvl, cfg) ({  \
+   int _l = lvl;   \
+   !arm_v7s_is_mtk_enabled(cfg) ?  \
+(1 << _ARM_V7S_LVL_BITS(_l)) : (1 << _ARM_V7S_LVL_BITS_MTK(_l));\
+})
+
 #define ARM_V7S_TABLE_SIZE(lvl, cfg)   \
(ARM_V7S_PTES_PER_LVL(lvl, cfg) * sizeof(arm_v7s_iopte))
 
@@ -63,7 +70,7 @@
 #define _ARM_V7S_IDX_MASK(lvl, cfg)(ARM_V7S_PTES_PER_LVL(lvl, cfg) - 1)
 #define ARM_V7S_LVL_IDX(addr, lvl, cfg)({  \
int _l = lvl;   \
-   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
+   ((addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
 })
 
 /*
@@ -755,7 +762,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 {
struct arm_v7s_io_pgtable *data;
 
-   if (cfg->ias > ARM_V7S_ADDR_BITS)
+   if (cfg->ias > (arm_v7s_is_mtk_enabled(cfg) ? 34 : ARM_V7S_ADDR_BITS))
return NULL;
 
if (cfg->oas > (arm_v7s_is_mtk_enabled(cfg) ? 35 : ARM_V7S_ADDR_BITS))
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index f6a2e3eb59d2..6e85c9976a33 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -316,7 +316,7 @@ static int mtk_iommu_domain_finalise(struct 
mtk_iommu_domain *dom)
IO_PGTABLE_QUIRK_TLBI_ON_MAP |
IO_PGTABLE_QUIRK_ARM_MTK_EXT,
.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
-   .ias = 32,
+   .ias = 34,
.oas = 35,
.tlb = _iommu_flush_ops,
.iommu_dev = data->dev,
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/23] iommu/mediatek: Use the common mtk-smi-larb-port.h

2020-09-05 Thread Yong Wu
Use the common larb-port header in the source code.

Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c  | 7 ---
 drivers/iommu/mtk_iommu.h  | 1 +
 drivers/memory/mtk-smi.c   | 1 +
 include/soc/mediatek/smi.h | 2 --
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 785b228d39a6..fd8322ee4980 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -101,13 +101,6 @@
 
 #define MTK_PROTECT_PA_ALIGN   256
 
-/*
- * Get the local arbiter ID and the portid within the larb arbiter
- * from mtk_m4u_id which is defined by MTK_M4U_ID.
- */
-#define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0xf)
-#define MTK_M4U_TO_PORT(id)((id) & 0x1f)
-
 #define HAS_4GB_MODE   BIT(0)
 /* HW will use the EMI clock if there isn't the "bclk". */
 #define HAS_BCLK   BIT(1)
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 122925dbe547..a2e2c844b96e 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MTK_LARB_COM_MAX   8
 #define MTK_LARB_SUBCOM_MAX4
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index c21262502581..ec83f1ac48b1 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* mt8173 */
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
index 5a34b87d89e3..9371bf572ab8 100644
--- a/include/soc/mediatek/smi.h
+++ b/include/soc/mediatek/smi.h
@@ -11,8 +11,6 @@
 
 #ifdef CONFIG_MTK_SMI
 
-#define MTK_LARB_NR_MAX16
-
 #define MTK_SMI_MMU_EN(port)   BIT(port)
 
 struct mtk_smi_larb_iommu {
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/23] iommu/io-pgtable-arm-v7s: Add cfg as a param in some macros

2020-09-05 Thread Yong Wu
Add "cfg" as a parameter for some macros. This is a preparing patch for
mediatek extend the lvl1 pgtable. No functional change.

Signed-off-by: Yong Wu 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 34 +++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index a3b3e9147b8d..8362fdf76657 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -53,17 +53,17 @@
 #define ARM_V7S_LVL_SHIFT(lvl) (ARM_V7S_ADDR_BITS - (4 + 8 * (lvl)))
 #define ARM_V7S_TABLE_SHIFT10
 
-#define ARM_V7S_PTES_PER_LVL(lvl)  (1 << _ARM_V7S_LVL_BITS(lvl))
-#define ARM_V7S_TABLE_SIZE(lvl)
\
-   (ARM_V7S_PTES_PER_LVL(lvl) * sizeof(arm_v7s_iopte))
+#define ARM_V7S_PTES_PER_LVL(lvl, cfg) (1 << _ARM_V7S_LVL_BITS(lvl))
+#define ARM_V7S_TABLE_SIZE(lvl, cfg)   \
+   (ARM_V7S_PTES_PER_LVL(lvl, cfg) * sizeof(arm_v7s_iopte))
 
 #define ARM_V7S_BLOCK_SIZE(lvl)(1UL << ARM_V7S_LVL_SHIFT(lvl))
 #define ARM_V7S_LVL_MASK(lvl)  ((u32)(~0U << ARM_V7S_LVL_SHIFT(lvl)))
 #define ARM_V7S_TABLE_MASK ((u32)(~0U << ARM_V7S_TABLE_SHIFT))
-#define _ARM_V7S_IDX_MASK(lvl) (ARM_V7S_PTES_PER_LVL(lvl) - 1)
-#define ARM_V7S_LVL_IDX(addr, lvl) ({  \
+#define _ARM_V7S_IDX_MASK(lvl, cfg)(ARM_V7S_PTES_PER_LVL(lvl, cfg) - 1)
+#define ARM_V7S_LVL_IDX(addr, lvl, cfg)({  \
int _l = lvl;   \
-   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l); \
+   ((u32)(addr) >> ARM_V7S_LVL_SHIFT(_l)) & _ARM_V7S_IDX_MASK(_l, cfg); \
 })
 
 /*
@@ -239,7 +239,7 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
struct device *dev = cfg->iommu_dev;
phys_addr_t phys;
dma_addr_t dma;
-   size_t size = ARM_V7S_TABLE_SIZE(lvl);
+   size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg);
void *table = NULL;
 
if (lvl == 1)
@@ -285,7 +285,7 @@ static void __arm_v7s_free_table(void *table, int lvl,
 {
struct io_pgtable_cfg *cfg = >iop.cfg;
struct device *dev = cfg->iommu_dev;
-   size_t size = ARM_V7S_TABLE_SIZE(lvl);
+   size_t size = ARM_V7S_TABLE_SIZE(lvl, cfg);
 
if (!cfg->coherent_walk)
dma_unmap_single(dev, __arm_v7s_dma_addr(table), size,
@@ -429,7 +429,7 @@ static int arm_v7s_init_pte(struct arm_v7s_io_pgtable *data,
arm_v7s_iopte *tblp;
size_t sz = ARM_V7S_BLOCK_SIZE(lvl);
 
-   tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl);
+   tblp = ptep - ARM_V7S_LVL_IDX(iova, lvl, cfg);
if (WARN_ON(__arm_v7s_unmap(data, NULL, iova + i * sz,
sz, lvl, tblp) != sz))
return -EINVAL;
@@ -482,7 +482,7 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
int num_entries = size >> ARM_V7S_LVL_SHIFT(lvl);
 
/* Find our entry at the current level */
-   ptep += ARM_V7S_LVL_IDX(iova, lvl);
+   ptep += ARM_V7S_LVL_IDX(iova, lvl, cfg);
 
/* If we can install a leaf entry at this level, then do so */
if (num_entries)
@@ -554,7 +554,7 @@ static void arm_v7s_free_pgtable(struct io_pgtable *iop)
struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
int i;
 
-   for (i = 0; i < ARM_V7S_PTES_PER_LVL(1); i++) {
+   for (i = 0; i < ARM_V7S_PTES_PER_LVL(1, >iop.cfg); i++) {
arm_v7s_iopte pte = data->pgd[i];
 
if (ARM_V7S_PTE_IS_TABLE(pte, 1))
@@ -606,9 +606,9 @@ static size_t arm_v7s_split_blk_unmap(struct 
arm_v7s_io_pgtable *data,
if (!tablep)
return 0; /* Bytes unmapped */
 
-   num_ptes = ARM_V7S_PTES_PER_LVL(2);
+   num_ptes = ARM_V7S_PTES_PER_LVL(2, cfg);
num_entries = size >> ARM_V7S_LVL_SHIFT(2);
-   unmap_idx = ARM_V7S_LVL_IDX(iova, 2);
+   unmap_idx = ARM_V7S_LVL_IDX(iova, 2, cfg);
 
pte = arm_v7s_prot_to_pte(arm_v7s_pte_to_prot(blk_pte, 1), 2, cfg);
if (num_entries > 1)
@@ -650,7 +650,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
if (WARN_ON(lvl > 2))
return 0;
 
-   idx = ARM_V7S_LVL_IDX(iova, lvl);
+   idx = ARM_V7S_LVL_IDX(iova, lvl, >cfg);
ptep += idx;
do {
pte[i] = READ_ONCE(ptep[i]);
@@ -736,7 +736,7 @@ static phys_addr_t arm_v7s_iova_to_phys(struct 
io_pgtable_ops *ops,
u32 mask;
 
do {
-   ptep += ARM_V7S_LVL_IDX(iova, ++lvl);
+   ptep += ARM_V7S_LVL_IDX(iova, ++lvl, >iop.cfg);
pte = READ_ONCE(*ptep);
ptep = iopte_deref(pte, lvl, data);
} 

[PATCH v2 08/23] iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap

2020-09-05 Thread Yong Wu
As title.
BTW, change the ias/oas checking format in v7s_map.

Signed-off-by: Yong Wu 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index a688f22cbe3b..4c9d8dccfc5a 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -526,8 +526,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned 
long iova,
if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
 
-   if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
-   paddr >= (1ULL << data->iop.cfg.oas)))
+   if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
return -ERANGE;
 
ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
@@ -717,7 +716,7 @@ static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, 
unsigned long iova,
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
 
-   if (WARN_ON(upper_32_bits(iova)))
+   if (WARN_ON(iova >> data->iop.cfg.ias))
return 0;
 
return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 05/23] dt-bindings: memory: mediatek: Add domain definition

2020-09-05 Thread Yong Wu
In the latest SoC, there are several HW IP require a sepecial iova
range, mainly CCU and VPU has this requirement. Take CCU as a example,
CCU require its iova locate in the range(0x4000_ ~ 0x43ff_).

In this patch we add a domain definition for the special port. In the
example of CCU, If we preassign CCU port in domain1, then iommu driver
will prepare a independent iommu domain of the special iova range for it,
then the iova got from dma_alloc_attrs(ccu-dev) will locate in its special
range.

This is a preparing patch for multi-domain support.

Signed-off-by: Yong Wu 
---
 include/dt-bindings/memory/mtk-smi-larb-port.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
b/include/dt-bindings/memory/mtk-smi-larb-port.h
index f4d8e3aed0bc..d00f5de8438b 100644
--- a/include/dt-bindings/memory/mtk-smi-larb-port.h
+++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
@@ -7,9 +7,16 @@
 #define __DTS_MTK_IOMMU_PORT_H_
 
 #define MTK_LARB_NR_MAX32
+#define MTK_M4U_DOM_NR_MAX 8
+
+#define MTK_M4U_DOM_ID(domid, larb, port)  \
+   (((domid) & 0x7) << 16 | (((larb) & 0x1f) << 5) | ((port) & 0x1f))
+
+/* The default dom id is 0. */
+#define MTK_M4U_ID(larb, port) MTK_M4U_DOM_ID(0, larb, port)
 
-#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
 #define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0x1f)
 #define MTK_M4U_TO_PORT(id)((id) & 0x1f)
+#define MTK_M4U_TO_DOM(id) (((id) >> 16) & 0x7)
 
 #endif
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/23] dt-bindings: memory: mediatek: Extend LARB_NR_MAX to 32

2020-09-05 Thread Yong Wu
Extend the max larb number definition as mt8192 has larb_nr over 16.

Signed-off-by: Yong Wu 
Acked-by: Rob Herring 
---
 include/dt-bindings/memory/mtk-smi-larb-port.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
b/include/dt-bindings/memory/mtk-smi-larb-port.h
index 2ec7fe5ce4e9..f4d8e3aed0bc 100644
--- a/include/dt-bindings/memory/mtk-smi-larb-port.h
+++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
@@ -6,10 +6,10 @@
 #ifndef __DTS_MTK_IOMMU_PORT_H_
 #define __DTS_MTK_IOMMU_PORT_H_
 
-#define MTK_LARB_NR_MAX16
+#define MTK_LARB_NR_MAX32
 
 #define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
-#define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0xf)
+#define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0x1f)
 #define MTK_M4U_TO_PORT(id)((id) & 0x1f)
 
 #endif
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 01/23] dt-bindings: iommu: mediatek: Convert IOMMU to DT schema

2020-09-05 Thread Yong Wu
Convert MediaTek IOMMU to DT schema.

Signed-off-by: Yong Wu 
---
 .../bindings/iommu/mediatek,iommu.txt | 103 
 .../bindings/iommu/mediatek,iommu.yaml| 150 ++
 2 files changed, 150 insertions(+), 103 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
deleted file mode 100644
index c1ccd8582eb2..
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
+++ /dev/null
@@ -1,103 +0,0 @@
-* Mediatek IOMMU Architecture Implementation
-
-  Some Mediatek SOCs contain a Multimedia Memory Management Unit (M4U), and
-this M4U have two generations of HW architecture. Generation one uses flat
-pagetable, and only supports 4K size page mapping. Generation two uses the
-ARM Short-Descriptor translation table format for address translation.
-
-  About the M4U Hardware Block Diagram, please check below:
-
-  EMI (External Memory Interface)
-   |
-  m4u (Multimedia Memory Management Unit)
-   |
-  ++
-  ||
-  gals0-rx   gals1-rx(Global Async Local Sync rx)
-  ||
-  ||
-  gals0-tx   gals1-tx(Global Async Local Sync tx)
-  ||  Some SoCs may have GALS.
-  ++
-   |
-   SMI Common(Smart Multimedia Interface Common)
-   |
-   ++---
-   ||
-   | gals-rxThere may be GALS in some larbs.
-   ||
-   ||
-   | gals-tx
-   ||
-   SMI larb0SMI larb1   ... SoCs have several SMI local arbiter(larb).
-   (display) (vdec)
-   ||
-   ||
- +-+-+ +++
- | | | |||
- | | |...  |||  ... There are different ports in each larb.
- | | | |||
-OVL0 RDMA0 WDMA0  MC   PP   VLD
-
-  As above, The Multimedia HW will go through SMI and M4U while it
-access EMI. SMI is a bridge between m4u and the Multimedia HW. It contain
-smi local arbiter and smi common. It will control whether the Multimedia
-HW should go though the m4u for translation or bypass it and talk
-directly with EMI. And also SMI help control the power domain and clocks for
-each local arbiter.
-  Normally we specify a local arbiter(larb) for each multimedia HW
-like display, video decode, and camera. And there are different ports
-in each larb. Take a example, There are many ports like MC, PP, VLD in the
-video decode local arbiter, all these ports are according to the video HW.
-  In some SoCs, there may be a GALS(Global Async Local Sync) module between
-smi-common and m4u, and additional GALS module between smi-larb and
-smi-common. GALS can been seen as a "asynchronous fifo" which could help
-synchronize for the modules in different clock frequency.
-
-Required properties:
-- compatible : must be one of the following string:
-   "mediatek,mt2701-m4u" for mt2701 which uses generation one m4u HW.
-   "mediatek,mt2712-m4u" for mt2712 which uses generation two m4u HW.
-   "mediatek,mt6779-m4u" for mt6779 which uses generation two m4u HW.
-   "mediatek,mt7623-m4u", "mediatek,mt2701-m4u" for mt7623 which uses
-generation one m4u HW.
-   "mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
-   "mediatek,mt8183-m4u" for mt8183 which uses generation two m4u HW.
-- reg : m4u register base and size.
-- interrupts : the interrupt of m4u.
-- clocks : must contain one entry for each clock-names.
-- clock-names : Only 1 optional clock:
-  - "bclk": the block clock of m4u.
-  Here is the list which require this "bclk":
-  - mt2701, mt2712, mt7623 and mt8173.
-  Note that m4u use the EMI clock which always has been enabled before kernel
-  if there is no this "bclk".
-- mediatek,larbs : List of phandle to the local arbiters in the current Socs.
-   Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must sort
-   according to the local arbiter index, like larb0, larb1, larb2...
-- iommu-cells : must be 1. This is the mtk_m4u_id according to the HW.
-   Specifies the mtk_m4u_id as defined in
-   dt-binding/memory/mt2701-larb-port.h for mt2701, mt7623
-   dt-binding/memory/mt2712-larb-port.h for mt2712,
-   dt-binding/memory/mt6779-larb-port.h for mt6779,
-   dt-binding/memory/mt8173-larb-port.h for mt8173, and
-   dt-binding/memory/mt8183-larb-port.h for mt8183.
-
-Example:
-   iommu: iommu@10205000 {
-   compatible = "mediatek,mt8173-m4u";
-   reg = <0 0x10205000 0 

[PATCH v2 03/23] dt-bindings: memory: mediatek: Add a common larb-port header file

2020-09-05 Thread Yong Wu
Put all the macros about smi larb/port togethers, this is a preparing
patch for extending LARB_NR and adding new dom-id support.

Signed-off-by: Yong Wu 
Acked-by: Rob Herring 
---
 include/dt-bindings/memory/mt2712-larb-port.h  |  2 +-
 include/dt-bindings/memory/mt6779-larb-port.h  |  2 +-
 include/dt-bindings/memory/mt8173-larb-port.h  |  2 +-
 include/dt-bindings/memory/mt8183-larb-port.h  |  2 +-
 include/dt-bindings/memory/mtk-smi-larb-port.h | 15 +++
 5 files changed, 19 insertions(+), 4 deletions(-)
 create mode 100644 include/dt-bindings/memory/mtk-smi-larb-port.h

diff --git a/include/dt-bindings/memory/mt2712-larb-port.h 
b/include/dt-bindings/memory/mt2712-larb-port.h
index 6f9aa7349cef..b6b2c6bf4459 100644
--- a/include/dt-bindings/memory/mt2712-larb-port.h
+++ b/include/dt-bindings/memory/mt2712-larb-port.h
@@ -6,7 +6,7 @@
 #ifndef __DTS_IOMMU_PORT_MT2712_H
 #define __DTS_IOMMU_PORT_MT2712_H
 
-#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
+#include 
 
 #define M4U_LARB0_ID   0
 #define M4U_LARB1_ID   1
diff --git a/include/dt-bindings/memory/mt6779-larb-port.h 
b/include/dt-bindings/memory/mt6779-larb-port.h
index 2ad0899fbf2f..60f57f54393e 100644
--- a/include/dt-bindings/memory/mt6779-larb-port.h
+++ b/include/dt-bindings/memory/mt6779-larb-port.h
@@ -7,7 +7,7 @@
 #ifndef _DTS_IOMMU_PORT_MT6779_H_
 #define _DTS_IOMMU_PORT_MT6779_H_
 
-#define MTK_M4U_ID(larb, port)  (((larb) << 5) | (port))
+#include 
 
 #define M4U_LARB0_ID0
 #define M4U_LARB1_ID1
diff --git a/include/dt-bindings/memory/mt8173-larb-port.h 
b/include/dt-bindings/memory/mt8173-larb-port.h
index 9f31ccfeca21..d8c99c946053 100644
--- a/include/dt-bindings/memory/mt8173-larb-port.h
+++ b/include/dt-bindings/memory/mt8173-larb-port.h
@@ -6,7 +6,7 @@
 #ifndef __DTS_IOMMU_PORT_MT8173_H
 #define __DTS_IOMMU_PORT_MT8173_H
 
-#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
+#include 
 
 #define M4U_LARB0_ID   0
 #define M4U_LARB1_ID   1
diff --git a/include/dt-bindings/memory/mt8183-larb-port.h 
b/include/dt-bindings/memory/mt8183-larb-port.h
index 2c579f305162..275c095a6fd6 100644
--- a/include/dt-bindings/memory/mt8183-larb-port.h
+++ b/include/dt-bindings/memory/mt8183-larb-port.h
@@ -6,7 +6,7 @@
 #ifndef __DTS_IOMMU_PORT_MT8183_H
 #define __DTS_IOMMU_PORT_MT8183_H
 
-#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
+#include 
 
 #define M4U_LARB0_ID   0
 #define M4U_LARB1_ID   1
diff --git a/include/dt-bindings/memory/mtk-smi-larb-port.h 
b/include/dt-bindings/memory/mtk-smi-larb-port.h
new file mode 100644
index ..2ec7fe5ce4e9
--- /dev/null
+++ b/include/dt-bindings/memory/mtk-smi-larb-port.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020 MediaTek Inc.
+ * Author: Yong Wu 
+ */
+#ifndef __DTS_MTK_IOMMU_PORT_H_
+#define __DTS_MTK_IOMMU_PORT_H_
+
+#define MTK_LARB_NR_MAX16
+
+#define MTK_M4U_ID(larb, port) (((larb) << 5) | (port))
+#define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0xf)
+#define MTK_M4U_TO_PORT(id)((id) & 0x1f)
+
+#endif
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 00/23] MT8192 IOMMU support

2020-09-05 Thread Yong Wu
This patch mainly adds support for mt8192 IOMMU and SMI.

mt8192 also is MTK IOMMU gen2 which uses ARM Short-Descriptor translation
table format. The M4U-SMI HW diagram is as below:

  EMI
   |
  M4U
   |
  
   SMI Common
  
   |
  +---+--+--+--+---+
  |   |  |  |   .. |   |
  |   |  |  |  |   |
larb0   larb1  larb2  larb4 ..  larb19   larb20
disp0   disp1   mdpvdec   IPE  IPE

All the connections are HW fixed, SW can NOT adjust it.

Comparing with the preview SoC, this patchset mainly adds two new functions:
a) add iova 34 bits support.
b) add multi domains support since several HW has the special iova
region requirement.

this patchset depend on v5.9-rc1.

change note:
v2:
  a) Convert IOMMU/SMI dt-binding to DT schema.
  b) Fix some comment from Pi-Hsun and Nicolas. like use
  generic_iommu_put_resv_regions.
  c) Reword some comment, like add how to use domain-id.
  
v1: 
https://lore.kernel.org/linux-iommu/20200711064846.16007-1-yong...@mediatek.com/

Yong Wu (23):
  dt-bindings: iommu: mediatek: Convert IOMMU to DT schema
  dt-bindings: memory: mediatek: Convert SMI to DT schema
  dt-bindings: memory: mediatek: Add a common larb-port header file
  dt-bindings: memory: mediatek: Extend LARB_NR_MAX to 32
  dt-bindings: memory: mediatek: Add domain definition
  dt-bindings: mediatek: Add binding for mt8192 IOMMU and SMI
  iommu/mediatek: Use the common mtk-smi-larb-port.h
  iommu/io-pgtable-arm-v7s: Use ias to check the valid iova in unmap
  iommu/io-pgtable-arm-v7s: Extend PA34 for MediaTek
  iommu/io-pgtable-arm-v7s: Add cfg as a param in some macros
  iommu/io-pgtable-arm-v7s: Quad lvl1 pgtable for MediaTek
  iommu/mediatek: Move hw_init into attach_device
  iommu/mediatek: Add device link for smi-common and m4u
  iommu/mediatek: Add power-domain operation
  iommu/mediatek: Add iova reserved function
  iommu/mediatek: Add single domain
  iommu/mediatek: Support master use iova over 32bit
  iommu/mediatek: Support up to 34bit iova in tlb flush
  iommu/mediatek: Support report iova 34bit translation fault in ISR
  iommu/mediatek: Add support for multi domain
  iommu/mediatek: Adjust the structure
  iommu/mediatek: Add mt8192 support
  memory: mtk-smi: Add mt8192 support

 .../bindings/iommu/mediatek,iommu.txt | 103 
 .../bindings/iommu/mediatek,iommu.yaml| 155 +++
 .../mediatek,smi-common.txt   |  49 
 .../mediatek,smi-common.yaml  |  97 +++
 .../memory-controllers/mediatek,smi-larb.txt  |  49 
 .../memory-controllers/mediatek,smi-larb.yaml |  86 ++
 drivers/iommu/io-pgtable-arm-v7s.c|  57 ++--
 drivers/iommu/mtk_iommu.c | 247 ++
 drivers/iommu/mtk_iommu.h |  11 +-
 drivers/memory/mtk-smi.c  |  27 ++
 include/dt-bindings/memory/mt2712-larb-port.h |   2 +-
 include/dt-bindings/memory/mt6779-larb-port.h |   2 +-
 include/dt-bindings/memory/mt8173-larb-port.h |   2 +-
 include/dt-bindings/memory/mt8183-larb-port.h |   2 +-
 include/dt-bindings/memory/mt8192-larb-port.h | 239 +
 .../dt-bindings/memory/mtk-smi-larb-port.h|  22 ++
 include/linux/io-pgtable.h|   4 +-
 include/soc/mediatek/smi.h|   3 +-
 18 files changed, 880 insertions(+), 277 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml
 create mode 100644 include/dt-bindings/memory/mt8192-larb-port.h
 create mode 100644 include/dt-bindings/memory/mtk-smi-larb-port.h

-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/23] dt-bindings: memory: mediatek: Convert SMI to DT schema

2020-09-05 Thread Yong Wu
Convert MediaTek SMI to DT schema.

Signed-off-by: Yong Wu 
---
 .../mediatek,smi-common.txt   | 49 --
 .../mediatek,smi-common.yaml  | 96 +++
 .../memory-controllers/mediatek,smi-larb.txt  | 49 --
 .../memory-controllers/mediatek,smi-larb.yaml | 85 
 4 files changed, 181 insertions(+), 98 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
 delete mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 
Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml

diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
deleted file mode 100644
index b64573680b42..
--- 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
+++ /dev/null
@@ -1,49 +0,0 @@
-SMI (Smart Multimedia Interface) Common
-
-The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
-
-Mediatek SMI have two generations of HW architecture, here is the list
-which generation the SoCs use:
-generation 1: mt2701 and mt7623.
-generation 2: mt2712, mt6779, mt8173 and mt8183.
-
-There's slight differences between the two SMI, for generation 2, the
-register which control the iommu port is at each larb's register base. But
-for generation 1, the register is at smi ao base(smi always on register
-base). Besides that, the smi async clock should be prepared and enabled for
-SMI generation 1 to transform the smi clock into emi clock domain, but that is
-not needed for SMI generation 2.
-
-Required properties:
-- compatible : must be one of :
-   "mediatek,mt2701-smi-common"
-   "mediatek,mt2712-smi-common"
-   "mediatek,mt6779-smi-common"
-   "mediatek,mt7623-smi-common", "mediatek,mt2701-smi-common"
-   "mediatek,mt8173-smi-common"
-   "mediatek,mt8183-smi-common"
-- reg : the register and size of the SMI block.
-- power-domains : a phandle to the power domain of this local arbiter.
-- clocks : Must contain an entry for each entry in clock-names.
-- clock-names : must contain 3 entries for generation 1 smi HW and 2 entries
-  for generation 2 smi HW as follows:
-  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
-   the register.
-  - "smi" : It's the clock for transfer data and command.
-   They may be the same if both source clocks are the same.
-  - "async" : asynchronous clock, it help transform the smi clock into the emi
- clock domain, this clock is only needed by generation 1 smi HW.
-  and these 2 option clocks for generation 2 smi HW:
-  - "gals0": the path0 clock of GALS(Global Async Local Sync).
-  - "gals1": the path1 clock of GALS(Global Async Local Sync).
-  Here is the list which has this GALS: mt6779 and mt8183.
-
-Example:
-   smi_common: smi@14022000 {
-   compatible = "mediatek,mt8173-smi-common";
-   reg = <0 0x14022000 0 0x1000>;
-   power-domains = < MT8173_POWER_DOMAIN_MM>;
-   clocks = < CLK_MM_SMI_COMMON>,
-< CLK_MM_SMI_COMMON>;
-   clock-names = "apb", "smi";
-   };
diff --git 
a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
new file mode 100644
index ..781d7b7617d9
--- /dev/null
+++ 
b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml
@@ -0,0 +1,96 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/mediatek,smi-common.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: SMI (Smart Multimedia Interface) Common
+
+maintainers:
+  - Yong Wu 
+
+description: |+
+  The hardware block diagram please check bindings/iommu/mediatek,iommu.yaml
+
+  MediaTek SMI have two generations of HW architecture, here is the list
+  which generation the SoCs use:
+  generation 1: mt2701 and mt7623.
+  generation 2: mt2712, mt6779, mt8173 and mt8183.
+
+  There's slight differences between the two SMI, for generation 2, the
+  register which control the iommu port is at each larb's register base. But
+  for generation 1, the register is at smi ao base(smi always on register
+  base). Besides that, the smi async clock should be prepared and enabled for
+  SMI generation 1 to transform the smi clock into emi clock domain, but that 
is
+  not needed for SMI generation 2.
+
+properties:
+  compatible:
+enum:
+  - mediatek,mt2701-smi-common
+  - mediatek,mt2712-smi-common
+  - mediatek,mt6779-smi-common
+  - mediatek,mt7623-smi-common, 

RE: [PATCH V6 1/3] iommu: Add support to change default domain of an iommu group

2020-09-05 Thread Prakhya, Sai Praneeth
Hi Joerg,

> -Original Message-
> From: Joerg Roedel 
> Sent: Friday, September 4, 2020 2:42 PM
> To: Prakhya, Sai Praneeth 
> Cc: iommu@lists.linux-foundation.org; Christoph Hellwig ; Raj,
> Ashok ; Will Deacon ; Lu Baolu
> ; Mehta, Sohil ; Robin
> Murphy ; Jacob Pan 
> Subject: Re: [PATCH V6 1/3] iommu: Add support to change default domain of
> an iommu group
> 
> On Fri, Sep 04, 2020 at 07:11:07PM +, Prakhya, Sai Praneeth wrote:
> > But couple of questions..
> > 1. Do you want me to post the entire patch series? (i.e. 3 patches) or
> > do you want me to post just this patch i.e. 1st patch only 2. Do you want 
> > me to
> bump the version number? i.e. post it as V7 ?
> > 3. Didn't get what you meant here.. "woth b4" ☹
> 
> Please resend all 3 patches a v7, b4 is just a tool I am using to download the
> patches from the mailing list and add all tags[1].

Thanks for answering the questions. It's clear to me now.. I will post a V7 

Regards,
Sai

> Regards,
> 
>   Joerg
> 
> [1] https://people.kernel.org/monsieuricon/introducing-b4-and-patch-
> attestation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu