[PATCH v5 6/6] Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

2016-07-27 Thread Mitchel Humpherys
This reverts commit d346180e70b9 ("iommu/arm-smmu: Treat all device
transactions as unprivileged") since some platforms actually make use of
privileged transactions.

Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Moved to the end of the series.

 drivers/iommu/arm-smmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4f49fe29f202..46059b06f48d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -178,9 +178,6 @@
 #define S2CR_TYPE_BYPASS   (1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT(2 << S2CR_TYPE_SHIFT)
 
-#define S2CR_PRIVCFG_SHIFT 24
-#define S2CR_PRIVCFG_UNPRIV(2 << S2CR_PRIVCFG_SHIFT)
-
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT0
@@ -1175,7 +1172,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u32 idx, s2cr;
 
idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-   s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
+   s2cr = S2CR_TYPE_TRANS |
   (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
}
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/6] common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute

2016-07-27 Thread Mitchel Humpherys
This patch adds the DMA_ATTR_PRIVILEGED attribute to the DMA-mapping
subsystem.

Some advanced peripherals such as remote processors and GPUs perform
accesses to DMA buffers in both privileged "supervisor" and unprivileged
"user" modes.  This attribute is used to indicate to the DMA-mapping
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).

Cc: linux-...@vger.kernel.org
Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Not worrying about executability.

 Documentation/DMA-attributes.txt | 10 ++
 include/linux/dma-mapping.h  |  6 ++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 2d455a5cf671..7728bda278c9 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,13 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_PRIVILEGED
+--
+
+Some advanced peripherals such as remote processors and GPUs perform
+accesses to DMA buffers in both privileged "supervisor" and unprivileged
+"user" modes.  This attribute is used to indicate to the DMA-mapping
+subsystem that the buffer is fully accessible at the elevated privilege
+level (and ideally inaccessible or at least read-only at the
+lesser-privileged levels).
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 66533e18276c..73f477609262 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -56,6 +56,12 @@
  * that gives better TLB efficiency.
  */
 #define DMA_ATTR_ALLOC_SINGLE_PAGES(1UL << 7)
+/*
+ * DMA_ATTR_PRIVILEGED: used to indicate that the buffer is fully
+ * accessible at an elevated privilege level (and ideally inaccessible or
+ * at least read-only at lesser-privileged levels).
+ */
+#define DMA_ATTR_PRIVILEGED(1UL << 8)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/6] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED

2016-07-27 Thread Mitchel Humpherys
The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that
are only accessible to privileged DMA engines.  Implement it in
dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it.

Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v4..v5

  - Simplified (suggested by Robin Murphy)

v3..v4

  - Reworked against the new dma attrs format

v2..v3

  - Renamed and redocumented dma_direction_to_prot.
  - Dropped the stuff making all privileged mappings read-only.

 arch/arm64/mm/dma-mapping.c |  6 +++---
 drivers/iommu/dma-iommu.c   | 10 --
 include/linux/dma-iommu.h   |  3 ++-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c4284c432ae8..1c6f85c56115 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -556,7 +556,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
 unsigned long attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -710,7 +710,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, 
struct page *page,
   unsigned long attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int prot = dma_direction_to_prot(dir, coherent);
+   int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
if (!iommu_dma_mapping_error(dev, dev_addr) &&
@@ -768,7 +768,7 @@ static int __iommu_map_sg_attrs(struct device *dev, struct 
scatterlist *sgl,
__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
 
return iommu_dma_map_sg(dev, sgl, nelems,
-   dma_direction_to_prot(dir, coherent));
+   dma_info_to_prot(dir, coherent, attrs));
 }
 
 static void __iommu_unmap_sg_attrs(struct device *dev,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 08a1e2f3690f..279764305005 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -129,16 +129,22 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base, u64 size
 EXPORT_SYMBOL(iommu_dma_init_domain);
 
 /**
- * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
+ *page flags.
  * @dir: Direction of DMA transfer
  * @coherent: Is the DMA master cache-coherent?
+ * @attrs: DMA attributes for the mapping
  *
  * Return: corresponding IOMMU API page protection flags
  */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+unsigned long attrs)
 {
int prot = coherent ? IOMMU_CACHE : 0;
 
+   if (attrs & DMA_ATTR_PRIVILEGED)
+   prot |= IOMMU_PRIV;
+
switch (dir) {
case DMA_BIDIRECTIONAL:
return prot | IOMMU_READ | IOMMU_WRITE;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 81c5c8d167ad..b367613d49ba 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -32,7 +32,8 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size);
 
 /* General helpers for DMA-API <-> IOMMU-API interaction */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+unsigned long attrs);
 
 /*
  * These implement the bulk of the relevant DMA mapping callbacks, but require
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/6] dmaengine: pl330: Make sure microcode is privileged

2016-07-27 Thread Mitchel Humpherys
The PL330 performs privileged instruction fetches.  This can result in
SMMU permission faults on SMMUs that implement the ARMv8 VMSA, which
specifies that mappings that are writeable at one execution level shall
not be executable at any higher-privileged level.  Fix this by using the
DMA_ATTR_PRIVILEGED attribute, which will ensure that the microcode
IOMMU mapping is only accessible to the privileged level.

Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Vinod Koul <vinod.k...@intel.com>
Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v3..v4

  - Reworked against the new dma attrs format.

 drivers/dma/pl330.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 4fc3ffbd5ca0..8cd624fc3760 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1854,14 +1854,16 @@ static int dmac_alloc_resources(struct pl330_dmac 
*pl330)
 {
int chans = pl330->pcfg.num_chan;
int ret;
+   unsigned long dma_attrs = DMA_ATTR_PRIVILEGED;
 
/*
 * Alloc MicroCode buffer for 'chans' Channel threads.
 * A channel's buffer offset is (Channel_Id * MCODE_BUFF_PERCHAN)
 */
-   pl330->mcode_cpu = dma_alloc_coherent(pl330->ddma.dev,
+   pl330->mcode_cpu = dma_alloc_attrs(pl330->ddma.dev,
chans * pl330->mcbufsz,
-   >mcode_bus, GFP_KERNEL);
+   >mcode_bus, GFP_KERNEL,
+   dma_attrs);
if (!pl330->mcode_cpu) {
dev_err(pl330->ddma.dev, "%s:%d Can't allocate memory!\n",
__func__, __LINE__);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/6] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

2016-07-27 Thread Mitchel Humpherys
From: Jeremy Gebben 

Allow the creation of privileged mode mappings, for stage 1 only.

Reviewed-by: Robin Murphy 
Tested-by: Robin Murphy 
Signed-off-by: Jeremy Gebben 
---

Notes:
v2..v3

  - Use existing bit definitions.

 drivers/iommu/io-pgtable-arm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5c90e1366ce..69ba83a135f1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -350,11 +350,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
-   pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+   pte = ARM_LPAE_PTE_nG;
 
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
 
+   if (!(prot & IOMMU_PRIV))
+   pte |= ARM_LPAE_PTE_AP_UNPRIV;
+
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/6] Add support for privileged mappings

2016-07-27 Thread Mitchel Humpherys
The following patch to the ARM SMMU driver:

commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
Author: Robin Murphy <robin.mur...@arm.com>
Date:   Tue Jan 26 18:06:34 2016 +

iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
  ARM VMSAv8 behavior of unprivileged-writeable =>
  privileged-execute-never.

This series rectifies (1) by introducing an IOMMU API for privileged
mappings and implements it in io-pgtable-arm.

This series rectifies (2) by introducing a new dma attribute
(DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
mappings which are inaccessible to lesser-privileged execution levels, and
implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
is converted over to the new attribute.

Jordan and Jeremy can provide more info on the use case if needed, but the
high level is that it's a security feature to prevent attacks such as [1].

Joerg, the v3 series was previously acked by Will [2].  He also recommended
that we take all of this through your tree since it's touching multiple
subsystems [3].  Can you please take a look?  Thanks!

It's also worth noting that I will no longer be at QuIC as of this coming
Monday, but the fine folks with codeaurora email addresses Cc'd here can
provide help getting these through once I'm gone.

[1] https://github.com/robclark/kilroy
[2] http://article.gmane.org/gmane.linux.kernel.iommu/14617
[3] http://article.gmane.org/gmane.linux.kernel/2272531

Changelog:

  v4..v5

- Simplified patch 4/6 (suggested by Robin Murphy).

  v3..v4

- Rebased and reworked on linux next due to the dma attrs rework going
  on over there.  Patches changed: 3/6, 4/6, and 5/6.

  v2..v3

- Incorporated feedback from Robin:
  * Various comments and re-wordings.
  * Use existing bit definitions for IOMMU_PRIV implementation
in io-pgtable-arm.
  * Renamed and redocumented dma_direction_to_prot.
  * Don't worry about executability in new DMA attr.

  v1..v2

- Added a new DMA attribute to make executable privileged mappings
  work, and use that in the pl330 driver (suggested by Will).


Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (5):
  iommu: add IOMMU_PRIV attribute
  common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute
  arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED
  dmaengine: pl330: Make sure microcode is privileged
  Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

 Documentation/DMA-attributes.txt | 10 ++
 arch/arm64/mm/dma-mapping.c  |  6 +++---
 drivers/dma/pl330.c  |  6 --
 drivers/iommu/arm-smmu.c |  5 +
 drivers/iommu/dma-iommu.c| 10 --
 drivers/iommu/io-pgtable-arm.c   |  5 -
 include/linux/dma-iommu.h|  3 ++-
 include/linux/dma-mapping.h  |  6 ++
 include/linux/iommu.h|  1 +
 9 files changed, 39 insertions(+), 13 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/6] iommu: add IOMMU_PRIV attribute

2016-07-27 Thread Mitchel Humpherys
Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.

Reviewed-by: Robin Murphy <robin.mur...@arm.com>
Tested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Added comment

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a35fb8b42e1a..35804abbd6cf 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC   (1 << 3)
 #define IOMMU_MMIO (1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_PRIV (1 << 5) /* privileged */
 
 struct iommu_ops;
 struct iommu_group;
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/6] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED

2016-07-27 Thread Mitchel Humpherys
On Tue, Jul 26 2016 at 04:05:17 PM, Robin Murphy  wrote:
> + if (attrs & DMA_ATTR_PRIVILEGED)
> + prot |= IOMMU_PRIV;
> +
>
> then drop the rest of the changes to the switch statement below. It's
> taken me an embarrassingly long time to work out why things were blowing
> up in __iommu_sync_single_for_device() all with a VA of phys_to_virt(0) ;)

Ah yes, that is much nicer!  Nice catch...

> With that change, for the whole series:
>
> Reviewed-by: Robin Murphy 
> Tested-by: Robin Murphy 
>
> I guess at this point it may be worth waiting to repost based on -rc1.
> Be sure to CC patch 5 to Vinod as the current dmaengine maintainer, as
> it's his ack we'll need on that.

Thanks for the review and test.  I'll go ahead and send a v5 series
since I'm actually leaving QuIC this Friday and so I won't be around to
send it next week...


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/6] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED

2016-07-25 Thread Mitchel Humpherys
The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that
are only accessible to privileged DMA engines.  Implement it in
dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v3..v4

  - Reworked against the new dma attrs format

v2..v3

  - Renamed and redocumented dma_direction_to_prot.
  - Dropped the stuff making all privileged mappings read-only.

 arch/arm64/mm/dma-mapping.c |  6 +++---
 drivers/iommu/dma-iommu.c   | 16 +++-
 include/linux/dma-iommu.h   |  3 ++-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c4284c432ae8..1c6f85c56115 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -556,7 +556,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
 unsigned long attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -710,7 +710,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, 
struct page *page,
   unsigned long attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int prot = dma_direction_to_prot(dir, coherent);
+   int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
if (!iommu_dma_mapping_error(dev, dev_addr) &&
@@ -768,7 +768,7 @@ static int __iommu_map_sg_attrs(struct device *dev, struct 
scatterlist *sgl,
__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
 
return iommu_dma_map_sg(dev, sgl, nelems,
-   dma_direction_to_prot(dir, coherent));
+   dma_info_to_prot(dir, coherent, attrs));
 }
 
 static void __iommu_unmap_sg_attrs(struct device *dev,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 08a1e2f3690f..5e1e495b35f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -129,26 +129,32 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base, u64 size
 EXPORT_SYMBOL(iommu_dma_init_domain);
 
 /**
- * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
+ *page flags.
  * @dir: Direction of DMA transfer
  * @coherent: Is the DMA master cache-coherent?
+ * @attrs: DMA attributes for the mapping
  *
  * Return: corresponding IOMMU API page protection flags
  */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+unsigned long attrs)
 {
int prot = coherent ? IOMMU_CACHE : 0;
 
switch (dir) {
case DMA_BIDIRECTIONAL:
-   return prot | IOMMU_READ | IOMMU_WRITE;
+   prot |= IOMMU_READ | IOMMU_WRITE;
case DMA_TO_DEVICE:
-   return prot | IOMMU_READ;
+   prot |= IOMMU_READ;
case DMA_FROM_DEVICE:
-   return prot | IOMMU_WRITE;
+   prot |= IOMMU_WRITE;
default:
return 0;
}
+   if (attrs & DMA_ATTR_PRIVILEGED)
+   prot |= IOMMU_PRIV;
+   return prot;
 }
 
 static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 81c5c8d167ad..b367613d49ba 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -32,7 +32,8 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size);
 
 /* General helpers for DMA-API <-> IOMMU-API interaction */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+unsigned long attrs);
 
 /*
  * These implement the bulk of the relevant DMA mapping callbacks, but require
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/6] Add support for privileged mappings

2016-07-25 Thread Mitchel Humpherys
The following patch to the ARM SMMU driver:

commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
Author: Robin Murphy <robin.mur...@arm.com>
Date:   Tue Jan 26 18:06:34 2016 +

iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
  ARM VMSAv8 behavior of unprivileged-writeable =>
  privileged-execute-never.

This series rectifies (1) by introducing an IOMMU API for privileged
mappings and implements it in io-pgtable-arm.

This series rectifies (2) by introducing a new dma attribute
(DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
mappings which are inaccessible to lesser-privileged execution levels, and
implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
is converted over to the new attribute.

Jordan and Jeremy can provide more info on the use case if needed, but the
high level is that it's a security feature to prevent attacks such as [1].

Joerg, the v3 series was previously acked by Will [2].  He also recommended
that we take all of this through your tree since it's touching multiple
subsystems [3].  Can you please take a look?  Thanks!

[1] https://github.com/robclark/kilroy
[2] http://article.gmane.org/gmane.linux.kernel.iommu/14617
[3] http://article.gmane.org/gmane.linux.kernel/2272531

Changelog:

  v3..v4

- Rebased and reworked on linux next due to the dma attrs rework going
  on over there.  Patches changed: 3/6, 4/6, and 5/6.

  v2..v3

- Incorporated feedback from Robin:
  * Various comments and re-wordings.
  * Use existing bit definitions for IOMMU_PRIV implementation
in io-pgtable-arm.
  * Renamed and redocumented dma_direction_to_prot.
  * Don't worry about executability in new DMA attr.

  v1..v2

- Added a new DMA attribute to make executable privileged mappings
  work, and use that in the pl330 driver (suggested by Will).


Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (5):
  iommu: add IOMMU_PRIV attribute
  common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute
  arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED
  dmaengine: pl330: Make sure microcode is privileged
  Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

 Documentation/DMA-attributes.txt | 10 ++
 arch/arm64/mm/dma-mapping.c  |  6 +++---
 drivers/dma/pl330.c  |  6 --
 drivers/iommu/arm-smmu.c |  5 +
 drivers/iommu/dma-iommu.c| 16 +++-
 drivers/iommu/io-pgtable-arm.c   |  5 -
 include/linux/dma-iommu.h|  3 ++-
 include/linux/dma-mapping.h  |  6 ++
 include/linux/iommu.h|  1 +
 9 files changed, 42 insertions(+), 16 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 5/6] dmaengine: pl330: Make sure microcode is privileged

2016-07-25 Thread Mitchel Humpherys
The PL330 performs privileged instruction fetches.  This can result in
SMMU permission faults on SMMUs that implement the ARMv8 VMSA, which
specifies that mappings that are writeable at one execution level shall
not be executable at any higher-privileged level.  Fix this by using the
DMA_ATTR_PRIVILEGED attribute, which will ensure that the microcode
IOMMU mapping is only accessible to the privileged level.

Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Jassi Brar <jassi.b...@samsung.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v3..v4

  - Reworked against the new dma attrs format.

 drivers/dma/pl330.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 4fc3ffbd5ca0..8cd624fc3760 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1854,14 +1854,16 @@ static int dmac_alloc_resources(struct pl330_dmac 
*pl330)
 {
int chans = pl330->pcfg.num_chan;
int ret;
+   unsigned long dma_attrs = DMA_ATTR_PRIVILEGED;
 
/*
 * Alloc MicroCode buffer for 'chans' Channel threads.
 * A channel's buffer offset is (Channel_Id * MCODE_BUFF_PERCHAN)
 */
-   pl330->mcode_cpu = dma_alloc_coherent(pl330->ddma.dev,
+   pl330->mcode_cpu = dma_alloc_attrs(pl330->ddma.dev,
chans * pl330->mcbufsz,
-   >mcode_bus, GFP_KERNEL);
+   >mcode_bus, GFP_KERNEL,
+   dma_attrs);
if (!pl330->mcode_cpu) {
dev_err(pl330->ddma.dev, "%s:%d Can't allocate memory!\n",
__func__, __LINE__);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/6] Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

2016-07-25 Thread Mitchel Humpherys
This reverts commit d346180e70b9 ("iommu/arm-smmu: Treat all device
transactions as unprivileged") since some platforms actually make use of
privileged transactions.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Moved to the end of the series.

 drivers/iommu/arm-smmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 4f49fe29f202..46059b06f48d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -178,9 +178,6 @@
 #define S2CR_TYPE_BYPASS   (1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT(2 << S2CR_TYPE_SHIFT)
 
-#define S2CR_PRIVCFG_SHIFT 24
-#define S2CR_PRIVCFG_UNPRIV(2 << S2CR_PRIVCFG_SHIFT)
-
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT0
@@ -1175,7 +1172,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u32 idx, s2cr;
 
idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-   s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
+   s2cr = S2CR_TYPE_TRANS |
   (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
}
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/6] common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute

2016-07-25 Thread Mitchel Humpherys
This patch adds the DMA_ATTR_PRIVILEGED attribute to the DMA-mapping
subsystem.

Some advanced peripherals such as remote processors and GPUs perform
accesses to DMA buffers in both privileged "supervisor" and unprivileged
"user" modes.  This attribute is used to indicate to the DMA-mapping
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).

Cc: linux-...@vger.kernel.org
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v3..v4

  - Reworked against the new dma attrs format

v2..v3

  - Not worrying about executability.

 Documentation/DMA-attributes.txt | 10 ++
 include/linux/dma-mapping.h  |  6 ++
 2 files changed, 16 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index 2d455a5cf671..7728bda278c9 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,13 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_PRIVILEGED
+--
+
+Some advanced peripherals such as remote processors and GPUs perform
+accesses to DMA buffers in both privileged "supervisor" and unprivileged
+"user" modes.  This attribute is used to indicate to the DMA-mapping
+subsystem that the buffer is fully accessible at the elevated privilege
+level (and ideally inaccessible or at least read-only at the
+lesser-privileged levels).
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 66533e18276c..73f477609262 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -56,6 +56,12 @@
  * that gives better TLB efficiency.
  */
 #define DMA_ATTR_ALLOC_SINGLE_PAGES(1UL << 7)
+/*
+ * DMA_ATTR_PRIVILEGED: used to indicate that the buffer is fully
+ * accessible at an elevated privilege level (and ideally inaccessible or
+ * at least read-only at lesser-privileged levels).
+ */
+#define DMA_ATTR_PRIVILEGED(1UL << 8)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/6] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

2016-07-25 Thread Mitchel Humpherys
From: Jeremy Gebben 

Allow the creation of privileged mode mappings, for stage 1 only.

Signed-off-by: Jeremy Gebben 
---

Notes:
v2..v3

  - Use existing bit definitions.

 drivers/iommu/io-pgtable-arm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5c90e1366ce..69ba83a135f1 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -350,11 +350,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
-   pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+   pte = ARM_LPAE_PTE_nG;
 
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
 
+   if (!(prot & IOMMU_PRIV))
+   pte |= ARM_LPAE_PTE_AP_UNPRIV;
+
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/6] iommu: add IOMMU_PRIV attribute

2016-07-25 Thread Mitchel Humpherys
Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Added comment

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a35fb8b42e1a..35804abbd6cf 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC   (1 << 3)
 #define IOMMU_MMIO (1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_PRIV (1 << 5) /* privileged */
 
 struct iommu_ops;
 struct iommu_group;
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add support for privileged mappings

2016-07-25 Thread Mitchel Humpherys
On Mon, Jul 25 2016 at 10:50:13 AM, Will Deacon <will.dea...@arm.com> wrote:
> On Fri, Jul 22, 2016 at 01:39:45PM -0700, Mitchel Humpherys wrote:
>> On Fri, Jul 22 2016 at 05:51:07 PM, Will Deacon <will.dea...@arm.com> wrote:
>> > On Tue, Jul 19, 2016 at 01:36:49PM -0700, Mitchel Humpherys wrote:
>> >> The following patch to the ARM SMMU driver:
>> >> 
>> >> commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
>> >> Author: Robin Murphy <robin.mur...@arm.com>
>> >> Date:   Tue Jan 26 18:06:34 2016 +
>> >> 
>> >> iommu/arm-smmu: Treat all device transactions as unprivileged
>> >> 
>> >> started forcing all SMMU transactions to come through as "unprivileged".
>> >> The rationale given was that:
>> >> 
>> >>   (1) There is no way in the IOMMU API to even request privileged 
>> >> mappings.
>> >> 
>> >>   (2) It's difficult to implement a DMA mapper that correctly models the
>> >>   ARM VMSAv8 behavior of unprivileged-writeable =>
>> >>   privileged-execute-never.
>> >> 
>> >> This series rectifies (1) by introducing an IOMMU API for privileged
>> >> mappings and implements it in io-pgtable-arm.
>> >> 
>> >> This series rectifies (2) by introducing a new dma attribute
>> >> (DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
>> >> mappings which are inaccessible to lesser-privileged execution levels, and
>> >> implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
>> >> is converted over to the new attribute.
>> >> 
>> >> Jordan and Jeremy can provide more info on the use case if needed, but the
>> >> high level is that it's a security feature to prevent attacks such as [1].
>> >
>> > This all looks good to me:
>> >
>> > Acked-by: Will Deacon <will.dea...@arm.com>
>> >
>> > It looks pretty fiddly to merge, however. How are you planning to get
>> > this upstream?
>> 
>> Fiddly in what way?  Do you mean in relation to "dma-mapping: Use
>> unsigned long for dma_attrs" [1]?  I admit I wasn't aware of that
>> activity until Robin mentioned it.  It looks like it's merged on
>> next/master, shall I rebase/rework on that and resend?
>
> Fiddly in that it touches multiple subsystems. I guess routing it via
> the iommu tree (Joerg) might be the best bet.

Sounds good.  I'm going to rebase on linux-next as well anyways to get
the new dma attrs format and resend.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add support for privileged mappings

2016-07-22 Thread Mitchel Humpherys
On Fri, Jul 22 2016 at 05:51:07 PM, Will Deacon <will.dea...@arm.com> wrote:
> On Tue, Jul 19, 2016 at 01:36:49PM -0700, Mitchel Humpherys wrote:
>> The following patch to the ARM SMMU driver:
>> 
>> commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
>> Author: Robin Murphy <robin.mur...@arm.com>
>> Date:   Tue Jan 26 18:06:34 2016 +
>> 
>> iommu/arm-smmu: Treat all device transactions as unprivileged
>> 
>> started forcing all SMMU transactions to come through as "unprivileged".
>> The rationale given was that:
>> 
>>   (1) There is no way in the IOMMU API to even request privileged mappings.
>> 
>>   (2) It's difficult to implement a DMA mapper that correctly models the
>>   ARM VMSAv8 behavior of unprivileged-writeable =>
>>   privileged-execute-never.
>> 
>> This series rectifies (1) by introducing an IOMMU API for privileged
>> mappings and implements it in io-pgtable-arm.
>> 
>> This series rectifies (2) by introducing a new dma attribute
>> (DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
>> mappings which are inaccessible to lesser-privileged execution levels, and
>> implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
>> is converted over to the new attribute.
>> 
>> Jordan and Jeremy can provide more info on the use case if needed, but the
>> high level is that it's a security feature to prevent attacks such as [1].
>
> This all looks good to me:
>
> Acked-by: Will Deacon <will.dea...@arm.com>
>
> It looks pretty fiddly to merge, however. How are you planning to get
> this upstream?

Fiddly in what way?  Do you mean in relation to "dma-mapping: Use
unsigned long for dma_attrs" [1]?  I admit I wasn't aware of that
activity until Robin mentioned it.  It looks like it's merged on
next/master, shall I rebase/rework on that and resend?

[1] https://lkml.org/lkml/2016/7/13/198


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/6] dmaengine: pl330: Make sure microcode is privileged

2016-07-19 Thread Mitchel Humpherys
The PL330 performs privileged instruction fetches.  This can result in
SMMU permission faults on SMMUs that implement the ARMv8 VMSA, which
specifies that mappings that are writeable at one execution level shall
not be executable at any higher-privileged level.  Fix this by using the
DMA_ATTR_PRIVILEGED attribute, which will ensure that the microcode
IOMMU mapping is only accessible to the privileged level.

Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Jassi Brar <jassi.b...@samsung.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/dma/pl330.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 372b4359da97..7297cd1d03c8 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1854,14 +1854,17 @@ static int dmac_alloc_resources(struct pl330_dmac 
*pl330)
 {
int chans = pl330->pcfg.num_chan;
int ret;
+   DEFINE_DMA_ATTRS(attrs);
 
+   dma_set_attr(DMA_ATTR_PRIVILEGED, );
/*
 * Alloc MicroCode buffer for 'chans' Channel threads.
 * A channel's buffer offset is (Channel_Id * MCODE_BUFF_PERCHAN)
 */
-   pl330->mcode_cpu = dma_alloc_coherent(pl330->ddma.dev,
+   pl330->mcode_cpu = dma_alloc_attrs(pl330->ddma.dev,
chans * pl330->mcbufsz,
-   >mcode_bus, GFP_KERNEL);
+   >mcode_bus, GFP_KERNEL,
+   );
if (!pl330->mcode_cpu) {
dev_err(pl330->ddma.dev, "%s:%d Can't allocate memory!\n",
__func__, __LINE__);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/6] Add support for privileged mappings

2016-07-19 Thread Mitchel Humpherys
The following patch to the ARM SMMU driver:

commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
Author: Robin Murphy <robin.mur...@arm.com>
Date:   Tue Jan 26 18:06:34 2016 +

iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
  ARM VMSAv8 behavior of unprivileged-writeable =>
  privileged-execute-never.

This series rectifies (1) by introducing an IOMMU API for privileged
mappings and implements it in io-pgtable-arm.

This series rectifies (2) by introducing a new dma attribute
(DMA_ATTR_PRIVILEGED) for users of the DMA API that need privileged
mappings which are inaccessible to lesser-privileged execution levels, and
implements it in the arm64 IOMMU DMA mapper.  The one known user (pl330.c)
is converted over to the new attribute.

Jordan and Jeremy can provide more info on the use case if needed, but the
high level is that it's a security feature to prevent attacks such as [1].

[1] https://github.com/robclark/kilroy

Changelog:

  v2..v3

- Incorporated feedback from Robin:
  * Various comments and re-wordings.
  * Use existing bit definitions for IOMMU_PRIV implementation
in io-pgtable-arm.
  * Renamed and redocumented dma_direction_to_prot.
  * Don't worry about executability in new DMA attr.

  v1..v2

- Added a new DMA attribute to make executable privileged mappings
  work, and use that in the pl330 driver (suggested by Will).


Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (5):
  iommu: add IOMMU_PRIV attribute
  common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute
  arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED
  dmaengine: pl330: Make sure microcode is privileged
  Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

 Documentation/DMA-attributes.txt | 10 ++
 arch/arm64/mm/dma-mapping.c  |  6 +++---
 drivers/dma/pl330.c  |  7 +--
 drivers/iommu/arm-smmu.c |  5 +
 drivers/iommu/dma-iommu.c| 16 +++-
 drivers/iommu/io-pgtable-arm.c   |  5 -
 include/linux/dma-attrs.h|  1 +
 include/linux/dma-iommu.h|  3 ++-
 include/linux/iommu.h|  1 +
 9 files changed, 38 insertions(+), 16 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/6] common: DMA-mapping: add DMA_ATTR_PRIVILEGED attribute

2016-07-19 Thread Mitchel Humpherys
This patch adds the DMA_ATTR_PRIVILEGED attribute to the DMA-mapping
subsystem.

Some advanced peripherals such as remote processors and GPUs perform
accesses to DMA buffers in both privileged "supervisor" and unprivileged
"user" modes.  This attribute is used to indicate to the DMA-mapping
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).

Cc: linux-...@vger.kernel.org
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Not worrying about executability.

 Documentation/DMA-attributes.txt | 10 ++
 include/linux/dma-attrs.h|  1 +
 2 files changed, 11 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf873b3..d985effd0053 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,13 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_PRIVILEGED
+--
+
+Some advanced peripherals such as remote processors and GPUs perform
+accesses to DMA buffers in both privileged "supervisor" and unprivileged
+"user" modes.  This attribute is used to indicate to the DMA-mapping
+subsystem that the buffer is fully accessible at the elevated privilege
+level (and ideally inaccessible or at least read-only at the
+lesser-privileged levels).
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 5246239a4953..3bc2208e1765 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_PRIVILEGED,
DMA_ATTR_MAX,
 };
 
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/6] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

2016-07-19 Thread Mitchel Humpherys
From: Jeremy Gebben 

Allow the creation of privileged mode mappings, for stage 1 only.

Signed-off-by: Jeremy Gebben 
---

Notes:
v2..v3

  - Use existing bit definitions.

 drivers/iommu/io-pgtable-arm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a1ed1b73fed4..2a7d28ab8241 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -350,11 +350,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
-   pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+   pte = ARM_LPAE_PTE_nG;
 
if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
pte |= ARM_LPAE_PTE_AP_RDONLY;
 
+   if (!(prot & IOMMU_PRIV))
+   pte |= ARM_LPAE_PTE_AP_UNPRIV;
+
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/6] iommu: add IOMMU_PRIV attribute

2016-07-19 Thread Mitchel Humpherys
Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Added comment

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 664683aedcce..4ae46254b915 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC   (1 << 3)
 #define IOMMU_MMIO (1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_PRIV (1 << 5) /* privileged */
 
 struct iommu_ops;
 struct iommu_group;
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/6] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED

2016-07-19 Thread Mitchel Humpherys
The newly added DMA_ATTR_PRIVILEGED is useful for creating mappings that
are only accessible to privileged DMA engines.  Implement it in
dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Renamed and redocumented dma_direction_to_prot.
  - Dropped the stuff making all privileged mappings read-only.

 arch/arm64/mm/dma-mapping.c |  6 +++---
 drivers/iommu/dma-iommu.c   | 16 +++-
 include/linux/dma-iommu.h   |  3 ++-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c566ec83719f..5da8946ead9e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -543,7 +543,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
 struct dma_attrs *attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -697,7 +697,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, 
struct page *page,
   struct dma_attrs *attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int prot = dma_direction_to_prot(dir, coherent);
+   int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
if (!iommu_dma_mapping_error(dev, dev_addr) &&
@@ -755,7 +755,7 @@ static int __iommu_map_sg_attrs(struct device *dev, struct 
scatterlist *sgl,
__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
 
return iommu_dma_map_sg(dev, sgl, nelems,
-   dma_direction_to_prot(dir, coherent));
+   dma_info_to_prot(dir, coherent, attrs));
 }
 
 static void __iommu_unmap_sg_attrs(struct device *dev,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea5a9ebf0f78..cc2c9cccde1f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -129,26 +129,32 @@ int iommu_dma_init_domain(struct iommu_domain *domain, 
dma_addr_t base, u64 size
 EXPORT_SYMBOL(iommu_dma_init_domain);
 
 /**
- * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * dma_info_to_prot - Translate DMA API directions and attributes to IOMMU API
+ *page flags.
  * @dir: Direction of DMA transfer
  * @coherent: Is the DMA master cache-coherent?
+ * @attrs: DMA attributes for the mapping
  *
  * Return: corresponding IOMMU API page protection flags
  */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+struct dma_attrs *attrs)
 {
int prot = coherent ? IOMMU_CACHE : 0;
 
switch (dir) {
case DMA_BIDIRECTIONAL:
-   return prot | IOMMU_READ | IOMMU_WRITE;
+   prot |= IOMMU_READ | IOMMU_WRITE;
case DMA_TO_DEVICE:
-   return prot | IOMMU_READ;
+   prot |= IOMMU_READ;
case DMA_FROM_DEVICE:
-   return prot | IOMMU_WRITE;
+   prot |= IOMMU_WRITE;
default:
return 0;
}
+   if (dma_get_attr(DMA_ATTR_PRIVILEGED, attrs))
+   prot |= IOMMU_PRIV;
+   return prot;
 }
 
 static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 8443bbb5c071..b0dbcff2d73e 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -32,7 +32,8 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size);
 
 /* General helpers for DMA-API <-> IOMMU-API interaction */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
+struct dma_attrs *attrs);
 
 /*
  * These implement the bulk of the relevant DMA mapping callbacks, but require
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/6] Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

2016-07-19 Thread Mitchel Humpherys
This reverts commit d346180e70b9 ("iommu/arm-smmu: Treat all device
transactions as unprivileged") since some platforms actually make use of
privileged transactions.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---

Notes:
v2..v3

  - Moved to the end of the series.

 drivers/iommu/arm-smmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9345a3fcb706..d0627ef26b05 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -178,9 +178,6 @@
 #define S2CR_TYPE_BYPASS   (1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT(2 << S2CR_TYPE_SHIFT)
 
-#define S2CR_PRIVCFG_SHIFT 24
-#define S2CR_PRIVCFG_UNPRIV(2 << S2CR_PRIVCFG_SHIFT)
-
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT0
@@ -1175,7 +1172,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u32 idx, s2cr;
 
idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-   s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
+   s2cr = S2CR_TYPE_TRANS |
   (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
}
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/6] dmaengine: pl330: Make sure microcode is privileged-executable

2016-07-08 Thread Mitchel Humpherys
The PL330 can perform privileged instruction fetches.  This can result
in SMMU permission faults on SMMUs that implement the ARMv8 VMSA, which
specifies that mappings that are writeable at one execution level shall
not be executable at any higher-privileged level.  Fix this by using the
DMA_ATTR_PRIVILEGED_EXECUTABLE attribute, which will ensure that the
microcode IOMMU mapping is not writeable.

Cc: Dan Williams <dan.j.willi...@intel.com>
Cc: Jassi Brar <jassi.b...@samsung.com>
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/dma/pl330.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 372b4359da97..25bc49d47c45 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1854,14 +1854,17 @@ static int dmac_alloc_resources(struct pl330_dmac 
*pl330)
 {
int chans = pl330->pcfg.num_chan;
int ret;
+   DEFINE_DMA_ATTRS(attrs);
 
+   dma_set_attr(DMA_ATTR_PRIVILEGED_EXECUTABLE, );
/*
 * Alloc MicroCode buffer for 'chans' Channel threads.
 * A channel's buffer offset is (Channel_Id * MCODE_BUFF_PERCHAN)
 */
-   pl330->mcode_cpu = dma_alloc_coherent(pl330->ddma.dev,
+   pl330->mcode_cpu = dma_alloc_attrs(pl330->ddma.dev,
chans * pl330->mcbufsz,
-   >mcode_bus, GFP_KERNEL);
+   >mcode_bus, GFP_KERNEL,
+   );
if (!pl330->mcode_cpu) {
dev_err(pl330->ddma.dev, "%s:%d Can't allocate memory!\n",
__func__, __LINE__);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/6] common: DMA-mapping: add DMA_ATTR_PRIVILEGED_EXECUTABLE attribute

2016-07-08 Thread Mitchel Humpherys
This patch adds the DMA_ATTR_PRIVILEGED_EXECUTABLE attribute to the
DMA-mapping subsystem.

Some architectures require that writable mappings also be non-executable at
lesser-privileged levels of execution.  This attribute is used to indicate
to the DMA-mapping subsystem that it should do whatever is necessary to
ensure that the buffer is executable at an elevated privilege level (by
making it read-only at the lesser-privileged levels, for example).

Cc: linux-...@vger.kernel.org
Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 Documentation/DMA-attributes.txt | 9 +
 include/linux/dma-attrs.h| 1 +
 2 files changed, 10 insertions(+)

diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt
index e8cf9cf873b3..6a22d4307008 100644
--- a/Documentation/DMA-attributes.txt
+++ b/Documentation/DMA-attributes.txt
@@ -126,3 +126,12 @@ means that we won't try quite as hard to get them.
 
 NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
 though ARM64 patches will likely be posted soon.
+
+DMA_ATTR_PRIVILEGED_EXECUTABLE
+--
+
+Some architectures require that writable mappings also be non-executable at
+lesser-privileged levels of execution.  This attribute is used to indicate
+to the DMA-mapping subsystem that it should do whatever is necessary to
+ensure that the buffer is executable at an elevated privilege level (by
+making it read-only at the lesser-privileged levels, for example).
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index 5246239a4953..8cf4dff6185b 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -19,6 +19,7 @@ enum dma_attr {
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
DMA_ATTR_ALLOC_SINGLE_PAGES,
+   DMA_ATTR_PRIVILEGED_EXECUTABLE,
DMA_ATTR_MAX,
 };
 
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/6] Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

2016-07-08 Thread Mitchel Humpherys
This reverts commit d346180e70b9 ("iommu/arm-smmu: Treat all device
transactions as unprivileged") since some platforms actually make use of
privileged transactions.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9345a3fcb706..d0627ef26b05 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -178,9 +178,6 @@
 #define S2CR_TYPE_BYPASS   (1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT(2 << S2CR_TYPE_SHIFT)
 
-#define S2CR_PRIVCFG_SHIFT 24
-#define S2CR_PRIVCFG_UNPRIV(2 << S2CR_PRIVCFG_SHIFT)
-
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT0
@@ -1175,7 +1172,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u32 idx, s2cr;
 
idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-   s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
+   s2cr = S2CR_TYPE_TRANS |
   (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
}
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/6] arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED_EXECUTABLE

2016-07-08 Thread Mitchel Humpherys
The newly added DMA_ATTR_PRIVILEGED_EXECUTABLE is useful for creating
mappings that are executable by privileged DMA engines.  Implement it in
dma-iommu.c so that the ARM64 DMA IOMMU mapper can make use of it.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 arch/arm64/mm/dma-mapping.c |  6 +++---
 drivers/iommu/dma-iommu.c   | 15 +++
 include/linux/dma-iommu.h   |  3 ++-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index c566ec83719f..44f676268df6 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -543,7 +543,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
 struct dma_attrs *attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+   int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -697,7 +697,7 @@ static dma_addr_t __iommu_map_page(struct device *dev, 
struct page *page,
   struct dma_attrs *attrs)
 {
bool coherent = is_device_dma_coherent(dev);
-   int prot = dma_direction_to_prot(dir, coherent);
+   int prot = dma_direction_to_prot(dir, coherent, attrs);
dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
 
if (!iommu_dma_mapping_error(dev, dev_addr) &&
@@ -755,7 +755,7 @@ static int __iommu_map_sg_attrs(struct device *dev, struct 
scatterlist *sgl,
__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
 
return iommu_dma_map_sg(dev, sgl, nelems,
-   dma_direction_to_prot(dir, coherent));
+   dma_direction_to_prot(dir, coherent, attrs));
 }
 
 static void __iommu_unmap_sg_attrs(struct device *dev,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea5a9ebf0f78..ccc6219da228 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -132,23 +132,30 @@ EXPORT_SYMBOL(iommu_dma_init_domain);
  * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
  * @dir: Direction of DMA transfer
  * @coherent: Is the DMA master cache-coherent?
+ * @attrs: DMA attributes for the mapping
  *
  * Return: corresponding IOMMU API page protection flags
  */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent,
+ struct dma_attrs *attrs)
 {
int prot = coherent ? IOMMU_CACHE : 0;
 
switch (dir) {
case DMA_BIDIRECTIONAL:
-   return prot | IOMMU_READ | IOMMU_WRITE;
+   prot |= IOMMU_READ | IOMMU_WRITE;
case DMA_TO_DEVICE:
-   return prot | IOMMU_READ;
+   prot |= IOMMU_READ;
case DMA_FROM_DEVICE:
-   return prot | IOMMU_WRITE;
+   prot |= IOMMU_WRITE;
default:
return 0;
}
+   if (dma_get_attr(DMA_ATTR_PRIVILEGED_EXECUTABLE, attrs)) {
+   prot &= ~IOMMU_WRITE;
+   prot |= IOMMU_PRIV;
+   }
+   return prot;
 }
 
 static struct iova *__alloc_iova(struct iova_domain *iovad, size_t size,
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 8443bbb5c071..d5a37e58d29b 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -32,7 +32,8 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 
size);
 
 /* General helpers for DMA-API <-> IOMMU-API interaction */
-int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent,
+ struct dma_attrs *attrs);
 
 /*
  * These implement the bulk of the relevant DMA mapping callbacks, but require
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/6] iommu: add IOMMU_PRIV attribute

2016-07-08 Thread Mitchel Humpherys
Add the IOMMU_PRIV attribute, which is used to indicate privileged
mappings.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 664683aedcce..01c9f2667f2b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,7 @@
 #define IOMMU_CACHE(1 << 2) /* DMA cache coherency */
 #define IOMMU_NOEXEC   (1 << 3)
 #define IOMMU_MMIO (1 << 4) /* e.g. things like MSI doorbells */
+#define IOMMU_PRIV (1 << 5)
 
 struct iommu_ops;
 struct iommu_group;
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/6] Add support for privileged mappings

2016-07-08 Thread Mitchel Humpherys
The following patch to the ARM SMMU driver:

commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
Author: Robin Murphy <robin.mur...@arm.com>
Date:   Tue Jan 26 18:06:34 2016 +

iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
  ARM VMSAv8 behavior of unprivileged-writeable =>
  privileged-execute-never.

This series rectifies (1) by introducing an IOMMU API for privileged
mappings and implements it in io-pgtable-arm.

This series rectifies (2) by introducing a new dma attribute
(DMA_ATTR_PRIVILEGED_EXECUTABLE) for users of the DMA API that need
privileged, executable mappings, and implements it in the arm64 IOMMU DMA
mapper.  The one known user (pl330.c) is converted over to the new
attribute.

Jordan and Jeremy can provide more info on the use case if needed, but the
high level is that it's a security feature to prevent attacks such as [1].

[1] https://github.com/robclark/kilroy

Changelog:

  v1..v2

- Added a new DMA attribute to make executable privileged mappings
  work, and use that in the pl330 driver (suggested by Will).


Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (5):
  iommu: add IOMMU_PRIV attribute
  Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"
  common: DMA-mapping: add DMA_ATTR_PRIVILEGED_EXECUTABLE attribute
  arm64/dma-mapping: Implement DMA_ATTR_PRIVILEGED_EXECUTABLE
  dmaengine: pl330: Make sure microcode is privileged-executable

 Documentation/DMA-attributes.txt |  9 +
 arch/arm64/mm/dma-mapping.c  |  6 +++---
 drivers/dma/pl330.c  |  7 +--
 drivers/iommu/arm-smmu.c |  5 +
 drivers/iommu/dma-iommu.c| 15 +++
 drivers/iommu/io-pgtable-arm.c   | 16 +++-
 include/linux/dma-attrs.h|  1 +
 include/linux/dma-iommu.h|  3 ++-
 include/linux/iommu.h|  1 +
 9 files changed, 44 insertions(+), 19 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] Add support for privileged mappings

2016-07-08 Thread Mitchel Humpherys
On Thu, Jul 07 2016 at 02:58:21 PM, Jordan Crouse  
wrote:
>> Whilst this series is a step in the right direction for fixing that, I
>> don't think you can claim that only low-level users need this, given that
>> we have in-tree code which would break without it. Perhaps you just need
>> to extend things slightly more to expose this to the DMA API as well (or,
>> alternatively, hack the PL330 driver some how).
>
> I agree that hacking the DMA api would be the best long term solution but 
> there
> be dragons there. Perhaps a workable compromise might be to white-list
> privileged aware devices via the device tree.

I'm sending a v2 with an attempt at plumbing this through the DMA layer.
Hopefully avoiding dragons while I'm at it :)


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/6] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

2016-07-08 Thread Mitchel Humpherys
From: Jeremy Gebben 

Allow the creation of privileged mode mappings, for stage 1 only.

Signed-off-by: Jeremy Gebben 
---
 drivers/iommu/io-pgtable-arm.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a1ed1b73fed4..e9e7dd179708 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -101,8 +101,10 @@
 ARM_LPAE_PTE_ATTR_HI_MASK)
 
 /* Stage-1 PTE */
-#define ARM_LPAE_PTE_AP_UNPRIV (((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_AP_RDONLY (((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_AP_PRIV_RW(((arm_lpae_iopte)0) << 6)
+#define ARM_LPAE_PTE_AP_RW (((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_AP_PRIV_RO(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_AP_RO (((arm_lpae_iopte)3) << 6)
 #define ARM_LPAE_PTE_ATTRINDX_SHIFT2
 #define ARM_LPAE_PTE_nG(((arm_lpae_iopte)1) << 11)
 
@@ -350,10 +352,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
-   pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+   pte = ARM_LPAE_PTE_nG;
 
-   if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
-   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   if (prot & IOMMU_WRITE)
+   pte |= (prot & IOMMU_PRIV) ? ARM_LPAE_PTE_AP_PRIV_RW
+   : ARM_LPAE_PTE_AP_RW;
+   else
+   pte |= (prot & IOMMU_PRIV) ? ARM_LPAE_PTE_AP_PRIV_RO
+   : ARM_LPAE_PTE_AP_RO;
 
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

2016-07-06 Thread Mitchel Humpherys
This reverts commit (d346180e70b91b3d: "iommu/arm-smmu: Treat all device
transactions as unprivileged") since some platforms actually make use of
privileged transactions.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 9345a3fcb706..d0627ef26b05 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -178,9 +178,6 @@
 #define S2CR_TYPE_BYPASS   (1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT(2 << S2CR_TYPE_SHIFT)
 
-#define S2CR_PRIVCFG_SHIFT 24
-#define S2CR_PRIVCFG_UNPRIV(2 << S2CR_PRIVCFG_SHIFT)
-
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)   (0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT0
@@ -1175,7 +1172,7 @@ static int arm_smmu_domain_add_master(struct 
arm_smmu_domain *smmu_domain,
u32 idx, s2cr;
 
idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-   s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
+   s2cr = S2CR_TYPE_TRANS |
   (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
}
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

2016-07-06 Thread Mitchel Humpherys
From: Jeremy Gebben 

Allow the creation of privileged mode mappings, for stage 1 only.

Signed-off-by: Jeremy Gebben 
---
 drivers/iommu/io-pgtable-arm.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a1ed1b73fed4..e9e7dd179708 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -101,8 +101,10 @@
 ARM_LPAE_PTE_ATTR_HI_MASK)
 
 /* Stage-1 PTE */
-#define ARM_LPAE_PTE_AP_UNPRIV (((arm_lpae_iopte)1) << 6)
-#define ARM_LPAE_PTE_AP_RDONLY (((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_AP_PRIV_RW(((arm_lpae_iopte)0) << 6)
+#define ARM_LPAE_PTE_AP_RW (((arm_lpae_iopte)1) << 6)
+#define ARM_LPAE_PTE_AP_PRIV_RO(((arm_lpae_iopte)2) << 6)
+#define ARM_LPAE_PTE_AP_RO (((arm_lpae_iopte)3) << 6)
 #define ARM_LPAE_PTE_ATTRINDX_SHIFT2
 #define ARM_LPAE_PTE_nG(((arm_lpae_iopte)1) << 11)
 
@@ -350,10 +352,14 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
 
if (data->iop.fmt == ARM_64_LPAE_S1 ||
data->iop.fmt == ARM_32_LPAE_S1) {
-   pte = ARM_LPAE_PTE_AP_UNPRIV | ARM_LPAE_PTE_nG;
+   pte = ARM_LPAE_PTE_nG;
 
-   if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
-   pte |= ARM_LPAE_PTE_AP_RDONLY;
+   if (prot & IOMMU_WRITE)
+   pte |= (prot & IOMMU_PRIV) ? ARM_LPAE_PTE_AP_PRIV_RW
+   : ARM_LPAE_PTE_AP_RW;
+   else
+   pte |= (prot & IOMMU_PRIV) ? ARM_LPAE_PTE_AP_PRIV_RO
+   : ARM_LPAE_PTE_AP_RO;
 
if (prot & IOMMU_MMIO)
pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Add support for privileged mappings

2016-07-06 Thread Mitchel Humpherys
The following patch to the ARM SMMU driver:

commit d346180e70b91b3d5a1ae7e5603e65593d4622bc
Author: Robin Murphy <robin.mur...@arm.com>
Date:   Tue Jan 26 18:06:34 2016 +

iommu/arm-smmu: Treat all device transactions as unprivileged

started forcing all SMMU transactions to come through as "unprivileged".
The rationale given was that:

  (1) There is no way in the IOMMU API to even request privileged mappings.

  (2) It's difficult to implement a DMA mapper that correctly models the
  ARM VMSAv8 behavior of unprivileged-writeable =>
  privileged-execute-never.

This series attempts to rectify (1) by introducing an IOMMU API for
privileged mappings (and implementing it in io-pgtable-arm).  It seems like
(2) can be safely ignored for now under the assumption that any users of
the IOMMU_PRIV flag will be using the low-level IOMMU APIs directly, rather
than going through the DMA APIs.

Robin, Will, what do you think?  Jordan and Jeremy can provide more info on
the use case if needed, but the high level is that it's a security feature
to prevent attacks such as [1].

[1] https://github.com/robclark/kilroy


Jeremy Gebben (1):
  iommu/io-pgtable-arm: add support for the IOMMU_PRIV flag

Mitchel Humpherys (2):
  iommu: add IOMMU_PRIV attribute
  Revert "iommu/arm-smmu: Treat all device transactions as unprivileged"

 drivers/iommu/arm-smmu.c   |  5 +
 drivers/iommu/io-pgtable-arm.c | 16 +++-
 include/linux/iommu.h  |  1 +
 3 files changed, 13 insertions(+), 9 deletions(-)

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to keep PCI-e endpoints and RCs in distinct IOMMU groups?

2016-06-02 Thread Mitchel Humpherys
On Wed, May 25 2016 at 08:45:58 PM, Alex Williamson 
 wrote:
>> Why do we do that?  If the devices have different BDFs can't we safely
>> say that they're protected from peer-to-peer DMA (assuming no DMA
>> aliasing quirks)?  Even as I write that out it seems wrong though since
>> the RC can probably do whatever it wants...
>> 
>> Maybe the IOMMU framework can't actually know whether the devices should
>> be kept in separate groups and we just need to do something custom in
>> the arm-smmu driver?
>
> You're only considering the visibility of devices to the IOMMU, not the
> isolation between devices.  Without ACS peer-to-peer can be re-routed
> between devices before the IOMMU even knows about it.  That's why the
> root port is included in the group.  I'm confused why your driver is
> using the IOMMU API instead of the much more common DMA API anyway
> though.  Thanks,
>
> Alex

Ah ok, thanks for the explanation!

The driver *is* using the DMA API.  I'm actually working on the DMA APIs
themselves (a hacked-up version of the arm32 DMA APIs that have been
forklifted into arm64, to be exact).  Anyways, it looks like the best
route for us long-term is to try and align with Robin's arm64 IOMMU DMA
API mapper and take it from there.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to keep PCI-e endpoints and RCs in distinct IOMMU groups?

2016-06-02 Thread Mitchel Humpherys
On Thu, May 26 2016 at 11:58:53 AM, Robin Murphy <robin.mur...@arm.com> wrote:
> Hey Mitch,
>
> On 26/05/16 01:26, Mitchel Humpherys wrote:
>> Hey there,
>>
>> We're experiencing an issue with IOMMU groups and PCI-e devices.  The
>> system in question has a WLAN DMA master behind a PCI-e root complex
>> which is, in turn, behind an IOMMU.  There are no there devices behind
>> the RC.  This is on an ARM platform using the arm-smmu and pci-msm
>> drivers (pci-msm is in the MSM vendor tree, sorry...).
>>
>> What we're observing is that the WLAN endpoint device is being added to
>> the same IOMMU group as the root complex device itself.  I don't think
>> they should be in the same group though, since they each have different
>> BDFs, which, in our system, are translated to different SMMU Stream IDs,
>> so their traffic is split onto separate SMMU context banks.  Since their
>> traffic is isolated from one other I don't think they need to be in the
>> same IOMMU group (as I understand IOMMU groups).
>>
>> The result is that when the WLAN driver tries to attach to their IOMMU
>> it errors out due to the following check in iommu_attach_device:
>>
>>  if (iommu_group_device_count(group) != 1)
>>  goto out_unlock;
>>
>> I've come up with a few hacky workarounds:
>>
>>- Forcing PCI-e ACS to be "enabled" unconditionally (even though our
>>  platform doesn't actually support it).
>
> If the _only_ use of the IOMMU is to allow 32-bit devices to get at
> physically higher RAM without DAC addressing, then perhaps. If system
> integrity matters, though, you're opening up the big hole that Alex
> mentions. I'm reminded of Rob Clark's awesome Fire TV hack for some of the
> dangers of letting DMA-capable devices play together without careful
> supervision...
>
>>- Call iommu_attach_group instead of iommu_attach_device in the arm64
>>  DMA IOMMU mapping layer (yuck).
>
> That's not yuck, that would be correct, except for the arm64 DMA mapping
> code relying on default domains from the IOMMU core and not calling
> iommu_attach anything :/
>
> If you've not picked 921b1f52c942 into the MSM kernel, please do so and fix
> the fallout in whatever other modifications you have. That dodgy workaround
> was only necessary for the brief window between the DMA mapping code and
> the IOMMU core group rework both landing in 4.4, and then hung around
> unused for far too long, frankly.

Ah sorry, somehow I forgot that we forklifted the arm32 IOMMU DMA mapper
into arm64 a few years ago...  I've been watching your recent work in
this area but haven't had a chance to do any proper testing.  Hopefully
we'll be getting some time to better align with upstream soon.  Our
divergence is a pain for everyone, I know...

>
>>- Don't use the pci_device_group helper at all from the arm-smmu
>>  driver.  Just allocate a new group for all PCI-e devices.
>
> See point #1.
>
>> It seems like the proper solution would be to somehow make these devices
>> end up in separate IOMMU groups using the existing pci_device_group
>> helper, since that might be doing useful stuff for other configurations
>> (like detecting the DMA aliasing quirks).
>>
>> Looking at pci_device_group, though, I'm not sure how we could tell that
>> these two devices are supposed to get separated.  I know very little
>> about PCI-e so maybe I'm just missing something simple.  The match
>> happens in the following loop where we walk up the PCI-e topology:
>>
>>  /*
>>   * Continue upstream from the point of minimum IOMMU granularity
>>   * due to aliases to the point where devices are protected from
>>   * peer-to-peer DMA by PCI ACS.  Again, if we find an existing
>>   * group, use it.
>>   */
>>  for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
>>  if (!bus->self)
>>  continue;
>>
>>  if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
>>  break;
>>
>>  pdev = bus->self;
>>
>>  group = iommu_group_get(>dev);
>>  if (group)
>>  return group;
>>  }
>>
>> Why do we do that?  If the devices have different BDFs can't we safely
>> say that they're protected from peer-to-peer DMA (assuming no DMA
>> aliasing quirks)?  Even as I write that out it seems wrong though since
>> the RC can probably do whatever it wants...
>
&g

How to keep PCI-e endpoints and RCs in distinct IOMMU groups?

2016-05-25 Thread Mitchel Humpherys
Hey there,

We're experiencing an issue with IOMMU groups and PCI-e devices.  The
system in question has a WLAN DMA master behind a PCI-e root complex
which is, in turn, behind an IOMMU.  There are no there devices behind
the RC.  This is on an ARM platform using the arm-smmu and pci-msm
drivers (pci-msm is in the MSM vendor tree, sorry...).

What we're observing is that the WLAN endpoint device is being added to
the same IOMMU group as the root complex device itself.  I don't think
they should be in the same group though, since they each have different
BDFs, which, in our system, are translated to different SMMU Stream IDs,
so their traffic is split onto separate SMMU context banks.  Since their
traffic is isolated from one other I don't think they need to be in the
same IOMMU group (as I understand IOMMU groups).

The result is that when the WLAN driver tries to attach to their IOMMU
it errors out due to the following check in iommu_attach_device:

if (iommu_group_device_count(group) != 1)
goto out_unlock;

I've come up with a few hacky workarounds:

  - Forcing PCI-e ACS to be "enabled" unconditionally (even though our
platform doesn't actually support it).

  - Call iommu_attach_group instead of iommu_attach_device in the arm64
DMA IOMMU mapping layer (yuck).

  - Don't use the pci_device_group helper at all from the arm-smmu
driver.  Just allocate a new group for all PCI-e devices.

It seems like the proper solution would be to somehow make these devices
end up in separate IOMMU groups using the existing pci_device_group
helper, since that might be doing useful stuff for other configurations
(like detecting the DMA aliasing quirks).

Looking at pci_device_group, though, I'm not sure how we could tell that
these two devices are supposed to get separated.  I know very little
about PCI-e so maybe I'm just missing something simple.  The match
happens in the following loop where we walk up the PCI-e topology:

/*
 * Continue upstream from the point of minimum IOMMU granularity
 * due to aliases to the point where devices are protected from
 * peer-to-peer DMA by PCI ACS.  Again, if we find an existing
 * group, use it.
 */
for (bus = pdev->bus; !pci_is_root_bus(bus); bus = bus->parent) {
if (!bus->self)
continue;

if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
break;

pdev = bus->self;

group = iommu_group_get(>dev);
if (group)
return group;
}

Why do we do that?  If the devices have different BDFs can't we safely
say that they're protected from peer-to-peer DMA (assuming no DMA
aliasing quirks)?  Even as I write that out it seems wrong though since
the RC can probably do whatever it wants...

Maybe the IOMMU framework can't actually know whether the devices should
be kept in separate groups and we just need to do something custom in
the arm-smmu driver?

Sorry for the novel!  Thanks for any pointers.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Support dynamic pgsize_bitmap

2016-04-07 Thread Mitchel Humpherys
On Thu, Apr 07 2016 at 12:29:59 PM, Mitchel Humpherys <mitch...@codeaurora.org> 
wrote:
>> I'll clean up what I have and try to get it posted this afternoon so
>> we can compare.
>
> Cool, I have some comments that I'll leave over there.

Never mind, my comments weren't relevant.  I'll try to test your series
out soon...


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu: Support dynamic pgsize_bitmap

2016-04-07 Thread Mitchel Humpherys
On Wed, Apr 06 2016 at 11:47:19 AM, Robin Murphy  wrote:
> How would you handle said restriction of page sizes under this scheme?

I have a custom io-pgtable implementation that gets wired up based on an
IOMMU domain attribute, which is set by yet another custom DMA mapper.
My main goal is to give clients a way to specify the page table format
they want to use.  It's a bit of a mess but hopefully I can clean it up
and send it out soon.

> I'll clean up what I have and try to get it posted this afternoon so
> we can compare.

Cool, I have some comments that I'll leave over there.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu: Support dynamic pgsize_bitmap

2016-04-05 Thread Mitchel Humpherys
Currently we use a single pgsize_bitmap per IOMMU driver.  However, some
IOMMU drivers might service different IOMMUs with different supported
page sizes.  Some drivers might also want to restrict page sizes for
different use cases.  Support these use cases by adding a
.get_pgsize_bitmap function to the iommu_ops which can optionally be
used by the driver to return a domain-specific pgsize_bitmap.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/iommu/iommu.c | 28 +++-
 include/linux/iommu.h |  3 +++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index bfd4f7c3b1d8..6141710f3091 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -325,6 +325,13 @@ int iommu_group_set_name(struct iommu_group *group, const 
char *name)
 }
 EXPORT_SYMBOL_GPL(iommu_group_set_name);
 
+static unsigned long iommu_get_pgsize_bitmap(struct iommu_domain *domain)
+{
+   if (domain->ops->get_pgsize_bitmap)
+   return domain->ops->get_pgsize_bitmap(domain);
+   return domain->ops->pgsize_bitmap;
+}
+
 static int iommu_group_create_direct_mappings(struct iommu_group *group,
  struct device *dev)
 {
@@ -337,9 +344,9 @@ static int iommu_group_create_direct_mappings(struct 
iommu_group *group,
if (!domain || domain->type != IOMMU_DOMAIN_DMA)
return 0;
 
-   BUG_ON(!domain->ops->pgsize_bitmap);
+   BUG_ON(!(domain->ops->pgsize_bitmap || domain->ops->get_pgsize_bitmap));
 
-   pg_size = 1UL << __ffs(domain->ops->pgsize_bitmap);
+   pg_size = 1UL << __ffs(iommu_get_pgsize_bitmap(domain));
INIT_LIST_HEAD();
 
iommu_get_dm_regions(dev, );
@@ -1318,14 +1325,15 @@ int iommu_map(struct iommu_domain *domain, unsigned 
long iova,
int ret = 0;
 
if (unlikely(domain->ops->map == NULL ||
-domain->ops->pgsize_bitmap == 0UL))
+(domain->ops->pgsize_bitmap == 0UL &&
+ !domain->ops->get_pgsize_bitmap)))
return -ENODEV;
 
if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
return -EINVAL;
 
/* find out the minimum page size supported */
-   min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+   min_pagesz = 1 << __ffs(iommu_get_pgsize_bitmap(domain));
 
/*
 * both the virtual address and the physical one, as well as
@@ -1372,14 +1380,15 @@ size_t iommu_unmap(struct iommu_domain *domain, 
unsigned long iova, size_t size)
unsigned long orig_iova = iova;
 
if (unlikely(domain->ops->unmap == NULL ||
-domain->ops->pgsize_bitmap == 0UL))
+(domain->ops->pgsize_bitmap == 0UL &&
+ !domain->ops->get_pgsize_bitmap)))
return -ENODEV;
 
if (unlikely(!(domain->type & __IOMMU_DOMAIN_PAGING)))
return -EINVAL;
 
/* find out the minimum page size supported */
-   min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+   min_pagesz = 1 << __ffs(iommu_get_pgsize_bitmap(domain));
 
/*
 * The virtual address, as well as the size of the mapping, must be
@@ -1425,10 +1434,11 @@ size_t default_iommu_map_sg(struct iommu_domain 
*domain, unsigned long iova,
unsigned int i, min_pagesz;
int ret;
 
-   if (unlikely(domain->ops->pgsize_bitmap == 0UL))
+   if (unlikely(domain->ops->pgsize_bitmap == 0UL &&
+!domain->ops->get_pgsize_bitmap))
return 0;
 
-   min_pagesz = 1 << __ffs(domain->ops->pgsize_bitmap);
+   min_pagesz = 1 << __ffs(iommu_get_pgsize_bitmap(domain));
 
for_each_sg(sg, s, nents, i) {
phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
@@ -1509,7 +1519,7 @@ int iommu_domain_get_attr(struct iommu_domain *domain,
break;
case DOMAIN_ATTR_PAGING:
paging  = data;
-   *paging = (domain->ops->pgsize_bitmap != 0UL);
+   *paging = (iommu_get_pgsize_bitmap(domain) != 0UL);
break;
case DOMAIN_ATTR_WINDOWS:
count = data;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a5c539fa5d2b..03f8d50670db 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -156,6 +156,8 @@ struct iommu_dm_region {
  * @domain_get_windows: Return the number of windows for a domain
  * @of_xlate: add OF master IDs to iommu grouping
  * @pgsize_bitmap: bitmap of supported page sizes
+ * @get_pgsize_bitmap: gets a bitmap of supported page sizes for a domain
+ * This takes precedence over @pgsize_bitmap.

[PATCH 2/2] iommu/arm-smmu: Implement .get_pgsize_bitmap for domain

2016-04-05 Thread Mitchel Humpherys
Currently we restrict the pgsize_bitmap for the entire SMMU every time
we allocate some new page tables.  However, certain io-pgtable
implementations might wish to restrict the formats beyond the
restrictions of the SMMU itself, which forces all domains on that SMMU
to the same pgsize_bitmap, even if the other domains would prefer to use
a more permissive page table format.  Besides that, some SMMUs in the
system might have different supported page sizes at the hardware level,
so applying those to everyone else is wrong.

Fix these issues by implementing the new .get_pgsize_bitmap IOMMU op.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2409e3bd3df2..a1b0f542d5ca 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -908,9 +908,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
goto out_clear_smmu;
}
 
-   /* Update our support page sizes to reflect the page table format */
-   arm_smmu_ops.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-
/* Initialise the context bank with our page table cfg */
arm_smmu_init_context_bank(smmu_domain, _cfg);
 
@@ -1462,6 +1459,23 @@ out_unlock:
return ret;
 }
 
+static unsigned long arm_smmu_get_pgsize_bitmap(struct iommu_domain *domain)
+{
+   struct arm_smmu_domain *smmu_domain = domain->priv;
+
+   /*
+* if someone is calling map before attach just return the
+* supported page sizes for the hardware itself.
+*/
+   if (!smmu_domain->pgtbl_cfg.pgsize_bitmap)
+   return arm_smmu_ops.pgsize_bitmap;
+   /*
+* otherwise return the page sizes supported by this specific page
+* table configuration
+*/
+   return smmu_domain->pgtbl_cfg.pgsize_bitmap;
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -1477,6 +1491,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_get_attr= arm_smmu_domain_get_attr,
.domain_set_attr= arm_smmu_domain_set_attr,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
+   .get_pgsize_bitmap  = arm_smmu_get_pgsize_bitmap,
 };
 
 static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: Only return IRQ_NONE if FSR is not set

2015-10-06 Thread Mitchel Humpherys
On Mon, Oct 05 2015 at 03:24:03 PM, Will Deacon <will.dea...@arm.com> wrote:
> Hi Mitch,
>
> On Sat, Sep 26, 2015 at 01:12:05AM +0100, Mitchel Humpherys wrote:
>> Currently we return IRQ_NONE from the context fault handler if the FSR
>> doesn't actually have the fault bit set (some sort of miswired
>> interrupt?) or if the client doesn't register an IOMMU fault handler.
>> However, registering a client fault handler is optional, so telling the
>> interrupt framework that the interrupt wasn't for this device if the
>> client doesn't register a handler isn't exactly accurate.  Fix this by
>> returning IRQ_HANDLED even if the client doesn't register a handler.
>> 
>> Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
>> ---
>>  drivers/iommu/arm-smmu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 48a39dfa9777..95560d447a54 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -653,7 +653,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
>> *dev)
>>  dev_err_ratelimited(smmu->dev,
>>  "Unhandled context fault: iova=0x%08lx, fsynr=0x%x, 
>> cb=%d\n",
>>  iova, fsynr, cfg->cbndx);
>> -ret = IRQ_NONE;
>> +ret = IRQ_HANDLED;
>>  resume = RESUME_TERMINATE;
>
> Hmm, but if we haven't actually done anything to rectify the cause of the
> fault, what means that we won't take it again immediately? I guess I'm not
> understanding the use-case that triggered you to write this patch...

Does returning IRQ_NONE actually prevent us from taking another
interrupt (despite clearing the FSR below)?  We definitely take more
interrupts on our platform despite returning IRQ_NONE, but maybe we have
something misconfigured...

I thought that returning IRQ_NONE would simply affect spurious interrupt
accounting (only disabling the interrupt if we took enough of them in
close enough succession to flag a misbehaving device).

As far as a valid use case, I can't think of one.  I just thought we
were messing up the spurious interrupt accounting.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: Only return IRQ_NONE if FSR is not set

2015-09-25 Thread Mitchel Humpherys
Currently we return IRQ_NONE from the context fault handler if the FSR
doesn't actually have the fault bit set (some sort of miswired
interrupt?) or if the client doesn't register an IOMMU fault handler.
However, registering a client fault handler is optional, so telling the
interrupt framework that the interrupt wasn't for this device if the
client doesn't register a handler isn't exactly accurate.  Fix this by
returning IRQ_HANDLED even if the client doesn't register a handler.

Signed-off-by: Mitchel Humpherys <mitch...@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 48a39dfa9777..95560d447a54 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -653,7 +653,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void 
*dev)
dev_err_ratelimited(smmu->dev,
"Unhandled context fault: iova=0x%08lx, fsynr=0x%x, 
cb=%d\n",
iova, fsynr, cfg->cbndx);
-   ret = IRQ_NONE;
+   ret = IRQ_HANDLED;
resume = RESUME_TERMINATE;
}
 
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/mediatek: Add mt8173 IOMMU driver

2015-03-06 Thread Mitchel Humpherys
On Fri, Mar 06 2015 at 02:48:17 AM, yong...@mediatek.com wrote:
 From: Yong Wu yong...@mediatek.com

 This patch adds support for mediatek m4u (MultiMedia Memory Management Unit).
 Currently this only supports m4u gen 2 with 2 levels of page table on mt8173.

[...]

 +static int mtk_iommu_invalidate_tlb(const struct mtk_iommu_info *piommu,
 + int isinvall, unsigned int iova_start,
 + unsigned int iova_end)
 +{
 + void __iomem *m4u_base = piommu-m4u_base;
 + u32 val;
 + u64 start, end;
 +
 + start = sched_clock();
 +
 + if (!isinvall) {
 + iova_start = round_down(iova_start, SZ_4K);
 + iova_end = round_up(iova_end, SZ_4K);
 + }
 +
 + val = F_MMU_INV_EN_L2 | F_MMU_INV_EN_L1;
 +
 + writel(val, m4u_base + REG_INVLID_SEL);
 +
 + if (isinvall) {
 + writel(F_MMU_INV_ALL, m4u_base + REG_MMU_INVLD);
 + } else {
 + writel(iova_start, m4u_base + REG_MMU_INVLD_SA);
 + writel(iova_end, m4u_base + REG_MMU_INVLD_EA);
 + writel(F_MMU_INV_RANGE, m4u_base + REG_MMU_INVLD);
 +
 + while (!readl(m4u_base + REG_MMU_CPE_DONE)) {
 + end = sched_clock();
 + if (end - start = 1ULL) {
 + dev_warn(piommu-dev, invalid don't done\n);
 + writel(F_MMU_INV_ALL, m4u_base + REG_MMU_INVLD);
 + }
 + };

Superfluous `;'.  Also, maybe you should be using readl_poll_timeout?

 + writel(0, m4u_base + REG_MMU_CPE_DONE);
 + }
 +
 + return 0;
 +}



-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: fix leak in arm_smmu_flush_pgtable

2015-03-05 Thread Mitchel Humpherys
On Thu, Mar 05 2015 at 02:38:45 AM, Robin Murphy robin.mur...@arm.com wrote:
 Hi Mitch,

 On 05/03/15 00:18, Mitchel Humpherys wrote:
 We're currently mapping a page in arm_smmu_flush_pgtable without ever
 unmapping it.  Fix this by calling dma_unmap_page on the returned dma
 address.  Since the only reason we're calling dma_map_page is to make
 sure it actually gets flushed out to RAM, we can just call
 dma_unmap_page immediately following the map.

 Without this, eventually swiotlb runs out of memory and starts printing
 things like:

  [   35.545076] arm-smmu d0.arm,smmu: swiotlb buffer is full (sz: 
 128 bytes)


 So, you have non-coherent SMMUs too ;) The real problem is that the SMMU's
 DMA mask is wrong (as it happens I've just given Will a patch to fix that)
 - this is really just doing a whole bunch of unnecessary work (two memory
 copies and two cache flushes, one of which isn't even flushing the right
 area) to hide the problem. With an appropriate DMA mask set,
 swiotlb_map_page becomes a no-op and we fall through to the cache flush
 without ever allocating anything.

Yeah I noticed that as well...  But isn't this still incorrect usage of
the API (DMA-API-HOWTO.txt seems to indicate that calls to map should
always be balanced with calls to unmap)?  What we really want to do here
is just call __dma_map_area directly, but the comment on that guy
expressly forbids it...  Not sure what's worse, abusing the DMA API or
disobeying that comment?


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix leak in arm_smmu_flush_pgtable

2015-03-04 Thread Mitchel Humpherys
We're currently mapping a page in arm_smmu_flush_pgtable without ever
unmapping it.  Fix this by calling dma_unmap_page on the returned dma
address.  Since the only reason we're calling dma_map_page is to make
sure it actually gets flushed out to RAM, we can just call
dma_unmap_page immediately following the map.

Without this, eventually swiotlb runs out of memory and starts printing
things like:

[   35.545076] arm-smmu d0.arm,smmu: swiotlb buffer is full (sz: 128 
bytes)

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fc13dd56953e..2ff8f35cf533 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -627,8 +627,13 @@ static void arm_smmu_flush_pgtable(void *addr, size_t 
size, void *cookie)
 * recursion here as the SMMU table walker will not be wired
 * through another SMMU.
 */
-   dma_map_page(smmu-dev, virt_to_page(addr), offset, size,
-DMA_TO_DEVICE);
+   dma_addr_t handle = dma_map_page(smmu-dev, virt_to_page(addr),
+   offset, size, DMA_TO_DEVICE);
+   if (handle == DMA_ERROR_CODE)
+   dev_err(smmu-dev,
+   Couldn't flush page tables at %p!\n, addr);
+   else
+   dma_unmap_page(smmu-dev, handle, size, DMA_TO_DEVICE);
}
 }
 
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: use a threaded handler for context interrupts

2015-02-04 Thread Mitchel Humpherys
On Wed, Feb 04 2015 at 03:33:05 AM, Will Deacon will.dea...@arm.com wrote:
 On Mon, Feb 02, 2015 at 08:10:02PM +, Mitchel Humpherys wrote:
 On Wed, Jan 28 2015 at 04:07:39 AM, Will Deacon will.dea...@arm.com wrote:
  With a shared handler (e.g. a bunch of context banks have the same IRQ)
  then I assume that we don't want to end up with multiple handler threads
  all tripping over each other. I'd rather have one thread that handles work
  queued up by multiple low-level handlers.
 
  Do you have a preference either way?
 
 Ok I think I understand the scenario you're describing.  If multiple
 context banks are sharing an interrupt line their handlers currently
 execute serially, but with threaded handlers they would all be woken up
 and possibly execute concurrently.  I hadn't really considered this
 because none of our targets have CBs sharing interrupts.  In any case,
 the CBs that aren't interrupting should quickly return IRQ_NONE when
 they notice that !(fsr  FSR_FAULT), so is this really a concern?

 Well, with my stall-mode hat on, the FSR check could be done in the
 low-level handler, with the actual page fault handling or whatever done
 in the thread.

But we'll need to turn on clocks just to read the FSR, which can't be
done from atomic context.


 Anyways, we can always hold off on this until we have a more compelling
 motivation for it.  For example, if we need to enable clocks to read
 registers then threaded IRQs seem like the best solution.  Hopefully
 I'll find time to have another go at the clocks stuff soon, which is the
 real reason why we're using threaded IRQs for context interrupts in our
 msm tree.

 Okey doke. Having the clocks stuff supported in iommu core would be my
 preference.

Yeah I'll try to come up with something.  In this particular case I
guess we'd actually have to call out to some iommu_enable_access API so
it wouldn't be completely transparent.  Everywhere else I think the
iommu core can wrap the various iommu_ops callbacks with the
enable/disable calls.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: use a threaded handler for context interrupts

2015-02-02 Thread Mitchel Humpherys
On Wed, Jan 28 2015 at 04:07:39 AM, Will Deacon will.dea...@arm.com wrote:
 On Fri, Jan 23, 2015 at 10:33:20PM +, Mitchel Humpherys wrote:
 On Fri, Jan 23 2015 at 03:24:15 AM, Will Deacon will.dea...@arm.com wrote:
  On Thu, Jan 22, 2015 at 11:48:02PM +, Mitchel Humpherys wrote:
  Context interrupts can call domain-specific handlers which might sleep.
  Currently we register our handler with request_irq, so our handler is
  called in atomic context, so domain handlers that sleep result in an
  invalid context BUG.  Fix this by using request_threaded_irq.
  
  This also prepares the way for doing things like enabling clocks within
  our interrupt handler.
  
  Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
  ---
   drivers/iommu/arm-smmu.c | 5 +++--
   1 file changed, 3 insertions(+), 2 deletions(-)
  
  diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
  index 6cd47b75286f..81f6b54d94b1 100644
  --- a/drivers/iommu/arm-smmu.c
  +++ b/drivers/iommu/arm-smmu.c
  @@ -973,8 +973,9 @@ static int arm_smmu_init_domain_context(struct 
  iommu_domain *domain,
spin_unlock_irqrestore(smmu_domain-lock, flags);
   
irq = smmu-irqs[smmu-num_global_irqs + cfg-irptndx];
  - ret = request_irq(irq, arm_smmu_context_fault, IRQF_SHARED,
  -   arm-smmu-context-fault, domain);
  + ret = request_threaded_irq(irq, NULL, arm_smmu_context_fault,
  + IRQF_ONESHOT | IRQF_SHARED,
  + arm-smmu-context-fault, domain);
if (IS_ERR_VALUE(ret)) {
dev_err(smmu-dev, failed to request context IRQ %d (%u)\n,
cfg-irptndx, irq);
 
  I think I'd rather keep a simple atomic handler, then have a threaded
  handler for actually issuing the report_iommu_fault. i.e. we only wake
  the thread when it looks like there's some work to do. That also works
  much better for shared interrupts.
 
 Are you still against adding clock support to the driver?  If not, we'll
 need to move to a threaded handler when clocks come in anyways...
 
 Can you elaborate what you mean regarding shared interrupts?  Even
 without clocks it seems like the code clarity / performance tradeoff
 would favor a threaded handler, given that performance isn't important
 here.

 With a shared handler (e.g. a bunch of context banks have the same IRQ)
 then I assume that we don't want to end up with multiple handler threads
 all tripping over each other. I'd rather have one thread that handles work
 queued up by multiple low-level handlers.

 Do you have a preference either way?

Ok I think I understand the scenario you're describing.  If multiple
context banks are sharing an interrupt line their handlers currently
execute serially, but with threaded handlers they would all be woken up
and possibly execute concurrently.  I hadn't really considered this
because none of our targets have CBs sharing interrupts.  In any case,
the CBs that aren't interrupting should quickly return IRQ_NONE when
they notice that !(fsr  FSR_FAULT), so is this really a concern?

Anyways, we can always hold off on this until we have a more compelling
motivation for it.  For example, if we need to enable clocks to read
registers then threaded IRQs seem like the best solution.  Hopefully
I'll find time to have another go at the clocks stuff soon, which is the
real reason why we're using threaded IRQs for context interrupts in our
msm tree.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: use a threaded handler for context interrupts

2015-01-23 Thread Mitchel Humpherys
On Fri, Jan 23 2015 at 03:24:15 AM, Will Deacon will.dea...@arm.com wrote:
 Hi Mitch,

 On Thu, Jan 22, 2015 at 11:48:02PM +, Mitchel Humpherys wrote:
 Context interrupts can call domain-specific handlers which might sleep.
 Currently we register our handler with request_irq, so our handler is
 called in atomic context, so domain handlers that sleep result in an
 invalid context BUG.  Fix this by using request_threaded_irq.
 
 This also prepares the way for doing things like enabling clocks within
 our interrupt handler.
 
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
  drivers/iommu/arm-smmu.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index 6cd47b75286f..81f6b54d94b1 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -973,8 +973,9 @@ static int arm_smmu_init_domain_context(struct 
 iommu_domain *domain,
  spin_unlock_irqrestore(smmu_domain-lock, flags);
  
  irq = smmu-irqs[smmu-num_global_irqs + cfg-irptndx];
 -ret = request_irq(irq, arm_smmu_context_fault, IRQF_SHARED,
 -  arm-smmu-context-fault, domain);
 +ret = request_threaded_irq(irq, NULL, arm_smmu_context_fault,
 +IRQF_ONESHOT | IRQF_SHARED,
 +arm-smmu-context-fault, domain);
  if (IS_ERR_VALUE(ret)) {
  dev_err(smmu-dev, failed to request context IRQ %d (%u)\n,
  cfg-irptndx, irq);

 I think I'd rather keep a simple atomic handler, then have a threaded
 handler for actually issuing the report_iommu_fault. i.e. we only wake
 the thread when it looks like there's some work to do. That also works
 much better for shared interrupts.

Are you still against adding clock support to the driver?  If not, we'll
need to move to a threaded handler when clocks come in anyways...

Can you elaborate what you mean regarding shared interrupts?  Even
without clocks it seems like the code clarity / performance tradeoff
would favor a threaded handler, given that performance isn't important
here.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: use a threaded handler for context interrupts

2015-01-22 Thread Mitchel Humpherys
Context interrupts can call domain-specific handlers which might sleep.
Currently we register our handler with request_irq, so our handler is
called in atomic context, so domain handlers that sleep result in an
invalid context BUG.  Fix this by using request_threaded_irq.

This also prepares the way for doing things like enabling clocks within
our interrupt handler.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 6cd47b75286f..81f6b54d94b1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -973,8 +973,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
spin_unlock_irqrestore(smmu_domain-lock, flags);
 
irq = smmu-irqs[smmu-num_global_irqs + cfg-irptndx];
-   ret = request_irq(irq, arm_smmu_context_fault, IRQF_SHARED,
- arm-smmu-context-fault, domain);
+   ret = request_threaded_irq(irq, NULL, arm_smmu_context_fault,
+   IRQF_ONESHOT | IRQF_SHARED,
+   arm-smmu-context-fault, domain);
if (IS_ERR_VALUE(ret)) {
dev_err(smmu-dev, failed to request context IRQ %d (%u)\n,
cfg-irptndx, irq);
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2015-01-20 Thread Mitchel Humpherys
On Tue, Jan 20 2015 at 06:16:43 AM, Will Deacon will.dea...@arm.com wrote:
 Hey Mitch,

 On Wed, Oct 29, 2014 at 09:13:40PM +, Mitchel Humpherys wrote:
 Currently, we provide the iommu_ops.iova_to_phys service by doing a
 table walk in software to translate IO virtual addresses to physical
 addresses. On SMMUs that support it, it can be useful to ask the SMMU
 itself to do the translation. This can be used to warm the TLBs for an
 SMMU. It can also be useful for testing and hardware validation.
 
 Since the address translation registers are optional on SMMUv2, only
 enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
 and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.
 
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
 Changes since v6:
   - added missing lock
   - fixed physical address mask

 I had a go at rebasing this onto my current queue, but ended up making
 quite a few changes. Can you take a look at the result, please?

 Patch below (also on my for-joerg/arm-smmu/updates branch).

The modified patch looks good to me.  Thanks!

-Mitch


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10] iopoll: Introduce memory-mapped IO polling macros

2015-01-14 Thread Mitchel Humpherys
On Tue, Dec 16 2014 at 01:45:27 AM, Will Deacon will.dea...@arm.com wrote:
 On Mon, Dec 15, 2014 at 11:47:23PM +, Mitchel Humpherys wrote:
 From: Matt Wagantall ma...@codeaurora.org
 
 It is sometimes necessary to poll a memory-mapped register until its value
 satisfies some condition. Introduce a family of convenience macros that do
 this. Tight-looping, sleeping, and timing out can all be accomplished using
 these macros.
 
 Cc: Thierry Reding thierry.red...@gmail.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Arnd Bergmann a...@arndb.de
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Robert Elliott elli...@hp.com
 Signed-off-by: Matt Wagantall ma...@codeaurora.org
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
 v9..10:
   - Actually added the comments mentioned in v8..v9 (doh!)
 
 v8..v9:
   - Added note in comments about max sleep time (Rob Elliott)
 
 v7..v8:
   - sorted helper macros by size (b, w, l, q)
   - removed some of the more esoteric (or flat-out bogus) helper macros
 
 This patch was originally part of a series [1] for adding support for IOMMU
 address translations through an ARM SMMU hardware register.  The other
 patch in the series (the one that actually uses these macros and implements
 said hardware address translations) was Ack'd by the driver maintainer
 there (Will Deacon) so I've pulled this patch out to avoid resending an
 already Ack'd patch over and over again.
 
 In short, please see [1] for previous discussion and the first user of
 these macros.
 
 Will also acked this patch in [2].  I didn't retain his Ack here because I
 added to the macro comments.

 You can keep the ack, it still looks good to me and I'm not really fussed
 about the comments.

 Will

This hasn't gotten any further comments.  Would someone be willing to
take it?

Joerg, maybe you could take this through the IOMMU tree since the first
user is an IOMMU driver?  Currently we can't move [1] forward because of
this dependency...

[1] http://thread.gmane.org/gmane.linux.kernel.iommu/7837


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: don't touch the secure STLBIALL register

2015-01-07 Thread Mitchel Humpherys
On Wed, Jan 07 2015 at 10:04:20 AM, Will Deacon will.dea...@arm.com wrote:
 On Wed, Jan 07, 2015 at 05:52:46PM +, Mitchel Humpherys wrote:
 On Wed, Jan 07 2015 at 02:13:00 AM, Will Deacon will.dea...@arm.com wrote:
  On Tue, Jan 06, 2015 at 11:30:49PM +, Mitchel Humpherys wrote:
  On Tue, Jan 06 2015 at 02:35:28 PM, Rob Herring robherri...@gmail.com 
  wrote:
   On Tue, Jan 6, 2015 at 2:16 PM, Mitchel Humpherys
   mitch...@codeaurora.org wrote:
   On Tue, Jan 06 2015 at 06:15:07 AM, Will Deacon will.dea...@arm.com 
   wrote:
/* Invalidate the TLB, just in case */
   -writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
  
   I was slightly worried that this would break the Calxeda 
   implementation
   with ARM_SMMU_OPT_SECURE_CFG_ACCESS, but actually these registers 
   aren't
   even aliased there so I think there's a bigger bug for them.
  
   Anyway, given that their hardware has gone the way of the dodo, I'll 
   take
   the patch as-is unless you have any further comments?
  
   Will
  
   Yeah I agree that this shouldn't affect the (now defunct) Calxeda
   implementation.  I've tested this on some hardware here and we crash
   when we touch that register since it's secure-only (not banked, as you
   mentioned).
  
   It's not quite dead:
  
   http://www.eweek.com/servers/calxedas-arm-based-server-chips-re-emerge-with-new-company.html
  
   But AFAIK, production systems don't enable the SMMU, but someone could
   still want to at some point. A note in the commit log here would be
   nice so it gets recorded.
  
  Actually, as Will mentioned this shouldn't affect Calxeda since this
  isn't a banked register.  I think the confusion is from the `S' prefix
  in the spec.  The /s/ (lower-case, italic) prefix means that there are
  secure and non-secure versions of the register, while the S (upper-case,
  non-italic) prefix means this is a secure register (which may or may
  not have a banked non-secure counterpart).  This particular register is
  an S-only register (there's no non-secure counterpart) so the Calxeda
  workaround isn't relevant here, AFAICT.
 
  Right, but I think the problem is that we go and write zero to
  ARM_SMMU_GR0_TLBIALLH and ARM_SMMU_GR0_TLBIALLNSNH at what *would be* their
  non-secure aliases for the secure side (i.e. + 0x400).
 
 This sounds like a separate problem.  Since these GR0 registers aren't
 banked the calxeda workaround doesn't work...  SMMU_STLBIALL, on the
 other hand, is not only not banked but it's also secure only so I
 don't think we have any business touching it ever.
 
  If would be better to check for the ARM_SMMU_OPT_SECURE_CFG_ACCESS feature
  and, if it's set then zero ARM_SMMU_GR0_STLBIALL at the correct address
  otherwise do the ARM_SMMU_GR0_TLBIALLH and ARM_SMMU_GR0_TLBIALLNSNH.
 
 I'm confused.  The problem I'm addressing here is that we're touching a
 register that's marked as secure only, which causes our system to
 crash.  Why would we ever want to touch a secure only register, calxeda
 workaround or not?

 Because I think the way the SMMU is wired for Calxeda is that the CPU can
 only see the secure side of the register interface, so the only way to nuke
 the whole TLB would be to use ARM_SMMU_GR0_STLBIALL.

Still not sure I understand what the correct address is for STLBIALL
on Calxeda (i.e. whether or not we need to use ARM_SMMU_GR0_NS), but
something like:

-- 8 --
Subject: [PATCH v2] iommu/arm-smmu: don't touch the secure STLBIALL register

Currently we do a STLBIALL when we initialize the SMMU.  However, on
systems with sane secure
configurations (i.e. !ARM_SMMU_OPT_SECURE_CFG_ACCESS) that register is
not supposed to be touched and is marked as Secure only in the spec.
Touching it results in a crash on those platforms.  However, on
platforms with ARM_SMMU_OPT_SECURE_CFG_ACCESS it's the only way to nuke
the whole TLB, so leave it in for them but rip it out for everyone else.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 60558f794922..d4c149d83f3d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1686,9 +1686,12 @@ static void arm_smmu_device_reset(struct arm_smmu_device 
*smmu)
}
 
/* Invalidate the TLB, just in case */
-   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
-   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
-   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+   if (smmu-options  ARM_SMMU_OPT_SECURE_CFG_ACCESS) {
+   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
+   } else {
+   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH

Re: [PATCH] iommu/arm-smmu: don't touch the secure STLBIALL register

2015-01-07 Thread Mitchel Humpherys
On Wed, Jan 07 2015 at 02:13:00 AM, Will Deacon will.dea...@arm.com wrote:
 On Tue, Jan 06, 2015 at 11:30:49PM +, Mitchel Humpherys wrote:
 On Tue, Jan 06 2015 at 02:35:28 PM, Rob Herring robherri...@gmail.com 
 wrote:
  On Tue, Jan 6, 2015 at 2:16 PM, Mitchel Humpherys
  mitch...@codeaurora.org wrote:
  On Tue, Jan 06 2015 at 06:15:07 AM, Will Deacon will.dea...@arm.com 
  wrote:
   /* Invalidate the TLB, just in case */
  -writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
  I was slightly worried that this would break the Calxeda implementation
  with ARM_SMMU_OPT_SECURE_CFG_ACCESS, but actually these registers aren't
  even aliased there so I think there's a bigger bug for them.
 
  Anyway, given that their hardware has gone the way of the dodo, I'll take
  the patch as-is unless you have any further comments?
 
  Will
 
  Yeah I agree that this shouldn't affect the (now defunct) Calxeda
  implementation.  I've tested this on some hardware here and we crash
  when we touch that register since it's secure-only (not banked, as you
  mentioned).
 
  It's not quite dead:
 
  http://www.eweek.com/servers/calxedas-arm-based-server-chips-re-emerge-with-new-company.html
 
  But AFAIK, production systems don't enable the SMMU, but someone could
  still want to at some point. A note in the commit log here would be
  nice so it gets recorded.
 
 Actually, as Will mentioned this shouldn't affect Calxeda since this
 isn't a banked register.  I think the confusion is from the `S' prefix
 in the spec.  The /s/ (lower-case, italic) prefix means that there are
 secure and non-secure versions of the register, while the S (upper-case,
 non-italic) prefix means this is a secure register (which may or may
 not have a banked non-secure counterpart).  This particular register is
 an S-only register (there's no non-secure counterpart) so the Calxeda
 workaround isn't relevant here, AFAICT.

 Right, but I think the problem is that we go and write zero to
 ARM_SMMU_GR0_TLBIALLH and ARM_SMMU_GR0_TLBIALLNSNH at what *would be* their
 non-secure aliases for the secure side (i.e. + 0x400).

This sounds like a separate problem.  Since these GR0 registers aren't
banked the calxeda workaround doesn't work...  SMMU_STLBIALL, on the
other hand, is not only not banked but it's also secure only so I
don't think we have any business touching it ever.

 If would be better to check for the ARM_SMMU_OPT_SECURE_CFG_ACCESS feature
 and, if it's set then zero ARM_SMMU_GR0_STLBIALL at the correct address
 otherwise do the ARM_SMMU_GR0_TLBIALLH and ARM_SMMU_GR0_TLBIALLNSNH.

I'm confused.  The problem I'm addressing here is that we're touching a
register that's marked as secure only, which causes our system to
crash.  Why would we ever want to touch a secure only register, calxeda
workaround or not?


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: don't touch the secure STLBIALL register

2015-01-06 Thread Mitchel Humpherys
On Tue, Jan 06 2015 at 06:15:07 AM, Will Deacon will.dea...@arm.com wrote:
  /* Invalidate the TLB, just in case */
 -writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
  writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
  writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

 I was slightly worried that this would break the Calxeda implementation
 with ARM_SMMU_OPT_SECURE_CFG_ACCESS, but actually these registers aren't
 even aliased there so I think there's a bigger bug for them.

 Anyway, given that their hardware has gone the way of the dodo, I'll take
 the patch as-is unless you have any further comments?

 Will

Yeah I agree that this shouldn't affect the (now defunct) Calxeda
implementation.  I've tested this on some hardware here and we crash
when we touch that register since it's secure-only (not banked, as you
mentioned).


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/arm-smmu: don't touch the secure STLBIALL register

2015-01-06 Thread Mitchel Humpherys
On Tue, Jan 06 2015 at 02:35:28 PM, Rob Herring robherri...@gmail.com wrote:
 On Tue, Jan 6, 2015 at 2:16 PM, Mitchel Humpherys
 mitch...@codeaurora.org wrote:
 On Tue, Jan 06 2015 at 06:15:07 AM, Will Deacon will.dea...@arm.com wrote:
  /* Invalidate the TLB, just in case */
 -writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
  writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
  writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);

 I was slightly worried that this would break the Calxeda implementation
 with ARM_SMMU_OPT_SECURE_CFG_ACCESS, but actually these registers aren't
 even aliased there so I think there's a bigger bug for them.

 Anyway, given that their hardware has gone the way of the dodo, I'll take
 the patch as-is unless you have any further comments?

 Will

 Yeah I agree that this shouldn't affect the (now defunct) Calxeda
 implementation.  I've tested this on some hardware here and we crash
 when we touch that register since it's secure-only (not banked, as you
 mentioned).

 It's not quite dead:

 http://www.eweek.com/servers/calxedas-arm-based-server-chips-re-emerge-with-new-company.html

 But AFAIK, production systems don't enable the SMMU, but someone could
 still want to at some point. A note in the commit log here would be
 nice so it gets recorded.

Actually, as Will mentioned this shouldn't affect Calxeda since this
isn't a banked register.  I think the confusion is from the `S' prefix
in the spec.  The /s/ (lower-case, italic) prefix means that there are
secure and non-secure versions of the register, while the S (upper-case,
non-italic) prefix means this is a secure register (which may or may
not have a banked non-secure counterpart).  This particular register is
an S-only register (there's no non-secure counterpart) so the Calxeda
workaround isn't relevant here, AFAICT.



-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: don't touch the secure STLBIALL register

2014-12-23 Thread Mitchel Humpherys
Currently we do a STLBIALL when we initialize the SMMU.  However, in
some configurations that register is not supposed to be touched and is
marked as Secure only in the spec.  Rip it out.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 60558f794922..9170bbced5e5 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -173,7 +173,6 @@
 #define PIDR2_ARCH_MASK0xf
 
 /* Global TLB invalidation */
-#define ARM_SMMU_GR0_STLBIALL  0x60
 #define ARM_SMMU_GR0_TLBIVMID  0x64
 #define ARM_SMMU_GR0_TLBIALLNSNH   0x68
 #define ARM_SMMU_GR0_TLBIALLH  0x6c
@@ -1686,7 +1685,6 @@ static void arm_smmu_device_reset(struct arm_smmu_device 
*smmu)
}
 
/* Invalidate the TLB, just in case */
-   writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
 
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9] iopoll: Introduce memory-mapped IO polling macros

2014-12-15 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Cc: Arnd Bergmann a...@arndb.de
Cc: Andrew Morton a...@linux-foundation.org
Cc: Robert Elliott elli...@hp.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
v8..v9:
  - Added note in comments about max sleep time (Rob Elliott)

v7..v8:
  - sorted helper macros by size (b, w, l, q)
  - removed some of the more esoteric (or flat-out bogus) helper macros

This patch was originally part of a series [1] for adding support for IOMMU
address translations through an ARM SMMU hardware register.  The other
patch in the series (the one that actually uses these macros and implements
said hardware address translations) was Ack'd by the driver maintainer
there (Will Deacon) so I've pulled this patch out to avoid resending an
already Ack'd patch over and over again.

In short, please see [1] for previous discussion and the first user of
these macros.

Will also acked this patch in [2].  I didn't retain his Ack here because I
added to the macro comments.

[1] http://thread.gmane.org/gmane.linux.kernel.iommu/7140
[2] http://thread.gmane.org/gmane.linux.kernel.iommu/7449

---
 include/linux/iopoll.h | 140 +
 1 file changed, 140 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index ..bd161dae2d40
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range((sleep_us  2) + 1, sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout

Re: [PATCH v9] iopoll: Introduce memory-mapped IO polling macros

2014-12-15 Thread Mitchel Humpherys
On Mon, Dec 15 2014 at 03:31:20 PM, Mitchel Humpherys mitch...@codeaurora.org 
wrote:
 From: Matt Wagantall ma...@codeaurora.org

 It is sometimes necessary to poll a memory-mapped register until its value
 satisfies some condition. Introduce a family of convenience macros that do
 this. Tight-looping, sleeping, and timing out can all be accomplished using
 these macros.

 Cc: Thierry Reding thierry.red...@gmail.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Arnd Bergmann a...@arndb.de
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Robert Elliott elli...@hp.com
 Signed-off-by: Matt Wagantall ma...@codeaurora.org
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
 v8..v9:
   - Added note in comments about max sleep time (Rob Elliott)

Sorry, just noticed that I somehow dropped these additional comments
that Rob requested...  Let me send a v10 that actually includes them.
Please ignore this version.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10] iopoll: Introduce memory-mapped IO polling macros

2014-12-15 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Cc: Arnd Bergmann a...@arndb.de
Cc: Andrew Morton a...@linux-foundation.org
Cc: Robert Elliott elli...@hp.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
v9..10:
  - Actually added the comments mentioned in v8..v9 (doh!)

v8..v9:
  - Added note in comments about max sleep time (Rob Elliott)

v7..v8:
  - sorted helper macros by size (b, w, l, q)
  - removed some of the more esoteric (or flat-out bogus) helper macros

This patch was originally part of a series [1] for adding support for IOMMU
address translations through an ARM SMMU hardware register.  The other
patch in the series (the one that actually uses these macros and implements
said hardware address translations) was Ack'd by the driver maintainer
there (Will Deacon) so I've pulled this patch out to avoid resending an
already Ack'd patch over and over again.

In short, please see [1] for previous discussion and the first user of
these macros.

Will also acked this patch in [2].  I didn't retain his Ack here because I
added to the macro comments.

[1] http://thread.gmane.org/gmane.linux.kernel.iommu/7140
[2] http://thread.gmane.org/gmane.linux.kernel.iommu/7449

---
 include/linux/iopoll.h | 144 +
 1 file changed, 144 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index ..08fd52cdb5a0
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,144 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0
+ *tight-loops).  Should be less than ~20ms since usleep_range
+ *is used (see Documentation/timers/timers-howto.txt).
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range((sleep_us  2) + 1, sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops).  Should
+ *be less than ~10us since udelay is used (see
+ *Documentation/timers/timers-howto.txt).
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define

Re: [PATCH 2/4] iommu: add ARM LPAE page table allocator

2014-12-02 Thread Mitchel Humpherys
On Thu, Nov 27 2014 at 03:51:16 AM, Will Deacon will.dea...@arm.com wrote:
 A number of IOMMUs found in ARM SoCs can walk architecture-compatible
 page tables.

 This patch adds a generic allocator for Stage-1 and Stage-2 v7/v8
 long-descriptor page tables. 4k, 16k and 64k pages are supported, with
 up to 4-levels of walk to cover a 48-bit address space.

 Signed-off-by: Will Deacon will.dea...@arm.com
 ---

[...]

 +static struct io_pgtable *arm_lpae_alloc_pgtable_s1(struct io_pgtable_cfg 
 *cfg,
 + void *cookie)
 +{
 + u64 reg;
 + struct arm_lpae_io_pgtable *data = arm_lpae_alloc_pgtable(cfg);
 +
 + if (!data)
 + return NULL;
 +
 + /* TCR */
 + reg = ARM_LPAE_TCR_EAE |
 +  (ARM_LPAE_TCR_SH_IS  ARM_LPAE_TCR_SH0_SHIFT) |
 +  (ARM_LPAE_TCR_RGN_WBWA  ARM_LPAE_TCR_IRGN0_SHIFT) |
 +  (ARM_LPAE_TCR_RGN_WBWA  ARM_LPAE_TCR_ORGN0_SHIFT);

TCR has different definitions depending on whether we're using v7l or
v8l.  For example, bit 31 is TG1[1] (not EAE) when CBA2R.VA64=1.  Are we
expecting to have an io-pgtable-arm64.c or something?  Seems like that
would be mostly redundant with this file...  (We have this problem in
the current arm-smmu driver today).


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND v8] iopoll: Introduce memory-mapped IO polling macros

2014-11-24 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Cc: Arnd Bergmann a...@arndb.de
Cc: Andrew Morton a...@linux-foundation.org
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
This patch was originally part of a series [1] for adding support for IOMMU
address translations through an ARM SMMU hardware register.  The other
patch in the series (the one that actually uses these macros and implements
said hardware address translations) was Ack'd by the driver maintainer
there (Will Deacon) so I've pulled this patch out to avoid resending an
already Ack'd patch over and over again.

In short, please see [1] for previous discussion and the first user of
these macros.

This patch has been resent previously here [2], here [3], and here [4] on
2014-10-30, 2014-11-06, and 2014-11-17, respectively.  It has not changed
since [2] and has not received any comments since [1] on 2014-10-19.
Thanks to everyone who has taken a look at this.

[1] http://thread.gmane.org/gmane.linux.kernel.iommu/7140
[2] http://thread.gmane.org/gmane.linux.kernel/1818213
[3] http://thread.gmane.org/gmane.linux.kernel/1823422
[4] http://thread.gmane.org/gmane.linux.kernel.iommu/7394


Changes since v7:
  - sorted helper macros by size (b, w, l, q)
  - removed some of the more esoteric (or flat-out bogus) helper macros
---
 include/linux/iopoll.h | 140 +
 1 file changed, 140 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..bd161dae2d
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range((sleep_us  2) + 1, sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us

Re: [PATCH RESEND v8] iopoll: Introduce memory-mapped IO polling macros

2014-11-24 Thread Mitchel Humpherys
On Mon, Nov 24 2014 at 04:53:19 PM, Elliott, Robert (Server Storage) 
elli...@hp.com wrote:
 -Original Message-
 From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
 ow...@vger.kernel.org] On Behalf Of Mitchel Humpherys
 Sent: Monday, 24 November, 2014 2:15 PM
 ...
 From: Matt Wagantall ma...@codeaurora.org
 
 It is sometimes necessary to poll a memory-mapped register until its value
 satisfies some condition. Introduce a family of convenience macros that do
 this. Tight-looping, sleeping, and timing out can all be accomplished
 using these macros.
 
 ...
 +#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)   
 \
 +({ \
 +ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
 +might_sleep_if(sleep_us); \
 +for (;;) { \
 +(val) = op(addr); \
 +if (cond) \
 +break; \
 +if (timeout_us  ktime_compare(ktime_get(), timeout)  0) {
 \
 +(val) = op(addr); \
 +break; \
 +} \
 +if (sleep_us) \
 +usleep_range((sleep_us  2) + 1, sleep_us); \

 The hpsa SCSI driver used to use usleep_range in a loop like 
 that, but we found that it caused scheduler problems during
 boots because it uses TASK_UNINTERRUPTIBLE:
 [9.260668] [sched_delayed] sched: RT throttling activated

 msleep() worked much better.

Hmm, maybe you were just sleeping for too long?  According to
Documentation/timers/timers-howto.txt, usleep_range is what should be
used for non-atomic sleeps in the range [10us, 20ms].  Plus we need
microsecond granularity anyways, so msleep wouldn't cut it.

If there are any potential users of these macros that would want to
sleep for more than 20ms I guess we could add a special case here to use
msleep when sleep_us exceeds 20,000 or so.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND v8] iopoll: Introduce memory-mapped IO polling macros

2014-11-17 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Cc: Arnd Bergmann a...@arndb.de
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Sorry for any confusion regarding the genesis of this patch.  Let me try to
clarify the history here.  This patch was originally part of a series [1]
for adding support for IOMMU address translations through an ARM SMMU
hardware register.  The other patch in the series (the one that actually
uses these macros and implements said hardware address translations) was
Ack'd by the driver maintainer there (Will Deacon) so I've pulled this
patch out to avoid resending an already Ack'd patch over and over again.

In short, please see [1] for previous discussion and the first user of
these macros.

[1] http://thread.gmane.org/gmane.linux.kernel.iommu/7140

Changes since v7:
  - sorted helper macros by size (b, w, l, q)
  - removed some of the more esoteric (or flat-out bogus) helper macros
---
 include/linux/iopoll.h | 140 +
 1 file changed, 140 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..bd161dae2d
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,140 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range((sleep_us  2) + 1, sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * When available, you'll probably want to use one of the specialized
+ * macros defined below rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (delay_us) \
+   udelay(delay_us

Re: [PATCH 5/6] iommu/arm-smmu: support buggy implementations with invalidate-on-map

2014-11-14 Thread Mitchel Humpherys
On Thu, Nov 13 2014 at 01:48:26 AM, Will Deacon will.dea...@arm.com wrote:
 Ha, damn, then I don't have a user of the shiny new quirks field I added!
 I don't think I'll go as far as removing it altogether though...

I'm sure we'll be making liberal use of that field soon enough ;)



-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/6] iommu/arm-smmu: support buggy implementations with invalidate-on-map

2014-11-12 Thread Mitchel Humpherys
On Wed, Nov 12 2014 at 10:26:43 AM, Will Deacon will.dea...@arm.com wrote:
 Hi Mitch,

 On Wed, Aug 13, 2014 at 01:51:38AM +0100, Mitchel Humpherys wrote:
 Add a workaround for some buggy hardware that requires a TLB invalidate
 operation to occur at map time. Activate the feature with the
 qcom,smmu-invalidate-on-map boolean DT property.

 I'm digging up an old thread here, but I've been working on a new page-table
 allocator for the SMMU and looked into implementing this workaround for you
 in there. When I do the TLBI on map after installing the new PTE, can I just
 invalidate the range mapped by that PTE, or does it need to be a full TLBI?

I'm not totally sure on the history of the hardware errata but I believe
it's just the range mapped by that pte.  We use SMMU_CBn_TLBIVA in the
our smmu driver.

However, let's actually just drop this...  It's looking like the targets
we have that will use the arm-smmu driver thankfully won't need this
workaround.  Thanks for keeping this in mind though :)


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-30 Thread Mitchel Humpherys
On Thu, Oct 30 2014 at 05:00:23 AM, Arnd Bergmann a...@arndb.de wrote:
 On Thursday 30 October 2014 11:41:00 Will Deacon wrote:
  +
  +#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
  +
  +#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
  +
  +#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
  +
  +#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
  +
  +#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(readw, addr, val, cond, delay_us, timeout_us)
  +
  +#define readw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(readw, addr, val, cond, delay_us, timeout_us)
  +
  +#define readq_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(readq, addr, val, cond, delay_us, timeout_us)
  +
  +#define readq_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(readq, addr, val, cond, delay_us, timeout_us)

 Sort these by size (b, w, l, q) maybe?

Sure


  +#define ioread32_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(ioread32, addr, val, cond, delay_us, timeout_us)
  +
  +#define ioread32_poll_timeout_atomic(addr, val, cond, delay_us, 
  timeout_us) \
  +  readx_poll_timeout_atomic(ioread32, addr, val, cond, delay_us, 
  timeout_us)
  +
  +#define ioread32b3_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(ioread32b3, addr, val, cond, delay_us, timeout_us)
  +
  +#define ioread32b3_poll_timeout_atomic(addr, val, cond, delay_us, 
  timeout_us) \
  +  readx_poll_timeout_atomic(ioread32b3, addr, val, cond, delay_us, 
  timeout_us)

 What is ioread32b3?

Looks like it's a... typo!  It was supposed to be ioread32be.


  +#define inb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(inb, addr, val, cond, delay_us, timeout_us)
  +
  +#define inb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(inb, addr, val, cond, delay_us, timeout_us)
  +
  +#define inb_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout(inb_p, addr, val, cond, delay_us, timeout_us)
  +
  +#define inb_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
  +  readx_poll_timeout_atomic(inb_p, addr, val, cond, delay_us, timeout_us)

 I would leave out the _p variants, they are very rarely used anyway.

 Looking at the long list, I wonder if we should really define each variant,
 or just expect drivers to call readx_poll_timeout{,_atomic} directly and
 pass whichever accessor they want.

That sounds reasonable although I think we'd at least want to include
the readX family of functions.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-10-29 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v6:
  - added missing lock
  - fixed physical address mask
---
 drivers/iommu/arm-smmu.c | 80 +++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 60558f7949..c6f96ba3b1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +256,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1569,67 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+   unsigned long flags;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   spin_lock_irqsave(smmu_domain-lock, flags);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xfff)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 5, 50)) {
+   spin_unlock_irqrestore(smmu_domain-lock, flags);
+   dev_err(dev,
+   iova to phys timed out on 0x%pa. Falling back to 
software table walk.\n,
+   iova);
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32

[PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-29 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v6:
  - No changes. Resending due to changes in the the next patch in the series.
---
 include/linux/iopoll.h | 213 +
 1 file changed, 213 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..21dd41942b
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range((sleep_us  2) + 1, sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (delay_us) \
+   udelay(delay_us);   \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+
+#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
+
+#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout(readw, addr, val

[PATCH v7 0/2] iommu/arm-smmu: hard iova_to_phys

2014-10-29 Thread Mitchel Humpherys
This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v6..v7:

  - iopoll: no changes. resending series due to arm-smmu change.
  - arm-smmu: added missing lock and fixed physical address mask

v5..v6:

  - iopoll: use shift instead of divide
  - arm-smmu: no changes, resending series due to iopoll change.

v4..v5:

  - iopoll: Added support for other accessor functions
  - iopoll: Unified atomic and non-atomic interfaces
  - iopoll: Fixed erroneous `might_sleep'
  - arm-smmu: Lowered timeout and moved to new iopoll atomic interface

v3..v4:

  - Updated the iopoll commit message to reflect the patch better
  - Added locking around address translation op
  - Return 0 on iova_to_phys failure

v2..v3:

  - Removed unnecessary `dev_name's

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
suffix
  - Removed some convenience iopoll wrappers to encourage explicitness


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c |  80 +-
 include/linux/iopoll.h   | 213 +++
 2 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-21 Thread Mitchel Humpherys
On Tue, Oct 14 2014 at 02:53:29 PM, Mitchel Humpherys mitch...@codeaurora.org 
wrote:
 From: Matt Wagantall ma...@codeaurora.org

 It is sometimes necessary to poll a memory-mapped register until its value
 satisfies some condition. Introduce a family of convenience macros that do
 this. Tight-looping, sleeping, and timing out can all be accomplished using
 these macros.

 Cc: Thierry Reding thierry.red...@gmail.com
 Cc: Will Deacon will.dea...@arm.com
 Signed-off-by: Matt Wagantall ma...@codeaurora.org
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
 Changes since v5:
   - Use a shift instead of a divide in the poll loop.
 ---
  include/linux/iopoll.h | 213 
 +
  1 file changed, 213 insertions(+)
  create mode 100644 include/linux/iopoll.h

I realize I sent this at a bad time (ELCE) but were there any more
comments on this patch?


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/2] iommu/arm-smmu: hard iova_to_phys

2014-10-14 Thread Mitchel Humpherys
This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v5..v6:

  - iopoll: use shift instead of divide
  - arm-smmu: no changes, resending series due to iopoll change.

v4..v5:

  - iopoll: Added support for other accessor functions
  - iopoll: Unified atomic and non-atomic interfaces
  - iopoll: Fixed erroneous `might_sleep'
  - arm-smmu: Lowered timeout and moved to new iopoll atomic interface

v3..v4:

  - Updated the iopoll commit message to reflect the patch better
  - Added locking around address translation op
  - Return 0 on iova_to_phys failure

v2..v3:

  - Removed unnecessary `dev_name's

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
suffix
  - Removed some convenience iopoll wrappers to encourage explicitness


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c |  79 +-
 include/linux/iopoll.h   | 213 +++
 2 files changed, 291 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-10-14 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v5:
  - None. Re-sending series due to change in patch 1/2 in series
---
 drivers/iommu/arm-smmu.c | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..ef57043994 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +256,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1569,66 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+   unsigned long flags;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   spin_lock_irqsave(smmu_domain-lock, flags);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xfff)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 5, 50)) {
+   dev_err(dev,
+   iova to phys timed out on 0x%pa. Falling back to 
software table walk.\n,
+   iova);
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
+
+   spin_unlock_irqrestore(smmu_domain-lock, flags

[PATCH v5 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-10-10 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v4:
  - Lowered timeout and moved to new iopoll atomic interface
---
 drivers/iommu/arm-smmu.c | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..ef57043994 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +256,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1569,66 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+   unsigned long flags;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   spin_lock_irqsave(smmu_domain-lock, flags);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xfff)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 5, 50)) {
+   dev_err(dev,
+   iova to phys timed out on 0x%pa. Falling back to 
software table walk.\n,
+   iova);
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
+
+   spin_unlock_irqrestore(smmu_domain-lock, flags

[PATCH v5 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-10 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v4:
  - Added support for other accessor functions
  - Unified atomic and non-atomic interfaces
  - Fixed erroneous `might_sleep' (we were might_sleep()'ing on the wrong
variable)
---
 include/linux/iopoll.h | 213 +
 1 file changed, 213 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..b817cade6a
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)  \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(sleep_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   for (;;) { \
+   (val) = op(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = op(addr); \
+   break; \
+   } \
+   if (delay_us) \
+   udelay(delay_us);   \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+
+#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+   readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us

Re: [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-10 Thread Mitchel Humpherys
On Wed, Oct 08 2014 at 06:40:46 AM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 07 October 2014 18:47:59 Mitchel Humpherys wrote:
 On Wed, Oct 01 2014 at 01:25:33 AM, Arnd Bergmann a...@arndb.de wrote:
  On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
  + */
  +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
  +({ \
  +   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
  +   might_sleep_if(timeout_us); \
 
  Does it make sense to call this with timeout_us = 0?
 
 Yes, the idea there being to never timeout.  That mode should, of
 course, be used with extreme caution since never timing out is not
 really playing nice with the system.

 But then you certainly still 'might_sleep' here. The
 might_sleep_if(timeout_us) line suggests that it won't sleep, but
 that isn't the case.

Yes looks like that was actually a bug.  Should have been
might_sleep_if()'ing on sleep_us.  This is fixed in the v5 I just sent
out.


[...]

 Regarding the division, for the overwhelmingly common case where the
 user of the API passes in a constant for sleep_us the compiler optimizes
 out this calculation altogether and just sticks the final result in (I
 verified this with gcc 4.9 and the kernel build system's built-in
 support for generating .s files).  Conveying semantic meaning by using
 `DIV_ROUND_UP' is nice but if you feel strongly about it we can make
 this a shift instead.

 The more important question is probably if you want to keep the _ROUND_UP
 part. If that's not significant, I think a shift would be better.

If we drop the _ROUND_UP then passing a sleep_us = 4 would result in a
minimum sleep time of 0, so we'd be polling a lot faster than the user
had expected.



-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-09 Thread Mitchel Humpherys
On Tue, Oct 07 2014 at 06:47:59 PM, Mitchel Humpherys mitch...@codeaurora.org 
wrote:
 +#define readl_poll_timeout_atomic(addr, val, cond, max_reads, 
 time_between_us) \
 +({ \
 +   int count; \
 +   for (count = (max_reads); count  0; count--) { \
 +   (val) = readl(addr); \
 +   if (cond) \
 +   break; \
 +   udelay(time_between_us); \
 +   } \
 +   (cond) ? 0 : -ETIMEDOUT; \
 +})

 udelay has a large variability, I think it would be better to also use
 ktime_compare here and make the interface the same as the other one.
 You might want to add a warning if someone tries to pass more than a few
 microseconds as the timeout.

 Sounds good, will update in v5.

Except I'll probably hold off on adding a warning about udelay since
udelay already includes a warning (a compile error, actually) when
exceedingly large delays are requested.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-10-07 Thread Mitchel Humpherys
On Wed, Oct 01 2014 at 01:25:33 AM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 30 September 2014 18:28:12 Mitchel Humpherys wrote:
 + */
 +#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
 +({ \
 +   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
 +   might_sleep_if(timeout_us); \

 Does it make sense to call this with timeout_us = 0?

Yes, the idea there being to never timeout.  That mode should, of
course, be used with extreme caution since never timing out is not
really playing nice with the system.


 +   for (;;) { \
 +   (val) = readl(addr); \
 +   if (cond) \
 +   break; \
 +   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { 
 \
 +   (val) = readl(addr); \
 +   break; \
 +   } \
 +   if (sleep_us) \
 +   usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
 +   } \
 +   (cond) ? 0 : -ETIMEDOUT; \
 +})

 I think it would be better to tie the 'range' argument to the timeout. Also
 doing a division seems expensive here.

We may have cases where the HW spec says something like the foo widget
response time is on average 5us, with a worst case of 100us.  In such a
case we may want to poll the bit very frequently to optimize for the
common case of a very fast lock time, but we may not want to error out
due to a timeout unless we've been waiting 100us.

Regarding the division, for the overwhelmingly common case where the
user of the API passes in a constant for sleep_us the compiler optimizes
out this calculation altogether and just sticks the final result in (I
verified this with gcc 4.9 and the kernel build system's built-in
support for generating .s files).  Conveying semantic meaning by using
`DIV_ROUND_UP' is nice but if you feel strongly about it we can make
this a shift instead.


 +/**
 + * readl_poll_timeout_atomic - Periodically poll an address until a 
 condition is met or a timeout occurs
 + * @addr: Address to poll
 + * @val: Variable to read the value into
 + * @cond: Break condition (usually involving @val)
 + * @max_reads: Maximum number of reads before giving up
 + * @time_between_us: Time to udelay() between successive reads
 + *
 + * Returns 0 on success and -ETIMEDOUT upon a timeout.
 + */
 +#define readl_poll_timeout_atomic(addr, val, cond, max_reads, 
 time_between_us) \
 +({ \
 +   int count; \
 +   for (count = (max_reads); count  0; count--) { \
 +   (val) = readl(addr); \
 +   if (cond) \
 +   break; \
 +   udelay(time_between_us); \
 +   } \
 +   (cond) ? 0 : -ETIMEDOUT; \
 +})

 udelay has a large variability, I think it would be better to also use
 ktime_compare here and make the interface the same as the other one.
 You might want to add a warning if someone tries to pass more than a few
 microseconds as the timeout.

Sounds good, will update in v5.


 More generally speaking, using 'readl' seems fairly specific. I suspect
 that we'd have to add the entire range of accessors over time if this
 catches on: readb, readw, readq, readb_relaxed, readw_relaxed, readl_relaxed,
 readq_relaxed, ioread8, ioread16, ioread16be, ioread32, ioread32be,
 inb, inb_p, inw, inw_p, inw, inl, inl_p, and possibly more of those.

 Would it make sense to pass that operation as an argument?

Sure, we'll do that in v5 as well.



-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC][PATCH 2/2] Add support of the IOMMU_DEVICE flag.

2014-10-06 Thread Mitchel Humpherys
On Mon, Oct 06 2014 at 03:28:16 AM, Varun Sethi varun.se...@freescale.com 
wrote:
 This flag is used for specifying access to device memory. SMMU would apply
 device memory attributes for a DMA transaction. This is required for setting
 access to GIC registers, for generating message interrupts. This would ensure 
 that 

Nit: long line and trailing whitespace.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-10-01 Thread Mitchel Humpherys
On Wed, Oct 01 2014 at 01:27:27 AM, Arnd Bergmann a...@arndb.de wrote:
 On Tuesday 30 September 2014 18:28:13 Mitchel Humpherys wrote:
 +   if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
 +   !(tmp  ATSR_ACTIVE), 50, 100)) {
 

 This looks really bad.

 You are doing up to 50 100us delays, each of which can be much longer,
 so you can do up to 10ms total delay with interrupts disabled.

 Don't do that.

Oh wow somehow I forgot I was in atomic context even though I was
explicitly moving to the `_atomic' polling function in this version.
Don't ask.

Let me ratchet that down to a maximum of 10 delays of 5 microseconds
each for v5.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/2] iommu/arm-smmu: hard iova_to_phys

2014-09-30 Thread Mitchel Humpherys
This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
suffix
  - Removed some convenience iopoll wrappers to encourage explicitness

v2..v3:

  - Removed unnecessary `dev_name's

v3..v4:

  - Updated the iopoll commit message to reflect the patch better
  - Added locking around address translation op
  - Return 0 on iova_to_phys failure


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c | 79 +++-
 include/linux/iopoll.h   | 77 ++
 2 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-30 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v3:

  - Added locking around address translation op
  - Return 0 on iova_to_phys failure
---
 drivers/iommu/arm-smmu.c | 79 +++-
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..c80c12a104 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +256,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1569,66 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+   unsigned long flags;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   spin_lock_irqsave(smmu_domain-lock, flags);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xfff;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xfff)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 50, 100)) {
+   dev_err(dev,
+   iova to phys timed out on 0x%pa. Falling back to 
software table walk.\n,
+   iova);
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
+
+   spin_unlock_irqrestore

[PATCH v4 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-09-30 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Changes since v3:

  - Updated commit message to better reflect the patch content
---
 include/linux/iopoll.h | 77 ++
 1 file changed, 77 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..594b0d4f03
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readl_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ */
+#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(timeout_us); \
+   for (;;) { \
+   (val) = readl(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = readl(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readl_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @max_reads: Maximum number of reads before giving up
+ * @time_between_us: Time to udelay() between successive reads
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout.
+ */
+#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) 
\
+({ \
+   int count; \
+   for (count = (max_reads); count  0; count--) { \
+   (val) = readl(addr); \
+   if (cond) \
+   break; \
+   udelay(time_between_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+#endif /* _LINUX_IOPOLL_H */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-30 Thread Mitchel Humpherys
On Tue, Sep 30 2014 at 03:23:34 AM, Will Deacon will.dea...@arm.com wrote:
 Hi Mitch,

 On Sun, Sep 28, 2014 at 04:27:29AM +0100, Mitchel Humpherys wrote:
 Currently, we provide the iommu_ops.iova_to_phys service by doing a
 table walk in software to translate IO virtual addresses to physical
 addresses. On SMMUs that support it, it can be useful to ask the SMMU
 itself to do the translation. This can be used to warm the TLBs for an
 SMMU. It can also be useful for testing and hardware validation.
 
 Since the address translation registers are optional on SMMUv2, only
 enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
 and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

 [...]

 +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 +dma_addr_t iova)
 +{
 +struct arm_smmu_domain *smmu_domain = domain-priv;
 +struct arm_smmu_device *smmu = smmu_domain-smmu;
 +struct arm_smmu_cfg *cfg = smmu_domain-cfg;
 +struct device *dev = smmu-dev;
 +void __iomem *cb_base;
 +u32 tmp;
 +u64 phys;
 +
 +cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
 +
 +if (smmu-version == 1) {
 +u32 reg = iova  ~0xFFF;

 Cosmetic comment, but hex constants are lowercase everywhere else in the
 file.

Ah, woops.  Let me fix that.


 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
 +} else {
 +u32 reg = iova  ~0xFFF;
 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
 +reg = (iova  ~0xFFF)  32;
 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
 +}
 +
 +if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp,
 +!(tmp  ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) {
 +dev_err(dev,
 +iova to phys timed out on 0x%pa. Falling back to 
 software table walk.\n,
 +iova);
 +return arm_smmu_iova_to_phys_soft(domain, iova);
 +}
 +
 +phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
 +phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;

 The absence of locking in this function concerns me a bit. For the software
 implementation, we're just reading page tables, but here we're writing ATS
 registers and I think we need to ensure serialisation against another
 iova_to_phys on the same domain.

Good catch, let me take the domain lock here.  I'll also have to move to
readl_poll_timeout_atomic since the domain lock is a spinlock.


 +if (phys  CB_PAR_F) {
 +dev_err(dev, translation fault!\n);
 +dev_err(dev, PAR = 0x%llx\n, phys);
 +}
 +phys = (phys  0xFFF000ULL) | (iova  0x0FFF);
 +
 +return phys;

 You can return phys == 0 on failure (at least, the callers in kvm and vfio
 treat this as an error).

Ah yes, I agree that a 0 return value from iommu_iova_to_phys appears to
be treated as an error.  Let me fix that.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-09-29 Thread Mitchel Humpherys
On Mon, Sep 29 2014 at 01:31:37 AM, Thierry Reding thierry.red...@gmail.com 
wrote:
 On Sat, Sep 27, 2014 at 08:27:28PM -0700, Mitchel Humpherys wrote:
 From: Matt Wagantall ma...@codeaurora.org
 
 It is sometimes necessary to poll a memory-mapped register until its
 value satisfies some condition. Introduce a family of convenience macros
 that do this. Tight-loop and sleeping versions are provided with and
 without timeouts.
 
 Cc: Thierry Reding thierry.red...@gmail.com
 Cc: Will Deacon will.dea...@arm.com
 Signed-off-by: Matt Wagantall ma...@codeaurora.org
 Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
 ---
  include/linux/iopoll.h | 77 
 ++
  1 file changed, 77 insertions(+)
  create mode 100644 include/linux/iopoll.h

 It would be good to provide a changelog with each new version of the
 patch. As it is I now have v2 and v3 of this patch in my inbox and I
 have no idea what the differences are, so I'd need to download both
 and run them through interdiff to find out.

Yeah I put the changelog in the cover letter.  There were no changes on
this patch, though I admit that wasn't entirely clear now re-reading the
cover letter text.  I also didn't account for the fact that you probably
aren't reading the whole series since I only Cc'd you on this patch, not
the whole series.  In any case, I probably shouldn't have re-sent the
whole series after one minor modification to one patch in the series.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] iommu/arm-smmu: hard iova_to_phys

2014-09-27 Thread Mitchel Humpherys
This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

Changes since v1:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
suffix
  - Removed some convenience iopoll wrappers to encourage explicitness


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c | 73 -
 include/linux/iopoll.h   | 77 
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-27 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 73 +++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..7c4629cafd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,17 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
+#define ATSR_LOOP_TIMEOUT  100 /* 1s! */
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +257,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +378,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1537,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1570,59 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xFFF;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xFFF;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xFFF)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) {
+   dev_err(dev,
+   iova to phys timed out on 0x%pa for %s. Falling back 
to software table walk.\n,
+   iova, dev_name(dev));
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
+
+   if (phys  CB_PAR_F) {
+   dev_err(dev, translation fault on %s!\n, dev_name(dev));
+   dev_err(dev, PAR = 0x

Re: [PATCH v2 0/2] iommu/arm-smmu: hard iova_to_phys

2014-09-27 Thread Mitchel Humpherys
On Sat, Sep 27 2014 at 02:31:51 PM, Mitchel Humpherys mitch...@codeaurora.org 
wrote:
 This series introduces support for performing iova-to-phys translations via
 the ARM SMMU hardware on supported implementations. We also make use of
 some new generic macros for polling hardware registers.

 Changes since v1:

   - Renamed one of the iopoll macros to use the more standard `_atomic'
 suffix
   - Removed some convenience iopoll wrappers to encourage explicitness

Hold on, just remembered there was another comment on the iova_to_phys
patch.  v3 is en route...



-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/2] iopoll: Introduce memory-mapped IO polling macros

2014-09-27 Thread Mitchel Humpherys
From: Matt Wagantall ma...@codeaurora.org

It is sometimes necessary to poll a memory-mapped register until its
value satisfies some condition. Introduce a family of convenience macros
that do this. Tight-loop and sleeping versions are provided with and
without timeouts.

Cc: Thierry Reding thierry.red...@gmail.com
Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Matt Wagantall ma...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 include/linux/iopoll.h | 77 ++
 1 file changed, 77 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 00..594b0d4f03
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/hrtimer.h
+#include linux/delay.h
+#include linux/errno.h
+#include linux/io.h
+
+/**
+ * readl_poll_timeout - Periodically poll an address until a condition is met 
or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ */
+#define readl_poll_timeout(addr, val, cond, sleep_us, timeout_us) \
+({ \
+   ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+   might_sleep_if(timeout_us); \
+   for (;;) { \
+   (val) = readl(addr); \
+   if (cond) \
+   break; \
+   if (timeout_us  ktime_compare(ktime_get(), timeout)  0) { \
+   (val) = readl(addr); \
+   break; \
+   } \
+   if (sleep_us) \
+   usleep_range(DIV_ROUND_UP(sleep_us, 4), sleep_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readl_poll_timeout_atomic - Periodically poll an address until a condition 
is met or a timeout occurs
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @max_reads: Maximum number of reads before giving up
+ * @time_between_us: Time to udelay() between successive reads
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout.
+ */
+#define readl_poll_timeout_atomic(addr, val, cond, max_reads, time_between_us) 
\
+({ \
+   int count; \
+   for (count = (max_reads); count  0; count--) { \
+   (val) = readl(addr); \
+   if (cond) \
+   break; \
+   udelay(time_between_us); \
+   } \
+   (cond) ? 0 : -ETIMEDOUT; \
+})
+
+#endif /* _LINUX_IOPOLL_H */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/2] iommu/arm-smmu: hard iova_to_phys

2014-09-27 Thread Mitchel Humpherys
This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
suffix
  - Removed some convenience iopoll wrappers to encourage explicitness

v2..v3:

  - Remomved unnecessary `dev_name's


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c | 73 -
 include/linux/iopoll.h   | 77 
 2 files changed, 149 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-27 Thread Mitchel Humpherys
Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 73 +++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 37dc3dd0df..934870b593 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include linux/interrupt.h
 #include linux/io.h
 #include linux/iommu.h
+#include linux/iopoll.h
 #include linux/mm.h
 #include linux/module.h
 #include linux/of.h
@@ -140,6 +141,7 @@
 #define ID0_S2TS   (1  29)
 #define ID0_NTS(1  28)
 #define ID0_SMS(1  27)
+#define ID0_ATOSNS (1  26)
 #define ID0_PTFS_SHIFT 24
 #define ID0_PTFS_MASK  0x2
 #define ID0_PTFS_V8_ONLY   0x2
@@ -233,11 +235,17 @@
 #define ARM_SMMU_CB_TTBR0_HI   0x24
 #define ARM_SMMU_CB_TTBCR  0x30
 #define ARM_SMMU_CB_S1_MAIR0   0x38
+#define ARM_SMMU_CB_PAR_LO 0x50
+#define ARM_SMMU_CB_PAR_HI 0x54
 #define ARM_SMMU_CB_FSR0x58
 #define ARM_SMMU_CB_FAR_LO 0x60
 #define ARM_SMMU_CB_FAR_HI 0x64
 #define ARM_SMMU_CB_FSYNR0 0x68
 #define ARM_SMMU_CB_S1_TLBIASID0x610
+#define ARM_SMMU_CB_ATS1PR_LO  0x800
+#define ARM_SMMU_CB_ATS1PR_HI  0x804
+#define ARM_SMMU_CB_ATSR   0x8f0
+#define ATSR_LOOP_TIMEOUT  100 /* 1s! */
 
 #define SCTLR_S1_ASIDPNE   (1  12)
 #define SCTLR_CFCFG(1  7)
@@ -249,6 +257,10 @@
 #define SCTLR_M(1  0)
 #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F   (1  0)
+
+#define ATSR_ACTIVE(1  0)
+
 #define RESUME_RETRY   (0  0)
 #define RESUME_TERMINATE   (1  0)
 
@@ -366,6 +378,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1 (1  2)
 #define ARM_SMMU_FEAT_TRANS_S2 (1  3)
 #define ARM_SMMU_FEAT_TRANS_NESTED (1  4)
+#define ARM_SMMU_FEAT_TRANS_OPS(1  5)
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1  0)
@@ -1524,7 +1537,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 dma_addr_t iova)
 {
pgd_t *pgdp, pgd;
@@ -1557,6 +1570,59 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
return __pfn_to_phys(pte_pfn(pte)) | (iova  ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+   dma_addr_t iova)
+{
+   struct arm_smmu_domain *smmu_domain = domain-priv;
+   struct arm_smmu_device *smmu = smmu_domain-smmu;
+   struct arm_smmu_cfg *cfg = smmu_domain-cfg;
+   struct device *dev = smmu-dev;
+   void __iomem *cb_base;
+   u32 tmp;
+   u64 phys;
+
+   cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
+
+   if (smmu-version == 1) {
+   u32 reg = iova  ~0xFFF;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   } else {
+   u32 reg = iova  ~0xFFF;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+   reg = (iova  ~0xFFF)  32;
+   writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+   }
+
+   if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp,
+   !(tmp  ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) {
+   dev_err(dev,
+   iova to phys timed out on 0x%pa. Falling back to 
software table walk.\n,
+   iova);
+   return arm_smmu_iova_to_phys_soft(domain, iova);
+   }
+
+   phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+   phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
+
+   if (phys  CB_PAR_F) {
+   dev_err(dev, translation fault!\n);
+   dev_err(dev, PAR = 0x%llx\n, phys);
+   }
+   phys = (phys

Re: [PATCH 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-26 Thread Mitchel Humpherys
On Fri, Sep 26 2014 at 03:24:30 AM, Will Deacon will.dea...@arm.com wrote:
 On Wed, Sep 24, 2014 at 09:34:26PM +0100, Mitchel Humpherys wrote:
 On Wed, Sep 24 2014 at 09:37:12 AM, Will Deacon will.dea...@arm.com wrote:
  On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote:
  On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon will.dea...@arm.com 
  wrote:
   On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote:
   +  return arm_smmu_iova_to_phys_soft(domain, iova);
   +  }
   +
   +  phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
   +  phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  
   32;
   +
   +  if (phys  CB_PAR_F) {
   +  dev_err(dev, translation fault on %s!\n, 
   dev_name(dev));
   +  dev_err(dev, PAR = 0x%llx\n, phys);
   +  }
   +  phys = (phys  0xFFF000ULL) | (iova  0x0FFF);
  
   How does this work for 64k pages?
  
  So at the moment we're always assuming that we're using v7/v8 long
  descriptor format, right?  All I see in the spec (14.5.15 SMMU_CBn_PAR)
  is that bits[47:12]=PA[47:12]...  Or am I missing something completely?
 
  I think you've got 64k pages confused with the short-descriptor format.
 
  When we use 64k pages with long descriptors, you're masked off bits 15-12 
  of
  the iova above, so you'll have a hole in the physical address afaict.
 
 Even with long descriptors the spec says bits 15-12 should come from
 CB_PAR...  It makes no mention of reinterpreting those bits depending on
 the programmed page granule.  The only thing I can conclude from the
 spec is that hardware should be smart enough to do the right thing with
 bits 15-12 when the page granule is 64k.  Although even if hardware is
 smart enough I guess CB_PAR[15:12] should be the same as iova[15:12] for
 the 64k case?

 Yeah, fair enough, the code you have should work correctly then.
 Unfortunately, I don't have any suitable hardware on which to test it.

FWIW, I have tested this on a few platforms here.  I'll send out a v2
for the series then with the changes you suggested on the iopoll patch.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-24 Thread Mitchel Humpherys
On Wed, Sep 24 2014 at 09:37:12 AM, Will Deacon will.dea...@arm.com wrote:
 On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote:
 On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon will.dea...@arm.com wrote:
  On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote:
  + return arm_smmu_iova_to_phys_soft(domain, iova);
  + }
  +
  + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
  + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
  +
  + if (phys  CB_PAR_F) {
  + dev_err(dev, translation fault on %s!\n, dev_name(dev));
  + dev_err(dev, PAR = 0x%llx\n, phys);
  + }
  + phys = (phys  0xFFF000ULL) | (iova  0x0FFF);
 
  How does this work for 64k pages?
 
 So at the moment we're always assuming that we're using v7/v8 long
 descriptor format, right?  All I see in the spec (14.5.15 SMMU_CBn_PAR)
 is that bits[47:12]=PA[47:12]...  Or am I missing something completely?

 I think you've got 64k pages confused with the short-descriptor format.

 When we use 64k pages with long descriptors, you're masked off bits 15-12 of
 the iova above, so you'll have a hole in the physical address afaict.

Even with long descriptors the spec says bits 15-12 should come from
CB_PAR...  It makes no mention of reinterpreting those bits depending on
the programmed page granule.  The only thing I can conclude from the
spec is that hardware should be smart enough to do the right thing with
bits 15-12 when the page granule is 64k.  Although even if hardware is
smart enough I guess CB_PAR[15:12] should be the same as iova[15:12] for
the 64k case?


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

2014-09-23 Thread Mitchel Humpherys
On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon will.dea...@arm.com wrote:
 Hi Mitch,

 On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote:
 Currently, we provide the iommu_ops.iova_to_phys service by doing a
 table walk in software to translate IO virtual addresses to physical
 addresses. On SMMUs that support it, it can be useful to ask the SMMU
 itself to do the translation. This can be used to warm the TLBs for an
 SMMU. It can also be useful for testing and hardware validation.
 
 Since the address translation registers are optional on SMMUv2, only
 enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
 and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

 [...]

 +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 +dma_addr_t iova)
 +{
 +struct arm_smmu_domain *smmu_domain = domain-priv;
 +struct arm_smmu_device *smmu = smmu_domain-smmu;
 +struct arm_smmu_cfg *cfg = smmu_domain-cfg;
 +struct device *dev = smmu-dev;
 +void __iomem *cb_base;
 +u32 tmp;
 +u64 phys;
 +
 +cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg-cbndx);
 +
 +if (smmu-version == 1) {
 +u32 reg = iova  ~0xFFF;
 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
 +} else {
 +u32 reg = iova  ~0xFFF;
 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
 +reg = (iova  ~0xFFF)  32;
 +writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
 +}
 +
 +if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp,
 +!(tmp  ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) {
 +dev_err(dev,
 +iova to phys timed out on 0x%pa for %s. Falling back 
 to software table walk.\n,
 +iova, dev_name(dev));

 dev_err already prints the device name.

Ah of course.  I'll remove the dev_name.


 +return arm_smmu_iova_to_phys_soft(domain, iova);
 +}
 +
 +phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
 +phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI))  32;
 +
 +if (phys  CB_PAR_F) {
 +dev_err(dev, translation fault on %s!\n, dev_name(dev));
 +dev_err(dev, PAR = 0x%llx\n, phys);
 +}
 +phys = (phys  0xFFF000ULL) | (iova  0x0FFF);

 How does this work for 64k pages?

So at the moment we're always assuming that we're using v7/v8 long
descriptor format, right?  All I see in the spec (14.5.15 SMMU_CBn_PAR)
is that bits[47:12]=PA[47:12]...  Or am I missing something completely?

As a mental note, if we add support for v7 short descriptors (which we
would like to do sometime soon) then we'll have to handle the
supersection case here as well.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu: fix bug in pmd construction

2014-09-19 Thread Mitchel Humpherys
We are using the same pfn for every pte we create while constructing the
pmd. Fix this by actually updating the pfn on each iteration of the pmd
construction loop.

It's not clear if we can actually hit this bug right now since iommu_map
splits up the calls to .map based on the page size, so we only ever seem to
iterate this loop once. However, things might change in the future that
might cause us to hit this.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
Will, I was unable to come up with a test case to hit this bug based on
what I said in the commit message above. Not sure if my analysis is
completely off base, my head is still spinning from all these page tables
:).
---
 drivers/iommu/arm-smmu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ca18d6d42a..eba4cb390c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1368,6 +1368,7 @@ static int arm_smmu_alloc_init_pmd(struct arm_smmu_device 
*smmu, pud_t *pud,
ret = arm_smmu_alloc_init_pte(smmu, pmd, addr, next, pfn,
  prot, stage);
phys += next - addr;
+   pfn = __phys_to_pfn(phys);
} while (pmd++, addr = next, addr  end);
 
return ret;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu: add IOMMU_PRIV flag for access-protected mappings

2014-09-17 Thread Mitchel Humpherys
Some IOMMUs support access-protected mappings. Add a mapping flag to
indicate that the mapping should be created with access protection
configured.

Cc: Shubhraprakash Das sa...@codeaurora.org
Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 20f9a52792..44101c9332 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -28,6 +28,7 @@
 #define IOMMU_WRITE(1  1)
 #define IOMMU_CACHE(1  2) /* DMA cache coherency */
 #define IOMMU_EXEC (1  3)
+#define IOMMU_PRIV (1  4)
 
 struct iommu_ops;
 struct iommu_group;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Add access-protected IOMMU mappings

2014-09-17 Thread Mitchel Humpherys
This series introduces a new mapping flag to indicate that the mapping
should be created with access protection applied. Support for this new flag
is then added to the ARM SMMU driver.

Mitchel Humpherys (2):
  iommu: add IOMMU_PRIV flag for access-protected mappings
  iommu/arm-smmu: add support for access-protected mappings

 drivers/iommu/arm-smmu.c | 5 +++--
 include/linux/iommu.h| 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/arm-smmu: add support for access-protected mappings

2014-09-17 Thread Mitchel Humpherys
ARM SMMUs support memory access control via some bits in the translation
table descriptor memory attributes. Currently we assume all translations
are unprivileged. Add support for privileged mappings, controlled by
the IOMMU_PRIV prot flag.

Also sneak in a whitespace change for consistency with nearby code.

Signed-off-by: Mitchel Humpherys mitch...@codeaurora.org
---
 drivers/iommu/arm-smmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ca18d6d42a..93999ec22c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1256,10 +1256,11 @@ static int arm_smmu_alloc_init_pte(struct 
arm_smmu_device *smmu, pmd_t *pmd,
}
 
if (stage == 1) {
-   pteval |= ARM_SMMU_PTE_AP_UNPRIV | ARM_SMMU_PTE_nG;
+   pteval |= ARM_SMMU_PTE_nG;
+   if (!(prot  IOMMU_PRIV))
+   pteval |= ARM_SMMU_PTE_AP_UNPRIV;
if (!(prot  IOMMU_WRITE)  (prot  IOMMU_READ))
pteval |= ARM_SMMU_PTE_AP_RDONLY;
-
if (prot  IOMMU_CACHE)
pteval |= (MAIR_ATTR_IDX_CACHE 
   ARM_SMMU_PTE_ATTRINDX_SHIFT);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/6] iommu/arm-smmu: add support for specifying clocks

2014-09-15 Thread Mitchel Humpherys
On Wed, Sep 10 2014 at 12:09:06 PM, Mitchel Humpherys mitch...@codeaurora.org 
wrote:
 On Wed, Sep 10 2014 at 11:27:39 AM, Will Deacon will.dea...@arm.com wrote:
 On Wed, Sep 10, 2014 at 02:29:42AM +0100, Mitchel Humpherys wrote:
 On Tue, Aug 26 2014 at 07:27:58 AM, Will Deacon will.dea...@arm.com wrote:
  On Tue, Aug 19, 2014 at 08:03:09PM +0100, Olav Haugan wrote:
  Clients of the SMMU driver are required to vote for clocks and power
  when they know they need to use the SMMU. However, the clock and power
  needed to be on for the SMMU to service bus masters aren't necessarily
  the same as the ones needed to read/write registers...See below.
 
  The case I'm thinking of is where a device masters through the IOMMU, but
  doesn't make use of any translations. In this case, its transactions will
  bypass the SMMU and I want to ensure that continues to happen, regardless 
  of
  the power state of the SMMU.
 
 Then I assume the driver for such a device wouldn't be attaching to (or
 detaching from) the IOMMU, so we won't be touching it at all either
 way. Or am I missing something?

 As long as its only the register file that gets powered down, then there's
 no issue. However, that's making assumptions about what these clocks are
 controlling. Is there a way for the driver to know which aspects of the
 device are controlled by which clock?

 Yes, folks should only be putting config clocks here. In our system,
 at least, the clocks for configuring the SMMU are different than those
 for using it. Maybe I should make a note about what kinds of clocks
 can be specified here in the bindings (i.e. only those that can be
 safely disabled while still allowing translations to occur)?

Let me amend this statement slightly.  Folks should be putting all
clocks necessary to program SMMU registers here.  On our system, this
actually does include the core clocks in addition to the config
clocks.  Clients won't vote for config clocks since they have no
business programming SMMU registers, so those will get shut down when we
remove our vote for them.  Clients *should* hold their votes for core
clocks for as long as they want to use the SMMU.  Also, for the bypass
case, clients should be voting for clocks and power for the SMMU
themselves.

In light of all this I guess there isn't really anything to say in the
DT bindings.


-Mitch

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   >