intel-iommu: Is this a bug?

2018-03-26 Thread Tushar Dave

Hi,

I am analyzing network performance with intel-iommu enabled.
And found that running with iommu=pt, for every dma map/unmap it
executes this code:

/*
 * At boot time, we don't yet know if devices will be 64-bit 
capable.

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
if (!startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.
 */
u64 dma_mask = *dev->dma_mask;

if (dev->coherent_dma_mask &&
dev->coherent_dma_mask < dma_mask)
dma_mask = dev->coherent_dma_mask;

return dma_mask >= dma_get_required_mask(dev);
}


Do we really need this check for every dma/unmap?
Considering it should be only during startup, shouldn't it be,

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01..3c8f14e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2929,7 +2929,7 @@ static int iommu_should_identity_map(struct device 
*dev, int startup)

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
-   if (!startup) {
+   if (startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.


Thanks.

-Tushar
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/5] Add debugfs info for the AMD IOMMU

2018-03-26 Thread Gary R Hook

On 03/15/2018 08:58 AM, Joerg Roedel wrote:

On Wed, Mar 14, 2018 at 06:04:44PM -0500, Gary R Hook wrote:

Gary R Hook (5):
   iommu/amd - Add debugfs support
   iommu/amd - Add a 'verbose' switch for IOMMU debugfs
   iommu/amd - Add a README variable for the IOMMU debugfs
   iommu/amd - Expose the active IOMMU device table entries
   iommu/amd - Add a debugfs entry to specify a IOMMU device table entry


Same problem here as with the Intel patches, I don't think it's a good
idea to reveal internal iommu data structures to user-space in this way.


I'm afraid I'm not following the line of thought here. AFAIK, if we 
restrict access of any debugfs info to root, then only root can see any 
of that information. If root is compromised, the whole system is 
compromised. Which seems to me to make all of debugfs suspect from a 
security perspective. Therefore, I'm likely not understanding the 
security concerns as they relate specifically to the IOMMU.


As for stability, you mention on the Intel IOMMU thread issues 
surrounding kABI, which I appreciate. But I'm exposing well-documented 
device structures in my patches, not kernel structures. There's nothing 
there that is going to change underneath me without my knowledge, if 
ever (I vote for never). And I'm likely ignorant about policy nuances 
surrounding the kernel ABI.



I've debugged iommu issues for around 10 years now and never had the
need for an interface that reveals those internals. How exactly are you
planning to use this information?


Well, I've already had to dig into the DTEs and page tables for a couple 
of reasons (both debug and development), and it made life easier (for 
me) to make the live data available this way. Then I thought that others 
might be interested as well. Admittedly, I could be wrong.


Finally, I'm no guru, and likely am unaware of other techniques in which 
I should develop skills. I'm always open to input.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/ipmmu-vmsa: Document R-Car M3-N IPMMU DT bindings

2018-03-26 Thread Rob Herring
On Tue, Mar 20, 2018 at 03:48:11PM +0900, Magnus Damm wrote:
> From: Magnus Damm 
> 
> Update the IPMMU DT binding documentation to include the r8a77965 compat
> string for the IPMMU devices included in the R-Car M3-N SoC.
> 
> Signed-off-by: Magnus Damm 
> ---
> 
>  Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt |1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Rob Herring 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: swiotlb_{alloc,free}_buffer should depend on CONFIG_DMA_DIRECT_OPS

2018-03-26 Thread Christoph Hellwig
On Sat, Mar 24, 2018 at 04:05:45PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > Otherwise we might get unused symbol warnings for configs that built
> > > > swiotlb.c only for use by xen-swiotlb.c and that don't otherwise select
> > > > CONFIG_DMA_DIRECT_OPS, which is possible on arm.
> > > > 
> > > > Fixes: 16e73adbca76 ("dma/swiotlb: Remove 
> > > > swiotlb_{alloc,free}_coherent()")
> > > > Reported-by: Stephen Rothwell 
> > > > Signed-off-by: Christoph Hellwig 
> > > 
> > > 
> > > Alternatively could we set the Kconfig to slect DMA_DIRECT_OPS?
> > 
> > 
> > IFF we build swiotlb.c only for xen-swiotlb we don't need DMA_DIRECT_OPS.
> 
> I don't think there is ever an case where you want a Xen specific build.

arm never uses swiotlb directly, so any swiotlb.c build on arm is purely
for xen-swiotlb.c
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Is adding offset to dma address is valid operation?

2018-03-26 Thread Robin Murphy

On 26/03/18 13:31, Harsh Jain wrote:



On 26-03-2018 17:15, Robin Murphy wrote:

Hi Harsh,

On 26/03/18 10:55, Harsh Jain wrote:

Hi,

Can we add  offset to dma address received from IOMMU in
scatter/Gather list before sending it to H/W.

Address to HW = sg_dma_address(sg) + offset, where 0 < offset <
sg_dma_len(sg).


sg_dma_address() should already account for sg->offset (i.e. the
start of the buffer within the page. If there's a DMA API
implementation failing to respect that then that's a bug.


I need this operation to make sure our H/W does not receives
entry of size greater than 2K.


If however it's just that you want to fragment a single SG entry
into multiple commaneds/descriptors/etc. internally to your driver,
then I can't see that being an issue as long as you don't visibly
modify the scatterlist itself. If there's a genuine length and/or
alignment limitation in the hardware, though, it would be better to
set up dev->dma_parms appropriately to describe them, such that the
subsystem responsible for generating the scatterlist respects the
appropriate constraints in the first place (and if it doesn't, then
that's another bug to fix).

Can dma_parms  handle 2K size limit ? Because dma_map_sg() returns <=
passed nents. But in my case nents will increase.


The point is that dma_map_sg() should still never increase nents, 
because the guy *creating* the scatterlist is also expected to respect 
dma_parms where it exists, such that nents would be appropriate to start 
with. That said, it does look like there are some places (e.g. the block 
layer) which may not cope properly with limits less than PAGE_SIZE, so I 
guess the reliability of dma_parms might depend on how much you want to 
go spelunking in core code :/


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/7] iommu/arm-smmu-v3: Support 52-bit virtual address

2018-03-26 Thread Robin Murphy
Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
really all that's involved is letting io-pgtable know the appropriate
upper bound for T0SZ.

Signed-off-by: Robin Murphy 
---

v3: Fix IDR5.VAX being a 2-bit field, and slight reorganisation of the
stage 1 IAS logic. Since constraining to VA_BITS turns out to be an
SMMUv3-specific behaviour, and would introduce an undesirable arm64
dependency to io-pgtable, I ended up leaving it here.

 drivers/iommu/arm-smmu-v3.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e0d46661c153..b0e23d6bd1ef 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -92,6 +92,8 @@
 #define IDR5_OAS_44_BIT4
 #define IDR5_OAS_48_BIT5
 #define IDR5_OAS_52_BIT6
+#define IDR5_VAX   GENMASK(11, 10)
+#define IDR5_VAX_52_BIT1
 
 #define ARM_SMMU_CR0   0x20
 #define CR0_CMDQEN (1 << 3)
@@ -551,6 +553,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS   (1 << 11)
 #define ARM_SMMU_FEAT_HYP  (1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE  (1 << 13)
+#define ARM_SMMU_FEAT_VAX  (1 << 14)
u32 features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
@@ -1591,7 +1594,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
 
switch (smmu_domain->stage) {
case ARM_SMMU_DOMAIN_S1:
-   ias = VA_BITS;
+   ias = (smmu->features & ARM_SMMU_FEAT_VAX) ? 52 : 48;
+   ias = min_t(unsigned long, ias, VA_BITS);
oas = smmu->ias;
fmt = ARM_64_LPAE_S1;
finalise_stage_fn = arm_smmu_domain_finalise_s1;
@@ -2634,6 +2638,10 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
+   /* Input address size */
+   if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT)
+   smmu->features |= ARM_SMMU_FEAT_VAX;
+
/* Output address size */
switch (FIELD_GET(IDR5_OAS, reg)) {
case IDR5_OAS_32_BIT:
-- 
2.16.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/7] iommu/arm-smmu-v3: Clean up table definitions

2018-03-26 Thread Robin Murphy
As with registers, use GENMASK and the bitfield accessors consistently
for table fields, to save some lines and ease maintenance a little. This
also catches a subtle off-by-one wherein bit 5 of CD.T0SZ was missing.

Signed-off-by: Robin Murphy 
---

v3: New

 drivers/iommu/arm-smmu-v3.c | 147 +---
 1 file changed, 58 insertions(+), 89 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 40a19ce03f99..b1dc7d7cbbb5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -207,54 +207,46 @@
 #define STRTAB_SPLIT   8
 
 #define STRTAB_L1_DESC_DWORDS  1
-#define STRTAB_L1_DESC_SPAN_SHIFT  0
-#define STRTAB_L1_DESC_SPAN_MASK   0x1fUL
+#define STRTAB_L1_DESC_SPANGENMASK_ULL(4, 0)
 #define STRTAB_L1_DESC_L2PTR_MASK  GENMASK_ULL(47, 6)
 
 #define STRTAB_STE_DWORDS  8
 #define STRTAB_STE_0_V (1UL << 0)
-#define STRTAB_STE_0_CFG_SHIFT 1
-#define STRTAB_STE_0_CFG_MASK  0x7UL
-#define STRTAB_STE_0_CFG_ABORT (0UL << STRTAB_STE_0_CFG_SHIFT)
-#define STRTAB_STE_0_CFG_BYPASS(4UL << STRTAB_STE_0_CFG_SHIFT)
-#define STRTAB_STE_0_CFG_S1_TRANS  (5UL << STRTAB_STE_0_CFG_SHIFT)
-#define STRTAB_STE_0_CFG_S2_TRANS  (6UL << STRTAB_STE_0_CFG_SHIFT)
+#define STRTAB_STE_0_CFG   GENMASK_ULL(3, 1)
+#define STRTAB_STE_0_CFG_ABORT 0
+#define STRTAB_STE_0_CFG_BYPASS4
+#define STRTAB_STE_0_CFG_S1_TRANS  5
+#define STRTAB_STE_0_CFG_S2_TRANS  6
 
-#define STRTAB_STE_0_S1FMT_SHIFT   4
-#define STRTAB_STE_0_S1FMT_LINEAR  (0UL << STRTAB_STE_0_S1FMT_SHIFT)
+#define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
+#define STRTAB_STE_0_S1FMT_LINEAR  0
 #define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
-#define STRTAB_STE_0_S1CDMAX_SHIFT 59
-#define STRTAB_STE_0_S1CDMAX_MASK  0x1fUL
+#define STRTAB_STE_0_S1CDMAX   GENMASK_ULL(63, 59)
 
 #define STRTAB_STE_1_S1C_CACHE_NC  0UL
 #define STRTAB_STE_1_S1C_CACHE_WBRA1UL
 #define STRTAB_STE_1_S1C_CACHE_WT  2UL
 #define STRTAB_STE_1_S1C_CACHE_WB  3UL
-#define STRTAB_STE_1_S1C_SH_NSH0UL
-#define STRTAB_STE_1_S1C_SH_OSH2UL
-#define STRTAB_STE_1_S1C_SH_ISH3UL
-#define STRTAB_STE_1_S1CIR_SHIFT   2
-#define STRTAB_STE_1_S1COR_SHIFT   4
-#define STRTAB_STE_1_S1CSH_SHIFT   6
+#define STRTAB_STE_1_S1CIR GENMASK_ULL(3, 2)
+#define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4)
+#define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6)
 
 #define STRTAB_STE_1_S1STALLD  (1UL << 27)
 
+#define STRTAB_STE_1_EATS  GENMASK_ULL(29, 28)
 #define STRTAB_STE_1_EATS_ABT  0UL
 #define STRTAB_STE_1_EATS_TRANS1UL
 #define STRTAB_STE_1_EATS_S1CHK2UL
-#define STRTAB_STE_1_EATS_SHIFT28
 
+#define STRTAB_STE_1_STRW  GENMASK_ULL(31, 30)
 #define STRTAB_STE_1_STRW_NSEL10UL
 #define STRTAB_STE_1_STRW_EL2  2UL
-#define STRTAB_STE_1_STRW_SHIFT30
 
+#define STRTAB_STE_1_SHCFG GENMASK_ULL(45, 44)
 #define STRTAB_STE_1_SHCFG_INCOMING1UL
-#define STRTAB_STE_1_SHCFG_SHIFT   44
 
-#define STRTAB_STE_2_S2VMID_SHIFT  0
-#define STRTAB_STE_2_S2VMID_MASK   0xUL
-#define STRTAB_STE_2_VTCR_SHIFT32
-#define STRTAB_STE_2_VTCR_MASK 0x7UL
+#define STRTAB_STE_2_S2VMIDGENMASK_ULL(15, 0)
+#define STRTAB_STE_2_VTCR  GENMASK_ULL(50, 32)
 #define STRTAB_STE_2_S2AA64(1UL << 51)
 #define STRTAB_STE_2_S2ENDI(1UL << 52)
 #define STRTAB_STE_2_S2PTW (1UL << 54)
@@ -264,56 +256,41 @@
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS  8
-#define CTXDESC_CD_0_TCR_T0SZ_SHIFT0
-#define ARM64_TCR_T0SZ_SHIFT   0
-#define ARM64_TCR_T0SZ_MASK0x1fUL
-#define CTXDESC_CD_0_TCR_TG0_SHIFT 6
-#define ARM64_TCR_TG0_SHIFT14
-#define ARM64_TCR_TG0_MASK 0x3UL
-#define CTXDESC_CD_0_TCR_IRGN0_SHIFT   8
-#define ARM64_TCR_IRGN0_SHIFT  8
-#define ARM64_TCR_IRGN0_MASK   0x3UL
-#define CTXDESC_CD_0_TCR_ORGN0_SHIFT   10
-#define ARM64_TCR_ORGN0_SHIFT  10
-#define ARM64_TCR_ORGN0_MASK   0x3UL
-#define CTXDESC_CD_0_TCR_SH0_SHIFT 12
-#define ARM64_TCR_SH0_SHIFT12
-#define ARM64_TCR_SH0_MASK 0x3UL
-#define CTXDESC_CD_0_TCR_EPD0_SHIFT14
-#define ARM64_TCR_EPD0_SHIFT   7
-#define ARM64_TCR_EPD0_MASK0x1UL
-#define CTXDESC_CD_0_TCR_EPD1_SHIFT30
-#define ARM64_TCR_EPD1_SHIFT   23
-#define ARM64_TCR_EPD1_MASK0x1UL
+#define CTXDESC_CD_0_TCR_T0SZ  GENMASK_ULL(5, 0)
+#define ARM64_TCR_T0SZ GENMASK_ULL(5, 0)
+#define CTXDESC_CD_0_TCR_TG0  

[PATCH v3 6/7] iommu/arm-smmu-v3: Support 52-bit physical address

2018-03-26 Thread Robin Murphy
Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
OAS implies 64KB translation granule support, permitting level 1 block
entries there is simple, and the rest is just extending address fields.

Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
---

v3: No change

 drivers/iommu/arm-smmu-v3.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4e0ffda217f6..e0d46661c153 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -91,6 +91,7 @@
 #define IDR5_OAS_42_BIT3
 #define IDR5_OAS_44_BIT4
 #define IDR5_OAS_48_BIT5
+#define IDR5_OAS_52_BIT6
 
 #define ARM_SMMU_CR0   0x20
 #define CR0_CMDQEN (1 << 3)
@@ -147,7 +148,7 @@
 
 #define ARM_SMMU_STRTAB_BASE   0x80
 #define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_MASK  GENMASK_ULL(47, 6)
+#define STRTAB_BASE_ADDR_MASK  GENMASK_ULL(51, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG   0x88
 #define STRTAB_BASE_CFG_FMTGENMASK(17, 16)
@@ -175,7 +176,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
 #define MSI_CFG2_SHGENMASK(5, 4)
 #define MSI_CFG2_MEMATTR   GENMASK(3, 0)
 
@@ -194,7 +195,7 @@
 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_MASK   GENMASK_ULL(47, 5)
+#define Q_BASE_ADDR_MASK   GENMASK_ULL(51, 5)
 #define Q_BASE_LOG2SIZEGENMASK(4, 0)
 
 /*
@@ -209,7 +210,7 @@
 
 #define STRTAB_L1_DESC_DWORDS  1
 #define STRTAB_L1_DESC_SPANGENMASK_ULL(4, 0)
-#define STRTAB_L1_DESC_L2PTR_MASK  GENMASK_ULL(47, 6)
+#define STRTAB_L1_DESC_L2PTR_MASK  GENMASK_ULL(51, 6)
 
 #define STRTAB_STE_DWORDS  8
 #define STRTAB_STE_0_V (1UL << 0)
@@ -221,7 +222,7 @@
 
 #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4)
 #define STRTAB_STE_0_S1FMT_LINEAR  0
-#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX   GENMASK_ULL(63, 59)
 
 #define STRTAB_STE_1_S1C_CACHE_NC  0UL
@@ -253,7 +254,7 @@
 #define STRTAB_STE_2_S2PTW (1UL << 54)
 #define STRTAB_STE_2_S2R   (1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_MASKGENMASK_ULL(47, 4)
+#define STRTAB_STE_3_S2TTB_MASKGENMASK_ULL(51, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS  8
@@ -287,7 +288,7 @@
 #define CTXDESC_CD_0_ASET  (1UL << 47)
 #define CTXDESC_CD_0_ASID  GENMASK_ULL(63, 48)
 
-#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
 
 /* Convert between AArch64 (CPU) TCR format and SMMU CD format */
 #define ARM_SMMU_TCR2CD(tcr, fld)  FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
@@ -317,7 +318,7 @@
 #define CMDQ_TLBI_0_ASID   GENMASK_ULL(63, 48)
 #define CMDQ_TLBI_1_LEAF   (1UL << 0)
 #define CMDQ_TLBI_1_VA_MASKGENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK   GENMASK_ULL(47, 12)
+#define CMDQ_TLBI_1_IPA_MASK   GENMASK_ULL(51, 12)
 
 #define CMDQ_PRI_0_SSIDGENMASK_ULL(31, 12)
 #define CMDQ_PRI_0_SID GENMASK_ULL(63, 32)
@@ -331,7 +332,7 @@
 #define CMDQ_SYNC_0_MSHGENMASK_ULL(23, 22)
 #define CMDQ_SYNC_0_MSIATTRGENMASK_ULL(27, 24)
 #define CMDQ_SYNC_0_MSIDATAGENMASK_ULL(63, 32)
-#define CMDQ_SYNC_1_MSIADDR_MASK   GENMASK_ULL(47, 2)
+#define CMDQ_SYNC_1_MSIADDR_MASK   GENMASK_ULL(51, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS4
@@ -1622,7 +1623,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain)
return -ENOMEM;
 
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-   domain->geometry.aperture_end = (1UL << ias) - 1;
+   domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
domain->geometry.force_aperture = true;
 
ret = finalise_stage_fn(smmu_domain, _cfg);
@@ -2633,11 +2634,6 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
-   if (arm_smmu_ops.pgsize_bitmap == -1UL)
-   arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
-   else
-   arm_smmu_ops.pgsize_bitmap |= 

[PATCH v3 5/7] iommu/io-pgtable-arm: Support 52-bit physical address

2018-03-26 Thread Robin Murphy
Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
52-bit physical addresses when using the 64KB translation granule.
This will be supported by SMMUv3.1.

Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
---

v3: 64K-granule-specific logic in iopte_to_paddr() makes it much more
symmetrical with paddr_to_iopte() and generates shorter code.

I haven't implemented explicit input masking in paddr_to_iopte()
since I realised that if we did ever get an out-of-range address
there, it would only make the difference between silently mapping
the wrong address and silently mapping a slightly different wrong
address.

I also went to add a comment about io_pgtable_alloc() updating the
cfg, only to find that it's been documented as such from the start.

 drivers/iommu/io-pgtable-arm.c | 67 ++
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..a5be4c92c5c8 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
 #define pr_fmt(fmt)"arm-lpae io-pgtable: " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -32,7 +33,7 @@
 
 #include "io-pgtable.h"
 
-#define ARM_LPAE_MAX_ADDR_BITS 48
+#define ARM_LPAE_MAX_ADDR_BITS 52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES   16
 #define ARM_LPAE_MAX_LEVELS4
 
@@ -86,6 +87,8 @@
 #define ARM_LPAE_PTE_TYPE_TABLE3
 #define ARM_LPAE_PTE_TYPE_PAGE 3
 
+#define ARM_LPAE_PTE_ADDR_MASK GENMASK_ULL(47,12)
+
 #define ARM_LPAE_PTE_NSTABLE   (((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN(((arm_lpae_iopte)3) << 53)
 #define ARM_LPAE_PTE_AF(((arm_lpae_iopte)1) << 10)
@@ -159,6 +162,7 @@
 #define ARM_LPAE_TCR_PS_42_BIT 0x3ULL
 #define ARM_LPAE_TCR_PS_44_BIT 0x4ULL
 #define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
 
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK0xff
@@ -170,9 +174,7 @@
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
 
 /* IOPTE accessors */
-#define iopte_deref(pte,d) \
-   (__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)\
-   & ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
 
 #define iopte_type(pte,l)  \
(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -184,12 +186,6 @@
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
 
-#define iopte_to_pfn(pte,d)\
-   (((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
-
-#define pfn_to_iopte(pfn,d)\
-   (((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
-
 struct arm_lpae_io_pgtable {
struct io_pgtable   iop;
 
@@ -203,6 +199,27 @@ struct arm_lpae_io_pgtable {
 
 typedef u64 arm_lpae_iopte;
 
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+struct arm_lpae_io_pgtable *data)
+{
+   arm_lpae_iopte pte = paddr;
+
+   /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+   return (pte | (pte >> (48 - 12))) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+ struct arm_lpae_io_pgtable *data)
+{
+   phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+
+   if (data->pg_shift < 16)
+   return paddr;
+
+   /* Rotate the packed high-order bits back to the top */
+   return (paddr | (paddr << (48 - 12))) & (ARM_LPAE_PTE_ADDR_MASK << 4);
+}
+
 static bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -287,7 +304,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
-   pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+   pte |= paddr_to_iopte(paddr, data);
 
__arm_lpae_set_pte(ptep, pte, >iop.cfg);
 }
@@ -528,7 +545,7 @@ static int arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
if (size == split_sz)
unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
 
-   blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
+   blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
 
for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
@@ -652,12 +669,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct 
io_pgtable_ops *ops,
 
 found_translation:
iova &= 

[PATCH v3 4/7] iommu/arm-smmu-v3: Clean up queue definitions

2018-03-26 Thread Robin Murphy
As with registers and tables, use GENMASK and the bitfield accessors
consistently for queue fields, to save some lines and ease maintenance
a little. This now leaves everything in a nice state where all named
field definitions expect to be used with bitfield accessors (although
since single-bit fields can still be used directly we leave some of
those uses as-is to avoid unnecessary churn), while the few remaining
*_MASK definitions apply exclusively to in-place values.

Signed-off-by: Robin Murphy 
---

v3: New

 drivers/iommu/arm-smmu-v3.c | 126 +++-
 1 file changed, 54 insertions(+), 72 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index b1dc7d7cbbb5..4e0ffda217f6 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -184,6 +184,7 @@
 #define ARM_SMMU_SH_OSH2
 #define ARM_SMMU_SH_ISH3
 #define ARM_SMMU_MEMATTR_DEVICE_nGnRE  0x1
+#define ARM_SMMU_MEMATTR_OIWB  0xf
 
 #define Q_IDX(q, p)((p) & ((1 << (q)->max_n_shift) - 1))
 #define Q_WRP(q, p)((p) & (1 << (q)->max_n_shift))
@@ -301,64 +302,49 @@
 #define CMDQ_ERR_CERROR_ILL_IDX1
 #define CMDQ_ERR_CERROR_ABT_IDX2
 
-#define CMDQ_0_OP_SHIFT0
-#define CMDQ_0_OP_MASK 0xffUL
+#define CMDQ_0_OP  GENMASK_ULL(7, 0)
 #define CMDQ_0_SSV (1UL << 11)
 
-#define CMDQ_PREFETCH_0_SID_SHIFT  32
-#define CMDQ_PREFETCH_1_SIZE_SHIFT 0
+#define CMDQ_PREFETCH_0_SIDGENMASK_ULL(63, 32)
+#define CMDQ_PREFETCH_1_SIZE   GENMASK_ULL(4, 0)
 #define CMDQ_PREFETCH_1_ADDR_MASK  GENMASK_ULL(63, 12)
 
-#define CMDQ_CFGI_0_SID_SHIFT  32
-#define CMDQ_CFGI_0_SID_MASK   0xUL
+#define CMDQ_CFGI_0_SIDGENMASK_ULL(63, 32)
 #define CMDQ_CFGI_1_LEAF   (1UL << 0)
-#define CMDQ_CFGI_1_RANGE_SHIFT0
-#define CMDQ_CFGI_1_RANGE_MASK 0x1fUL
+#define CMDQ_CFGI_1_RANGE  GENMASK_ULL(4, 0)
 
-#define CMDQ_TLBI_0_VMID_SHIFT 32
-#define CMDQ_TLBI_0_ASID_SHIFT 48
+#define CMDQ_TLBI_0_VMID   GENMASK_ULL(47, 32)
+#define CMDQ_TLBI_0_ASID   GENMASK_ULL(63, 48)
 #define CMDQ_TLBI_1_LEAF   (1UL << 0)
 #define CMDQ_TLBI_1_VA_MASKGENMASK_ULL(63, 12)
 #define CMDQ_TLBI_1_IPA_MASK   GENMASK_ULL(47, 12)
 
-#define CMDQ_PRI_0_SSID_SHIFT  12
-#define CMDQ_PRI_0_SSID_MASK   0xfUL
-#define CMDQ_PRI_0_SID_SHIFT   32
-#define CMDQ_PRI_0_SID_MASK0xUL
-#define CMDQ_PRI_1_GRPID_SHIFT 0
-#define CMDQ_PRI_1_GRPID_MASK  0x1ffUL
-#define CMDQ_PRI_1_RESP_SHIFT  12
-#define CMDQ_PRI_1_RESP_DENY   (0UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_FAIL   (1UL << CMDQ_PRI_1_RESP_SHIFT)
-#define CMDQ_PRI_1_RESP_SUCC   (2UL << CMDQ_PRI_1_RESP_SHIFT)
+#define CMDQ_PRI_0_SSIDGENMASK_ULL(31, 12)
+#define CMDQ_PRI_0_SID GENMASK_ULL(63, 32)
+#define CMDQ_PRI_1_GRPID   GENMASK_ULL(8, 0)
+#define CMDQ_PRI_1_RESPGENMASK_ULL(13, 12)
 
-#define CMDQ_SYNC_0_CS_SHIFT   12
-#define CMDQ_SYNC_0_CS_NONE(0UL << CMDQ_SYNC_0_CS_SHIFT)
-#define CMDQ_SYNC_0_CS_IRQ (1UL << CMDQ_SYNC_0_CS_SHIFT)
-#define CMDQ_SYNC_0_CS_SEV (2UL << CMDQ_SYNC_0_CS_SHIFT)
-#define CMDQ_SYNC_0_MSH_SHIFT  22
-#define CMDQ_SYNC_0_MSH_ISH(3UL << CMDQ_SYNC_0_MSH_SHIFT)
-#define CMDQ_SYNC_0_MSIATTR_SHIFT  24
-#define CMDQ_SYNC_0_MSIATTR_OIWB   (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
-#define CMDQ_SYNC_0_MSIDATA_SHIFT  32
-#define CMDQ_SYNC_0_MSIDATA_MASK   0xUL
+#define CMDQ_SYNC_0_CS GENMASK_ULL(13, 12)
+#define CMDQ_SYNC_0_CS_NONE0
+#define CMDQ_SYNC_0_CS_IRQ 1
+#define CMDQ_SYNC_0_CS_SEV 2
+#define CMDQ_SYNC_0_MSHGENMASK_ULL(23, 22)
+#define CMDQ_SYNC_0_MSIATTRGENMASK_ULL(27, 24)
+#define CMDQ_SYNC_0_MSIDATAGENMASK_ULL(63, 32)
 #define CMDQ_SYNC_1_MSIADDR_MASK   GENMASK_ULL(47, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS4
 #define EVTQ_MAX_SZ_SHIFT  7
 
-#define EVTQ_0_ID_SHIFT0
-#define EVTQ_0_ID_MASK 0xffUL
+#define EVTQ_0_ID  GENMASK_ULL(7, 0)
 
 /* PRI queue */
 #define PRIQ_ENT_DWORDS2
 #define PRIQ_MAX_SZ_SHIFT  8
 
-#define PRIQ_0_SID_SHIFT   0
-#define PRIQ_0_SID_MASK0xUL
-#define PRIQ_0_SSID_SHIFT  32
-#define PRIQ_0_SSID_MASK   0xfUL
+#define PRIQ_0_SID 

[PATCH v3 0/7] SMMUv3 52-bit address support

2018-03-26 Thread Robin Murphy
Hi all,

Here's a v3 with most of the additional cleanup as discussed, which makes
the diffstat look pretty horrific, but I think it does work out for the
better in the end. I did write a further patch converting "(1 << x)" into
"BIT(x)", but comes out as a +70/-70 diff of pure churn which somehow
manages to make the codegen in arm_smmu_device_probe() worse, so it got
git reset --hard into oblivion.

Robin.

Robin Murphy (7):
  iommu/arm-smmu-v3: Clean up address masking
  iommu/arm-smmu-v3: Clean up register definitions
  iommu/arm-smmu-v3: Clean up table definitions
  iommu/arm-smmu-v3: Clean up queue definitions
  iommu/io-pgtable-arm: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit virtual address

 drivers/iommu/arm-smmu-v3.c| 523 ++---
 drivers/iommu/io-pgtable-arm.c |  67 --
 2 files changed, 277 insertions(+), 313 deletions(-)

-- 
2.16.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/7] iommu/arm-smmu-v3: Clean up register definitions

2018-03-26 Thread Robin Murphy
The FIELD_{GET,PREP} accessors provided by linux/bitfield.h allow us to
define multi-bit register fields solely in terms of their bit positions
via GENMASK(), without needing explicit *_SHIFT and *_MASK definitions.
As well as the immediate reduction in lines of code, this avoids the
awkwardness of values sometimes being pre-shifted and sometimes not,
which means we can factor out some common values like memory attributes.
Furthermore, it also makes it trivial to verify the definitions against
the architecture spec, on which note let's also fix up a few field names
to properly match the current release (IHI0070B).

Signed-off-by: Robin Murphy 
---

v3: New

 drivers/iommu/arm-smmu-v3.c | 174 
 1 file changed, 77 insertions(+), 97 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index ac437aedc598..40a19ce03f99 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -44,18 +45,15 @@
 
 /* MMIO registers */
 #define ARM_SMMU_IDR0  0x0
-#define IDR0_ST_LVL_SHIFT  27
-#define IDR0_ST_LVL_MASK   0x3
-#define IDR0_ST_LVL_2LVL   (1 << IDR0_ST_LVL_SHIFT)
-#define IDR0_STALL_MODEL_SHIFT 24
-#define IDR0_STALL_MODEL_MASK  0x3
-#define IDR0_STALL_MODEL_STALL (0 << IDR0_STALL_MODEL_SHIFT)
-#define IDR0_STALL_MODEL_FORCE (2 << IDR0_STALL_MODEL_SHIFT)
-#define IDR0_TTENDIAN_SHIFT21
-#define IDR0_TTENDIAN_MASK 0x3
-#define IDR0_TTENDIAN_LE   (2 << IDR0_TTENDIAN_SHIFT)
-#define IDR0_TTENDIAN_BE   (3 << IDR0_TTENDIAN_SHIFT)
-#define IDR0_TTENDIAN_MIXED(0 << IDR0_TTENDIAN_SHIFT)
+#define IDR0_ST_LVLGENMASK(28, 27)
+#define IDR0_ST_LVL_2LVL   1
+#define IDR0_STALL_MODEL   GENMASK(25, 24)
+#define IDR0_STALL_MODEL_STALL 0
+#define IDR0_STALL_MODEL_FORCE 2
+#define IDR0_TTENDIAN  GENMASK(22, 21)
+#define IDR0_TTENDIAN_MIXED0
+#define IDR0_TTENDIAN_LE   2
+#define IDR0_TTENDIAN_BE   3
 #define IDR0_CD2L  (1 << 19)
 #define IDR0_VMID16(1 << 18)
 #define IDR0_PRI   (1 << 16)
@@ -65,10 +63,9 @@
 #define IDR0_ATS   (1 << 10)
 #define IDR0_HYP   (1 << 9)
 #define IDR0_COHACC(1 << 4)
-#define IDR0_TTF_SHIFT 2
-#define IDR0_TTF_MASK  0x3
-#define IDR0_TTF_AARCH64   (2 << IDR0_TTF_SHIFT)
-#define IDR0_TTF_AARCH32_64(3 << IDR0_TTF_SHIFT)
+#define IDR0_TTF   GENMASK(3, 2)
+#define IDR0_TTF_AARCH64   2
+#define IDR0_TTF_AARCH32_643
 #define IDR0_S1P   (1 << 1)
 #define IDR0_S2P   (1 << 0)
 
@@ -76,31 +73,24 @@
 #define IDR1_TABLES_PRESET (1 << 30)
 #define IDR1_QUEUES_PRESET (1 << 29)
 #define IDR1_REL   (1 << 28)
-#define IDR1_CMDQ_SHIFT21
-#define IDR1_CMDQ_MASK 0x1f
-#define IDR1_EVTQ_SHIFT16
-#define IDR1_EVTQ_MASK 0x1f
-#define IDR1_PRIQ_SHIFT11
-#define IDR1_PRIQ_MASK 0x1f
-#define IDR1_SSID_SHIFT6
-#define IDR1_SSID_MASK 0x1f
-#define IDR1_SID_SHIFT 0
-#define IDR1_SID_MASK  0x3f
+#define IDR1_CMDQS GENMASK(25, 21)
+#define IDR1_EVTQS GENMASK(20, 16)
+#define IDR1_PRIQS GENMASK(15, 11)
+#define IDR1_SSIDSIZE  GENMASK(10, 6)
+#define IDR1_SIDSIZE   GENMASK(5, 0)
 
 #define ARM_SMMU_IDR5  0x14
-#define IDR5_STALL_MAX_SHIFT   16
-#define IDR5_STALL_MAX_MASK0x
+#define IDR5_STALL_MAX GENMASK(31, 16)
 #define IDR5_GRAN64K   (1 << 6)
 #define IDR5_GRAN16K   (1 << 5)
 #define IDR5_GRAN4K(1 << 4)
-#define IDR5_OAS_SHIFT 0
-#define IDR5_OAS_MASK  0x7
-#define IDR5_OAS_32_BIT(0 << IDR5_OAS_SHIFT)
-#define IDR5_OAS_36_BIT(1 << IDR5_OAS_SHIFT)
-#define IDR5_OAS_40_BIT(2 << IDR5_OAS_SHIFT)
-#define IDR5_OAS_42_BIT(3 << IDR5_OAS_SHIFT)
-#define IDR5_OAS_44_BIT(4 << IDR5_OAS_SHIFT)
-#define IDR5_OAS_48_BIT(5 << IDR5_OAS_SHIFT)
+#define IDR5_OAS   GENMASK(2, 0)
+#define IDR5_OAS_32_BIT0
+#define IDR5_OAS_36_BIT1
+#define IDR5_OAS_40_BIT2

[PATCH v3 1/7] iommu/arm-smmu-v3: Clean up address masking

2018-03-26 Thread Robin Murphy
Before trying to add the SMMUv3.1 support for 52-bit addresses, make
things bearable by cleaning up the various address mask definitions to
use GENMASK_ULL() consistently. The fact that doing so reveals (and
fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
jolly good idea it is...

Tested-by: Nate Watterson 
Signed-off-by: Robin Murphy 
---

v3: No change

 drivers/iommu/arm-smmu-v3.c | 53 ++---
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f2f1fc68b52..ac437aedc598 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -158,8 +159,7 @@
 
 #define ARM_SMMU_STRTAB_BASE   0x80
 #define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_SHIFT 6
-#define STRTAB_BASE_ADDR_MASK  0x3ffUL
+#define STRTAB_BASE_ADDR_MASK  GENMASK_ULL(47, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG   0x88
 #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
@@ -190,8 +190,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_SHIFT2
-#define MSI_CFG0_ADDR_MASK 0x3fffUL
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
 #define MSI_CFG2_SH_SHIFT  4
 #define MSI_CFG2_SH_NSH(0UL << MSI_CFG2_SH_SHIFT)
 #define MSI_CFG2_SH_OSH(2UL << MSI_CFG2_SH_SHIFT)
@@ -207,8 +206,7 @@
 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_SHIFT  5
-#define Q_BASE_ADDR_MASK   0xfffUL
+#define Q_BASE_ADDR_MASK   GENMASK_ULL(47, 5)
 #define Q_BASE_LOG2SIZE_SHIFT  0
 #define Q_BASE_LOG2SIZE_MASK   0x1fUL
 
@@ -225,8 +223,7 @@
 #define STRTAB_L1_DESC_DWORDS  1
 #define STRTAB_L1_DESC_SPAN_SHIFT  0
 #define STRTAB_L1_DESC_SPAN_MASK   0x1fUL
-#define STRTAB_L1_DESC_L2PTR_SHIFT 6
-#define STRTAB_L1_DESC_L2PTR_MASK  0x3ffUL
+#define STRTAB_L1_DESC_L2PTR_MASK  GENMASK_ULL(47, 6)
 
 #define STRTAB_STE_DWORDS  8
 #define STRTAB_STE_0_V (1UL << 0)
@@ -239,8 +236,7 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT   4
 #define STRTAB_STE_0_S1FMT_LINEAR  (0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_SHIFT6
-#define STRTAB_STE_0_S1CTXPTR_MASK 0x3ffUL
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
 #define STRTAB_STE_0_S1CDMAX_SHIFT 59
 #define STRTAB_STE_0_S1CDMAX_MASK  0x1fUL
 
@@ -278,8 +274,7 @@
 #define STRTAB_STE_2_S2PTW (1UL << 54)
 #define STRTAB_STE_2_S2R   (1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_SHIFT   4
-#define STRTAB_STE_3_S2TTB_MASK0xfffUL
+#define STRTAB_STE_3_S2TTB_MASKGENMASK_ULL(47, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS  8
@@ -325,8 +320,7 @@
 #define CTXDESC_CD_0_ASID_SHIFT48
 #define CTXDESC_CD_0_ASID_MASK 0xUL
 
-#define CTXDESC_CD_1_TTB0_SHIFT4
-#define CTXDESC_CD_1_TTB0_MASK 0xfffUL
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
 
 #define CTXDESC_CD_3_MAIR_SHIFT0
 
@@ -351,7 +345,7 @@
 
 #define CMDQ_PREFETCH_0_SID_SHIFT  32
 #define CMDQ_PREFETCH_1_SIZE_SHIFT 0
-#define CMDQ_PREFETCH_1_ADDR_MASK  ~0xfffUL
+#define CMDQ_PREFETCH_1_ADDR_MASK  GENMASK_ULL(63, 12)
 
 #define CMDQ_CFGI_0_SID_SHIFT  32
 #define CMDQ_CFGI_0_SID_MASK   0xUL
@@ -362,8 +356,8 @@
 #define CMDQ_TLBI_0_VMID_SHIFT 32
 #define CMDQ_TLBI_0_ASID_SHIFT 48
 #define CMDQ_TLBI_1_LEAF   (1UL << 0)
-#define CMDQ_TLBI_1_VA_MASK~0xfffUL
-#define CMDQ_TLBI_1_IPA_MASK   0xf000UL
+#define CMDQ_TLBI_1_VA_MASKGENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK   GENMASK_ULL(47, 12)
 
 #define CMDQ_PRI_0_SSID_SHIFT  12
 #define CMDQ_PRI_0_SSID_MASK   0xfUL
@@ -386,8 +380,7 @@
 #define CMDQ_SYNC_0_MSIATTR_OIWB   (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
 #define CMDQ_SYNC_0_MSIDATA_SHIFT  32
 #define CMDQ_SYNC_0_MSIDATA_MASK   0xUL
-#define CMDQ_SYNC_1_MSIADDR_SHIFT  0
-#define CMDQ_SYNC_1_MSIADDR_MASK   0xcUL
+#define CMDQ_SYNC_1_MSIADDR_MASK   GENMASK_ULL(47, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS4
@@ -413,8 +406,7 @@
 
 #define PRIQ_1_PRG_IDX_SHIFT   0
 #define PRIQ_1_PRG_IDX_MASK0x1ffUL
-#define PRIQ_1_ADDR_SHIFT  12
-#define PRIQ_1_ADDR_MASK   0xfUL
+#define 

Re: Is adding offset to dma address is valid operation?

2018-03-26 Thread Harsh Jain


On 26-03-2018 17:15, Robin Murphy wrote:
> Hi Harsh,
>
> On 26/03/18 10:55, Harsh Jain wrote:
>> Hi,
>>
>> Can we add  offset to dma address received from IOMMU in scatter/Gather list 
>> before sending it to H/W.
>>
>> Address to HW = sg_dma_address(sg) + offset, where 0 < offset < 
>> sg_dma_len(sg).
>
> sg_dma_address() should already account for sg->offset (i.e. the start of the 
> buffer within the page. If there's a DMA API implementation failing to 
> respect that then that's a bug.
>
>> I need this operation to make sure our H/W does not receives entry of size 
>> greater than 2K.
>
> If however it's just that you want to fragment a single SG entry into 
> multiple commaneds/descriptors/etc. internally to your driver, then I can't 
> see that being an issue as long as you don't visibly modify the scatterlist 
> itself. If there's a genuine length and/or alignment limitation in the 
> hardware, though, it would be better to set up dev->dma_parms appropriately 
> to describe them, such that the subsystem responsible for generating the 
> scatterlist respects the appropriate constraints in the first place (and if 
> it doesn't, then that's another bug to fix).
Can dma_parms  handle 2K size limit ? Because dma_map_sg() returns <= passed 
nents. But in my case nents will increase.

>
> Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Is adding offset to dma address is valid operation?

2018-03-26 Thread Harsh Jain

On 26-03-2018 17:15, Robin Murphy wrote:
> Hi Harsh,
>
> On 26/03/18 10:55, Harsh Jain wrote:
>> Hi,
>>
>> Can we add  offset to dma address received from IOMMU in scatter/Gather list 
>> before sending it to H/W.
>>
>> Address to HW = sg_dma_address(sg) + offset, where 0 < offset < 
>> sg_dma_len(sg).
>
> sg_dma_address() should already account for sg->offset (i.e. the start of the 
> buffer within the page. If there's a DMA API implementation failing to 
> respect that then that's a bug.
Thanks.
Here "offset" means size of last dma entry I had mapped to H/W structure. eg if 
dma length is 9K offset will be 0, 2K, 4K,6K,8K.
>
>> I need this operation to make sure our H/W does not receives entry of size 
>> greater than 2K.
>
> If however it's just that you want to fragment a single SG entry into 
> multiple commaneds/descriptors/etc. internally to your driver, then I can't 
> see that being an issue as long as you don't visibly modify the scatterlist 
> itself. If there's a genuine length and/or alignment limitation in the 
> hardware, though, it would be better to set up dev->dma_parms appropriately 
> to describe them, such that the subsystem responsible for generating the 
> scatterlist respects the appropriate constraints in the first place (and if 
> it doesn't, then that's another bug to fix).
>
Yes, I wants to fragment single entry into multiple such that it is less than 
2K. eg. if sg->length  == 5k , I will split into 3 entries
addr1 = sg_dma_address(sg),     len =2K
addr2 = sg_dma_address(sg) + 2K,     len = 2K
addr3 = sg_dma_address(sg) +4K,     len = 1K


> Robin.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Is adding offset to dma address is valid operation?

2018-03-26 Thread Robin Murphy

Hi Harsh,

On 26/03/18 10:55, Harsh Jain wrote:

Hi,

Can we add  offset to dma address received from IOMMU in scatter/Gather list 
before sending it to H/W.

Address to HW = sg_dma_address(sg) + offset, where 0 < offset < sg_dma_len(sg).


sg_dma_address() should already account for sg->offset (i.e. the start 
of the buffer within the page. If there's a DMA API implementation 
failing to respect that then that's a bug.



I need this operation to make sure our H/W does not receives entry of size 
greater than 2K.


If however it's just that you want to fragment a single SG entry into 
multiple commaneds/descriptors/etc. internally to your driver, then I 
can't see that being an issue as long as you don't visibly modify the 
scatterlist itself. If there's a genuine length and/or alignment 
limitation in the hardware, though, it would be better to set up 
dev->dma_parms appropriately to describe them, such that the subsystem 
responsible for generating the scatterlist respects the appropriate 
constraints in the first place (and if it doesn't, then that's another 
bug to fix).


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Is adding offset to dma address is valid operation?

2018-03-26 Thread Harsh Jain
Hi,

Can we add  offset to dma address received from IOMMU in scatter/Gather list 
before sending it to H/W.

Address to HW = sg_dma_address(sg) + offset, where 0 < offset < sg_dma_len(sg).

I need this operation to make sure our H/W does not receives entry of size 
greater than 2K.

Regards

Harsh Jain

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 11/14] iommu/rockchip: Use OF_IOMMU to attach devices automatically

2018-03-26 Thread JeffyChen

Hi Daniel,

Thanks for your reply.

On 03/26/2018 02:31 PM, Daniel Kurtz wrote:

>+struct rk_iommudata {
>+   struct rk_iommu *iommu;
>+};

Why do we need this struct?  Can't we just assign a pointer to struct
rk_iommu directly to dev->archdata.iommu?


hmmm, i was trying to add more device related data in patch[13]:

 struct rk_iommudata {
+   struct device_link *link; /* runtime PM link from IOMMU to master */
struct rk_iommu *iommu;
 };





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 11/14] iommu/rockchip: Use OF_IOMMU to attach devices automatically

2018-03-26 Thread Daniel Kurtz
On Fri, Mar 23, 2018 at 1:40 AM Jeffy Chen 
wrote:

> Converts the rockchip-iommu driver to use the OF_IOMMU infrastructure,
> which allows attaching master devices to their IOMMUs automatically
> according to DT properties.

> Signed-off-by: Jeffy Chen 
> Reviewed-by: Robin Murphy 
> ---

> Changes in v8: None
> Changes in v7: None
> Changes in v6: None
> Changes in v5: None
> Changes in v4: None
> Changes in v3:
> Add struct rk_iommudata.
> Squash iommu/rockchip: Use iommu_group_get_for_dev() for add_device

> Changes in v2: None

>   drivers/iommu/rockchip-iommu.c | 135
-
>   1 file changed, 40 insertions(+), 95 deletions(-)

> diff --git a/drivers/iommu/rockchip-iommu.c
b/drivers/iommu/rockchip-iommu.c
> index 7970d21b9858..bd8580b897e9 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -19,6 +19,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -104,6 +105,10 @@ struct rk_iommu {
>  struct iommu_domain *domain; /* domain to which iommu is attached
*/
>   };

> +struct rk_iommudata {
> +   struct rk_iommu *iommu;
> +};

Why do we need this struct?  Can't we just assign a pointer to struct
rk_iommu directly to dev->archdata.iommu?


> +
>   static struct device *dma_dev;

>   static inline void rk_table_flush(struct rk_iommu_domain *dom,
dma_addr_t dma,
> @@ -807,18 +812,9 @@ static size_t rk_iommu_unmap(struct iommu_domain
*domain, unsigned long _iova,

>   static struct rk_iommu *rk_iommu_from_dev(struct device *dev)
>   {
> -   struct iommu_group *group;
> -   struct device *iommu_dev;
> -   struct rk_iommu *rk_iommu;
> +   struct rk_iommudata *data = dev->archdata.iommu;

> -   group = iommu_group_get(dev);
> -   if (!group)
> -   return NULL;
> -   iommu_dev = iommu_group_get_iommudata(group);
> -   rk_iommu = dev_get_drvdata(iommu_dev);
> -   iommu_group_put(group);
> -
> -   return rk_iommu;
> +   return data ? data->iommu : NULL;
>   }

>   static int rk_iommu_attach_device(struct iommu_domain *domain,
> @@ -989,110 +985,53 @@ static void rk_iommu_domain_free(struct
iommu_domain *domain)
>  iommu_put_dma_cookie(_domain->domain);
>   }

> -static bool rk_iommu_is_dev_iommu_master(struct device *dev)
> -{
> -   struct device_node *np = dev->of_node;
> -   int ret;
> -
> -   /*
> -* An iommu master has an iommus property containing a list of
phandles
> -* to iommu nodes, each with an #iommu-cells property with value
0.
> -*/
> -   ret = of_count_phandle_with_args(np, "iommus", "#iommu-cells");
> -   return (ret > 0);
> -}
> -
> -static int rk_iommu_group_set_iommudata(struct iommu_group *group,
> -   struct device *dev)
> +static int rk_iommu_add_device(struct device *dev)
>   {
> -   struct device_node *np = dev->of_node;
> -   struct platform_device *pd;
> -   int ret;
> -   struct of_phandle_args args;
> +   struct iommu_group *group;
> +   struct rk_iommu *iommu;

> -   /*
> -* An iommu master has an iommus property containing a list of
phandles
> -* to iommu nodes, each with an #iommu-cells property with value
0.
> -*/
> -   ret = of_parse_phandle_with_args(np, "iommus", "#iommu-cells", 0,
> -);
> -   if (ret) {
> -   dev_err(dev, "of_parse_phandle_with_args(%pOF) => %d\n",
> -   np, ret);
> -   return ret;
> -   }
> -   if (args.args_count != 0) {
> -   dev_err(dev, "incorrect number of iommu params found for
%pOF (found %d, expected 0)\n",
> -   args.np, args.args_count);
> -   return -EINVAL;
> -   }
> +   iommu = rk_iommu_from_dev(dev);
> +   if (!iommu)
> +   return -ENODEV;

> -   pd = of_find_device_by_node(args.np);
> -   of_node_put(args.np);
> -   if (!pd) {
> -   dev_err(dev, "iommu %pOF not found\n", args.np);
> -   return -EPROBE_DEFER;
> -   }
> +   group = iommu_group_get_for_dev(dev);
> +   if (IS_ERR(group))
> +   return PTR_ERR(group);
> +   iommu_group_put(group);

> -   /* TODO(djkurtz): handle multiple slave iommus for a single
master */
> -   iommu_group_set_iommudata(group, >dev, NULL);
> +   iommu_device_link(>iommu, dev);

>  return 0;
>   }

> -static int rk_iommu_add_device(struct device *dev)
> +static void rk_iommu_remove_device(struct device *dev)
>   {
> -   struct iommu_group *group;
>  struct rk_iommu *iommu;
> -   int ret;
> -
> -   if (!rk_iommu_is_dev_iommu_master(dev))
> -   return -ENODEV;
> -
> -   group = iommu_group_get(dev);
> -   if (!group) {