Re: [PATCH v8 4/5] block: add a helper function to merge the segments

2019-07-24 Thread Simon Horman
On Tue, Jul 23, 2019 at 02:26:47PM +0900, Yoshihiro Shimoda wrote:
> This patch adds a helper function whether a queue can merge
> the segments by the DMA MAP layer (e.g. via IOMMU).
> 
> Signed-off-by: Yoshihiro Shimoda 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 2/5] iommu/dma: Add a new dma_map_ops of get_merge_boundary()

2019-07-24 Thread Simon Horman
On Tue, Jul 23, 2019 at 11:17:19AM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 23.07.2019 8:26, Yoshihiro Shimoda wrote:
> 
> > This patch adds a new dma_map_ops of get_merge_boundary() to
> > expose the DMA merge boundary if the domain type is IOMMU_DOMAIN_DMA.
> > 
> > Signed-off-by: Yoshihiro Shimoda 

Sergei's comment notwithstanding,

Reviewed-by: Simon Horman 

> > ---
> >   drivers/iommu/dma-iommu.c | 11 +++
> >   1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index a7f9c3e..f3e5f2b 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1085,6 +1085,16 @@ static int iommu_dma_get_sgtable(struct device *dev, 
> > struct sg_table *sgt,
> > return ret;
> >   }
> > +static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
> > +{
> > +   struct iommu_domain *domain = iommu_get_dma_domain(dev);
> > +
> > +   if (domain->type != IOMMU_DOMAIN_DMA)
> > +   return 0;   /* can't merge */
> > +
> > +   return (1 << __ffs(domain->pgsize_bitmap)) - 1;
> 
>Not 1UL?
> 
> > +}
> > +
> >   static const struct dma_map_ops iommu_dma_ops = {
> > .alloc  = iommu_dma_alloc,
> > .free   = iommu_dma_free,
> [...]
> 
> MBR, Sergei
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 5/5] mmc: queue: Use bigger segments if DMA MAP layer can merge the segments

2019-07-24 Thread Simon Horman
On Tue, Jul 23, 2019 at 02:26:48PM +0900, Yoshihiro Shimoda wrote:
> When the max_segs of a mmc host is smaller than 512, the mmc
> subsystem tries to use 512 segments if DMA MAP layer can merge
> the segments, and then the mmc subsystem exposes such information
> to the block layer by using blk_queue_can_use_dma_map_merging().
> 
> Signed-off-by: Yoshihiro Shimoda 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Ulf Hansson 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/5] block: sort headers on blk-setting.c

2019-07-24 Thread Simon Horman
On Tue, Jul 23, 2019 at 02:26:46PM +0900, Yoshihiro Shimoda wrote:
> This patch sorts the headers in alphabetic order to ease
> the maintenance for this part.
> 
> Signed-off-by: Yoshihiro Shimoda 
> Reviewed-by: Wolfram Sang 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 1/5] dma: Introduce dma_get_merge_boundary()

2019-07-24 Thread Simon Horman
On Tue, Jul 23, 2019 at 02:26:44PM +0900, Yoshihiro Shimoda wrote:
> This patch adds a new DMA API "dma_get_merge_boundary". This function
> returns the DMA merge boundary if the DMA layer can merge the segments.
> This patch also adds the implementation for a new dma_map_ops pointer.
> 
> Signed-off-by: Yoshihiro Shimoda 
> Reviewed-by: Christoph Hellwig 

Reviewed-by: Simon Horman 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/5] dma-mapping: provide a better default ->get_required_mask

2019-07-24 Thread Christoph Hellwig
Most dma_map_ops instances are IOMMUs that work perfectly fine in 32-bits
of IOVA space, and the generic direct mapping code already provides its
own routines that is intelligent based on the amount of memory actually
present.  Wire up the dma-direct routine for the ARM direct mapping code
as well, and otherwise default to the constant 32-bit mask.  This way
we only need to override it for the occasional odd IOMMU that requires
64-bit IOVA support, or IOMMU drivers that are more efficient if they
can fall back to the direct mapping.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c   |  3 +++
 arch/powerpc/platforms/ps3/system-bus.c |  7 --
 arch/x86/kernel/amd_gart_64.c   |  1 +
 kernel/dma/mapping.c| 30 +
 4 files changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 4410af33c5c4..9c9a23e5600d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,6 +193,7 @@ const struct dma_map_ops arm_dma_ops = {
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.dma_supported  = arm_dma_supported,
+   .get_required_mask  = dma_direct_get_required_mask,
 };
 EXPORT_SYMBOL(arm_dma_ops);
 
@@ -212,6 +214,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.map_sg = arm_dma_map_sg,
.map_resource   = dma_direct_map_resource,
.dma_supported  = arm_dma_supported,
+   .get_required_mask  = dma_direct_get_required_mask,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
 
diff --git a/arch/powerpc/platforms/ps3/system-bus.c 
b/arch/powerpc/platforms/ps3/system-bus.c
index 70fcc9736a8c..3542b7bd6a46 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -686,18 +686,12 @@ static int ps3_dma_supported(struct device *_dev, u64 
mask)
return mask >= DMA_BIT_MASK(32);
 }
 
-static u64 ps3_dma_get_required_mask(struct device *_dev)
-{
-   return DMA_BIT_MASK(32);
-}
-
 static const struct dma_map_ops ps3_sb_dma_ops = {
.alloc = ps3_alloc_coherent,
.free = ps3_free_coherent,
.map_sg = ps3_sb_map_sg,
.unmap_sg = ps3_sb_unmap_sg,
.dma_supported = ps3_dma_supported,
-   .get_required_mask = ps3_dma_get_required_mask,
.map_page = ps3_sb_map_page,
.unmap_page = ps3_unmap_page,
.mmap = dma_common_mmap,
@@ -710,7 +704,6 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
.map_sg = ps3_ioc0_map_sg,
.unmap_sg = ps3_ioc0_unmap_sg,
.dma_supported = ps3_dma_supported,
-   .get_required_mask = ps3_dma_get_required_mask,
.map_page = ps3_ioc0_map_page,
.unmap_page = ps3_unmap_page,
.mmap = dma_common_mmap,
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index a65b4a9c7f87..d02662238b57 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -680,6 +680,7 @@ static const struct dma_map_ops gart_dma_ops = {
.dma_supported  = dma_direct_supported,
.mmap   = dma_common_mmap,
.get_sgtable= dma_common_get_sgtable,
+   .get_required_mask  = dma_direct_get_required_mask,
 };
 
 static void gart_iommu_shutdown(void)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index cdb157cd70e7..7dff1829c8c5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -231,25 +231,6 @@ int dma_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
 }
 EXPORT_SYMBOL(dma_mmap_attrs);
 
-static u64 dma_default_get_required_mask(struct device *dev)
-{
-   u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT);
-   u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT));
-   u64 mask;
-
-   if (!high_totalram) {
-   /* convert to mask just covering totalram */
-   low_totalram = (1 << (fls(low_totalram) - 1));
-   low_totalram += low_totalram - 1;
-   mask = low_totalram;
-   } else {
-   high_totalram = (1 << (fls(high_totalram) - 1));
-   high_totalram += high_totalram - 1;
-   mask = (((u64)high_totalram) << 32) + 0x;
-   }
-   return mask;
-}
-
 u64 dma_get_required_mask(struct device *dev)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -258,7 +239,16 @@ u64 dma_get_required_mask(struct device *dev)
return dma_direct_get_required_mask(dev);
if (ops->get_required_mask)
return ops->get_required_mask(dev);
-   return dma_default_get_required_mask(dev);
+
+   /*
+* We require every DMA ops implementation to at least support a 32-bit

remove default fallbacks in dma_map_ops

2019-07-24 Thread Christoph Hellwig
Hi all,

we have a few places where the DMA mapping layer has non-trivial default
actions that are questionable and/or dangerous.

This series instead wires up the mmap, get_sgtable and get_required_mask
methods explicitly and cleans up some surrounding areas.  This also means
we could get rid of the ARCH_NO_COHERENT_DMA_MMAP kconfig option, as we
now require a mmap method wired up, or in case of non-coherent dma-direct
the presence of the arch_dma_coherent_to_pfn hook.  The only interesting
case is that the sound code also checked the ARCH_NO_COHERENT_DMA_MMAP
symbol in somewhat odd ways, so I'd like to see a review of the sound
situation before going forward with that patch.


[PATCH 5/5] dma-mapping: remove ARCH_NO_COHERENT_DMA_MMAP

2019-07-24 Thread Christoph Hellwig
Now that we never use a default ->mmap implementation, and non-coherent
architectures can control the presence of ->mmap support by enabling
ARCH_HAS_DMA_COHERENT_TO_PFN for the dma direct implementation there
is no need for a global config option to control the availability
of dma_common_mmap.

Signed-off-by: Christoph Hellwig 
---
 arch/Kconfig|  3 ---
 arch/c6x/Kconfig|  1 -
 arch/m68k/Kconfig   |  1 -
 arch/microblaze/Kconfig |  1 -
 arch/parisc/Kconfig |  1 -
 arch/sh/Kconfig |  1 -
 arch/xtensa/Kconfig |  1 -
 kernel/dma/mapping.c|  4 
 sound/core/pcm_native.c | 10 +-
 9 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a7b57dd42c26..ec2834206d08 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -790,9 +790,6 @@ config COMPAT_32BIT_TIME
  This is relevant on all 32-bit architectures, and 64-bit architectures
  as part of compat syscall handling.
 
-config ARCH_NO_COHERENT_DMA_MMAP
-   bool
-
 config ARCH_NO_PREEMPT
bool
 
diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig
index b4fb61c83494..e65e8d82442a 100644
--- a/arch/c6x/Kconfig
+++ b/arch/c6x/Kconfig
@@ -20,7 +20,6 @@ config C6X
select OF_EARLY_FLATTREE
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
-   select ARCH_NO_COHERENT_DMA_MMAP
select MMU_GATHER_NO_RANGE if MMU
 
 config MMU
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index c518d695c376..614b355ae338 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -8,7 +8,6 @@ config M68K
select ARCH_HAS_DMA_PREP_COHERENT if HAS_DMA && MMU && !COLDFIRE
select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA
select ARCH_MIGHT_HAVE_PC_PARPORT if ISA
-   select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_NO_PREEMPT if !COLDFIRE
select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
select DMA_DIRECT_REMAP if HAS_DMA && MMU && !COLDFIRE
diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig
index d411de05b628..632c9477a0f6 100644
--- a/arch/microblaze/Kconfig
+++ b/arch/microblaze/Kconfig
@@ -9,7 +9,6 @@ config MICROBLAZE
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_MIGHT_HAVE_PC_PARPORT
-   select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT
select TIMER_OF
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index 6d732e451071..e9dd88b7f81e 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -52,7 +52,6 @@ config PARISC
select GENERIC_SCHED_CLOCK
select HAVE_UNSTABLE_SCHED_CLOCK if SMP
select GENERIC_CLOCKEVENTS
-   select ARCH_NO_COHERENT_DMA_MMAP
select CPU_NO_EFFICIENT_FFS
select NEED_DMA_MAP_STATE
select NEED_SG_DMA_LENGTH
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 6b1b5941b618..f356ee674d89 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -5,7 +5,6 @@ config SUPERH
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_MIGHT_HAVE_PC_PARPORT
-   select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select HAVE_PATA_PLATFORM
select CLKDEV_LOOKUP
select DMA_DECLARE_COHERENT
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig
index ebc135bda921..70653aed3005 100644
--- a/arch/xtensa/Kconfig
+++ b/arch/xtensa/Kconfig
@@ -5,7 +5,6 @@ config XTENSA
select ARCH_HAS_BINFMT_FLAT if !MMU
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
-   select ARCH_NO_COHERENT_DMA_MMAP if !MMU
select ARCH_USE_QUEUED_RWLOCKS
select ARCH_USE_QUEUED_SPINLOCKS
select ARCH_WANT_FRAME_POINTERS
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 7dff1829c8c5..815446f76995 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,6 @@ int dma_common_mmap(struct device *dev, struct 
vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-#ifndef CONFIG_ARCH_NO_COHERENT_DMA_MMAP
unsigned long user_count = vma_pages(vma);
unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long off = vma->vm_pgoff;
@@ -198,9 +197,6 @@ int dma_common_mmap(struct device *dev, struct 
vm_area_struct *vma,
 
return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
user_count << PAGE_SHIFT, vma->vm_page_prot);
-#else
-   return -ENXIO;
-#endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */
 }
 
 /**
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 860543a4c840..2dadc708343a 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -218,15 +218,7 @@ int snd_pcm_info_user(struct snd_pcm_substream *substream,
 
 static bool hw_support_mmap(struct snd_pcm_substream 

[PATCH 2/5] dma-mapping: move the dma_get_sgtable API comments from arm to common code

2019-07-24 Thread Christoph Hellwig
The comments are spot on and should be near the central API, not just
near a single implementation.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 11 ---
 kernel/dma/mapping.c  | 11 +++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6774b03aa405..4410af33c5c4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -877,17 +877,6 @@ static void arm_coherent_dma_free(struct device *dev, 
size_t size, void *cpu_add
__arm_dma_free(dev, size, cpu_addr, handle, attrs, true);
 }
 
-/*
- * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
- * that the intention is to allow exporting memory allocated via the
- * coherent DMA APIs through the dma_buf API, which only accepts a
- * scattertable.  This presents a couple of problems:
- * 1. Not all memory allocated via the coherent DMA APIs is backed by
- *a struct page
- * 2. Passing coherent DMA memory into the streaming APIs is not allowed
- *as we will try to flush the memory through a different alias to that
- *actually being used (and the flushes are redundant.)
- */
 int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 void *cpu_addr, dma_addr_t handle, size_t size,
 unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index b945239621d8..4ceb5b9016d8 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -136,6 +136,17 @@ int dma_common_get_sgtable(struct device *dev, struct 
sg_table *sgt,
return ret;
 }
 
+/*
+ * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
+ * that the intention is to allow exporting memory allocated via the
+ * coherent DMA APIs through the dma_buf API, which only accepts a
+ * scattertable.  This presents a couple of problems:
+ * 1. Not all memory allocated via the coherent DMA APIs is backed by
+ *a struct page
+ * 2. Passing coherent DMA memory into the streaming APIs is not allowed
+ *as we will try to flush the memory through a different alias to that
+ *actually being used (and the flushes are redundant.)
+ */
 int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
-- 
2.20.1



[PATCH 1/5] m68knommu: add a pgprot_noncached stub

2019-07-24 Thread Christoph Hellwig
Provide a pgprot_noncached like all the other nommu ports so that
common code can rely on it being able to be present.  Note that this is
generally code that is not actually run on nommu, but at least we can
avoid nasty ifdefs by having a stub.

Signed-off-by: Christoph Hellwig 
---
 arch/m68k/include/asm/pgtable_no.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/m68k/include/asm/pgtable_no.h 
b/arch/m68k/include/asm/pgtable_no.h
index fc3a96c77bd8..06194c7ba151 100644
--- a/arch/m68k/include/asm/pgtable_no.h
+++ b/arch/m68k/include/asm/pgtable_no.h
@@ -29,6 +29,8 @@
 #define PAGE_READONLY  __pgprot(0)
 #define PAGE_KERNEL__pgprot(0)
 
+#define pgprot_noncached(prot)   (prot)
+
 extern void paging_init(void);
 #define swapper_pg_dir ((pgd_t *) 0)
 
-- 
2.20.1



[PATCH 3/5] dma-mapping: explicitly wire up ->mmap and ->get_sgtable

2019-07-24 Thread Christoph Hellwig
While the default ->mmap and ->get_sgtable implementations work for the
majority of our dma_map_ops impementations they are inherently safe
for others that don't use the page allocator or CMA and/or use their
own way of remapping not covered by the common code.  So remove the
defaults if these methods are not wired up, but instead wire up the
default implementations for all safe instances.

Fixes: e1c7e324539a ("dma-mapping: always provide the dma_map_ops based 
implementation")
Signed-off-by: Christoph Hellwig 
---
 arch/alpha/kernel/pci_iommu.c   |  2 ++
 arch/ia64/hp/common/sba_iommu.c |  2 ++
 arch/ia64/sn/pci/pci_dma.c  |  2 ++
 arch/mips/jazz/jazzdma.c|  2 ++
 arch/powerpc/kernel/dma-iommu.c |  2 ++
 arch/powerpc/platforms/ps3/system-bus.c |  4 
 arch/powerpc/platforms/pseries/vio.c|  2 ++
 arch/s390/pci/pci_dma.c |  2 ++
 arch/sparc/kernel/iommu.c   |  2 ++
 arch/sparc/kernel/pci_sun4v.c   |  2 ++
 arch/x86/kernel/amd_gart_64.c   |  2 ++
 arch/x86/kernel/pci-calgary_64.c|  2 ++
 drivers/iommu/amd_iommu.c   |  2 ++
 drivers/iommu/intel-iommu.c |  2 ++
 kernel/dma/mapping.c| 20 
 15 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c
index 242108439f42..7f1925a32c99 100644
--- a/arch/alpha/kernel/pci_iommu.c
+++ b/arch/alpha/kernel/pci_iommu.c
@@ -955,5 +955,7 @@ const struct dma_map_ops alpha_pci_ops = {
.map_sg = alpha_pci_map_sg,
.unmap_sg   = alpha_pci_unmap_sg,
.dma_supported  = alpha_pci_supported,
+   .mmap   = dma_common_mmap,
+   .get_sgtable= dma_common_get_sgtable,
 };
 EXPORT_SYMBOL(alpha_pci_ops);
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c
index 3d24cc43385b..4c0ea6c2833d 100644
--- a/arch/ia64/hp/common/sba_iommu.c
+++ b/arch/ia64/hp/common/sba_iommu.c
@@ -2183,6 +2183,8 @@ const struct dma_map_ops sba_dma_ops = {
.map_sg = sba_map_sg_attrs,
.unmap_sg   = sba_unmap_sg_attrs,
.dma_supported  = sba_dma_supported,
+   .mmap   = dma_common_mmap,
+   .get_sgtable= dma_common_get_sgtable,
 };
 
 void sba_dma_init(void)
diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c
index b7d42e4edc1f..12ffb9c0d738 100644
--- a/arch/ia64/sn/pci/pci_dma.c
+++ b/arch/ia64/sn/pci/pci_dma.c
@@ -438,6 +438,8 @@ static struct dma_map_ops sn_dma_ops = {
.unmap_sg   = sn_dma_unmap_sg,
.dma_supported  = sn_dma_supported,
.get_required_mask  = sn_dma_get_required_mask,
+   .mmap   = dma_common_mmap,
+   .get_sgtable= dma_common_get_sgtable,
 };
 
 void sn_dma_init(void)
diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c
index 1804dc9d8136..a01e14955187 100644
--- a/arch/mips/jazz/jazzdma.c
+++ b/arch/mips/jazz/jazzdma.c
@@ -682,5 +682,7 @@ const struct dma_map_ops jazz_dma_ops = {
.sync_sg_for_device = jazz_dma_sync_sg_for_device,
.dma_supported  = dma_direct_supported,
.cache_sync = arch_dma_cache_sync,
+   .mmap   = dma_common_mmap,
+   .get_sgtable= dma_common_get_sgtable,
 };
 EXPORT_SYMBOL(jazz_dma_ops);
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index a0879674a9c8..2f5a53874f6d 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -208,4 +208,6 @@ const struct dma_map_ops dma_iommu_ops = {
.sync_single_for_device = dma_iommu_sync_for_device,
.sync_sg_for_cpu= dma_iommu_sync_sg_for_cpu,
.sync_sg_for_device = dma_iommu_sync_sg_for_device,
+   .mmap   = dma_common_mmap,
+   .get_sgtable= dma_common_get_sgtable,
 };
diff --git a/arch/powerpc/platforms/ps3/system-bus.c 
b/arch/powerpc/platforms/ps3/system-bus.c
index 98410119c47b..70fcc9736a8c 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -700,6 +700,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = {
.get_required_mask = ps3_dma_get_required_mask,
.map_page = ps3_sb_map_page,
.unmap_page = ps3_unmap_page,
+   .mmap = dma_common_mmap,
+   .get_sgtable = dma_common_get_sgtable,
 };
 
 static const struct dma_map_ops ps3_ioc0_dma_ops = {
@@ -711,6 +713,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = {
.get_required_mask = ps3_dma_get_required_mask,
.map_page = ps3_ioc0_map_page,
.unmap_page = ps3_unmap_page,
+   .mmap = dma_common_mmap,
+   .get_sgtable = dma_common_get_sgtable,
 };
 
 /**
diff --git a/arch/powerpc/platforms/pseries/vio.c

Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops

2019-07-24 Thread Christoph Hellwig
>  /* Check if the dev needs to go through non-identity map and unmap process.*/
>  static bool iommu_need_mapping(struct device *dev)
>  {
> - int ret;
> -
>   if (iommu_dummy(dev))
>   return false;
>  
> - ret = identity_mapping(dev);
> - if (ret) {
> - u64 dma_mask = *dev->dma_mask;
> -
> - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
> - dma_mask = dev->coherent_dma_mask;
> -
> - if (dma_mask >= dma_get_required_mask(dev))
> - return false;

Don't we need to keep this bit so that we still allow the IOMMU
to act if the device has a too small DMA mask to address all memory in
the system, even if if it should otherwise be identity mapped?


Re: [PATCH v5 04/10] PCI: Add dev_is_untrusted helper

2019-07-24 Thread Christoph Hellwig
On Thu, Jul 25, 2019 at 11:17:11AM +0800, Lu Baolu wrote:
> There are several places in the kernel where it is necessary to
> check whether a device is a pci untrusted device. Add a helper
> to simplify the callers.

Looks good,

Reviewed-by: Christoph Hellwig 


Re: [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma

2019-07-24 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 06:10:53PM -0400, Michael S. Tsirkin wrote:
> Christoph - would a documented API wrapping dma_mask make sense?
> With the documentation explaining how users must
> desist from using DMA APIs if that returns false ...

We have some bigger changes in this are planned, including turning
dma_mask into a scalar instead of a pointer, please stay tuned.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device

2019-07-24 Thread Lu Baolu
This is necessary to avoid exposing valid kernel data to any
malicious device.

Suggested-by: Christoph Hellwig 
Signed-off-by: Lu Baolu 
---
 kernel/dma/swiotlb.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 43c88626a1f3..edc84a00b9f9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
@@ -562,6 +563,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+
+   /* Zero out the bounce buffer if the consumer is untrusted. */
+   if (dev_is_untrusted(hwdev))
+   memset(phys_to_virt(tlb_addr), 0, alloc_size);
+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
-- 
2.17.1



[PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer

2019-07-24 Thread Lu Baolu
This adds a helper to check whether a device needs to
use bounce buffer. It also provides a boot time option
to disable the bounce buffer. Users can use this to
prevent the iommu driver from using the bounce buffer
for performance gain.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
Tested-by: Xu Pengfei 
Tested-by: Mika Westerberg 
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +
 drivers/iommu/intel-iommu.c | 6 ++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 46b826fcb5ad..8628454bd8a2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1732,6 +1732,11 @@
Note that using this option lowers the security
provided by tboot because it makes the system
vulnerable to DMA attacks.
+   nobounce [Default off]
+   Disable bounce buffer for unstrusted devices such as
+   the Thunderbolt devices. This will treat the untrusted
+   devices as the trusted ones, hence might expose security
+   risks of DMA attacks.
 
intel_idle.max_cstate=  [KNL,HW,ACPI,X86]
0   disables intel_idle and fall back on acpi_idle.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a458df975c55..4185406b0368 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -360,6 +360,7 @@ static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
+static int intel_no_bounce;
 
 #define IDENTMAP_ALL   1
 #define IDENTMAP_GFX   2
@@ -373,6 +374,8 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
+#define device_needs_bounce(d) (!intel_no_bounce && dev_is_untrusted(d))
+
 /*
  * Iterate over elements in device_domain_list and call the specified
  * callback @fn against each element.
@@ -455,6 +458,9 @@ static int __init intel_iommu_setup(char *str)
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This 
could expose security risk for tboot\n");
intel_iommu_tboot_noforce = 1;
+   } else if (!strncmp(str, "nobounce", 8)) {
+   pr_info("Intel-IOMMU: No bounce buffer. This could 
expose security risks of DMA attacks\n");
+   intel_no_bounce = 1;
}
 
str += strcspn(str, ",");
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 04/10] PCI: Add dev_is_untrusted helper

2019-07-24 Thread Lu Baolu
There are several places in the kernel where it is necessary to
check whether a device is a pci untrusted device. Add a helper
to simplify the callers.

Signed-off-by: Lu Baolu 
---
 include/linux/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..960352a75a10 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1029,6 +1029,7 @@ void pcibios_setup_bridge(struct pci_bus *bus, unsigned 
long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == &pci_bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
+#define dev_is_untrusted(d) ((dev_is_pci(d) ? to_pci_dev(d)->untrusted : 
false))
 
 /* Generic PCI functions exported to card drivers */
 
@@ -1766,6 +1767,7 @@ static inline struct pci_dev *pci_dev_get(struct pci_dev 
*dev) { return NULL; }
 
 #define dev_is_pci(d) (false)
 #define dev_is_pf(d) (false)
+#define dev_is_untrusted(d) (false)
 static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 { return false; }
 static inline int pci_irqd_intx_xlate(struct irq_domain *d,
-- 
2.17.1



[PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap

2019-07-24 Thread Lu Baolu
This adds trace support for the Intel IOMMU driver. It
also declares some events which could be used to trace
the events when an IOVA is being mapped or unmapped in
a domain.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Mika Westerberg 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/Makefile |  1 +
 drivers/iommu/intel-trace.c| 14 +
 include/trace/events/intel_iommu.h | 95 ++
 3 files changed, 110 insertions(+)
 create mode 100644 drivers/iommu/intel-trace.c
 create mode 100644 include/trace/events/intel_iommu.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..bfe27b2755bd 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
 obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
new file mode 100644
index ..bfb6a6e37a88
--- /dev/null
+++ b/drivers/iommu/intel-trace.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+
+#include 
+#include 
+
+#define CREATE_TRACE_POINTS
+#include 
diff --git a/include/trace/events/intel_iommu.h 
b/include/trace/events/intel_iommu.h
new file mode 100644
index ..3fdeaad93b2e
--- /dev/null
+++ b/include/trace/events/intel_iommu.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu 
+ */
+#ifdef CONFIG_INTEL_IOMMU
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM intel_iommu
+
+#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INTEL_IOMMU_H
+
+#include 
+#include 
+
+DECLARE_EVENT_CLASS(dma_map,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+size_t size),
+
+   TP_ARGS(dev, dev_addr, phys_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(dma_addr_t, dev_addr)
+   __field(phys_addr_t, phys_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->dev_addr = dev_addr;
+   __entry->phys_addr = phys_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ (unsigned long long)__entry->phys_addr,
+ __entry->size)
+);
+
+DEFINE_EVENT(dma_map, bounce_map_single,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+size_t size),
+   TP_ARGS(dev, dev_addr, phys_addr, size)
+);
+
+DEFINE_EVENT(dma_map, bounce_map_sg,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+size_t size),
+   TP_ARGS(dev, dev_addr, phys_addr, size)
+);
+
+DECLARE_EVENT_CLASS(dma_unmap,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+
+   TP_ARGS(dev, dev_addr, size),
+
+   TP_STRUCT__entry(
+   __string(dev_name, dev_name(dev))
+   __field(dma_addr_t, dev_addr)
+   __field(size_t, size)
+   ),
+
+   TP_fast_assign(
+   __assign_str(dev_name, dev_name(dev));
+   __entry->dev_addr = dev_addr;
+   __entry->size = size;
+   ),
+
+   TP_printk("dev=%s dev_addr=0x%llx size=%zu",
+ __get_str(dev_name),
+ (unsigned long long)__entry->dev_addr,
+ __entry->size)
+);
+
+DEFINE_EVENT(dma_unmap, bounce_unmap_single,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+   TP_ARGS(dev, dev_addr, size)
+);
+
+DEFINE_EVENT(dma_unmap, bounce_unmap_sg,
+   TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+   TP_ARGS(dev, dev_addr, size)
+);
+
+#endif /* _TRACE_INTEL_IOMMU_H */
+
+/* This part must be outside protection */
+#include 
+#endif /* CONFIG_INTEL_IOMMU */
-- 
2.17.1



[PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops

2019-07-24 Thread Lu Baolu
After using per-device dma ops, we don't need to check whether
a dvice needs mapping, hence all checks of iommu_need_mapping()
are unnecessary now. Cleanup them.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 50 -
 1 file changed, 5 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 11474bd2e348..a458df975c55 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2749,17 +2749,6 @@ static int __init si_domain_init(int hw)
return 0;
 }
 
-static int identity_mapping(struct device *dev)
-{
-   struct device_domain_info *info;
-
-   info = dev->archdata.iommu;
-   if (info && info != DUMMY_DEVICE_DOMAIN_INFO)
-   return (info->domain == si_domain);
-
-   return 0;
-}
-
 static int domain_add_dev_info(struct dmar_domain *domain, struct device *dev)
 {
struct dmar_domain *ndomain;
@@ -3416,15 +3405,6 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
return domain;
 }
 
-/* Check if the dev needs to go through non-identity map and unmap process.*/
-static bool iommu_need_mapping(struct device *dev)
-{
-   if (iommu_dummy(dev))
-   return false;
-
-   return !identity_mapping(dev);
-}
-
 static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
 size_t size, int dir, u64 dma_mask)
 {
@@ -3486,20 +3466,15 @@ static dma_addr_t intel_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
-   if (iommu_need_mapping(dev))
-   return __intel_map_single(dev, page_to_phys(page) + offset,
-   size, dir, *dev->dma_mask);
-   return dma_direct_map_page(dev, page, offset, size, dir, attrs);
+   return __intel_map_single(dev, page_to_phys(page) + offset,
+ size, dir, *dev->dma_mask);
 }
 
 static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t phys_addr,
 size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
-   if (iommu_need_mapping(dev))
-   return __intel_map_single(dev, phys_addr, size, dir,
-   *dev->dma_mask);
-   return dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
+   return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask);
 }
 
 static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
@@ -3551,17 +3526,13 @@ static void intel_unmap_page(struct device *dev, 
dma_addr_t dev_addr,
 size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
-   if (iommu_need_mapping(dev))
-   intel_unmap(dev, dev_addr, size);
-   else
-   dma_direct_unmap_page(dev, dev_addr, size, dir, attrs);
+   intel_unmap(dev, dev_addr, size);
 }
 
 static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   if (iommu_need_mapping(dev))
-   intel_unmap(dev, dev_addr, size);
+   intel_unmap(dev, dev_addr, size);
 }
 
 static void *intel_alloc_coherent(struct device *dev, size_t size,
@@ -3571,9 +3542,6 @@ static void *intel_alloc_coherent(struct device *dev, 
size_t size,
struct page *page = NULL;
int order;
 
-   if (!iommu_need_mapping(dev))
-   return dma_direct_alloc(dev, size, dma_handle, flags, attrs);
-
size = PAGE_ALIGN(size);
order = get_order(size);
 
@@ -3607,9 +3575,6 @@ static void intel_free_coherent(struct device *dev, 
size_t size, void *vaddr,
int order;
struct page *page = virt_to_page(vaddr);
 
-   if (!iommu_need_mapping(dev))
-   return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-
size = PAGE_ALIGN(size);
order = get_order(size);
 
@@ -3627,9 +3592,6 @@ static void intel_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
struct scatterlist *sg;
int i;
 
-   if (!iommu_need_mapping(dev))
-   return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs);
-
for_each_sg(sglist, sg, nelems, i) {
nrpages += aligned_nrpages(sg_dma_address(sg), sg_dma_len(sg));
}
@@ -3651,8 +3613,6 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
struct intel_iommu *iommu;
 
BUG_ON(dir == DMA_NONE);
-   if (!iommu_need_mapping(dev))
-   return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
 
domain = find_domain(dev);
if (!domain)
-- 
2.17.1

__

[PATCH v5 07/10] iommu: Add bounce page APIs

2019-07-24 Thread Lu Baolu
IOMMU hardware always use paging for DMA remapping.  The
minimum mapped window is a page size. The device drivers
may map buffers not filling whole IOMMU window. It allows
device to access to possibly unrelated memory and various
malicious devices can exploit this to perform DMA attack.

This introduces the bouce buffer mechanism for DMA buffers
which doesn't fill a minimal IOMMU page. It could be used
by various vendor specific IOMMU drivers as long as the
DMA domain is managed by the generic IOMMU layer. Below
APIs are added:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
  - Map a buffer start at DMA address @addr in bounce page
manner. For buffer parts that doesn't cross a whole
minimal IOMMU page, the bounce page policy is applied.
A bounce page mapped by swiotlb will be used as the DMA
target in the IOMMU page table. Otherwise, the physical
address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
  - Unmap the buffer mapped with iommu_bounce_map(). The bounce
page will be torn down after the bounced data get synced.

* iommu_bounce_sync(dev, addr, size, dir, target)
  - Synce the bounced data in case the bounce mapped buffer is
reused.

The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
It's useful for cases where bounce page doesn't needed, for example,
embedded cases.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Alan Cox 
Cc: Mika Westerberg 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/Kconfig |  13 +
 drivers/iommu/iommu.c | 118 ++
 include/linux/iommu.h |  35 +
 3 files changed, 166 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd8cb3c..d7f2e09cbcf2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -86,6 +86,19 @@ config IOMMU_DEFAULT_PASSTHROUGH
 
  If unsure, say N here.
 
+config IOMMU_BOUNCE_PAGE
+   bool "Use bounce page for untrusted devices"
+   depends on IOMMU_API && SWIOTLB
+   help
+ IOMMU hardware always use paging for DMA remapping. The minimum
+ mapped window is a page size. The device drivers may map buffers
+ not filling whole IOMMU window. This allows device to access to
+ possibly unrelated memory and malicious device can exploit this
+ to perform a DMA attack. Select this to use a bounce page for the
+ buffer which doesn't fill a whole IOMU page.
+
+ If unsure, say N here.
+
 config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..fe3815186d72 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2468,3 +2468,121 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
return ops->sva_get_pasid(handle);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+
+/*
+ * Bounce buffer support for external devices:
+ *
+ * IOMMU hardware always use paging for DMA remapping. The minimum mapped
+ * window is a page size. The device drivers may map buffers not filling
+ * whole IOMMU window. This allows device to access to possibly unrelated
+ * memory and malicious device can exploit this to perform a DMA attack.
+ * Use bounce pages for the buffer which doesn't fill whole IOMMU pages.
+ */
+
+static inline size_t
+get_aligned_size(struct iommu_domain *domain, size_t size)
+{
+   return ALIGN(size, 1 << __ffs(domain->pgsize_bitmap));
+}
+
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+   phys_addr_t paddr, size_t size,
+   enum dma_data_direction dir,
+   unsigned long attrs)
+{
+   struct iommu_domain *domain;
+   unsigned int min_pagesz;
+   phys_addr_t tlb_addr;
+   size_t aligned_size;
+   int prot = 0;
+   int ret;
+
+   domain = iommu_get_dma_domain(dev);
+   if (!domain)
+   return DMA_MAPPING_ERROR;
+
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_READ;
+   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_WRITE;
+
+   aligned_size = get_aligned_size(domain, size);
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+   /*
+* If both the physical buffer start address and size are
+* page aligned, we don't need to use a bounce page.
+*/
+   if (!IS_ALIGNED(paddr | size, min_pagesz)) {
+   tlb_addr = swiotlb_tbl_map_single(dev,
+   __phys_to_dma(dev, io_tlb_start),
+   paddr, size, aligned_size, dir, attrs);
+   if (tlb_addr == DMA_MAPPING_ERROR)
+   return DMA_MAPPING_ERROR;
+   } else {
+   tlb_addr = paddr;
+   }
+
+   ret = iommu_map(domain, iova, tlb_addr, aligned_size, prot);
+   if

[PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops

2019-07-24 Thread Lu Baolu
Current Intel IOMMU driver sets the system level dma_ops hence
each dma API will go through the IOMMU driver even the devices
are using an identity mapped domain. This applies per-device
dma_ops in this driver and leave the system level dma_ops for
direct dma.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian \
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 43 ++---
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8064af607d3b..11474bd2e348 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3419,43 +3419,10 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap process.*/
 static bool iommu_need_mapping(struct device *dev)
 {
-   int ret;
-
if (iommu_dummy(dev))
return false;
 
-   ret = identity_mapping(dev);
-   if (ret) {
-   u64 dma_mask = *dev->dma_mask;
-
-   if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
-   dma_mask = dev->coherent_dma_mask;
-
-   if (dma_mask >= dma_get_required_mask(dev))
-   return false;
-
-   /*
-* 32 bit DMA is removed from si_domain and fall back to
-* non-identity mapping.
-*/
-   dmar_remove_one_dev_info(dev);
-   ret = iommu_request_dma_domain_for_dev(dev);
-   if (ret) {
-   struct iommu_domain *domain;
-   struct dmar_domain *dmar_domain;
-
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain) {
-   dmar_domain = to_dmar_domain(domain);
-   dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-   }
-   get_private_domain_for_dev(dev);
-   }
-
-   dev_info(dev, "32bit DMA uses non-identity mapping\n");
-   }
-
-   return true;
+   return !identity_mapping(dev);
 }
 
 static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
@@ -4706,8 +4673,6 @@ int __init intel_iommu_init(void)
}
up_write(&dmar_global_lock);
 
-   dma_ops = &intel_dma_ops;
-
init_iommu_pm_ops();
 
for_each_active_iommu(iommu, drhd) {
@@ -5280,6 +5245,8 @@ static int intel_iommu_add_device(struct device *dev)
dev_info(dev,
 "Device uses a private identity 
domain.\n");
}
+   } else {
+   set_dma_ops(dev, &intel_dma_ops);
}
} else {
if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
@@ -5295,6 +5262,8 @@ static int intel_iommu_add_device(struct device *dev)
dev_info(dev,
 "Device uses a private dma domain.\n");
}
+
+   set_dma_ops(dev, &intel_dma_ops);
}
}
 
@@ -5313,6 +5282,8 @@ static void intel_iommu_remove_device(struct device *dev)
iommu_group_remove_device(dev);
 
iommu_device_unlink(&iommu->iommu, dev);
+
+   set_dma_ops(dev, NULL);
 }
 
 static void intel_iommu_get_resv_regions(struct device *device,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs

2019-07-24 Thread Lu Baolu
This splits the size parameter to swiotlb_tbl_map_single() and
swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size
parameter, where the latter one is rounded up to the iommu page
size.

Suggested-by: Christoph Hellwig 
Signed-off-by: Lu Baolu 
---
 drivers/xen/swiotlb-xen.c |  8 
 include/linux/swiotlb.h   |  8 ++--
 kernel/dma/direct.c   |  2 +-
 kernel/dma/swiotlb.c  | 24 +---
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index cfbe46785a3b..58d25486971e 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -400,8 +400,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 */
trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
-attrs);
+   map = swiotlb_tbl_map_single(dev, start_dma_addr, phys,
+size, size, dir, attrs);
if (map == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
@@ -411,7 +411,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
 * Ensure that the address returned is DMA'ble
 */
if (unlikely(!dma_capable(dev, dev_addr, size))) {
-   swiotlb_tbl_unmap_single(dev, map, size, dir,
+   swiotlb_tbl_unmap_single(dev, map, size, size, dir,
attrs | DMA_ATTR_SKIP_CPU_SYNC);
return DMA_MAPPING_ERROR;
}
@@ -447,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
 
/* NOTE: We use dev_addr here, not paddr! */
if (is_xen_swiotlb_buffer(dev_addr))
-   swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+   swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 361f62bb4a8e..cde3dc18e21a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,13 +46,17 @@ enum dma_sync_target {
 
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
  dma_addr_t tbl_dma_addr,
- phys_addr_t phys, size_t size,
+ phys_addr_t phys,
+ size_t mapping_size,
+ size_t alloc_size,
  enum dma_data_direction dir,
  unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 phys_addr_t tlb_addr,
-size_t size, enum dma_data_direction dir,
+size_t mapping_size,
+size_t alloc_size,
+enum dma_data_direction dir,
 unsigned long attrs);
 
 extern void swiotlb_tbl_sync_single(struct device *hwdev,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 59bdceea3737..6c183326c4e6 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -297,7 +297,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t 
addr,
dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
if (unlikely(is_swiotlb_buffer(phys)))
-   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+   swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
 }
 EXPORT_SYMBOL(dma_direct_unmap_page);
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9de232229063..43c88626a1f3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -444,7 +444,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
phys_addr_t tlb_addr,
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
   dma_addr_t tbl_dma_addr,
-  phys_addr_t orig_addr, size_t size,
+  phys_addr_t orig_addr,
+  size_t mapping_size,
+  size_t alloc_size,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
@@ -481,8 +483,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 * For mappings greater than or equal to a page, we limit the stride
 * (and hence alignment) to a page size.
 */
-   nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-   if (size >= PAGE_SIZE)
+   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+   if (alloc_size >= PAGE_SI

[PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices

2019-07-24 Thread Lu Baolu
The Intel VT-d hardware uses paging for DMA remapping.
The minimum mapped window is a page size. The device
drivers may map buffers not filling the whole IOMMU
window. This allows the device to access to possibly
unrelated memory and a malicious device could exploit
this to perform DMA attacks. To address this, the
Intel IOMMU driver will use bounce pages for those
buffers which don't fill whole IOMMU pages.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
Tested-by: Xu Pengfei 
Tested-by: Mika Westerberg 
---
 drivers/iommu/Kconfig   |   2 +
 drivers/iommu/intel-iommu.c | 188 +++-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d7f2e09cbcf2..3baa418edc16 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -195,6 +195,8 @@ config INTEL_IOMMU
select IOMMU_IOVA
select NEED_DMA_MAP_STATE
select DMAR_TABLE
+   select SWIOTLB
+   select IOMMU_BOUNCE_PAGE
help
  DMA remapping (DMAR) devices support enables independent address
  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4185406b0368..2cdec279ccac 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 #include "intel-pasid.h"
@@ -3672,6 +3673,183 @@ static const struct dma_map_ops intel_dma_ops = {
.dma_supported = dma_direct_supported,
 };
 
+static dma_addr_t
+bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs,
+ u64 dma_mask)
+{
+   struct dmar_domain *domain;
+   struct intel_iommu *iommu;
+   unsigned long iova_pfn;
+   unsigned long nrpages;
+   dma_addr_t ret_addr;
+
+   domain = find_domain(dev);
+   if (WARN_ON(dir == DMA_NONE || !domain))
+   return DMA_MAPPING_ERROR;
+
+   iommu = domain_get_iommu(domain);
+   nrpages = aligned_nrpages(0, size);
+   iova_pfn = intel_alloc_iova(dev, domain,
+   dma_to_mm_pfn(nrpages), dma_mask);
+   if (!iova_pfn)
+   return DMA_MAPPING_ERROR;
+
+   ret_addr = iommu_bounce_map(dev, iova_pfn << PAGE_SHIFT,
+   paddr, size, dir, attrs);
+   if (ret_addr == DMA_MAPPING_ERROR) {
+   free_iova_fast(&domain->iovad, iova_pfn, 
dma_to_mm_pfn(nrpages));
+   return DMA_MAPPING_ERROR;
+   }
+
+   trace_bounce_map_single(dev, iova_pfn << PAGE_SHIFT, paddr, size);
+
+   return ret_addr;
+}
+
+static void
+bounce_unmap_single(struct device *dev, dma_addr_t dev_addr, size_t size,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   struct dmar_domain *domain;
+   struct intel_iommu *iommu;
+   unsigned long iova_pfn;
+   unsigned long nrpages;
+
+   domain = find_domain(dev);
+   if (WARN_ON(!domain))
+   return;
+
+   iommu_bounce_unmap(dev, dev_addr, size, dir, attrs);
+   trace_bounce_unmap_single(dev, dev_addr, size);
+
+   iommu = domain_get_iommu(domain);
+   iova_pfn = IOVA_PFN(dev_addr);
+   nrpages = aligned_nrpages(0, size);
+
+   iommu_flush_iotlb_psi(iommu, domain,
+ mm_to_dma_pfn(iova_pfn), nrpages, 0, 0);
+   free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages));
+}
+
+static dma_addr_t
+bounce_map_page(struct device *dev, struct page *page, unsigned long offset,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+   return bounce_map_single(dev, page_to_phys(page) + offset,
+size, dir, attrs, *dev->dma_mask);
+}
+
+static dma_addr_t
+bounce_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   return bounce_map_single(dev, phys_addr, size,
+dir, attrs, *dev->dma_mask);
+}
+
+static void
+bounce_unmap_page(struct device *dev, dma_addr_t dev_addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+   bounce_unmap_single(dev, dev_addr, size, dir, attrs);
+}
+
+static void
+bounce_unmap_resource(struct device *dev, dma_addr_t dev_addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+   bounce_unmap_single(dev, dev_addr, size, dir, attrs);
+}
+
+static void
+bounce_unmap_sg(struct device *dev, struct scatterlist *sglist, int nelems,
+   enum dma_data_direction dir, unsigned long attrs)
+{
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sglist, sg, nelems, i)
+   bounce_unmap_page(dev, sg->dma_address,
+

[PATCH v5 00/10] iommu: Bounce page for untrusted devices

2019-07-24 Thread Lu Baolu
The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.

An external PCI device is a PCI peripheral device connected
to the system through an external bus, such as Thunderbolt.
What makes it different is that it can't be trusted to the
same degree as the devices build into the system. Generally,
a trusted PCIe device will DMA into the designated buffers
and not overrun or otherwise write outside the specified
bounds. But it's different for an external device.

The minimum IOMMU mapping granularity is one page (4k), so
for DMA transfers smaller than that a malicious PCIe device
can access the whole page of memory even if it does not
belong to the driver in question. This opens a possibility
for DMA attack. For more information about DMA attacks
imposed by an untrusted PCI/PCIe device, please refer to [2].

This implements bounce buffer for the untrusted external
devices. The transfers should be limited in isolated pages
so the IOMMU window does not cover memory outside of what
the driver expects. Previously (v3 and before), we proposed
an optimisation to only copy the head and tail of the buffer
if it spans multiple pages, and directly map the ones in the
middle. Figure 1 gives a big picture about this solution.

swiotlb System
IOVA  bounce page   Memory
 .-.  .-..-.
 | |  | || |
 | |  | || |
buffer_start .-.  .-..-.
 | |->| |***>| |
 | |  | | swiotlb| |
 | |  | | mapping| |
 IOMMU Page  '-'  '-''-'
  Boundary   | | | |
 | | | |
 | | | |
 | |>| |
 | |IOMMU mapping| |
 | | | |
 IOMMU Page  .-. .-.
  Boundary   | | | |
 | | | |
 | |>| |
 | | IOMMU mapping   | |
 | | | |
 | | | |
 IOMMU Page  .-.  .-..-.
  Boundary   | |  | || |
 | |  | || |
 | |->| |***>| |
  buffer_end '-'  '-' swiotlb'-'
 | |  | | mapping| |
 | |  | || |
 '-'  '-''-'
  Figure 1: A big view of iommu bounce page 

As Robin Murphy pointed out, this ties us to using strict mode for
TLB maintenance, which may not be an overall win depending on the
balance between invalidation bandwidth vs. memcpy bandwidth. If we
use standard SWIOTLB logic to always copy the whole thing, we should
be able to release the bounce pages via the flush queue to allow
'safe' lazy unmaps. So since v4 we start to use the standard swiotlb
logic.

swiotlb System
IOVA  bounce page   Memory
buffer_start .-.  .-..-.
 | |  | || |
 | |  | || |
 | |  | |.-.physical
 | |->| | -->| |_start  
 | |iommu | | swiotlb| |
 | | map  | |   map  | |
 IOMMU Page  .-.  .-.'-'
  Boundary   | |  | || |
 | |  | || |
 | |->| || |
 | |iommu | || |
 | | map  | || |
 | |  | || |
 IOMMU Page  .-.  .-..-.
  Boundary   | |  | || |
 | |->| || |
 | |iommu | || |
 | | map  | || |
 | |  

[PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma

2019-07-24 Thread Lu Baolu
The direct dma implementation depends on swiotlb. Hence, don't
switch off swiotlb since direct dma interfaces are used in this
driver.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bdaed2da8a55..8064af607d3b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4568,9 +4568,6 @@ static int __init platform_optin_force_iommu(void)
iommu_identity_mapping |= IDENTMAP_ALL;
 
dmar_disabled = 0;
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-   swiotlb = 0;
-#endif
no_iommu = 0;
 
return 1;
@@ -4709,9 +4706,6 @@ int __init intel_iommu_init(void)
}
up_write(&dmar_global_lock);
 
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-   swiotlb = 0;
-#endif
dma_ops = &intel_dma_ops;
 
init_iommu_pm_ops();
-- 
2.17.1



[PATCH] media: staging: tegra-vde: Fix build error

2019-07-24 Thread YueHaibing
If IOMMU_SUPPORT is not set, and COMPILE_TEST is y,
IOMMU_IOVA may be set to m. So building will fails:

drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'

Select IOMMU_IOVA while COMPILE_TEST is set to fix this.

Reported-by: Hulk Robot 
Suggested-by: Dmitry Osipenko 
Fixes: b301f8de1925 ("media: staging: media: tegra-vde: Add IOMMU support")
Signed-off-by: YueHaibing 
---
 drivers/staging/media/tegra-vde/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/tegra-vde/Kconfig 
b/drivers/staging/media/tegra-vde/Kconfig
index 2e7f644..ba49ea5 100644
--- a/drivers/staging/media/tegra-vde/Kconfig
+++ b/drivers/staging/media/tegra-vde/Kconfig
@@ -3,7 +3,7 @@ config TEGRA_VDE
tristate "NVIDIA Tegra Video Decoder Engine driver"
depends on ARCH_TEGRA || COMPILE_TEST
select DMA_SHARED_BUFFER
-   select IOMMU_IOVA if IOMMU_SUPPORT
+   select IOMMU_IOVA if (IOMMU_SUPPORT || COMPILE_TEST)
select SRAM
help
Say Y here to enable support for the NVIDIA Tegra video decoder
-- 
2.7.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 7/7] iommu/vt-d: Consolidate domain_init() to avoid duplication

2019-07-24 Thread Lu Baolu

Hi,

On 7/22/19 10:22 PM, Joerg Roedel wrote:

On Sat, Jul 20, 2019 at 09:15:58AM +0800, Lu Baolu wrote:

Okay. I agree wit you. Let's revert this commit first.


Reverted the patch and queued it to my iommu/fixes branch.


Thank you! Joerg.

Best regards,
Baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/vt-d: Correctly check format of page table in debugfs

2019-07-24 Thread Lu Baolu

Hi Sai,

On 7/22/19 1:21 PM, Prakhya, Sai Praneeth wrote:

Hi Allen,


diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-
iommu-debugfs.c
index 73a552914455..e31c3b416351 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -235,7 +235,7 @@ static void ctx_tbl_walk(struct seq_file *m, struct
intel_iommu *iommu, u16 bus)
tbl_wlk.ctx_entry = context;
m->private = &tbl_wlk;

-   if (pasid_supported(iommu) && is_pasid_enabled(context)) {
+   if (dmar_readq(iommu->reg + DMAR_RTADDR_REG) &
DMA_RTADDR_SMT) {


Thanks for adding this, I do believe this is a good addition but I also think 
that we might
need "is_pasid_enabled()" as well. It checks if PASIDE bit in context entry is 
enabled or not.

I am thinking that even though DMAR might be using scalable root and context 
table, the entry
itself should have PASIDE bit set. Did I miss something here?


No matter the PASIDE bit set or not, IOMMU always uses the scalable mode
page table if scalable mode is enabled. If PASIDE is set, requests with
PASID will be handled. Otherwise, requests with PASID will be blocked
(but request without PASID will always be handled).

We are dumpling the page table of the IOMMU, so we only care about what
page table format it is using. Do I understand it right>

Best regards,
Baolu



And I also think a macro would be better so that it could reused elsewhere (if 
need be).

Regards,
Sai


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-24 Thread Michael S. Tsirkin
On Tue, Jul 23, 2019 at 05:38:51PM +0200, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index c8be1c4f5b55..37c143971211 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
> >>   {
> >>size_t max_segment_size = SIZE_MAX;
> >>   -if (vring_use_dma_api(vdev))
> >> +  if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
> >
> > Hmm, might it make sense to roll that check up into vring_use_dma_api() 
> > itself? After all, if the device has no mask then it's likely that other 
> > DMA API ops wouldn't really work as expected either.
> 
> Makes sense to me.

Christoph - would a documented API wrapping dma_mask make sense?
With the documentation explaining how users must
desist from using DMA APIs if that returns false ...


-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/4] dt-bindings: iommu/arm,smmu: add compatible string for Marvell

2019-07-24 Thread Rob Herring
On Thu, 11 Jul 2019 17:02:41 +0200, Gregory CLEMENT wrote:
> From: Hanna Hawa 
> 
> Add specific compatible string for Marvell usage due errata of
> accessing 64bits registers of ARM SMMU, in AP806.
> 
> AP806 SoC uses the generic ARM-MMU500, and there's no specific
> implementation of Marvell, this compatible is used for errata only.
> 
> Signed-off-by: Hanna Hawa 
> Signed-off-by: Gregory CLEMENT 
> ---
>  Documentation/devicetree/bindings/iommu/arm,smmu.txt | 1 +
>  1 file changed, 1 insertion(+)
> 

Reviewed-by: Rob Herring 


Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Lendacky, Thomas
On 7/24/19 1:40 PM, Kirill A. Shutemov wrote:
> On Wed, Jul 24, 2019 at 06:30:21PM +, Lendacky, Thomas wrote:
>> On 7/24/19 1:11 PM, Kirill A. Shutemov wrote:
>>> On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote:
 On 7/24/19 12:06 PM, Robin Murphy wrote:
> On 24/07/2019 17:42, Lendacky, Thomas wrote:
>> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
>>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
 @@ -351,6 +355,32 @@ bool sev_active(void)
   }
   EXPORT_SYMBOL(sev_active);
   +/* Override for DMA direct allocation check -
 ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 +bool force_dma_unencrypted(struct device *dev)
 +{
 +    /*
 + * For SEV, all DMA must be to unencrypted addresses.
 + */
 +    if (sev_active())
 +    return true;
 +
 +    /*
 + * For SME, all DMA must be to unencrypted addresses if the
 + * device does not support DMA to addresses that include the
 + * encryption mask.
 + */
 +    if (sme_active()) {
 +    u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
 +    u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
 +    dev->bus_dma_mask);
 +
 +    if (dma_dev_mask <= dma_enc_mask)
 +    return true;
>>>
>>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
>>> means that device mask is wide enough to cover encryption bit, doesn't 
>>> it?
>>
>> Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's 
>> say
>> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
>> will generate a mask without bit 47 set (0x7fff). So the check
>> will catch anything that does not support at least 48-bit DMA.
>
> Couldn't that be expressed as just:
>
> if (sme_me_mask & dma_dev_mask == sme_me_mask)

 Actually !=, but yes, it could have been done like that, I just didn't
 think of it.
>>>
>>> I'm looking into generalizing the check to cover MKTME.
>>>
>>> Leaving off the Kconfig changes and moving the check to other file, 
>>> doest
>>> the change below look reasonable to you. It's only build tested so far.
>>>
>>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>>> index fece30ca8b0c..6c86adcd02da 100644
>>> --- a/arch/x86/mm/mem_encrypt.c
>>> +++ b/arch/x86/mm/mem_encrypt.c
>>> @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
>>>  /* Override for DMA direct allocation check - 
>>> ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>>>  bool force_dma_unencrypted(struct device *dev)
>>>  {
>>> +   u64 dma_enc_mask;
>>> +
>>> /*
>>>  * For SEV, all DMA must be to unencrypted addresses.
>>>  */
>>> @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev)
>>> return true;
>>>  
>>> /*
>>> -* For SME, all DMA must be to unencrypted addresses if the
>>> -* device does not support DMA to addresses that include the
>>> -* encryption mask.
>>> +* For SME and MKTME, all DMA must be to unencrypted addresses if the
>>> +* device does not support DMA to addresses that include the encryption
>>> +* mask.
>>>  */
>>> -   if (sme_active()) {
>>> -   u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>>> -   u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>>> -   dev->bus_dma_mask);
>>> +   if (!sme_active() && !mktme_enabled())
>>> +   return false;
>>>  
>>> -   if (dma_dev_mask <= dma_enc_mask)
>>> -   return true;
>>> -   }
>>> +   dma_enc_mask = sme_me_mask | mktme_keyid_mask();
>>> +
>>> +   if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) 
>>> != dma_enc_mask)
>>> +   return true;
>>> +
>>> +   if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != 
>>> dma_enc_mask)
>>> +   return true;
>>
>> Do you want to err on the side of caution and return true if both masks
>> are zero? You could do the min_not_zero step and then return true if the
>> result is zero. Then just make the one comparison against dma_enc_mask.
> 
> Something like this?

Yup, looks good.

Thanks,
Tom

> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index fece30ca8b0c..173d68b08c55 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
>  /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED 
> */
>  bool force_dma_unencrypted(struct device *dev)
>  {
> + u64 dma_enc_mask, dma_dev_mask;
> +
>   /*
>* For SEV, all DMA must be to unencrypted addresses.
>*/
> @@ -362,20 +364,17 @@ bool force_dma_unencrypted(struct device *dev)
>   return

Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Kirill A. Shutemov
On Wed, Jul 24, 2019 at 06:30:21PM +, Lendacky, Thomas wrote:
> On 7/24/19 1:11 PM, Kirill A. Shutemov wrote:
> > On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote:
> >> On 7/24/19 12:06 PM, Robin Murphy wrote:
> >>> On 24/07/2019 17:42, Lendacky, Thomas wrote:
>  On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
> > On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
> >> @@ -351,6 +355,32 @@ bool sev_active(void)
> >>   }
> >>   EXPORT_SYMBOL(sev_active);
> >>   +/* Override for DMA direct allocation check -
> >> ARCH_HAS_FORCE_DMA_UNENCRYPTED */
> >> +bool force_dma_unencrypted(struct device *dev)
> >> +{
> >> +    /*
> >> + * For SEV, all DMA must be to unencrypted addresses.
> >> + */
> >> +    if (sev_active())
> >> +    return true;
> >> +
> >> +    /*
> >> + * For SME, all DMA must be to unencrypted addresses if the
> >> + * device does not support DMA to addresses that include the
> >> + * encryption mask.
> >> + */
> >> +    if (sme_active()) {
> >> +    u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> >> +    u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> >> +    dev->bus_dma_mask);
> >> +
> >> +    if (dma_dev_mask <= dma_enc_mask)
> >> +    return true;
> >
> > Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
> > means that device mask is wide enough to cover encryption bit, doesn't 
> > it?
> 
>  Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's 
>  say
>  that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
>  will generate a mask without bit 47 set (0x7fff). So the check
>  will catch anything that does not support at least 48-bit DMA.
> >>>
> >>> Couldn't that be expressed as just:
> >>>
> >>> if (sme_me_mask & dma_dev_mask == sme_me_mask)
> >>
> >> Actually !=, but yes, it could have been done like that, I just didn't
> >> think of it.
> > 
> > I'm looking into generalizing the check to cover MKTME.
> > 
> > Leaving off the Kconfig changes and moving the check to other file, 
> > doest
> > the change below look reasonable to you. It's only build tested so far.
> > 
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index fece30ca8b0c..6c86adcd02da 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
> >  /* Override for DMA direct allocation check - 
> > ARCH_HAS_FORCE_DMA_UNENCRYPTED */
> >  bool force_dma_unencrypted(struct device *dev)
> >  {
> > +   u64 dma_enc_mask;
> > +
> > /*
> >  * For SEV, all DMA must be to unencrypted addresses.
> >  */
> > @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev)
> > return true;
> >  
> > /*
> > -* For SME, all DMA must be to unencrypted addresses if the
> > -* device does not support DMA to addresses that include the
> > -* encryption mask.
> > +* For SME and MKTME, all DMA must be to unencrypted addresses if the
> > +* device does not support DMA to addresses that include the encryption
> > +* mask.
> >  */
> > -   if (sme_active()) {
> > -   u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> > -   u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> > -   dev->bus_dma_mask);
> > +   if (!sme_active() && !mktme_enabled())
> > +   return false;
> >  
> > -   if (dma_dev_mask <= dma_enc_mask)
> > -   return true;
> > -   }
> > +   dma_enc_mask = sme_me_mask | mktme_keyid_mask();
> > +
> > +   if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) 
> > != dma_enc_mask)
> > +   return true;
> > +
> > +   if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != 
> > dma_enc_mask)
> > +   return true;
> 
> Do you want to err on the side of caution and return true if both masks
> are zero? You could do the min_not_zero step and then return true if the
> result is zero. Then just make the one comparison against dma_enc_mask.

Something like this?

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index fece30ca8b0c..173d68b08c55 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
+   u64 dma_enc_mask, dma_dev_mask;
+
/*
 * For SEV, all DMA must be to unencrypted addresses.
 */
@@ -362,20 +364,17 @@ bool force_dma_unencrypted(struct device *dev)
return true;
 
/*
-* For SME, all DMA must be to unencrypted addresses if the
-* device does

Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Lendacky, Thomas
On 7/24/19 1:11 PM, Kirill A. Shutemov wrote:
> On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote:
>> On 7/24/19 12:06 PM, Robin Murphy wrote:
>>> On 24/07/2019 17:42, Lendacky, Thomas wrote:
 On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
>> @@ -351,6 +355,32 @@ bool sev_active(void)
>>   }
>>   EXPORT_SYMBOL(sev_active);
>>   +/* Override for DMA direct allocation check -
>> ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>> +bool force_dma_unencrypted(struct device *dev)
>> +{
>> +    /*
>> + * For SEV, all DMA must be to unencrypted addresses.
>> + */
>> +    if (sev_active())
>> +    return true;
>> +
>> +    /*
>> + * For SME, all DMA must be to unencrypted addresses if the
>> + * device does not support DMA to addresses that include the
>> + * encryption mask.
>> + */
>> +    if (sme_active()) {
>> +    u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>> +    u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>> +    dev->bus_dma_mask);
>> +
>> +    if (dma_dev_mask <= dma_enc_mask)
>> +    return true;
>
> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
> means that device mask is wide enough to cover encryption bit, doesn't it?

 Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's say
 that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
 will generate a mask without bit 47 set (0x7fff). So the check
 will catch anything that does not support at least 48-bit DMA.
>>>
>>> Couldn't that be expressed as just:
>>>
>>> if (sme_me_mask & dma_dev_mask == sme_me_mask)
>>
>> Actually !=, but yes, it could have been done like that, I just didn't
>> think of it.
> 
> I'm looking into generalizing the check to cover MKTME.
> 
> Leaving   off the Kconfig changes and moving the check to other file, 
> doest
> the change below look reasonable to you. It's only build tested so far.
> 
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index fece30ca8b0c..6c86adcd02da 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
>  /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED 
> */
>  bool force_dma_unencrypted(struct device *dev)
>  {
> + u64 dma_enc_mask;
> +
>   /*
>* For SEV, all DMA must be to unencrypted addresses.
>*/
> @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev)
>   return true;
>  
>   /*
> -  * For SME, all DMA must be to unencrypted addresses if the
> -  * device does not support DMA to addresses that include the
> -  * encryption mask.
> +  * For SME and MKTME, all DMA must be to unencrypted addresses if the
> +  * device does not support DMA to addresses that include the encryption
> +  * mask.
>*/
> - if (sme_active()) {
> - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> - dev->bus_dma_mask);
> + if (!sme_active() && !mktme_enabled())
> + return false;
>  
> - if (dma_dev_mask <= dma_enc_mask)
> - return true;
> - }
> + dma_enc_mask = sme_me_mask | mktme_keyid_mask();
> +
> + if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) 
> != dma_enc_mask)
> + return true;
> +
> + if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != 
> dma_enc_mask)
> + return true;

Do you want to err on the side of caution and return true if both masks
are zero? You could do the min_not_zero step and then return true if the
result is zero. Then just make the one comparison against dma_enc_mask.

Thanks,
Tom

>  
>   return false;
>  }
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Kirill A. Shutemov
On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote:
> On 7/24/19 12:06 PM, Robin Murphy wrote:
> > On 24/07/2019 17:42, Lendacky, Thomas wrote:
> >> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
> >>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
>  @@ -351,6 +355,32 @@ bool sev_active(void)
>    }
>    EXPORT_SYMBOL(sev_active);
>    +/* Override for DMA direct allocation check -
>  ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>  +bool force_dma_unencrypted(struct device *dev)
>  +{
>  +    /*
>  + * For SEV, all DMA must be to unencrypted addresses.
>  + */
>  +    if (sev_active())
>  +    return true;
>  +
>  +    /*
>  + * For SME, all DMA must be to unencrypted addresses if the
>  + * device does not support DMA to addresses that include the
>  + * encryption mask.
>  + */
>  +    if (sme_active()) {
>  +    u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>  +    u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>  +    dev->bus_dma_mask);
>  +
>  +    if (dma_dev_mask <= dma_enc_mask)
>  +    return true;
> >>>
> >>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
> >>> means that device mask is wide enough to cover encryption bit, doesn't it?
> >>
> >> Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's say
> >> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
> >> will generate a mask without bit 47 set (0x7fff). So the check
> >> will catch anything that does not support at least 48-bit DMA.
> > 
> > Couldn't that be expressed as just:
> > 
> > if (sme_me_mask & dma_dev_mask == sme_me_mask)
> 
> Actually !=, but yes, it could have been done like that, I just didn't
> think of it.

I'm looking into generalizing the check to cover MKTME.

Leaving off the Kconfig changes and moving the check to other file, doest
the change below look reasonable to you. It's only build tested so far.

diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index fece30ca8b0c..6c86adcd02da 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active);
 /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 bool force_dma_unencrypted(struct device *dev)
 {
+   u64 dma_enc_mask;
+
/*
 * For SEV, all DMA must be to unencrypted addresses.
 */
@@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev)
return true;
 
/*
-* For SME, all DMA must be to unencrypted addresses if the
-* device does not support DMA to addresses that include the
-* encryption mask.
+* For SME and MKTME, all DMA must be to unencrypted addresses if the
+* device does not support DMA to addresses that include the encryption
+* mask.
 */
-   if (sme_active()) {
-   u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
-   u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
-   dev->bus_dma_mask);
+   if (!sme_active() && !mktme_enabled())
+   return false;
 
-   if (dma_dev_mask <= dma_enc_mask)
-   return true;
-   }
+   dma_enc_mask = sme_me_mask | mktme_keyid_mask();
+
+   if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) 
!= dma_enc_mask)
+   return true;
+
+   if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != 
dma_enc_mask)
+   return true;
 
return false;
 }
-- 
 Kirill A. Shutemov


Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2019-07-24 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 07:23:50PM +0200, Nicolas Saenz Julienne wrote:
> Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb
> instead of arm_dma_ops altogether?

Nothing fundamental.  We just need to do a very careful piecemeal
migration as the arm code handles a ot of interesting corner case and
we need to ensure we don't break that.  I have various WIP patches
for the easier bits and we can work from there.


Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Lendacky, Thomas
On 7/24/19 12:06 PM, Robin Murphy wrote:
> On 24/07/2019 17:42, Lendacky, Thomas wrote:
>> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
>>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
 @@ -351,6 +355,32 @@ bool sev_active(void)
   }
   EXPORT_SYMBOL(sev_active);
   +/* Override for DMA direct allocation check -
 ARCH_HAS_FORCE_DMA_UNENCRYPTED */
 +bool force_dma_unencrypted(struct device *dev)
 +{
 +    /*
 + * For SEV, all DMA must be to unencrypted addresses.
 + */
 +    if (sev_active())
 +    return true;
 +
 +    /*
 + * For SME, all DMA must be to unencrypted addresses if the
 + * device does not support DMA to addresses that include the
 + * encryption mask.
 + */
 +    if (sme_active()) {
 +    u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
 +    u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
 +    dev->bus_dma_mask);
 +
 +    if (dma_dev_mask <= dma_enc_mask)
 +    return true;
>>>
>>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
>>> means that device mask is wide enough to cover encryption bit, doesn't it?
>>
>> Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's say
>> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
>> will generate a mask without bit 47 set (0x7fff). So the check
>> will catch anything that does not support at least 48-bit DMA.
> 
> Couldn't that be expressed as just:
> 
> if (sme_me_mask & dma_dev_mask == sme_me_mask)

Actually !=, but yes, it could have been done like that, I just didn't
think of it.

Thanks,
Tom

> 
> ?
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2019-07-24 Thread Nicolas Saenz Julienne
On Tue, 2019-07-09 at 07:20 -0700, Christoph Hellwig wrote:
> The DMA API requires that 32-bit DMA masks are always supported, but on
> arm LPAE configs they do not currently work when memory is present
> above 4GB.  Wire up the swiotlb code like for all other architectures
> to provide the bounce buffering in that case.
> 
> Fixes: 21e07dba9fb11 ("scsi: reduce use of block bounce buffers").
> Reported-by: Roger Quadros 
> Signed-off-by: Christoph Hellwig 
> ---

Hi Chistoph,
Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb
instead of arm_dma_ops altogether?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] dma-mapping fix for 5.3

2019-07-24 Thread pr-tracker-bot
The pull request you sent on Wed, 24 Jul 2019 16:49:42 +0200:

> git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c2626876c24fe1f326381e3f1d48301bfc627d8e

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Robin Murphy

On 24/07/2019 17:42, Lendacky, Thomas wrote:

On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:

On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:

@@ -351,6 +355,32 @@ bool sev_active(void)
  }
  EXPORT_SYMBOL(sev_active);
  
+/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */

+bool force_dma_unencrypted(struct device *dev)
+{
+   /*
+* For SEV, all DMA must be to unencrypted addresses.
+*/
+   if (sev_active())
+   return true;
+
+   /*
+* For SME, all DMA must be to unencrypted addresses if the
+* device does not support DMA to addresses that include the
+* encryption mask.
+*/
+   if (sme_active()) {
+   u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
+   u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
+   dev->bus_dma_mask);
+
+   if (dma_dev_mask <= dma_enc_mask)
+   return true;


Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
means that device mask is wide enough to cover encryption bit, doesn't it?


Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's say
that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
will generate a mask without bit 47 set (0x7fff). So the check
will catch anything that does not support at least 48-bit DMA.


Couldn't that be expressed as just:

if (sme_me_mask & dma_dev_mask == sme_me_mask)

?

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Lendacky, Thomas
On 7/24/19 10:55 AM, Kirill A. Shutemov wrote:
> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
>> @@ -351,6 +355,32 @@ bool sev_active(void)
>>  }
>>  EXPORT_SYMBOL(sev_active);
>>  
>> +/* Override for DMA direct allocation check - 
>> ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>> +bool force_dma_unencrypted(struct device *dev)
>> +{
>> +/*
>> + * For SEV, all DMA must be to unencrypted addresses.
>> + */
>> +if (sev_active())
>> +return true;
>> +
>> +/*
>> + * For SME, all DMA must be to unencrypted addresses if the
>> + * device does not support DMA to addresses that include the
>> + * encryption mask.
>> + */
>> +if (sme_active()) {
>> +u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
>> +u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
>> +dev->bus_dma_mask);
>> +
>> +if (dma_dev_mask <= dma_enc_mask)
>> +return true;
> 
> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
> means that device mask is wide enough to cover encryption bit, doesn't it?

Not really...  it's the way DMA_BIT_MASK works vs bit numbering. Let's say
that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47)
will generate a mask without bit 47 set (0x7fff). So the check
will catch anything that does not support at least 48-bit DMA.

Thanks,
Tom

> 
>> +}
>> +
>> +return false;
>> +}
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: add support for drivers that manage iommu explicitly

2019-07-24 Thread Rob Clark
On Wed, Jul 24, 2019 at 3:51 AM Will Deacon  wrote:
>
> On Tue, Jul 23, 2019 at 10:40:55AM -0700, Rob Clark wrote:
> > On Tue, Jul 23, 2019 at 8:38 AM Will Deacon  wrote:
> > >
> > > On Mon, Jul 22, 2019 at 09:23:48AM -0700, Rob Clark wrote:
> > > > On Mon, Jul 22, 2019 at 8:48 AM Joerg Roedel  wrote:
> > > > >
> > > > > On Mon, Jul 22, 2019 at 08:41:34AM -0700, Rob Clark wrote:
> > > > > > It is set by the driver:
> > > > > >
> > > > > > https://patchwork.freedesktop.org/patch/315291/
> > > > > >
> > > > > > (This doesn't really belong in devicetree, since it isn't a
> > > > > > description of the hardware, so the driver is really the only place 
> > > > > > to
> > > > > > set this.. which is fine because it is about a detail of how the
> > > > > > driver works.)
> > > > >
> > > > > It is more a detail about how the firmware works. IIUC the problem is
> > > > > that the firmware initializes the context mappings for the GPU and the
> > > > > OS doesn't know anything about that and just overwrites them, causing
> > > > > the firmware GPU driver to fail badly.
> > > > >
> > > > > So I think it is the task of the firmware to tell the OS not to touch
> > > > > the devices mappings until the OS device driver takes over. On x86 
> > > > > there
> > > > > is something similar with the RMRR/unity-map tables from the firmware.
> > > > >
> > > >
> > > > Bjorn had a patchset[1] to inherit the config from firmware/bootloader
> > > > when arm-smmu is probed which handles that part of the problem.  My
> > > > patch is intended to be used on top of his patchset.  This seems to me
> > > > like the best solution, if we don't have control over the firmware.
> > >
> > > Hmm, but the feedback from Robin on the thread you cite was that this 
> > > should
> > > be generalised to look more like RMRR, so there seems to be a clear 
> > > message
> > > here.
> > >
> >
> > Perhaps it is a lack of creativity, or lack of familiarity w/ iommu vs
> > virtualization, but I'm not quite seeing how RMRR would help.. in
> > particular when dealing with both DT and ACPI cases.
>
> Well, I suppose we'd have something for DT and something for ACPI and we'd
> try to make them look similar enough that we don't need lots of divergent
> code in the kernel. The RMRR-like description would describe that, for a
> particular device, a specific virt:phys mapping needs to exist in the
> small window between initialising the SMMU and re-initialising the device
> (GPU).

For both DT and ACPI (or perhaps more accurately UEFI and non-UEFI) we
often don't have much/any control of the firmware.  In the UEFI case,
we can at least get the physical address of the scanout buffer from
EFI GOP (since VA=PA at that point).  In either case we could get the
iova by reading back controller state.  We kinda just need the iommu
driver to not change anything about the context bank the display is
using until the display driver is ready to install it's own
pagetables.

Initially I just want to shut down display, and then bring it back up
w/ my own pagetables.. but eventually, once iommu/clk/genpd issues are
sorted upstream, I'm planning to read out more completely the display
state, and remap the existing scanout buffer at same iova in my own
pagetables/iommu_domain, for a flicker-free display handover... that
is a lot more work but at least it is self contained in the display
(and bridge/panel) drivers.

> I would prefer this to be framebuffer-specific, since we're already in
> flagrant violation of the arm64 boot requirements wrt ongoing DMA and making
> this too general could lead to all sorts of brain damage. That would
> probably also allow us to limit the flexibility, by mandating things like
> alignment and memory attributes.

I'd be pretty happy if we could convince qcom to use EFI GOP on
android devices too..

Although there is a lot more activity these days with people bringing
upstream support to various existing android phones/tablets.. and I'd
like to see that continue without downstream hacks due to lack of
control over firmware.

> Having said that, I just realised I'm still a bit confused about the
> problem: why does the bootloader require SMMU translation for the GPU at
> all? If we could limit this whole thing to be identity-mapped/bypassed,
> then all we need is a per-device flag in the firmware to indicate that the
> SMMU should be initialised to allow passthrough for transactions originating
> from that device.

I was chatting last night w/ Bjorn on IRC.. and he mentioned that it
looked like TTBRn was 0x0.  Which I wasn't expecting, and I didn't
realize was a legit thing.  Maybe the purpose is to allow display to
access memory w/ iova==pa but disallow memory access from other
devices using different context banks of the same iommu?  Maybe this
makes more sense to you?

> > So I kinda prefer, when possible, if arm-smmu can figure out what is going
> > on by looking at the hw state at boot (since that approach would work
> > equally well for DT and ACPI).

Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks

2019-07-24 Thread Kirill A. Shutemov
On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote:
> @@ -351,6 +355,32 @@ bool sev_active(void)
>  }
>  EXPORT_SYMBOL(sev_active);
>  
> +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED 
> */
> +bool force_dma_unencrypted(struct device *dev)
> +{
> + /*
> +  * For SEV, all DMA must be to unencrypted addresses.
> +  */
> + if (sev_active())
> + return true;
> +
> + /*
> +  * For SME, all DMA must be to unencrypted addresses if the
> +  * device does not support DMA to addresses that include the
> +  * encryption mask.
> +  */
> + if (sme_active()) {
> + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask));
> + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask,
> + dev->bus_dma_mask);
> +
> + if (dma_dev_mask <= dma_enc_mask)
> + return true;

Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it
means that device mask is wide enough to cover encryption bit, doesn't it?

> + }
> +
> + return false;
> +}

-- 
 Kirill A. Shutemov


Re: add swiotlb support to arm32

2019-07-24 Thread Christoph Hellwig
FYI, I've tentatively pulled the branch into the dma-mapping for-next
branch so that we get some linux-next exposure.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] dma-mapping fix for 5.3

2019-07-24 Thread Christoph Hellwig
The following changes since commit 7b5cf701ea9c395c792e2a7e3b7caf4c68b87721:

  Merge branch 'sched-urgent-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2019-07-22 09:30:34 
-0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-2

for you to fetch changes up to 06532750010e06dd4b6d69983773677df7fc5291:

  dma-mapping: use dma_get_mask in dma_addressing_limited (2019-07-23 17:43:58 
+0200)


dma-mapping regression fix for 5.3

 - ensure that dma_addressing_limited doesn't crash on devices
   without a dma mask (Eric Auger)


Eric Auger (1):
  dma-mapping: use dma_get_mask in dma_addressing_limited

 include/linux/dma-mapping.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread Will Deacon
On Wed, Jul 24, 2019 at 03:25:07PM +0100, John Garry wrote:
> On 24/07/2019 13:20, Will Deacon wrote:
> > On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote:
> > > On 11/07/2019 18:19, Will Deacon wrote:
> > > > This is a significant rework of the RFC I previously posted here:
> > > > 
> > > >   https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com
> > > > 
> > > > But this time, it looks like it might actually be worthwhile according
> > > > to my perf profiles, where __iommu_unmap() falls a long way down the
> > > > profile for a multi-threaded netperf run. I'm still relying on others to
> > > > confirm this is useful, however.
> > > > 
> > > > Some of the changes since last time are:
> > > > 
> > > >   * Support for constructing and submitting a list of commands in the
> > > > driver
> > > > 
> > > >   * Numerous changes to the IOMMU and io-pgtable APIs so that we can
> > > > submit commands in batches
> > > > 
> > > >   * Removal of cmpxchg() from cmdq_shared_lock() fast-path
> > > > 
> > > >   * Code restructuring and cleanups
> > > > 
> > > > This current applies against my iommu/devel branch that Joerg has pulled
> > > > for 5.3. If you want to test it out, I've put everything here:
> > > > 
> > > >   
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
> > > > 
> > > > Feedback welcome. I appreciate that we're in the merge window, but I
> > > > wanted to get this on the list for people to look at as an RFC.
> > > > 
> > > 
> > > I tested storage performance on this series, which I think is a better
> > > scenario to test than network performance, that being generally limited by
> > > the network link speed.
> > 
> > Interesting, thanks for sharing. Do you also see a similar drop in CPU time
> > to the one reported by Ganapat?
> 
> Not really, CPU load reported by fio is mostly the same.

That's a pity. Maybe the cmdq isn't actually getting hit very heavily by
fio.

> > > Baseline performance (will/iommu/devel, commit 9e6ea59f3)
> > > 8x SAS disks D05  839K IOPS
> > > 1x NVMe D05   454K IOPS
> > > 1x NVMe D06   442k IOPS
> > > 
> > > Patchset performance (will/iommu/cmdq)
> > > 8x SAS disk D05   835K IOPS
> > > 1x NVMe D05   472K IOPS
> > > 1x NVMe D06   459k IOPS
> > > 
> > > So we see a bit of an NVMe boost, but about the same for 8x disks. No 
> > > iommu
> > > performance is about 918K IOPs for 8x disks, so it is not limited by the
> > > medium.
> > 
> > It would be nice to know if this performance gap is because of Linux, or
> > simply because of the translation overhead in the SMMU hardware. Are you
> > able to get a perf profile to see where we're spending time?
> 
> I'll look to do that, but I'd really expect it to be down to the time linux
> spends on the DMA map and unmaps.

Right, and it would be good to see how much of that is in SMMUv3-specific
code. Another interesting thing to try would be reducing the depth of the
io-pgtable. We currently key that off VA_BITS which may be much larger
than you need (by virtue of being a compile-time value).

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread Will Deacon
Hi again, John,

On Wed, Jul 24, 2019 at 09:20:49AM +0100, John Garry wrote:
> On 11/07/2019 18:19, Will Deacon wrote:
> > +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > +  u64 *cmds, int n, bool sync)
> > +{
> > +   u64 cmd_sync[CMDQ_ENT_DWORDS];
> > +   u32 prod;
> > unsigned long flags;
> > -   bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
> > -   struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC };
> > -   int ret;
> > +   bool owner;
> > +   struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> > +   struct arm_smmu_ll_queue llq = {
> > +   .max_n_shift = cmdq->q.llq.max_n_shift,
> > +   }, head = llq;
> > +   int ret = 0;
> > 
> > -   arm_smmu_cmdq_build_cmd(cmd, &ent);
> > +   /* 1. Allocate some space in the queue */
> > +   local_irq_save(flags);
> > +   llq.val = READ_ONCE(cmdq->q.llq.val);
> > +   do {
> > +   u64 old;
> > +
> > +   while (!queue_has_space(&llq, n + sync)) {
> > +   local_irq_restore(flags);
> > +   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
> > +   dev_err_ratelimited(smmu->dev, "CMDQ 
> > timeout\n");
> > +   local_irq_save(flags);
> > +   }
> > +
> > +   head.cons = llq.cons;
> > +   head.prod = queue_inc_prod_n(&llq, n + sync) |
> > +CMDQ_PROD_OWNED_FLAG;
> > +
> > +   old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
> 
> I added some basic debug to the stress test on your branch, and this cmpxchg
> was failing ~10 times on average on my D06.
> 
> So we're not using the spinlock now, but this cmpxchg may lack fairness.

It definitely lacks fairness, although that's going to be the case in many
other places where locking is elided as well. If it shows up as an issue, we
should try to address it, but queueing means serialisation and that largely
defeats the point of this patch. I also don't expect it to be a problem
unless the cmpxchg() is heavily contended, which shouldn't be the case if
we're batching.

> Since we're batching commands, I wonder if it's better to restore the
> spinlock and send batched commands + CMD_SYNC under the lock, and then wait
> for the CMD_SYNC completion outside the lock.

Again, we'd need some numbers, but my concern with that approach is that
we're serialising CPUs which is what I've been trying hard to avoid. It
also doesn't let you straightforwardly remove the cmpxchg() loop, since
the owner clears the OWNED flag and you probably wouldn't want to hold
the lock for that.

> I don't know if it improves the queue contetion, but at least the prod
> pointer would be more closely track the issued commands, such that we're not
> waiting to kick off many gathered batches of commands, while the SMMU HW may
> be idle (in terms of command processing).

Again, probably going to need some numbers to change this, although it
sounds like your other suggestion about having the owner move prod twice
would largely address your concerns. Reintroducing the lock, on the other
hand, feels like a big step backwards to me, and the whole reason I started
down the current route was because of vague claims that the locking was a
problem for large systems.

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Dmitry Osipenko
24.07.2019 17:23, Robin Murphy пишет:
> On 24/07/2019 15:09, Dmitry Osipenko wrote:
>> 24.07.2019 17:03, Yuehaibing пишет:
>>> On 2019/7/24 21:49, Robin Murphy wrote:
 On 24/07/2019 11:30, Sakari Ailus wrote:
> Hi Yue,
>
> On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:
>> If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
>> But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
>> in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but
>> IOMMU_IOVA
>> is m, then the building fails like this:
>>
>> drivers/staging/media/tegra-vde/iommu.o: In function
>> `tegra_vde_iommu_map':
>> iommu.c:(.text+0x41): undefined reference to `alloc_iova'
>> iommu.c:(.text+0x56): undefined reference to `__free_iova'
>>
>> Reported-by: Hulk Robot 
>> Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top
>> level pci device driver")
>> Signed-off-by: YueHaibing 
>> ---
>>    drivers/staging/media/ipu3/Kconfig | 2 +-
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/staging/media/ipu3/Kconfig
>> b/drivers/staging/media/ipu3/Kconfig
>> index 4b51c67..b7df18f 100644
>> --- a/drivers/staging/media/ipu3/Kconfig
>> +++ b/drivers/staging/media/ipu3/Kconfig
>> @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
>>    depends on PCI && VIDEO_V4L2
>>    depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
>>    depends on X86
>> -    select IOMMU_IOVA
>> +    select IOMMU_IOVA if IOMMU_SUPPORT
>
> This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
> independently of IOMMU_SUPPORT.
>
> Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but
> that's not
> declared in its Kconfig entry. I wonder if adding that would be the
> right
> way to fix this.
>
> Cc'ing the IOMMU list.
>> IOMMU_SUPPORT is optional for the Tegra-VDE driver.
>>
 Right, I also had the impression that we'd made the IOVA library
 completely standalone. And what does the IPU3 driver's Kconfig have
 to do with some *other* driver failing to link anyway?
>>
>> I can see it failing if IPU3 is compiled as a loadable module, while
>> Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel
>> module and thus the IOVA symbols will be missing during of linkage of
>> the VDE driver.
>>
>>> Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank
>>> you for clarification.
>>>
>>> I will try to fix this in tegra-vde.
>>
>> Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA
>> library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled.
> 
> Oh, I think I get the problem now - tegra-vde/iommu.c is built
> unconditionally and relies on the static inline stubs for IOMMU and IOVA
> calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for
> other reasons, it then picks up the real declarations from linux/iova.h
> instead of the stubs, and things go downhill from there. So there is a
> real issue, but indeed it's Tegra-VDE which needs to be restructured to
> cope with such configurations, and not IPU3's (or anyone else who may
> select IOVA=m in future) job to work around it.

I guess it could be:

select IOMMU_IOVA if (IOMMU_SUPPORT || COMPILE_TEST)

as a workaround.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-24 Thread Nicolas Saenz Julienne
On Wed, 2019-07-24 at 15:56 +0200, Christoph Hellwig wrote:
> On Wed, Jul 24, 2019 at 02:51:24PM +0100, Catalin Marinas wrote:
> > I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on
> > arm64. ZONE_DMA would be based on the smallest dma-ranges as described
> > in the DT while DMA32 covers the first naturally aligned 4GB of RAM
> > (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded
> > to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as
> > 0-bytes? I don't think GFP_DMA can still allocate memory in this case).
> > 
> > We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something
> > smaller than 32-bit but sufficient to cover the known platforms like
> > RPi4 (the current 24 is too small, so maybe 30). AFAICT,
> > __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32
> > should be passed.
> 
> ARCH_ZONE_DMA_BITS should probably become a variable.  That way we can
> just initialize it to the default 24 bits in kernel/dma/direct.c and
> allow architectures to override it in their early boot code.

Thanks both for your feedback. I'll start preparing a proper series.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread John Garry

On 24/07/2019 13:20, Will Deacon wrote:

On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote:

On 11/07/2019 18:19, Will Deacon wrote:

This is a significant rework of the RFC I previously posted here:

  https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com

But this time, it looks like it might actually be worthwhile according
to my perf profiles, where __iommu_unmap() falls a long way down the
profile for a multi-threaded netperf run. I'm still relying on others to
confirm this is useful, however.

Some of the changes since last time are:

  * Support for constructing and submitting a list of commands in the
driver

  * Numerous changes to the IOMMU and io-pgtable APIs so that we can
submit commands in batches

  * Removal of cmpxchg() from cmdq_shared_lock() fast-path

  * Code restructuring and cleanups

This current applies against my iommu/devel branch that Joerg has pulled
for 5.3. If you want to test it out, I've put everything here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq

Feedback welcome. I appreciate that we're in the merge window, but I
wanted to get this on the list for people to look at as an RFC.



I tested storage performance on this series, which I think is a better
scenario to test than network performance, that being generally limited by
the network link speed.


Interesting, thanks for sharing. Do you also see a similar drop in CPU time
to the one reported by Ganapat?


Not really, CPU load reported by fio is mostly the same.




Baseline performance (will/iommu/devel, commit 9e6ea59f3)
8x SAS disks D05839K IOPS
1x NVMe D05 454K IOPS
1x NVMe D06 442k IOPS

Patchset performance (will/iommu/cmdq)
8x SAS disk D05 835K IOPS
1x NVMe D05 472K IOPS
1x NVMe D06 459k IOPS

So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu
performance is about 918K IOPs for 8x disks, so it is not limited by the
medium.


It would be nice to know if this performance gap is because of Linux, or
simply because of the translation overhead in the SMMU hardware. Are you
able to get a perf profile to see where we're spending time?


I'll look to do that, but I'd really expect it to be down to the time 
linux spends on the DMA map and unmaps.


Cheers,
john



Thanks,

Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Robin Murphy

On 24/07/2019 15:09, Dmitry Osipenko wrote:

24.07.2019 17:03, Yuehaibing пишет:

On 2019/7/24 21:49, Robin Murphy wrote:

On 24/07/2019 11:30, Sakari Ailus wrote:

Hi Yue,

On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:

If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
is m, then the building fails like this:

drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'

Reported-by: Hulk Robot 
Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device 
driver")
Signed-off-by: YueHaibing 
---
   drivers/staging/media/ipu3/Kconfig | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/Kconfig 
b/drivers/staging/media/ipu3/Kconfig
index 4b51c67..b7df18f 100644
--- a/drivers/staging/media/ipu3/Kconfig
+++ b/drivers/staging/media/ipu3/Kconfig
@@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
   depends on PCI && VIDEO_V4L2
   depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
   depends on X86
-select IOMMU_IOVA
+select IOMMU_IOVA if IOMMU_SUPPORT


This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
independently of IOMMU_SUPPORT.

Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
declared in its Kconfig entry. I wonder if adding that would be the right
way to fix this.

Cc'ing the IOMMU list.

IOMMU_SUPPORT is optional for the Tegra-VDE driver.


Right, I also had the impression that we'd made the IOVA library completely 
standalone. And what does the IPU3 driver's Kconfig have to do with some 
*other* driver failing to link anyway?


I can see it failing if IPU3 is compiled as a loadable module, while
Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel
module and thus the IOVA symbols will be missing during of linkage of
the VDE driver.


Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for 
clarification.

I will try to fix this in tegra-vde.


Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA
library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled.


Oh, I think I get the problem now - tegra-vde/iommu.c is built 
unconditionally and relies on the static inline stubs for IOMMU and IOVA 
calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for 
other reasons, it then picks up the real declarations from linux/iova.h 
instead of the stubs, and things go downhill from there. So there is a 
real issue, but indeed it's Tegra-VDE which needs to be restructured to 
cope with such configurations, and not IPU3's (or anyone else who may 
select IOVA=m in future) job to work around it.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Dmitry Osipenko
24.07.2019 17:03, Yuehaibing пишет:
> On 2019/7/24 21:49, Robin Murphy wrote:
>> On 24/07/2019 11:30, Sakari Ailus wrote:
>>> Hi Yue,
>>>
>>> On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:
 If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
 But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
 in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
 is m, then the building fails like this:

 drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
 iommu.c:(.text+0x41): undefined reference to `alloc_iova'
 iommu.c:(.text+0x56): undefined reference to `__free_iova'

 Reported-by: Hulk Robot 
 Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci 
 device driver")
 Signed-off-by: YueHaibing 
 ---
   drivers/staging/media/ipu3/Kconfig | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/drivers/staging/media/ipu3/Kconfig 
 b/drivers/staging/media/ipu3/Kconfig
 index 4b51c67..b7df18f 100644
 --- a/drivers/staging/media/ipu3/Kconfig
 +++ b/drivers/staging/media/ipu3/Kconfig
 @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
   depends on PCI && VIDEO_V4L2
   depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
   depends on X86
 -select IOMMU_IOVA
 +select IOMMU_IOVA if IOMMU_SUPPORT
>>>
>>> This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
>>> independently of IOMMU_SUPPORT.
>>>
>>> Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
>>> declared in its Kconfig entry. I wonder if adding that would be the right
>>> way to fix this.
>>>
>>> Cc'ing the IOMMU list.
IOMMU_SUPPORT is optional for the Tegra-VDE driver.

>> Right, I also had the impression that we'd made the IOVA library completely 
>> standalone. And what does the IPU3 driver's Kconfig have to do with some 
>> *other* driver failing to link anyway?

I can see it failing if IPU3 is compiled as a loadable module, while
Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel
module and thus the IOVA symbols will be missing during of linkage of
the VDE driver.

> Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for 
> clarification.
> 
> I will try to fix this in tegra-vde.

Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA
library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled.


Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread Will Deacon
On Wed, Jul 24, 2019 at 03:03:20PM +0100, John Garry wrote:
> On 24/07/2019 13:15, Will Deacon wrote:
> > > Could it be a minor optimisation to advance the HW producer pointer at 
> > > this
> > > stage for the owner only? We know that its entries are written, and it
> > > should be first in the new batch of commands (right?), so we could advance
> > > the pointer to at least get the HW started.
> > 
> > I think that would be a valid thing to do, but it depends on the relative
> > cost of writing to prod compared to how long we're likely to wait. Given
> > that everybody has irqs disabled when writing out their commands, I wouldn't
> > expect the waiting to be a big issue,
> 
> For sure, but I'm thinking of the possible scenario where the the guy(s)
> we're waiting on have many more commands. Or they just joined the current
> gathering quite late, just prior to clearing the owner flag.

Understood, but a "cacheable" memcpy (assuming the SMMU is coherent) should
be pretty quick, even for maximum batch size I think.

>  although we could probably optimise
> > arm_smmu_cmdq_write_entries() into a memcpy() if we needed to.
> > 
> > In other words, I think we need numbers to justify that change.
> 
> Anyway, this is quite minor, and I will see if the change could be justified
> by numbers.

Thanks! If the numbers show it's useful, we can definitely add it.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Yuehaibing
On 2019/7/24 21:49, Robin Murphy wrote:
> On 24/07/2019 11:30, Sakari Ailus wrote:
>> Hi Yue,
>>
>> On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:
>>> If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
>>> But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
>>> in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
>>> is m, then the building fails like this:
>>>
>>> drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
>>> iommu.c:(.text+0x41): undefined reference to `alloc_iova'
>>> iommu.c:(.text+0x56): undefined reference to `__free_iova'
>>>
>>> Reported-by: Hulk Robot 
>>> Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci 
>>> device driver")
>>> Signed-off-by: YueHaibing 
>>> ---
>>>   drivers/staging/media/ipu3/Kconfig | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/staging/media/ipu3/Kconfig 
>>> b/drivers/staging/media/ipu3/Kconfig
>>> index 4b51c67..b7df18f 100644
>>> --- a/drivers/staging/media/ipu3/Kconfig
>>> +++ b/drivers/staging/media/ipu3/Kconfig
>>> @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
>>>   depends on PCI && VIDEO_V4L2
>>>   depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
>>>   depends on X86
>>> -select IOMMU_IOVA
>>> +select IOMMU_IOVA if IOMMU_SUPPORT
>>
>> This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
>> independently of IOMMU_SUPPORT.
>>
>> Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
>> declared in its Kconfig entry. I wonder if adding that would be the right
>> way to fix this.
>>
>> Cc'ing the IOMMU list.
> 
> Right, I also had the impression that we'd made the IOVA library completely 
> standalone. And what does the IPU3 driver's Kconfig have to do with some 
> *other* driver failing to link anyway?

Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for 
clarification.

I will try to fix this in tegra-vde.

> 
> Robin.
> 
>>
>>>   select VIDEOBUF2_DMA_SG
>>>   help
>>> This is the Video4Linux2 driver for Intel IPU3 image processing 
>>> unit,
>>
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread John Garry

On 24/07/2019 13:15, Will Deacon wrote:

Hi John,

Thanks for reading the code!

On Fri, Jul 19, 2019 at 12:04:15PM +0100, John Garry wrote:

+static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+  u64 *cmds, int n, bool sync)
+{
+   u64 cmd_sync[CMDQ_ENT_DWORDS];
+   u32 prod;
unsigned long flags;
-   bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
-   struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC };
-   int ret;
+   bool owner;
+   struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+   struct arm_smmu_ll_queue llq = {
+   .max_n_shift = cmdq->q.llq.max_n_shift,
+   }, head = llq;
+   int ret = 0;

-   arm_smmu_cmdq_build_cmd(cmd, &ent);
+   /* 1. Allocate some space in the queue */
+   local_irq_save(flags);
+   llq.val = READ_ONCE(cmdq->q.llq.val);
+   do {
+   u64 old;
+
+   while (!queue_has_space(&llq, n + sync)) {
+   local_irq_restore(flags);
+   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
+   dev_err_ratelimited(smmu->dev, "CMDQ 
timeout\n");
+   local_irq_save(flags);
+   }
+
+   head.cons = llq.cons;
+   head.prod = queue_inc_prod_n(&llq, n + sync) |
+CMDQ_PROD_OWNED_FLAG;
+
+   old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
+   if (old == llq.val)
+   break;
+
+   llq.val = old;
+   } while (1);
+   owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
+
+   /*
+* 2. Write our commands into the queue
+* Dependency ordering from the cmpxchg() loop above.
+*/
+   arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n);
+   if (sync) {
+   prod = queue_inc_prod_n(&llq, n);
+   arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod);
+   queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS);
+
+   /*
+* In order to determine completion of our CMD_SYNC, we must
+* ensure that the queue can't wrap twice without us noticing.
+* We achieve that by taking the cmdq lock as shared before
+* marking our slot as valid.
+*/
+   arm_smmu_cmdq_shared_lock(cmdq);
+   }
+
+   /* 3. Mark our slots as valid, ensuring commands are visible first */
+   dma_wmb();
+   prod = queue_inc_prod_n(&llq, n + sync);
+   arm_smmu_cmdq_set_valid_map(cmdq, llq.prod, prod);
+
+   /* 4. If we are the owner, take control of the SMMU hardware */
+   if (owner) {
+   /* a. Wait for previous owner to finish */
+   atomic_cond_read_relaxed(&cmdq->owner_prod, VAL == llq.prod);
+
+   /* b. Stop gathering work by clearing the owned flag */
+   prod = atomic_fetch_andnot_relaxed(CMDQ_PROD_OWNED_FLAG,
+  &cmdq->q.llq.atomic.prod);
+   prod &= ~CMDQ_PROD_OWNED_FLAG;
+   head.prod &= ~CMDQ_PROD_OWNED_FLAG;
+


Could it be a minor optimisation to advance the HW producer pointer at this
stage for the owner only? We know that its entries are written, and it
should be first in the new batch of commands (right?), so we could advance
the pointer to at least get the HW started.


I think that would be a valid thing to do, but it depends on the relative
cost of writing to prod compared to how long we're likely to wait. Given
that everybody has irqs disabled when writing out their commands, I wouldn't
expect the waiting to be a big issue,


For sure, but I'm thinking of the possible scenario where the the guy(s) 
we're waiting on have many more commands. Or they just joined the 
current gathering quite late, just prior to clearing the owner flag.


 although we could probably optimise

arm_smmu_cmdq_write_entries() into a memcpy() if we needed to.

In other words, I think we need numbers to justify that change.


Anyway, this is quite minor, and I will see if the change could be 
justified by numbers.


Cheers,
John



Thanks,

Will

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-24 Thread Christoph Hellwig
On Wed, Jul 24, 2019 at 02:51:24PM +0100, Catalin Marinas wrote:
> I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on
> arm64. ZONE_DMA would be based on the smallest dma-ranges as described
> in the DT while DMA32 covers the first naturally aligned 4GB of RAM
> (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded
> to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as
> 0-bytes? I don't think GFP_DMA can still allocate memory in this case).
> 
> We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something
> smaller than 32-bit but sufficient to cover the known platforms like
> RPi4 (the current 24 is too small, so maybe 30). AFAICT,
> __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32
> should be passed.

ARCH_ZONE_DMA_BITS should probably become a variable.  That way we can
just initialize it to the default 24 bits in kernel/dma/direct.c and
allow architectures to override it in their early boot code.


Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-24 Thread Catalin Marinas
On Fri, Jul 19, 2019 at 03:08:52PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2019-07-18 at 13:18 +0200, Nicolas Saenz Julienne wrote:
> > On Thu, 2019-07-18 at 11:15 +0200, Christoph Hellwig wrote:
> > > On Wed, Jul 17, 2019 at 05:31:34PM +0200, Nicolas Saenz Julienne wrote:
> > > > Historically devices with ZONE_DMA32 have been assumed to be able to
> > > > address at least the lower 4GB of ram for DMA. This is still the defualt
> > > > behavior yet the Raspberry Pi 4 is limited to the first GB of memory.
> > > > This has been observed to trigger failures in dma_direct_supported() as
> > > > the 'min_mask' isn't properly set.
> > > > 
> > > > We create 'dma_direct_min_mask' in order for the arch init code to be
> > > > able to fine-tune dma direct's 'min_dma' mask.
> > > 
> > > Normally we use ZONE_DMA for that case.
> > 
> > Fair enough, I didn't think of that possibility.
> > 
> > So would the arm64 maintainers be happy with something like this:
> > 
> > - ZONE_DMA: Follows standard definition, 16MB in size. ARCH_ZONE_DMA_BITS is
> > left as is.
> > - ZONE_DMA32: Will honor the most constraining 'dma-ranges'. Which so far 
> > for
> >   most devices is 4G, except for RPi4.
> > - ZONE_NORMAL: The rest of the memory.
> 
> Never mind this suggestion, I don't think it makes any sense. If anything 
> arm64
> seems to fit the ZONE_DMA usage pattern of arm and powerpc: where ZONE_DMA's
> size is decided based on ram size and/or board configuration. It was actually
> set-up like this until Christoph's ad67f5a6545f7 ("arm64: replace ZONE_DMA 
> with
> ZONE_DMA32").
> 
> So the easy solution would be to simply revert that commit. On one hand I feel
> it would be a step backwards as most 64 bit architectures have been moving to
> use ZONE_DMA32. On the other, current ZONE_DMA32 usage seems to be heavily
> rooted on having a 32 bit DMA mask*, which will no longer be the case on arm64
> if we want to support the RPi 4.
> 
> So the way I see it and lacking a better solution, the argument is stronger on
> moving back arm64 to using ZONE_DMA. Any comments/opinions?

As it was suggested in this or the previous thread, I'm not keen on
limiting ZONE_DMA32 to the smalles RPi4 can cover, as the naming
implies this zone should cover 32-bit devices that can deal with a full
32-bit mask.

I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on
arm64. ZONE_DMA would be based on the smallest dma-ranges as described
in the DT while DMA32 covers the first naturally aligned 4GB of RAM
(unchanged). When a smaller ZONE_DMA is not needed, it could be expanded
to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as
0-bytes? I don't think GFP_DMA can still allocate memory in this case).

We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something
smaller than 32-bit but sufficient to cover the known platforms like
RPi4 (the current 24 is too small, so maybe 30). AFAICT,
__dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32
should be passed.

-- 
Catalin


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Robin Murphy

On 24/07/2019 11:30, Sakari Ailus wrote:

Hi Yue,

On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:

If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
is m, then the building fails like this:

drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'

Reported-by: Hulk Robot 
Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device 
driver")
Signed-off-by: YueHaibing 
---
  drivers/staging/media/ipu3/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/Kconfig 
b/drivers/staging/media/ipu3/Kconfig
index 4b51c67..b7df18f 100644
--- a/drivers/staging/media/ipu3/Kconfig
+++ b/drivers/staging/media/ipu3/Kconfig
@@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
depends on PCI && VIDEO_V4L2
depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
depends on X86
-   select IOMMU_IOVA
+   select IOMMU_IOVA if IOMMU_SUPPORT


This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
independently of IOMMU_SUPPORT.

Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
declared in its Kconfig entry. I wonder if adding that would be the right
way to fix this.

Cc'ing the IOMMU list.


Right, I also had the impression that we'd made the IOVA library 
completely standalone. And what does the IPU3 driver's Kconfig have to 
do with some *other* driver failing to link anyway?


Robin.




select VIDEOBUF2_DMA_SG
help
  This is the Video4Linux2 driver for Intel IPU3 image processing unit,




Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread Will Deacon
On Fri, Jul 19, 2019 at 09:55:39AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Jul 11, 2019 at 10:58 PM Will Deacon  wrote:
> > This is a significant rework of the RFC I previously posted here:
> >
> >   https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com
> >
> > But this time, it looks like it might actually be worthwhile according
> > to my perf profiles, where __iommu_unmap() falls a long way down the
> > profile for a multi-threaded netperf run. I'm still relying on others to
> > confirm this is useful, however.
> >
> > Some of the changes since last time are:
> >
> >   * Support for constructing and submitting a list of commands in the
> > driver
> >
> >   * Numerous changes to the IOMMU and io-pgtable APIs so that we can
> > submit commands in batches
> >
> >   * Removal of cmpxchg() from cmdq_shared_lock() fast-path
> >
> >   * Code restructuring and cleanups
> >
> > This current applies against my iommu/devel branch that Joerg has pulled
> > for 5.3. If you want to test it out, I've put everything here:
> >
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
> >
> > Feedback welcome. I appreciate that we're in the merge window, but I
> > wanted to get this on the list for people to look at as an RFC.
> 
> I have tried branch iommu/cmdq on ThunderX2. I do see there is drastic
> reduction in CPU bandwidth consumption(from 15 to 20% to 1 to 2% in
> perf top) from SMMU CMDQ helper functions, when I run iperf with more
> than 64 clients(-P 64). However I have not noticed any measurable
> performance improvement in iperf results. IMO, this might/should help
> in performance improvement of IO intensive workloads.
> 
> FWIW, you can add,
> Tested-by: Ganapatrao Kulkarni  

Brilliant, thanks. Your measurements reflect mine in that I can saturate
the NICs I have access to regardless of these changes, but the CPU time
is drastically reduced.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread Will Deacon
On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote:
> On 11/07/2019 18:19, Will Deacon wrote:
> > This is a significant rework of the RFC I previously posted here:
> > 
> >   https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com
> > 
> > But this time, it looks like it might actually be worthwhile according
> > to my perf profiles, where __iommu_unmap() falls a long way down the
> > profile for a multi-threaded netperf run. I'm still relying on others to
> > confirm this is useful, however.
> > 
> > Some of the changes since last time are:
> > 
> >   * Support for constructing and submitting a list of commands in the
> > driver
> > 
> >   * Numerous changes to the IOMMU and io-pgtable APIs so that we can
> > submit commands in batches
> > 
> >   * Removal of cmpxchg() from cmdq_shared_lock() fast-path
> > 
> >   * Code restructuring and cleanups
> > 
> > This current applies against my iommu/devel branch that Joerg has pulled
> > for 5.3. If you want to test it out, I've put everything here:
> > 
> >   
> > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq
> > 
> > Feedback welcome. I appreciate that we're in the merge window, but I
> > wanted to get this on the list for people to look at as an RFC.
> > 
> 
> I tested storage performance on this series, which I think is a better
> scenario to test than network performance, that being generally limited by
> the network link speed.

Interesting, thanks for sharing. Do you also see a similar drop in CPU time
to the one reported by Ganapat?

> Baseline performance (will/iommu/devel, commit 9e6ea59f3)
> 8x SAS disks D05  839K IOPS
> 1x NVMe D05   454K IOPS
> 1x NVMe D06   442k IOPS
> 
> Patchset performance (will/iommu/cmdq)
> 8x SAS disk D05   835K IOPS
> 1x NVMe D05   472K IOPS
> 1x NVMe D06   459k IOPS
> 
> So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu
> performance is about 918K IOPs for 8x disks, so it is not limited by the
> medium.

It would be nice to know if this performance gap is because of Linux, or
simply because of the translation overhead in the SMMU hardware. Are you
able to get a perf profile to see where we're spending time?

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread Will Deacon
Hi John,

Thanks for reading the code!

On Fri, Jul 19, 2019 at 12:04:15PM +0100, John Garry wrote:
> > +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > +  u64 *cmds, int n, bool sync)
> > +{
> > +   u64 cmd_sync[CMDQ_ENT_DWORDS];
> > +   u32 prod;
> > unsigned long flags;
> > -   bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
> > -   struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC };
> > -   int ret;
> > +   bool owner;
> > +   struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
> > +   struct arm_smmu_ll_queue llq = {
> > +   .max_n_shift = cmdq->q.llq.max_n_shift,
> > +   }, head = llq;
> > +   int ret = 0;
> > 
> > -   arm_smmu_cmdq_build_cmd(cmd, &ent);
> > +   /* 1. Allocate some space in the queue */
> > +   local_irq_save(flags);
> > +   llq.val = READ_ONCE(cmdq->q.llq.val);
> > +   do {
> > +   u64 old;
> > +
> > +   while (!queue_has_space(&llq, n + sync)) {
> > +   local_irq_restore(flags);
> > +   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
> > +   dev_err_ratelimited(smmu->dev, "CMDQ 
> > timeout\n");
> > +   local_irq_save(flags);
> > +   }
> > +
> > +   head.cons = llq.cons;
> > +   head.prod = queue_inc_prod_n(&llq, n + sync) |
> > +CMDQ_PROD_OWNED_FLAG;
> > +
> > +   old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);
> > +   if (old == llq.val)
> > +   break;
> > +
> > +   llq.val = old;
> > +   } while (1);
> > +   owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG);
> > +
> > +   /*
> > +* 2. Write our commands into the queue
> > +* Dependency ordering from the cmpxchg() loop above.
> > +*/
> > +   arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n);
> > +   if (sync) {
> > +   prod = queue_inc_prod_n(&llq, n);
> > +   arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod);
> > +   queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS);
> > +
> > +   /*
> > +* In order to determine completion of our CMD_SYNC, we must
> > +* ensure that the queue can't wrap twice without us noticing.
> > +* We achieve that by taking the cmdq lock as shared before
> > +* marking our slot as valid.
> > +*/
> > +   arm_smmu_cmdq_shared_lock(cmdq);
> > +   }
> > +
> > +   /* 3. Mark our slots as valid, ensuring commands are visible first */
> > +   dma_wmb();
> > +   prod = queue_inc_prod_n(&llq, n + sync);
> > +   arm_smmu_cmdq_set_valid_map(cmdq, llq.prod, prod);
> > +
> > +   /* 4. If we are the owner, take control of the SMMU hardware */
> > +   if (owner) {
> > +   /* a. Wait for previous owner to finish */
> > +   atomic_cond_read_relaxed(&cmdq->owner_prod, VAL == llq.prod);
> > +
> > +   /* b. Stop gathering work by clearing the owned flag */
> > +   prod = atomic_fetch_andnot_relaxed(CMDQ_PROD_OWNED_FLAG,
> > +  &cmdq->q.llq.atomic.prod);
> > +   prod &= ~CMDQ_PROD_OWNED_FLAG;
> > +   head.prod &= ~CMDQ_PROD_OWNED_FLAG;
> > +
> 
> Could it be a minor optimisation to advance the HW producer pointer at this
> stage for the owner only? We know that its entries are written, and it
> should be first in the new batch of commands (right?), so we could advance
> the pointer to at least get the HW started.

I think that would be a valid thing to do, but it depends on the relative
cost of writing to prod compared to how long we're likely to wait. Given
that everybody has irqs disabled when writing out their commands, I wouldn't
expect the waiting to be a big issue, although we could probably optimise
arm_smmu_cmdq_write_entries() into a memcpy() if we needed to.

In other words, I think we need numbers to justify that change.

Thanks,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu: add support for drivers that manage iommu explicitly

2019-07-24 Thread Will Deacon
On Tue, Jul 23, 2019 at 10:40:55AM -0700, Rob Clark wrote:
> On Tue, Jul 23, 2019 at 8:38 AM Will Deacon  wrote:
> >
> > On Mon, Jul 22, 2019 at 09:23:48AM -0700, Rob Clark wrote:
> > > On Mon, Jul 22, 2019 at 8:48 AM Joerg Roedel  wrote:
> > > >
> > > > On Mon, Jul 22, 2019 at 08:41:34AM -0700, Rob Clark wrote:
> > > > > It is set by the driver:
> > > > >
> > > > > https://patchwork.freedesktop.org/patch/315291/
> > > > >
> > > > > (This doesn't really belong in devicetree, since it isn't a
> > > > > description of the hardware, so the driver is really the only place to
> > > > > set this.. which is fine because it is about a detail of how the
> > > > > driver works.)
> > > >
> > > > It is more a detail about how the firmware works. IIUC the problem is
> > > > that the firmware initializes the context mappings for the GPU and the
> > > > OS doesn't know anything about that and just overwrites them, causing
> > > > the firmware GPU driver to fail badly.
> > > >
> > > > So I think it is the task of the firmware to tell the OS not to touch
> > > > the devices mappings until the OS device driver takes over. On x86 there
> > > > is something similar with the RMRR/unity-map tables from the firmware.
> > > >
> > >
> > > Bjorn had a patchset[1] to inherit the config from firmware/bootloader
> > > when arm-smmu is probed which handles that part of the problem.  My
> > > patch is intended to be used on top of his patchset.  This seems to me
> > > like the best solution, if we don't have control over the firmware.
> >
> > Hmm, but the feedback from Robin on the thread you cite was that this should
> > be generalised to look more like RMRR, so there seems to be a clear message
> > here.
> >
> 
> Perhaps it is a lack of creativity, or lack of familiarity w/ iommu vs
> virtualization, but I'm not quite seeing how RMRR would help.. in
> particular when dealing with both DT and ACPI cases.

Well, I suppose we'd have something for DT and something for ACPI and we'd
try to make them look similar enough that we don't need lots of divergent
code in the kernel. The RMRR-like description would describe that, for a
particular device, a specific virt:phys mapping needs to exist in the
small window between initialising the SMMU and re-initialising the device
(GPU).

I would prefer this to be framebuffer-specific, since we're already in
flagrant violation of the arm64 boot requirements wrt ongoing DMA and making
this too general could lead to all sorts of brain damage. That would
probably also allow us to limit the flexibility, by mandating things like
alignment and memory attributes.

Having said that, I just realised I'm still a bit confused about the
problem: why does the bootloader require SMMU translation for the GPU at
all? If we could limit this whole thing to be identity-mapped/bypassed,
then all we need is a per-device flag in the firmware to indicate that the
SMMU should be initialised to allow passthrough for transactions originating
from that device.

> So I kinda prefer, when possible, if arm-smmu can figure out what is going
> on by looking at the hw state at boot (since that approach would work
> equally well for DT and ACPI).

That's not going to fly.

Forcing Linux to infer the state of the system by effectively parsing the
hardware configuration directly is fragile, error-prone and may not even be
possible in the general case. Worse, if this goes wrong, the symptoms are
very likely to be undiagnosable memory corruption, which is pretty awful in
my opinion.

Not only would you need separate parsing code for every IOMMU out there,
but you'd also need to make Linux aware of device aspects that it otherwise
doesn't care about, just in case the firmware decided to use them.
Furthermore, running an older kernel on newer hardware (which may have some
extensions), could cause the parsing to silently ignore parts of the device
that indicate memory regions which are in use. On top of that, there made be
device-global state that we are unable to reconfigure and that affect
devices other than the ones in question.

So no, I'm very much against this approach and the solution absolutely needs
to come in the form of a more abstract description from firmware.

> I *think* (but need to confirm if Bjorn hasn't already) that the
> memory for the pagetables that firmware/bootloader sets up is already
> removed from the memory map efi passes to kernel, so we don't need to
> worry about kernel stomping in-use pagetables.

It's precisely this sort of fragility that makes me nervous about this whole
approach.

Will


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Sakari Ailus
Hi Yue,

On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:
> If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
> But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
> in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
> is m, then the building fails like this:
> 
> drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
> iommu.c:(.text+0x41): undefined reference to `alloc_iova'
> iommu.c:(.text+0x56): undefined reference to `__free_iova'
> 
> Reported-by: Hulk Robot 
> Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci 
> device driver")
> Signed-off-by: YueHaibing 
> ---
>  drivers/staging/media/ipu3/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/media/ipu3/Kconfig 
> b/drivers/staging/media/ipu3/Kconfig
> index 4b51c67..b7df18f 100644
> --- a/drivers/staging/media/ipu3/Kconfig
> +++ b/drivers/staging/media/ipu3/Kconfig
> @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
>   depends on PCI && VIDEO_V4L2
>   depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
>   depends on X86
> - select IOMMU_IOVA
> + select IOMMU_IOVA if IOMMU_SUPPORT

This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
independently of IOMMU_SUPPORT.

Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
declared in its Kconfig entry. I wonder if adding that would be the right
way to fix this.

Cc'ing the IOMMU list.

>   select VIDEOBUF2_DMA_SG
>   help
> This is the Video4Linux2 driver for Intel IPU3 image processing unit,

-- 
Regards,

Sakari Ailus
sakari.ai...@linux.intel.com


Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue

2019-07-24 Thread John Garry

On 11/07/2019 18:19, Will Deacon wrote:

Hi everyone,

This is a significant rework of the RFC I previously posted here:

  https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com

But this time, it looks like it might actually be worthwhile according
to my perf profiles, where __iommu_unmap() falls a long way down the
profile for a multi-threaded netperf run. I'm still relying on others to
confirm this is useful, however.

Some of the changes since last time are:

  * Support for constructing and submitting a list of commands in the
driver

  * Numerous changes to the IOMMU and io-pgtable APIs so that we can
submit commands in batches

  * Removal of cmpxchg() from cmdq_shared_lock() fast-path

  * Code restructuring and cleanups

This current applies against my iommu/devel branch that Joerg has pulled
for 5.3. If you want to test it out, I've put everything here:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq

Feedback welcome. I appreciate that we're in the merge window, but I
wanted to get this on the list for people to look at as an RFC.



I tested storage performance on this series, which I think is a better 
scenario to test than network performance, that being generally limited 
by the network link speed.


Results:

Baseline performance (will/iommu/devel, commit 9e6ea59f3)
8x SAS disks D05839K IOPS
1x NVMe D05 454K IOPS
1x NVMe D06 442k IOPS

Patchset performance (will/iommu/cmdq)
8x SAS disk D05 835K IOPS
1x NVMe D05 472K IOPS
1x NVMe D06 459k IOPS

So we see a bit of an NVMe boost, but about the same for 8x disks. No 
iommu performance is about 918K IOPs for 8x disks, so it is not limited 
by the medium.


The D06 is a bit memory starved, so that may account for generally lower 
NVMe performance.


John


Cheers,

Will

--->8

Cc: Jean-Philippe Brucker 
Cc: Robin Murphy 
Cc: Jayachandran Chandrasekharan Nair 
Cc: Jan Glauber 
Cc: Jon Masters 
Cc: Eric Auger 
Cc: Zhen Lei 
Cc: Jonathan Cameron 
Cc: Vijay Kilary 
Cc: Joerg Roedel 
Cc: John Garry 
Cc: Alex Williamson 

Will Deacon (19):
  iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops
  iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync()
  iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops
  iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
  iommu: Introduce iommu_iotlb_gather_add_page()
  iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync()
  iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf()
  iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in
drivers
  iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf()
  iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page()
  iommu/io-pgtable: Remove unused ->tlb_sync() callback
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap()
  iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page()
  iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes
  iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro
  iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue
  iommu/arm-smmu-v3: Operate directly on low-level queue where possible
  iommu/arm-smmu-v3: Reduce contention during command-queue insertion
  iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync()

 drivers/gpu/drm/panfrost/panfrost_mmu.c |  24 +-
 drivers/iommu/amd_iommu.c   |  11 +-
 drivers/iommu/arm-smmu-v3.c | 856 
 drivers/iommu/arm-smmu.c| 103 +++-
 drivers/iommu/dma-iommu.c   |   9 +-
 drivers/iommu/exynos-iommu.c|   3 +-
 drivers/iommu/intel-iommu.c |   3 +-
 drivers/iommu/io-pgtable-arm-v7s.c  |  57 +--
 drivers/iommu/io-pgtable-arm.c  |  48 +-
 drivers/iommu/iommu.c   |  24 +-
 drivers/iommu/ipmmu-vmsa.c  |  28 +-
 drivers/iommu/msm_iommu.c   |  42 +-
 drivers/iommu/mtk_iommu.c   |  45 +-
 drivers/iommu/mtk_iommu_v1.c|   3 +-
 drivers/iommu/omap-iommu.c  |   2 +-
 drivers/iommu/qcom_iommu.c  |  44 +-
 drivers/iommu/rockchip-iommu.c  |   2 +-
 drivers/iommu/s390-iommu.c  |   3 +-
 drivers/iommu/tegra-gart.c  |  12 +-
 drivers/iommu/tegra-smmu.c  |   2 +-
 drivers/vfio/vfio_iommu_type1.c |  27 +-
 include/linux/io-pgtable.h  |  57 ++-
 include/linux/iommu.h   |  92 +++-
 23 files changed, 1090 insertions(+), 407 deletions(-)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-07-24 Thread John Garry

On 11/07/2019 18:19, Will Deacon wrote:

+static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+  u64 *cmds, int n, bool sync)
+{
+   u64 cmd_sync[CMDQ_ENT_DWORDS];
+   u32 prod;
unsigned long flags;
-   bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV);
-   struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC };
-   int ret;
+   bool owner;
+   struct arm_smmu_cmdq *cmdq = &smmu->cmdq;
+   struct arm_smmu_ll_queue llq = {
+   .max_n_shift = cmdq->q.llq.max_n_shift,
+   }, head = llq;
+   int ret = 0;

-   arm_smmu_cmdq_build_cmd(cmd, &ent);
+   /* 1. Allocate some space in the queue */
+   local_irq_save(flags);
+   llq.val = READ_ONCE(cmdq->q.llq.val);
+   do {
+   u64 old;
+
+   while (!queue_has_space(&llq, n + sync)) {
+   local_irq_restore(flags);
+   if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq))
+   dev_err_ratelimited(smmu->dev, "CMDQ 
timeout\n");
+   local_irq_save(flags);
+   }
+
+   head.cons = llq.cons;
+   head.prod = queue_inc_prod_n(&llq, n + sync) |
+CMDQ_PROD_OWNED_FLAG;
+
+   old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);


I added some basic debug to the stress test on your branch, and this 
cmpxchg was failing ~10 times on average on my D06.


So we're not using the spinlock now, but this cmpxchg may lack fairness.

Since we're batching commands, I wonder if it's better to restore the 
spinlock and send batched commands + CMD_SYNC under the lock, and then 
wait for the CMD_SYNC completion outside the lock.


I don't know if it improves the queue contetion, but at least the prod 
pointer would be more closely track the issued commands, such that we're 
not waiting to kick off many gathered batches of commands, while the 
SMMU HW may be idle (in terms of command processing).


Cheers,
John


+   if (old == llq.val)
+   break;
+
+   llq.val = old;
+   } while (1);
+   owner = !(llq.prod & CMDQ_PROD_OWNED_F



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes

2019-07-24 Thread Will Deacon
Hi Joerg,

On Wed, Jul 24, 2019 at 09:19:59AM +0200, Joerg Roedel wrote:
> On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote:
> >  static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t 
> > dma_addr,
> > size_t size)
> >  {
> > +   struct iommu_iotlb_gather iotlb_gather;
> > struct iommu_dma_cookie *cookie = domain->iova_cookie;
> > struct iova_domain *iovad = &cookie->iovad;
> > size_t iova_off = iova_offset(iovad, dma_addr);
> > +   size_t unmapped;
> >  
> > dma_addr -= iova_off;
> > size = iova_align(iovad, size + iova_off);
> > +   iommu_iotlb_gather_init(&iotlb_gather);
> > +
> > +   unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather);
> > +   WARN_ON(unmapped != size);
> >  
> > -   WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
> > if (!cookie->fq_domain)
> > -   iommu_tlb_sync(domain);
> > +   iommu_tlb_sync(domain, &iotlb_gather);
> > iommu_dma_free_iova(cookie, dma_addr, size);
> 
> I looked through your patches and was wondering if we can't make the
> 'struct iotlb_gather' a member of 'struct iommu_domain' and update it
> transparently for the user? That would make things easier for users of
> the iommu-api.

Thanks for having a look.

My first (private) attempt at this series did exactly what you suggest, but
the problem there is that you can have concurrent, disjoint map()/unmap()
operations on the same 'struct iommu_domain', so you end up needing either
multiple 'iotlb_gather' structures to support these callers or you end up
massively over-invalidating. Worse, you can't just make things per-cpu
without disabling preemption, so whatever you do you end up re-introducing
synchronization to maintain these things. The /huge/ advantage of getting
the caller to allocate the 'iotlb_gather' structure on their stack is that
you don't have to worry about object lifetime at all, so all of the
synchronization disappears. There is also precedent for this when unmapping
things on the CPU side -- see the use of 'struct mmu_gather' and
tlb_gather_mmu() in unmap_region() [mm/mmap.c], for example.

There aren't tonnes of (direct) users of the iommu-api, and the additional
complexity introduced by the 'struct iotlb_gather' only applies to users of
iommu_unmap_fast(), which I think is a reasonable trade-off in return for
the potential for improved performance.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes

2019-07-24 Thread Joerg Roedel
Hi Will,

On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote:
>  static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t 
> dma_addr,
>   size_t size)
>  {
> + struct iommu_iotlb_gather iotlb_gather;
>   struct iommu_dma_cookie *cookie = domain->iova_cookie;
>   struct iova_domain *iovad = &cookie->iovad;
>   size_t iova_off = iova_offset(iovad, dma_addr);
> + size_t unmapped;
>  
>   dma_addr -= iova_off;
>   size = iova_align(iovad, size + iova_off);
> + iommu_iotlb_gather_init(&iotlb_gather);
> +
> + unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather);
> + WARN_ON(unmapped != size);
>  
> - WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
>   if (!cookie->fq_domain)
> - iommu_tlb_sync(domain);
> + iommu_tlb_sync(domain, &iotlb_gather);
>   iommu_dma_free_iova(cookie, dma_addr, size);

I looked through your patches and was wondering if we can't make the
'struct iotlb_gather' a member of 'struct iommu_domain' and update it
transparently for the user? That would make things easier for users of
the iommu-api.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu