Re: [PATCH v8 4/5] block: add a helper function to merge the segments
On Tue, Jul 23, 2019 at 02:26:47PM +0900, Yoshihiro Shimoda wrote: > This patch adds a helper function whether a queue can merge > the segments by the DMA MAP layer (e.g. via IOMMU). > > Signed-off-by: Yoshihiro Shimoda > Reviewed-by: Christoph Hellwig Reviewed-by: Simon Horman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 2/5] iommu/dma: Add a new dma_map_ops of get_merge_boundary()
On Tue, Jul 23, 2019 at 11:17:19AM +0300, Sergei Shtylyov wrote: > Hello! > > On 23.07.2019 8:26, Yoshihiro Shimoda wrote: > > > This patch adds a new dma_map_ops of get_merge_boundary() to > > expose the DMA merge boundary if the domain type is IOMMU_DOMAIN_DMA. > > > > Signed-off-by: Yoshihiro Shimoda Sergei's comment notwithstanding, Reviewed-by: Simon Horman > > --- > > drivers/iommu/dma-iommu.c | 11 +++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index a7f9c3e..f3e5f2b 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -1085,6 +1085,16 @@ static int iommu_dma_get_sgtable(struct device *dev, > > struct sg_table *sgt, > > return ret; > > } > > +static unsigned long iommu_dma_get_merge_boundary(struct device *dev) > > +{ > > + struct iommu_domain *domain = iommu_get_dma_domain(dev); > > + > > + if (domain->type != IOMMU_DOMAIN_DMA) > > + return 0; /* can't merge */ > > + > > + return (1 << __ffs(domain->pgsize_bitmap)) - 1; > >Not 1UL? > > > +} > > + > > static const struct dma_map_ops iommu_dma_ops = { > > .alloc = iommu_dma_alloc, > > .free = iommu_dma_free, > [...] > > MBR, Sergei > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 5/5] mmc: queue: Use bigger segments if DMA MAP layer can merge the segments
On Tue, Jul 23, 2019 at 02:26:48PM +0900, Yoshihiro Shimoda wrote: > When the max_segs of a mmc host is smaller than 512, the mmc > subsystem tries to use 512 segments if DMA MAP layer can merge > the segments, and then the mmc subsystem exposes such information > to the block layer by using blk_queue_can_use_dma_map_merging(). > > Signed-off-by: Yoshihiro Shimoda > Reviewed-by: Christoph Hellwig > Reviewed-by: Ulf Hansson Reviewed-by: Simon Horman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 3/5] block: sort headers on blk-setting.c
On Tue, Jul 23, 2019 at 02:26:46PM +0900, Yoshihiro Shimoda wrote: > This patch sorts the headers in alphabetic order to ease > the maintenance for this part. > > Signed-off-by: Yoshihiro Shimoda > Reviewed-by: Wolfram Sang Reviewed-by: Simon Horman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 1/5] dma: Introduce dma_get_merge_boundary()
On Tue, Jul 23, 2019 at 02:26:44PM +0900, Yoshihiro Shimoda wrote: > This patch adds a new DMA API "dma_get_merge_boundary". This function > returns the DMA merge boundary if the DMA layer can merge the segments. > This patch also adds the implementation for a new dma_map_ops pointer. > > Signed-off-by: Yoshihiro Shimoda > Reviewed-by: Christoph Hellwig Reviewed-by: Simon Horman ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/5] dma-mapping: provide a better default ->get_required_mask
Most dma_map_ops instances are IOMMUs that work perfectly fine in 32-bits of IOVA space, and the generic direct mapping code already provides its own routines that is intelligent based on the amount of memory actually present. Wire up the dma-direct routine for the ARM direct mapping code as well, and otherwise default to the constant 32-bit mask. This way we only need to override it for the occasional odd IOMMU that requires 64-bit IOVA support, or IOMMU drivers that are more efficient if they can fall back to the direct mapping. Signed-off-by: Christoph Hellwig --- arch/arm/mm/dma-mapping.c | 3 +++ arch/powerpc/platforms/ps3/system-bus.c | 7 -- arch/x86/kernel/amd_gart_64.c | 1 + kernel/dma/mapping.c| 30 + 4 files changed, 14 insertions(+), 27 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 4410af33c5c4..9c9a23e5600d 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -192,6 +193,7 @@ const struct dma_map_ops arm_dma_ops = { .sync_sg_for_cpu= arm_dma_sync_sg_for_cpu, .sync_sg_for_device = arm_dma_sync_sg_for_device, .dma_supported = arm_dma_supported, + .get_required_mask = dma_direct_get_required_mask, }; EXPORT_SYMBOL(arm_dma_ops); @@ -212,6 +214,7 @@ const struct dma_map_ops arm_coherent_dma_ops = { .map_sg = arm_dma_map_sg, .map_resource = dma_direct_map_resource, .dma_supported = arm_dma_supported, + .get_required_mask = dma_direct_get_required_mask, }; EXPORT_SYMBOL(arm_coherent_dma_ops); diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c index 70fcc9736a8c..3542b7bd6a46 100644 --- a/arch/powerpc/platforms/ps3/system-bus.c +++ b/arch/powerpc/platforms/ps3/system-bus.c @@ -686,18 +686,12 @@ static int ps3_dma_supported(struct device *_dev, u64 mask) return mask >= DMA_BIT_MASK(32); } -static u64 ps3_dma_get_required_mask(struct device *_dev) -{ - return DMA_BIT_MASK(32); -} - static const struct dma_map_ops ps3_sb_dma_ops = { .alloc = ps3_alloc_coherent, .free = ps3_free_coherent, .map_sg = ps3_sb_map_sg, .unmap_sg = ps3_sb_unmap_sg, .dma_supported = ps3_dma_supported, - .get_required_mask = ps3_dma_get_required_mask, .map_page = ps3_sb_map_page, .unmap_page = ps3_unmap_page, .mmap = dma_common_mmap, @@ -710,7 +704,6 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = { .map_sg = ps3_ioc0_map_sg, .unmap_sg = ps3_ioc0_unmap_sg, .dma_supported = ps3_dma_supported, - .get_required_mask = ps3_dma_get_required_mask, .map_page = ps3_ioc0_map_page, .unmap_page = ps3_unmap_page, .mmap = dma_common_mmap, diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index a65b4a9c7f87..d02662238b57 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -680,6 +680,7 @@ static const struct dma_map_ops gart_dma_ops = { .dma_supported = dma_direct_supported, .mmap = dma_common_mmap, .get_sgtable= dma_common_get_sgtable, + .get_required_mask = dma_direct_get_required_mask, }; static void gart_iommu_shutdown(void) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index cdb157cd70e7..7dff1829c8c5 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -231,25 +231,6 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, } EXPORT_SYMBOL(dma_mmap_attrs); -static u64 dma_default_get_required_mask(struct device *dev) -{ - u32 low_totalram = ((max_pfn - 1) << PAGE_SHIFT); - u32 high_totalram = ((max_pfn - 1) >> (32 - PAGE_SHIFT)); - u64 mask; - - if (!high_totalram) { - /* convert to mask just covering totalram */ - low_totalram = (1 << (fls(low_totalram) - 1)); - low_totalram += low_totalram - 1; - mask = low_totalram; - } else { - high_totalram = (1 << (fls(high_totalram) - 1)); - high_totalram += high_totalram - 1; - mask = (((u64)high_totalram) << 32) + 0x; - } - return mask; -} - u64 dma_get_required_mask(struct device *dev) { const struct dma_map_ops *ops = get_dma_ops(dev); @@ -258,7 +239,16 @@ u64 dma_get_required_mask(struct device *dev) return dma_direct_get_required_mask(dev); if (ops->get_required_mask) return ops->get_required_mask(dev); - return dma_default_get_required_mask(dev); + + /* +* We require every DMA ops implementation to at least support a 32-bit
remove default fallbacks in dma_map_ops
Hi all, we have a few places where the DMA mapping layer has non-trivial default actions that are questionable and/or dangerous. This series instead wires up the mmap, get_sgtable and get_required_mask methods explicitly and cleans up some surrounding areas. This also means we could get rid of the ARCH_NO_COHERENT_DMA_MMAP kconfig option, as we now require a mmap method wired up, or in case of non-coherent dma-direct the presence of the arch_dma_coherent_to_pfn hook. The only interesting case is that the sound code also checked the ARCH_NO_COHERENT_DMA_MMAP symbol in somewhat odd ways, so I'd like to see a review of the sound situation before going forward with that patch.
[PATCH 5/5] dma-mapping: remove ARCH_NO_COHERENT_DMA_MMAP
Now that we never use a default ->mmap implementation, and non-coherent architectures can control the presence of ->mmap support by enabling ARCH_HAS_DMA_COHERENT_TO_PFN for the dma direct implementation there is no need for a global config option to control the availability of dma_common_mmap. Signed-off-by: Christoph Hellwig --- arch/Kconfig| 3 --- arch/c6x/Kconfig| 1 - arch/m68k/Kconfig | 1 - arch/microblaze/Kconfig | 1 - arch/parisc/Kconfig | 1 - arch/sh/Kconfig | 1 - arch/xtensa/Kconfig | 1 - kernel/dma/mapping.c| 4 sound/core/pcm_native.c | 10 +- 9 files changed, 1 insertion(+), 22 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a7b57dd42c26..ec2834206d08 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -790,9 +790,6 @@ config COMPAT_32BIT_TIME This is relevant on all 32-bit architectures, and 64-bit architectures as part of compat syscall handling. -config ARCH_NO_COHERENT_DMA_MMAP - bool - config ARCH_NO_PREEMPT bool diff --git a/arch/c6x/Kconfig b/arch/c6x/Kconfig index b4fb61c83494..e65e8d82442a 100644 --- a/arch/c6x/Kconfig +++ b/arch/c6x/Kconfig @@ -20,7 +20,6 @@ config C6X select OF_EARLY_FLATTREE select GENERIC_CLOCKEVENTS select MODULES_USE_ELF_RELA - select ARCH_NO_COHERENT_DMA_MMAP select MMU_GATHER_NO_RANGE if MMU config MMU diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig index c518d695c376..614b355ae338 100644 --- a/arch/m68k/Kconfig +++ b/arch/m68k/Kconfig @@ -8,7 +8,6 @@ config M68K select ARCH_HAS_DMA_PREP_COHERENT if HAS_DMA && MMU && !COLDFIRE select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA select ARCH_MIGHT_HAVE_PC_PARPORT if ISA - select ARCH_NO_COHERENT_DMA_MMAP if !MMU select ARCH_NO_PREEMPT if !COLDFIRE select BINFMT_FLAT_ARGVP_ENVP_ON_STACK select DMA_DIRECT_REMAP if HAS_DMA && MMU && !COLDFIRE diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index d411de05b628..632c9477a0f6 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -9,7 +9,6 @@ config MICROBLAZE select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_MIGHT_HAVE_PC_PARPORT - select ARCH_NO_COHERENT_DMA_MMAP if !MMU select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT select TIMER_OF diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 6d732e451071..e9dd88b7f81e 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -52,7 +52,6 @@ config PARISC select GENERIC_SCHED_CLOCK select HAVE_UNSTABLE_SCHED_CLOCK if SMP select GENERIC_CLOCKEVENTS - select ARCH_NO_COHERENT_DMA_MMAP select CPU_NO_EFFICIENT_FFS select NEED_DMA_MAP_STATE select NEED_SG_DMA_LENGTH diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 6b1b5941b618..f356ee674d89 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -5,7 +5,6 @@ config SUPERH select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_MIGHT_HAVE_PC_PARPORT - select ARCH_NO_COHERENT_DMA_MMAP if !MMU select HAVE_PATA_PLATFORM select CLKDEV_LOOKUP select DMA_DECLARE_COHERENT diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index ebc135bda921..70653aed3005 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -5,7 +5,6 @@ config XTENSA select ARCH_HAS_BINFMT_FLAT if !MMU select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE - select ARCH_NO_COHERENT_DMA_MMAP if !MMU select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_FRAME_POINTERS diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 7dff1829c8c5..815446f76995 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -169,7 +169,6 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { -#ifndef CONFIG_ARCH_NO_COHERENT_DMA_MMAP unsigned long user_count = vma_pages(vma); unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; unsigned long off = vma->vm_pgoff; @@ -198,9 +197,6 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); -#else - return -ENXIO; -#endif /* !CONFIG_ARCH_NO_COHERENT_DMA_MMAP */ } /** diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 860543a4c840..2dadc708343a 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -218,15 +218,7 @@ int snd_pcm_info_user(struct snd_pcm_substream *substream, static bool hw_support_mmap(struct snd_pcm_substream
[PATCH 2/5] dma-mapping: move the dma_get_sgtable API comments from arm to common code
The comments are spot on and should be near the central API, not just near a single implementation. Signed-off-by: Christoph Hellwig --- arch/arm/mm/dma-mapping.c | 11 --- kernel/dma/mapping.c | 11 +++ 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 6774b03aa405..4410af33c5c4 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -877,17 +877,6 @@ static void arm_coherent_dma_free(struct device *dev, size_t size, void *cpu_add __arm_dma_free(dev, size, cpu_addr, handle, attrs, true); } -/* - * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems - * that the intention is to allow exporting memory allocated via the - * coherent DMA APIs through the dma_buf API, which only accepts a - * scattertable. This presents a couple of problems: - * 1. Not all memory allocated via the coherent DMA APIs is backed by - *a struct page - * 2. Passing coherent DMA memory into the streaming APIs is not allowed - *as we will try to flush the memory through a different alias to that - *actually being used (and the flushes are redundant.) - */ int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t handle, size_t size, unsigned long attrs) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index b945239621d8..4ceb5b9016d8 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -136,6 +136,17 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, return ret; } +/* + * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems + * that the intention is to allow exporting memory allocated via the + * coherent DMA APIs through the dma_buf API, which only accepts a + * scattertable. This presents a couple of problems: + * 1. Not all memory allocated via the coherent DMA APIs is backed by + *a struct page + * 2. Passing coherent DMA memory into the streaming APIs is not allowed + *as we will try to flush the memory through a different alias to that + *actually being used (and the flushes are redundant.) + */ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) -- 2.20.1
[PATCH 1/5] m68knommu: add a pgprot_noncached stub
Provide a pgprot_noncached like all the other nommu ports so that common code can rely on it being able to be present. Note that this is generally code that is not actually run on nommu, but at least we can avoid nasty ifdefs by having a stub. Signed-off-by: Christoph Hellwig --- arch/m68k/include/asm/pgtable_no.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/m68k/include/asm/pgtable_no.h b/arch/m68k/include/asm/pgtable_no.h index fc3a96c77bd8..06194c7ba151 100644 --- a/arch/m68k/include/asm/pgtable_no.h +++ b/arch/m68k/include/asm/pgtable_no.h @@ -29,6 +29,8 @@ #define PAGE_READONLY __pgprot(0) #define PAGE_KERNEL__pgprot(0) +#define pgprot_noncached(prot) (prot) + extern void paging_init(void); #define swapper_pg_dir ((pgd_t *) 0) -- 2.20.1
[PATCH 3/5] dma-mapping: explicitly wire up ->mmap and ->get_sgtable
While the default ->mmap and ->get_sgtable implementations work for the majority of our dma_map_ops impementations they are inherently safe for others that don't use the page allocator or CMA and/or use their own way of remapping not covered by the common code. So remove the defaults if these methods are not wired up, but instead wire up the default implementations for all safe instances. Fixes: e1c7e324539a ("dma-mapping: always provide the dma_map_ops based implementation") Signed-off-by: Christoph Hellwig --- arch/alpha/kernel/pci_iommu.c | 2 ++ arch/ia64/hp/common/sba_iommu.c | 2 ++ arch/ia64/sn/pci/pci_dma.c | 2 ++ arch/mips/jazz/jazzdma.c| 2 ++ arch/powerpc/kernel/dma-iommu.c | 2 ++ arch/powerpc/platforms/ps3/system-bus.c | 4 arch/powerpc/platforms/pseries/vio.c| 2 ++ arch/s390/pci/pci_dma.c | 2 ++ arch/sparc/kernel/iommu.c | 2 ++ arch/sparc/kernel/pci_sun4v.c | 2 ++ arch/x86/kernel/amd_gart_64.c | 2 ++ arch/x86/kernel/pci-calgary_64.c| 2 ++ drivers/iommu/amd_iommu.c | 2 ++ drivers/iommu/intel-iommu.c | 2 ++ kernel/dma/mapping.c| 20 15 files changed, 42 insertions(+), 8 deletions(-) diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index 242108439f42..7f1925a32c99 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -955,5 +955,7 @@ const struct dma_map_ops alpha_pci_ops = { .map_sg = alpha_pci_map_sg, .unmap_sg = alpha_pci_unmap_sg, .dma_supported = alpha_pci_supported, + .mmap = dma_common_mmap, + .get_sgtable= dma_common_get_sgtable, }; EXPORT_SYMBOL(alpha_pci_ops); diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index 3d24cc43385b..4c0ea6c2833d 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -2183,6 +2183,8 @@ const struct dma_map_ops sba_dma_ops = { .map_sg = sba_map_sg_attrs, .unmap_sg = sba_unmap_sg_attrs, .dma_supported = sba_dma_supported, + .mmap = dma_common_mmap, + .get_sgtable= dma_common_get_sgtable, }; void sba_dma_init(void) diff --git a/arch/ia64/sn/pci/pci_dma.c b/arch/ia64/sn/pci/pci_dma.c index b7d42e4edc1f..12ffb9c0d738 100644 --- a/arch/ia64/sn/pci/pci_dma.c +++ b/arch/ia64/sn/pci/pci_dma.c @@ -438,6 +438,8 @@ static struct dma_map_ops sn_dma_ops = { .unmap_sg = sn_dma_unmap_sg, .dma_supported = sn_dma_supported, .get_required_mask = sn_dma_get_required_mask, + .mmap = dma_common_mmap, + .get_sgtable= dma_common_get_sgtable, }; void sn_dma_init(void) diff --git a/arch/mips/jazz/jazzdma.c b/arch/mips/jazz/jazzdma.c index 1804dc9d8136..a01e14955187 100644 --- a/arch/mips/jazz/jazzdma.c +++ b/arch/mips/jazz/jazzdma.c @@ -682,5 +682,7 @@ const struct dma_map_ops jazz_dma_ops = { .sync_sg_for_device = jazz_dma_sync_sg_for_device, .dma_supported = dma_direct_supported, .cache_sync = arch_dma_cache_sync, + .mmap = dma_common_mmap, + .get_sgtable= dma_common_get_sgtable, }; EXPORT_SYMBOL(jazz_dma_ops); diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a0879674a9c8..2f5a53874f6d 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -208,4 +208,6 @@ const struct dma_map_ops dma_iommu_ops = { .sync_single_for_device = dma_iommu_sync_for_device, .sync_sg_for_cpu= dma_iommu_sync_sg_for_cpu, .sync_sg_for_device = dma_iommu_sync_sg_for_device, + .mmap = dma_common_mmap, + .get_sgtable= dma_common_get_sgtable, }; diff --git a/arch/powerpc/platforms/ps3/system-bus.c b/arch/powerpc/platforms/ps3/system-bus.c index 98410119c47b..70fcc9736a8c 100644 --- a/arch/powerpc/platforms/ps3/system-bus.c +++ b/arch/powerpc/platforms/ps3/system-bus.c @@ -700,6 +700,8 @@ static const struct dma_map_ops ps3_sb_dma_ops = { .get_required_mask = ps3_dma_get_required_mask, .map_page = ps3_sb_map_page, .unmap_page = ps3_unmap_page, + .mmap = dma_common_mmap, + .get_sgtable = dma_common_get_sgtable, }; static const struct dma_map_ops ps3_ioc0_dma_ops = { @@ -711,6 +713,8 @@ static const struct dma_map_ops ps3_ioc0_dma_ops = { .get_required_mask = ps3_dma_get_required_mask, .map_page = ps3_ioc0_map_page, .unmap_page = ps3_unmap_page, + .mmap = dma_common_mmap, + .get_sgtable = dma_common_get_sgtable, }; /** diff --git a/arch/powerpc/platforms/pseries/vio.c
Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
> /* Check if the dev needs to go through non-identity map and unmap process.*/ > static bool iommu_need_mapping(struct device *dev) > { > - int ret; > - > if (iommu_dummy(dev)) > return false; > > - ret = identity_mapping(dev); > - if (ret) { > - u64 dma_mask = *dev->dma_mask; > - > - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) > - dma_mask = dev->coherent_dma_mask; > - > - if (dma_mask >= dma_get_required_mask(dev)) > - return false; Don't we need to keep this bit so that we still allow the IOMMU to act if the device has a too small DMA mask to address all memory in the system, even if if it should otherwise be identity mapped?
Re: [PATCH v5 04/10] PCI: Add dev_is_untrusted helper
On Thu, Jul 25, 2019 at 11:17:11AM +0800, Lu Baolu wrote: > There are several places in the kernel where it is necessary to > check whether a device is a pci untrusted device. Add a helper > to simplify the callers. Looks good, Reviewed-by: Christoph Hellwig
Re: [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma
Looks good: Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call
On Wed, Jul 24, 2019 at 06:10:53PM -0400, Michael S. Tsirkin wrote: > Christoph - would a documented API wrapping dma_mask make sense? > With the documentation explaining how users must > desist from using DMA APIs if that returns false ... We have some bigger changes in this are planned, including turning dma_mask into a scalar instead of a pointer, please stay tuned. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device
This is necessary to avoid exposing valid kernel data to any malicious device. Suggested-by: Christoph Hellwig Signed-off-by: Lu Baolu --- kernel/dma/swiotlb.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 43c88626a1f3..edc84a00b9f9 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -35,6 +35,7 @@ #include #include #include +#include #ifdef CONFIG_DEBUG_FS #include #endif @@ -562,6 +563,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, */ for (i = 0; i < nslots; i++) io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT); + + /* Zero out the bounce buffer if the consumer is untrusted. */ + if (dev_is_untrusted(hwdev)) + memset(phys_to_virt(tlb_addr), 0, alloc_size); + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE); -- 2.17.1
[PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer
This adds a helper to check whether a device needs to use bounce buffer. It also provides a boot time option to disable the bounce buffer. Users can use this to prevent the iommu driver from using the bounce buffer for performance gain. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Lu Baolu Tested-by: Xu Pengfei Tested-by: Mika Westerberg --- Documentation/admin-guide/kernel-parameters.txt | 5 + drivers/iommu/intel-iommu.c | 6 ++ 2 files changed, 11 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 46b826fcb5ad..8628454bd8a2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1732,6 +1732,11 @@ Note that using this option lowers the security provided by tboot because it makes the system vulnerable to DMA attacks. + nobounce [Default off] + Disable bounce buffer for unstrusted devices such as + the Thunderbolt devices. This will treat the untrusted + devices as the trusted ones, hence might expose security + risks of DMA attacks. intel_idle.max_cstate= [KNL,HW,ACPI,X86] 0 disables intel_idle and fall back on acpi_idle. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index a458df975c55..4185406b0368 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -360,6 +360,7 @@ static int dmar_forcedac; static int intel_iommu_strict; static int intel_iommu_superpage = 1; static int iommu_identity_mapping; +static int intel_no_bounce; #define IDENTMAP_ALL 1 #define IDENTMAP_GFX 2 @@ -373,6 +374,8 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped); static DEFINE_SPINLOCK(device_domain_lock); static LIST_HEAD(device_domain_list); +#define device_needs_bounce(d) (!intel_no_bounce && dev_is_untrusted(d)) + /* * Iterate over elements in device_domain_list and call the specified * callback @fn against each element. @@ -455,6 +458,9 @@ static int __init intel_iommu_setup(char *str) printk(KERN_INFO "Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n"); intel_iommu_tboot_noforce = 1; + } else if (!strncmp(str, "nobounce", 8)) { + pr_info("Intel-IOMMU: No bounce buffer. This could expose security risks of DMA attacks\n"); + intel_no_bounce = 1; } str += strcspn(str, ","); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 04/10] PCI: Add dev_is_untrusted helper
There are several places in the kernel where it is necessary to check whether a device is a pci untrusted device. Add a helper to simplify the callers. Signed-off-by: Lu Baolu --- include/linux/pci.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/pci.h b/include/linux/pci.h index 9e700d9f9f28..960352a75a10 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1029,6 +1029,7 @@ void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type); void pci_sort_breadthfirst(void); #define dev_is_pci(d) ((d)->bus == &pci_bus_type) #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false)) +#define dev_is_untrusted(d) ((dev_is_pci(d) ? to_pci_dev(d)->untrusted : false)) /* Generic PCI functions exported to card drivers */ @@ -1766,6 +1767,7 @@ static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; } #define dev_is_pci(d) (false) #define dev_is_pf(d) (false) +#define dev_is_untrusted(d) (false) static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) { return false; } static inline int pci_irqd_intx_xlate(struct irq_domain *d, -- 2.17.1
[PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap
This adds trace support for the Intel IOMMU driver. It also declares some events which could be used to trace the events when an IOVA is being mapped or unmapped in a domain. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Mika Westerberg Signed-off-by: Lu Baolu --- drivers/iommu/Makefile | 1 + drivers/iommu/intel-trace.c| 14 + include/trace/events/intel_iommu.h | 95 ++ 3 files changed, 110 insertions(+) create mode 100644 drivers/iommu/intel-trace.c create mode 100644 include/trace/events/intel_iommu.h diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index f13f36ae1af6..bfe27b2755bd 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c new file mode 100644 index ..bfb6a6e37a88 --- /dev/null +++ b/drivers/iommu/intel-trace.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Intel IOMMU trace support + * + * Copyright (C) 2019 Intel Corporation + * + * Author: Lu Baolu + */ + +#include +#include + +#define CREATE_TRACE_POINTS +#include diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h new file mode 100644 index ..3fdeaad93b2e --- /dev/null +++ b/include/trace/events/intel_iommu.h @@ -0,0 +1,95 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Intel IOMMU trace support + * + * Copyright (C) 2019 Intel Corporation + * + * Author: Lu Baolu + */ +#ifdef CONFIG_INTEL_IOMMU +#undef TRACE_SYSTEM +#define TRACE_SYSTEM intel_iommu + +#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_INTEL_IOMMU_H + +#include +#include + +DECLARE_EVENT_CLASS(dma_map, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr, +size_t size), + + TP_ARGS(dev, dev_addr, phys_addr, size), + + TP_STRUCT__entry( + __string(dev_name, dev_name(dev)) + __field(dma_addr_t, dev_addr) + __field(phys_addr_t, phys_addr) + __field(size_t, size) + ), + + TP_fast_assign( + __assign_str(dev_name, dev_name(dev)); + __entry->dev_addr = dev_addr; + __entry->phys_addr = phys_addr; + __entry->size = size; + ), + + TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu", + __get_str(dev_name), + (unsigned long long)__entry->dev_addr, + (unsigned long long)__entry->phys_addr, + __entry->size) +); + +DEFINE_EVENT(dma_map, bounce_map_single, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr, +size_t size), + TP_ARGS(dev, dev_addr, phys_addr, size) +); + +DEFINE_EVENT(dma_map, bounce_map_sg, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr, +size_t size), + TP_ARGS(dev, dev_addr, phys_addr, size) +); + +DECLARE_EVENT_CLASS(dma_unmap, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size), + + TP_ARGS(dev, dev_addr, size), + + TP_STRUCT__entry( + __string(dev_name, dev_name(dev)) + __field(dma_addr_t, dev_addr) + __field(size_t, size) + ), + + TP_fast_assign( + __assign_str(dev_name, dev_name(dev)); + __entry->dev_addr = dev_addr; + __entry->size = size; + ), + + TP_printk("dev=%s dev_addr=0x%llx size=%zu", + __get_str(dev_name), + (unsigned long long)__entry->dev_addr, + __entry->size) +); + +DEFINE_EVENT(dma_unmap, bounce_unmap_single, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size), + TP_ARGS(dev, dev_addr, size) +); + +DEFINE_EVENT(dma_unmap, bounce_unmap_sg, + TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size), + TP_ARGS(dev, dev_addr, size) +); + +#endif /* _TRACE_INTEL_IOMMU_H */ + +/* This part must be outside protection */ +#include +#endif /* CONFIG_INTEL_IOMMU */ -- 2.17.1
[PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops
After using per-device dma ops, we don't need to check whether a dvice needs mapping, hence all checks of iommu_need_mapping() are unnecessary now. Cleanup them. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 50 - 1 file changed, 5 insertions(+), 45 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 11474bd2e348..a458df975c55 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2749,17 +2749,6 @@ static int __init si_domain_init(int hw) return 0; } -static int identity_mapping(struct device *dev) -{ - struct device_domain_info *info; - - info = dev->archdata.iommu; - if (info && info != DUMMY_DEVICE_DOMAIN_INFO) - return (info->domain == si_domain); - - return 0; -} - static int domain_add_dev_info(struct dmar_domain *domain, struct device *dev) { struct dmar_domain *ndomain; @@ -3416,15 +3405,6 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) return domain; } -/* Check if the dev needs to go through non-identity map and unmap process.*/ -static bool iommu_need_mapping(struct device *dev) -{ - if (iommu_dummy(dev)) - return false; - - return !identity_mapping(dev); -} - static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, size_t size, int dir, u64 dma_mask) { @@ -3486,20 +3466,15 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page, enum dma_data_direction dir, unsigned long attrs) { - if (iommu_need_mapping(dev)) - return __intel_map_single(dev, page_to_phys(page) + offset, - size, dir, *dev->dma_mask); - return dma_direct_map_page(dev, page, offset, size, dir, attrs); + return __intel_map_single(dev, page_to_phys(page) + offset, + size, dir, *dev->dma_mask); } static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (iommu_need_mapping(dev)) - return __intel_map_single(dev, phys_addr, size, dir, - *dev->dma_mask); - return dma_direct_map_resource(dev, phys_addr, size, dir, attrs); + return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask); } static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size) @@ -3551,17 +3526,13 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (iommu_need_mapping(dev)) - intel_unmap(dev, dev_addr, size); - else - dma_direct_unmap_page(dev, dev_addr, size, dir, attrs); + intel_unmap(dev, dev_addr, size); } static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr, size_t size, enum dma_data_direction dir, unsigned long attrs) { - if (iommu_need_mapping(dev)) - intel_unmap(dev, dev_addr, size); + intel_unmap(dev, dev_addr, size); } static void *intel_alloc_coherent(struct device *dev, size_t size, @@ -3571,9 +3542,6 @@ static void *intel_alloc_coherent(struct device *dev, size_t size, struct page *page = NULL; int order; - if (!iommu_need_mapping(dev)) - return dma_direct_alloc(dev, size, dma_handle, flags, attrs); - size = PAGE_ALIGN(size); order = get_order(size); @@ -3607,9 +3575,6 @@ static void intel_free_coherent(struct device *dev, size_t size, void *vaddr, int order; struct page *page = virt_to_page(vaddr); - if (!iommu_need_mapping(dev)) - return dma_direct_free(dev, size, vaddr, dma_handle, attrs); - size = PAGE_ALIGN(size); order = get_order(size); @@ -3627,9 +3592,6 @@ static void intel_unmap_sg(struct device *dev, struct scatterlist *sglist, struct scatterlist *sg; int i; - if (!iommu_need_mapping(dev)) - return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs); - for_each_sg(sglist, sg, nelems, i) { nrpages += aligned_nrpages(sg_dma_address(sg), sg_dma_len(sg)); } @@ -3651,8 +3613,6 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele struct intel_iommu *iommu; BUG_ON(dir == DMA_NONE); - if (!iommu_need_mapping(dev)) - return dma_direct_map_sg(dev, sglist, nelems, dir, attrs); domain = find_domain(dev); if (!domain) -- 2.17.1 __
[PATCH v5 07/10] iommu: Add bounce page APIs
IOMMU hardware always use paging for DMA remapping. The minimum mapped window is a page size. The device drivers may map buffers not filling whole IOMMU window. It allows device to access to possibly unrelated memory and various malicious devices can exploit this to perform DMA attack. This introduces the bouce buffer mechanism for DMA buffers which doesn't fill a minimal IOMMU page. It could be used by various vendor specific IOMMU drivers as long as the DMA domain is managed by the generic IOMMU layer. Below APIs are added: * iommu_bounce_map(dev, addr, paddr, size, dir, attrs) - Map a buffer start at DMA address @addr in bounce page manner. For buffer parts that doesn't cross a whole minimal IOMMU page, the bounce page policy is applied. A bounce page mapped by swiotlb will be used as the DMA target in the IOMMU page table. Otherwise, the physical address @paddr is mapped instead. * iommu_bounce_unmap(dev, addr, size, dir, attrs) - Unmap the buffer mapped with iommu_bounce_map(). The bounce page will be torn down after the bounced data get synced. * iommu_bounce_sync(dev, addr, size, dir, target) - Synce the bounced data in case the bounce mapped buffer is reused. The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE. It's useful for cases where bounce page doesn't needed, for example, embedded cases. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Alan Cox Cc: Mika Westerberg Signed-off-by: Lu Baolu --- drivers/iommu/Kconfig | 13 + drivers/iommu/iommu.c | 118 ++ include/linux/iommu.h | 35 + 3 files changed, 166 insertions(+) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e15cdcd8cb3c..d7f2e09cbcf2 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -86,6 +86,19 @@ config IOMMU_DEFAULT_PASSTHROUGH If unsure, say N here. +config IOMMU_BOUNCE_PAGE + bool "Use bounce page for untrusted devices" + depends on IOMMU_API && SWIOTLB + help + IOMMU hardware always use paging for DMA remapping. The minimum + mapped window is a page size. The device drivers may map buffers + not filling whole IOMMU window. This allows device to access to + possibly unrelated memory and malicious device can exploit this + to perform a DMA attack. Select this to use a bounce page for the + buffer which doesn't fill a whole IOMU page. + + If unsure, say N here. + config OF_IOMMU def_bool y depends on OF && IOMMU_API diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 0c674d80c37f..fe3815186d72 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2468,3 +2468,121 @@ int iommu_sva_get_pasid(struct iommu_sva *handle) return ops->sva_get_pasid(handle); } EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); + +#ifdef CONFIG_IOMMU_BOUNCE_PAGE + +/* + * Bounce buffer support for external devices: + * + * IOMMU hardware always use paging for DMA remapping. The minimum mapped + * window is a page size. The device drivers may map buffers not filling + * whole IOMMU window. This allows device to access to possibly unrelated + * memory and malicious device can exploit this to perform a DMA attack. + * Use bounce pages for the buffer which doesn't fill whole IOMMU pages. + */ + +static inline size_t +get_aligned_size(struct iommu_domain *domain, size_t size) +{ + return ALIGN(size, 1 << __ffs(domain->pgsize_bitmap)); +} + +dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova, + phys_addr_t paddr, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct iommu_domain *domain; + unsigned int min_pagesz; + phys_addr_t tlb_addr; + size_t aligned_size; + int prot = 0; + int ret; + + domain = iommu_get_dma_domain(dev); + if (!domain) + return DMA_MAPPING_ERROR; + + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + prot |= IOMMU_READ; + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + prot |= IOMMU_WRITE; + + aligned_size = get_aligned_size(domain, size); + min_pagesz = 1 << __ffs(domain->pgsize_bitmap); + + /* +* If both the physical buffer start address and size are +* page aligned, we don't need to use a bounce page. +*/ + if (!IS_ALIGNED(paddr | size, min_pagesz)) { + tlb_addr = swiotlb_tbl_map_single(dev, + __phys_to_dma(dev, io_tlb_start), + paddr, size, aligned_size, dir, attrs); + if (tlb_addr == DMA_MAPPING_ERROR) + return DMA_MAPPING_ERROR; + } else { + tlb_addr = paddr; + } + + ret = iommu_map(domain, iova, tlb_addr, aligned_size, prot); + if
[PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
Current Intel IOMMU driver sets the system level dma_ops hence each dma API will go through the IOMMU driver even the devices are using an identity mapped domain. This applies per-device dma_ops in this driver and leave the system level dma_ops for direct dma. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian \ Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 43 ++--- 1 file changed, 7 insertions(+), 36 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 8064af607d3b..11474bd2e348 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3419,43 +3419,10 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev) /* Check if the dev needs to go through non-identity map and unmap process.*/ static bool iommu_need_mapping(struct device *dev) { - int ret; - if (iommu_dummy(dev)) return false; - ret = identity_mapping(dev); - if (ret) { - u64 dma_mask = *dev->dma_mask; - - if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) - dma_mask = dev->coherent_dma_mask; - - if (dma_mask >= dma_get_required_mask(dev)) - return false; - - /* -* 32 bit DMA is removed from si_domain and fall back to -* non-identity mapping. -*/ - dmar_remove_one_dev_info(dev); - ret = iommu_request_dma_domain_for_dev(dev); - if (ret) { - struct iommu_domain *domain; - struct dmar_domain *dmar_domain; - - domain = iommu_get_domain_for_dev(dev); - if (domain) { - dmar_domain = to_dmar_domain(domain); - dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; - } - get_private_domain_for_dev(dev); - } - - dev_info(dev, "32bit DMA uses non-identity mapping\n"); - } - - return true; + return !identity_mapping(dev); } static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr, @@ -4706,8 +4673,6 @@ int __init intel_iommu_init(void) } up_write(&dmar_global_lock); - dma_ops = &intel_dma_ops; - init_iommu_pm_ops(); for_each_active_iommu(iommu, drhd) { @@ -5280,6 +5245,8 @@ static int intel_iommu_add_device(struct device *dev) dev_info(dev, "Device uses a private identity domain.\n"); } + } else { + set_dma_ops(dev, &intel_dma_ops); } } else { if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) { @@ -5295,6 +5262,8 @@ static int intel_iommu_add_device(struct device *dev) dev_info(dev, "Device uses a private dma domain.\n"); } + + set_dma_ops(dev, &intel_dma_ops); } } @@ -5313,6 +5282,8 @@ static void intel_iommu_remove_device(struct device *dev) iommu_group_remove_device(dev); iommu_device_unlink(&iommu->iommu, dev); + + set_dma_ops(dev, NULL); } static void intel_iommu_get_resv_regions(struct device *device, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs
This splits the size parameter to swiotlb_tbl_map_single() and swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size parameter, where the latter one is rounded up to the iommu page size. Suggested-by: Christoph Hellwig Signed-off-by: Lu Baolu --- drivers/xen/swiotlb-xen.c | 8 include/linux/swiotlb.h | 8 ++-- kernel/dma/direct.c | 2 +- kernel/dma/swiotlb.c | 24 +--- 4 files changed, 24 insertions(+), 18 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index cfbe46785a3b..58d25486971e 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -400,8 +400,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, */ trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force); - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir, -attrs); + map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, +size, size, dir, attrs); if (map == (phys_addr_t)DMA_MAPPING_ERROR) return DMA_MAPPING_ERROR; @@ -411,7 +411,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, * Ensure that the address returned is DMA'ble */ if (unlikely(!dma_capable(dev, dev_addr, size))) { - swiotlb_tbl_unmap_single(dev, map, size, dir, + swiotlb_tbl_unmap_single(dev, map, size, size, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); return DMA_MAPPING_ERROR; } @@ -447,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) - swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs); + swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs); } static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr, diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 361f62bb4a8e..cde3dc18e21a 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -46,13 +46,17 @@ enum dma_sync_target { extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, - phys_addr_t phys, size_t size, + phys_addr_t phys, + size_t mapping_size, + size_t alloc_size, enum dma_data_direction dir, unsigned long attrs); extern void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, -size_t size, enum dma_data_direction dir, +size_t mapping_size, +size_t alloc_size, +enum dma_data_direction dir, unsigned long attrs); extern void swiotlb_tbl_sync_single(struct device *hwdev, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 59bdceea3737..6c183326c4e6 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -297,7 +297,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr, dma_direct_sync_single_for_cpu(dev, addr, size, dir); if (unlikely(is_swiotlb_buffer(phys))) - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); + swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs); } EXPORT_SYMBOL(dma_direct_unmap_page); diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 9de232229063..43c88626a1f3 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -444,7 +444,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, dma_addr_t tbl_dma_addr, - phys_addr_t orig_addr, size_t size, + phys_addr_t orig_addr, + size_t mapping_size, + size_t alloc_size, enum dma_data_direction dir, unsigned long attrs) { @@ -481,8 +483,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ - nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; - if (size >= PAGE_SIZE) + nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + if (alloc_size >= PAGE_SI
[PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices
The Intel VT-d hardware uses paging for DMA remapping. The minimum mapped window is a page size. The device drivers may map buffers not filling the whole IOMMU window. This allows the device to access to possibly unrelated memory and a malicious device could exploit this to perform DMA attacks. To address this, the Intel IOMMU driver will use bounce pages for those buffers which don't fill whole IOMMU pages. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Lu Baolu Tested-by: Xu Pengfei Tested-by: Mika Westerberg --- drivers/iommu/Kconfig | 2 + drivers/iommu/intel-iommu.c | 188 +++- 2 files changed, 188 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d7f2e09cbcf2..3baa418edc16 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -195,6 +195,8 @@ config INTEL_IOMMU select IOMMU_IOVA select NEED_DMA_MAP_STATE select DMAR_TABLE + select SWIOTLB + select IOMMU_BOUNCE_PAGE help DMA remapping (DMAR) devices support enables independent address translations for Direct Memory Access (DMA) from devices. diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 4185406b0368..2cdec279ccac 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "irq_remapping.h" #include "intel-pasid.h" @@ -3672,6 +3673,183 @@ static const struct dma_map_ops intel_dma_ops = { .dma_supported = dma_direct_supported, }; +static dma_addr_t +bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size, + enum dma_data_direction dir, unsigned long attrs, + u64 dma_mask) +{ + struct dmar_domain *domain; + struct intel_iommu *iommu; + unsigned long iova_pfn; + unsigned long nrpages; + dma_addr_t ret_addr; + + domain = find_domain(dev); + if (WARN_ON(dir == DMA_NONE || !domain)) + return DMA_MAPPING_ERROR; + + iommu = domain_get_iommu(domain); + nrpages = aligned_nrpages(0, size); + iova_pfn = intel_alloc_iova(dev, domain, + dma_to_mm_pfn(nrpages), dma_mask); + if (!iova_pfn) + return DMA_MAPPING_ERROR; + + ret_addr = iommu_bounce_map(dev, iova_pfn << PAGE_SHIFT, + paddr, size, dir, attrs); + if (ret_addr == DMA_MAPPING_ERROR) { + free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages)); + return DMA_MAPPING_ERROR; + } + + trace_bounce_map_single(dev, iova_pfn << PAGE_SHIFT, paddr, size); + + return ret_addr; +} + +static void +bounce_unmap_single(struct device *dev, dma_addr_t dev_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + struct dmar_domain *domain; + struct intel_iommu *iommu; + unsigned long iova_pfn; + unsigned long nrpages; + + domain = find_domain(dev); + if (WARN_ON(!domain)) + return; + + iommu_bounce_unmap(dev, dev_addr, size, dir, attrs); + trace_bounce_unmap_single(dev, dev_addr, size); + + iommu = domain_get_iommu(domain); + iova_pfn = IOVA_PFN(dev_addr); + nrpages = aligned_nrpages(0, size); + + iommu_flush_iotlb_psi(iommu, domain, + mm_to_dma_pfn(iova_pfn), nrpages, 0, 0); + free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages)); +} + +static dma_addr_t +bounce_map_page(struct device *dev, struct page *page, unsigned long offset, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + return bounce_map_single(dev, page_to_phys(page) + offset, +size, dir, attrs, *dev->dma_mask); +} + +static dma_addr_t +bounce_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + return bounce_map_single(dev, phys_addr, size, +dir, attrs, *dev->dma_mask); +} + +static void +bounce_unmap_page(struct device *dev, dma_addr_t dev_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + bounce_unmap_single(dev, dev_addr, size, dir, attrs); +} + +static void +bounce_unmap_resource(struct device *dev, dma_addr_t dev_addr, size_t size, + enum dma_data_direction dir, unsigned long attrs) +{ + bounce_unmap_single(dev, dev_addr, size, dir, attrs); +} + +static void +bounce_unmap_sg(struct device *dev, struct scatterlist *sglist, int nelems, + enum dma_data_direction dir, unsigned long attrs) +{ + struct scatterlist *sg; + int i; + + for_each_sg(sglist, sg, nelems, i) + bounce_unmap_page(dev, sg->dma_address, +
[PATCH v5 00/10] iommu: Bounce page for untrusted devices
The Thunderbolt vulnerabilities are public and have a nice name as Thunderclap [1] [3] nowadays. This patch series aims to mitigate those concerns. An external PCI device is a PCI peripheral device connected to the system through an external bus, such as Thunderbolt. What makes it different is that it can't be trusted to the same degree as the devices build into the system. Generally, a trusted PCIe device will DMA into the designated buffers and not overrun or otherwise write outside the specified bounds. But it's different for an external device. The minimum IOMMU mapping granularity is one page (4k), so for DMA transfers smaller than that a malicious PCIe device can access the whole page of memory even if it does not belong to the driver in question. This opens a possibility for DMA attack. For more information about DMA attacks imposed by an untrusted PCI/PCIe device, please refer to [2]. This implements bounce buffer for the untrusted external devices. The transfers should be limited in isolated pages so the IOMMU window does not cover memory outside of what the driver expects. Previously (v3 and before), we proposed an optimisation to only copy the head and tail of the buffer if it spans multiple pages, and directly map the ones in the middle. Figure 1 gives a big picture about this solution. swiotlb System IOVA bounce page Memory .-. .-..-. | | | || | | | | || | buffer_start .-. .-..-. | |->| |***>| | | | | | swiotlb| | | | | | mapping| | IOMMU Page '-' '-''-' Boundary | | | | | | | | | | | | | |>| | | |IOMMU mapping| | | | | | IOMMU Page .-. .-. Boundary | | | | | | | | | |>| | | | IOMMU mapping | | | | | | | | | | IOMMU Page .-. .-..-. Boundary | | | || | | | | || | | |->| |***>| | buffer_end '-' '-' swiotlb'-' | | | | mapping| | | | | || | '-' '-''-' Figure 1: A big view of iommu bounce page As Robin Murphy pointed out, this ties us to using strict mode for TLB maintenance, which may not be an overall win depending on the balance between invalidation bandwidth vs. memcpy bandwidth. If we use standard SWIOTLB logic to always copy the whole thing, we should be able to release the bounce pages via the flush queue to allow 'safe' lazy unmaps. So since v4 we start to use the standard swiotlb logic. swiotlb System IOVA bounce page Memory buffer_start .-. .-..-. | | | || | | | | || | | | | |.-.physical | |->| | -->| |_start | |iommu | | swiotlb| | | | map | | map | | IOMMU Page .-. .-.'-' Boundary | | | || | | | | || | | |->| || | | |iommu | || | | | map | || | | | | || | IOMMU Page .-. .-..-. Boundary | | | || | | |->| || | | |iommu | || | | | map | || | | |
[PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma
The direct dma implementation depends on swiotlb. Hence, don't switch off swiotlb since direct dma interfaces are used in this driver. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index bdaed2da8a55..8064af607d3b 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4568,9 +4568,6 @@ static int __init platform_optin_force_iommu(void) iommu_identity_mapping |= IDENTMAP_ALL; dmar_disabled = 0; -#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB) - swiotlb = 0; -#endif no_iommu = 0; return 1; @@ -4709,9 +4706,6 @@ int __init intel_iommu_init(void) } up_write(&dmar_global_lock); -#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB) - swiotlb = 0; -#endif dma_ops = &intel_dma_ops; init_iommu_pm_ops(); -- 2.17.1
[PATCH] media: staging: tegra-vde: Fix build error
If IOMMU_SUPPORT is not set, and COMPILE_TEST is y, IOMMU_IOVA may be set to m. So building will fails: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Select IOMMU_IOVA while COMPILE_TEST is set to fix this. Reported-by: Hulk Robot Suggested-by: Dmitry Osipenko Fixes: b301f8de1925 ("media: staging: media: tegra-vde: Add IOMMU support") Signed-off-by: YueHaibing --- drivers/staging/media/tegra-vde/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/tegra-vde/Kconfig b/drivers/staging/media/tegra-vde/Kconfig index 2e7f644..ba49ea5 100644 --- a/drivers/staging/media/tegra-vde/Kconfig +++ b/drivers/staging/media/tegra-vde/Kconfig @@ -3,7 +3,7 @@ config TEGRA_VDE tristate "NVIDIA Tegra Video Decoder Engine driver" depends on ARCH_TEGRA || COMPILE_TEST select DMA_SHARED_BUFFER - select IOMMU_IOVA if IOMMU_SUPPORT + select IOMMU_IOVA if (IOMMU_SUPPORT || COMPILE_TEST) select SRAM help Say Y here to enable support for the NVIDIA Tegra video decoder -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 7/7] iommu/vt-d: Consolidate domain_init() to avoid duplication
Hi, On 7/22/19 10:22 PM, Joerg Roedel wrote: On Sat, Jul 20, 2019 at 09:15:58AM +0800, Lu Baolu wrote: Okay. I agree wit you. Let's revert this commit first. Reverted the patch and queued it to my iommu/fixes branch. Thank you! Joerg. Best regards, Baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/vt-d: Correctly check format of page table in debugfs
Hi Sai, On 7/22/19 1:21 PM, Prakhya, Sai Praneeth wrote: Hi Allen, diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel- iommu-debugfs.c index 73a552914455..e31c3b416351 100644 --- a/drivers/iommu/intel-iommu-debugfs.c +++ b/drivers/iommu/intel-iommu-debugfs.c @@ -235,7 +235,7 @@ static void ctx_tbl_walk(struct seq_file *m, struct intel_iommu *iommu, u16 bus) tbl_wlk.ctx_entry = context; m->private = &tbl_wlk; - if (pasid_supported(iommu) && is_pasid_enabled(context)) { + if (dmar_readq(iommu->reg + DMAR_RTADDR_REG) & DMA_RTADDR_SMT) { Thanks for adding this, I do believe this is a good addition but I also think that we might need "is_pasid_enabled()" as well. It checks if PASIDE bit in context entry is enabled or not. I am thinking that even though DMAR might be using scalable root and context table, the entry itself should have PASIDE bit set. Did I miss something here? No matter the PASIDE bit set or not, IOMMU always uses the scalable mode page table if scalable mode is enabled. If PASIDE is set, requests with PASID will be handled. Otherwise, requests with PASID will be blocked (but request without PASID will always be handled). We are dumpling the page table of the IOMMU, so we only care about what page table format it is using. Do I understand it right> Best regards, Baolu And I also think a macro would be better so that it could reused elsewhere (if need be). Regards, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call
On Tue, Jul 23, 2019 at 05:38:51PM +0200, Christoph Hellwig wrote: > On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote: > >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > >> index c8be1c4f5b55..37c143971211 100644 > >> --- a/drivers/virtio/virtio_ring.c > >> +++ b/drivers/virtio/virtio_ring.c > >> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev) > >> { > >>size_t max_segment_size = SIZE_MAX; > >> -if (vring_use_dma_api(vdev)) > >> + if (vring_use_dma_api(vdev) && vdev->dev.dma_mask) > > > > Hmm, might it make sense to roll that check up into vring_use_dma_api() > > itself? After all, if the device has no mask then it's likely that other > > DMA API ops wouldn't really work as expected either. > > Makes sense to me. Christoph - would a documented API wrapping dma_mask make sense? With the documentation explaining how users must desist from using DMA APIs if that returns false ... -- MST ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 3/4] dt-bindings: iommu/arm,smmu: add compatible string for Marvell
On Thu, 11 Jul 2019 17:02:41 +0200, Gregory CLEMENT wrote: > From: Hanna Hawa > > Add specific compatible string for Marvell usage due errata of > accessing 64bits registers of ARM SMMU, in AP806. > > AP806 SoC uses the generic ARM-MMU500, and there's no specific > implementation of Marvell, this compatible is used for errata only. > > Signed-off-by: Hanna Hawa > Signed-off-by: Gregory CLEMENT > --- > Documentation/devicetree/bindings/iommu/arm,smmu.txt | 1 + > 1 file changed, 1 insertion(+) > Reviewed-by: Rob Herring
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On 7/24/19 1:40 PM, Kirill A. Shutemov wrote: > On Wed, Jul 24, 2019 at 06:30:21PM +, Lendacky, Thomas wrote: >> On 7/24/19 1:11 PM, Kirill A. Shutemov wrote: >>> On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote: On 7/24/19 12:06 PM, Robin Murphy wrote: > On 24/07/2019 17:42, Lendacky, Thomas wrote: >> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: >>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: @@ -351,6 +355,32 @@ bool sev_active(void) } EXPORT_SYMBOL(sev_active); +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ +bool force_dma_unencrypted(struct device *dev) +{ + /* + * For SEV, all DMA must be to unencrypted addresses. + */ + if (sev_active()) + return true; + + /* + * For SME, all DMA must be to unencrypted addresses if the + * device does not support DMA to addresses that include the + * encryption mask. + */ + if (sme_active()) { + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, + dev->bus_dma_mask); + + if (dma_dev_mask <= dma_enc_mask) + return true; >>> >>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it >>> means that device mask is wide enough to cover encryption bit, doesn't >>> it? >> >> Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's >> say >> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) >> will generate a mask without bit 47 set (0x7fff). So the check >> will catch anything that does not support at least 48-bit DMA. > > Couldn't that be expressed as just: > > if (sme_me_mask & dma_dev_mask == sme_me_mask) Actually !=, but yes, it could have been done like that, I just didn't think of it. >>> >>> I'm looking into generalizing the check to cover MKTME. >>> >>> Leaving off the Kconfig changes and moving the check to other file, >>> doest >>> the change below look reasonable to you. It's only build tested so far. >>> >>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c >>> index fece30ca8b0c..6c86adcd02da 100644 >>> --- a/arch/x86/mm/mem_encrypt.c >>> +++ b/arch/x86/mm/mem_encrypt.c >>> @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); >>> /* Override for DMA direct allocation check - >>> ARCH_HAS_FORCE_DMA_UNENCRYPTED */ >>> bool force_dma_unencrypted(struct device *dev) >>> { >>> + u64 dma_enc_mask; >>> + >>> /* >>> * For SEV, all DMA must be to unencrypted addresses. >>> */ >>> @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev) >>> return true; >>> >>> /* >>> -* For SME, all DMA must be to unencrypted addresses if the >>> -* device does not support DMA to addresses that include the >>> -* encryption mask. >>> +* For SME and MKTME, all DMA must be to unencrypted addresses if the >>> +* device does not support DMA to addresses that include the encryption >>> +* mask. >>> */ >>> - if (sme_active()) { >>> - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); >>> - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, >>> - dev->bus_dma_mask); >>> + if (!sme_active() && !mktme_enabled()) >>> + return false; >>> >>> - if (dma_dev_mask <= dma_enc_mask) >>> - return true; >>> - } >>> + dma_enc_mask = sme_me_mask | mktme_keyid_mask(); >>> + >>> + if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) >>> != dma_enc_mask) >>> + return true; >>> + >>> + if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != >>> dma_enc_mask) >>> + return true; >> >> Do you want to err on the side of caution and return true if both masks >> are zero? You could do the min_not_zero step and then return true if the >> result is zero. Then just make the one comparison against dma_enc_mask. > > Something like this? Yup, looks good. Thanks, Tom > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index fece30ca8b0c..173d68b08c55 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); > /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED > */ > bool force_dma_unencrypted(struct device *dev) > { > + u64 dma_enc_mask, dma_dev_mask; > + > /* >* For SEV, all DMA must be to unencrypted addresses. >*/ > @@ -362,20 +364,17 @@ bool force_dma_unencrypted(struct device *dev) > return
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On Wed, Jul 24, 2019 at 06:30:21PM +, Lendacky, Thomas wrote: > On 7/24/19 1:11 PM, Kirill A. Shutemov wrote: > > On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote: > >> On 7/24/19 12:06 PM, Robin Murphy wrote: > >>> On 24/07/2019 17:42, Lendacky, Thomas wrote: > On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: > > On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: > >> @@ -351,6 +355,32 @@ bool sev_active(void) > >> } > >> EXPORT_SYMBOL(sev_active); > >> +/* Override for DMA direct allocation check - > >> ARCH_HAS_FORCE_DMA_UNENCRYPTED */ > >> +bool force_dma_unencrypted(struct device *dev) > >> +{ > >> + /* > >> + * For SEV, all DMA must be to unencrypted addresses. > >> + */ > >> + if (sev_active()) > >> + return true; > >> + > >> + /* > >> + * For SME, all DMA must be to unencrypted addresses if the > >> + * device does not support DMA to addresses that include the > >> + * encryption mask. > >> + */ > >> + if (sme_active()) { > >> + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); > >> + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, > >> + dev->bus_dma_mask); > >> + > >> + if (dma_dev_mask <= dma_enc_mask) > >> + return true; > > > > Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it > > means that device mask is wide enough to cover encryption bit, doesn't > > it? > > Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's > say > that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) > will generate a mask without bit 47 set (0x7fff). So the check > will catch anything that does not support at least 48-bit DMA. > >>> > >>> Couldn't that be expressed as just: > >>> > >>> if (sme_me_mask & dma_dev_mask == sme_me_mask) > >> > >> Actually !=, but yes, it could have been done like that, I just didn't > >> think of it. > > > > I'm looking into generalizing the check to cover MKTME. > > > > Leaving off the Kconfig changes and moving the check to other file, > > doest > > the change below look reasonable to you. It's only build tested so far. > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > > index fece30ca8b0c..6c86adcd02da 100644 > > --- a/arch/x86/mm/mem_encrypt.c > > +++ b/arch/x86/mm/mem_encrypt.c > > @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); > > /* Override for DMA direct allocation check - > > ARCH_HAS_FORCE_DMA_UNENCRYPTED */ > > bool force_dma_unencrypted(struct device *dev) > > { > > + u64 dma_enc_mask; > > + > > /* > > * For SEV, all DMA must be to unencrypted addresses. > > */ > > @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev) > > return true; > > > > /* > > -* For SME, all DMA must be to unencrypted addresses if the > > -* device does not support DMA to addresses that include the > > -* encryption mask. > > +* For SME and MKTME, all DMA must be to unencrypted addresses if the > > +* device does not support DMA to addresses that include the encryption > > +* mask. > > */ > > - if (sme_active()) { > > - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); > > - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, > > - dev->bus_dma_mask); > > + if (!sme_active() && !mktme_enabled()) > > + return false; > > > > - if (dma_dev_mask <= dma_enc_mask) > > - return true; > > - } > > + dma_enc_mask = sme_me_mask | mktme_keyid_mask(); > > + > > + if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) > > != dma_enc_mask) > > + return true; > > + > > + if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != > > dma_enc_mask) > > + return true; > > Do you want to err on the side of caution and return true if both masks > are zero? You could do the min_not_zero step and then return true if the > result is zero. Then just make the one comparison against dma_enc_mask. Something like this? diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index fece30ca8b0c..173d68b08c55 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) { + u64 dma_enc_mask, dma_dev_mask; + /* * For SEV, all DMA must be to unencrypted addresses. */ @@ -362,20 +364,17 @@ bool force_dma_unencrypted(struct device *dev) return true; /* -* For SME, all DMA must be to unencrypted addresses if the -* device does
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On 7/24/19 1:11 PM, Kirill A. Shutemov wrote: > On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote: >> On 7/24/19 12:06 PM, Robin Murphy wrote: >>> On 24/07/2019 17:42, Lendacky, Thomas wrote: On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: > On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: >> @@ -351,6 +355,32 @@ bool sev_active(void) >> } >> EXPORT_SYMBOL(sev_active); >> +/* Override for DMA direct allocation check - >> ARCH_HAS_FORCE_DMA_UNENCRYPTED */ >> +bool force_dma_unencrypted(struct device *dev) >> +{ >> + /* >> + * For SEV, all DMA must be to unencrypted addresses. >> + */ >> + if (sev_active()) >> + return true; >> + >> + /* >> + * For SME, all DMA must be to unencrypted addresses if the >> + * device does not support DMA to addresses that include the >> + * encryption mask. >> + */ >> + if (sme_active()) { >> + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); >> + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, >> + dev->bus_dma_mask); >> + >> + if (dma_dev_mask <= dma_enc_mask) >> + return true; > > Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it > means that device mask is wide enough to cover encryption bit, doesn't it? Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's say that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) will generate a mask without bit 47 set (0x7fff). So the check will catch anything that does not support at least 48-bit DMA. >>> >>> Couldn't that be expressed as just: >>> >>> if (sme_me_mask & dma_dev_mask == sme_me_mask) >> >> Actually !=, but yes, it could have been done like that, I just didn't >> think of it. > > I'm looking into generalizing the check to cover MKTME. > > Leaving off the Kconfig changes and moving the check to other file, > doest > the change below look reasonable to you. It's only build tested so far. > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c > index fece30ca8b0c..6c86adcd02da 100644 > --- a/arch/x86/mm/mem_encrypt.c > +++ b/arch/x86/mm/mem_encrypt.c > @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); > /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED > */ > bool force_dma_unencrypted(struct device *dev) > { > + u64 dma_enc_mask; > + > /* >* For SEV, all DMA must be to unencrypted addresses. >*/ > @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev) > return true; > > /* > - * For SME, all DMA must be to unencrypted addresses if the > - * device does not support DMA to addresses that include the > - * encryption mask. > + * For SME and MKTME, all DMA must be to unencrypted addresses if the > + * device does not support DMA to addresses that include the encryption > + * mask. >*/ > - if (sme_active()) { > - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); > - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, > - dev->bus_dma_mask); > + if (!sme_active() && !mktme_enabled()) > + return false; > > - if (dma_dev_mask <= dma_enc_mask) > - return true; > - } > + dma_enc_mask = sme_me_mask | mktme_keyid_mask(); > + > + if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) > != dma_enc_mask) > + return true; > + > + if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != > dma_enc_mask) > + return true; Do you want to err on the side of caution and return true if both masks are zero? You could do the min_not_zero step and then return true if the result is zero. Then just make the one comparison against dma_enc_mask. Thanks, Tom > > return false; > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On Wed, Jul 24, 2019 at 05:34:26PM +, Lendacky, Thomas wrote: > On 7/24/19 12:06 PM, Robin Murphy wrote: > > On 24/07/2019 17:42, Lendacky, Thomas wrote: > >> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: > >>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: > @@ -351,6 +355,32 @@ bool sev_active(void) > } > EXPORT_SYMBOL(sev_active); > +/* Override for DMA direct allocation check - > ARCH_HAS_FORCE_DMA_UNENCRYPTED */ > +bool force_dma_unencrypted(struct device *dev) > +{ > + /* > + * For SEV, all DMA must be to unencrypted addresses. > + */ > + if (sev_active()) > + return true; > + > + /* > + * For SME, all DMA must be to unencrypted addresses if the > + * device does not support DMA to addresses that include the > + * encryption mask. > + */ > + if (sme_active()) { > + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); > + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, > + dev->bus_dma_mask); > + > + if (dma_dev_mask <= dma_enc_mask) > + return true; > >>> > >>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it > >>> means that device mask is wide enough to cover encryption bit, doesn't it? > >> > >> Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's say > >> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) > >> will generate a mask without bit 47 set (0x7fff). So the check > >> will catch anything that does not support at least 48-bit DMA. > > > > Couldn't that be expressed as just: > > > > if (sme_me_mask & dma_dev_mask == sme_me_mask) > > Actually !=, but yes, it could have been done like that, I just didn't > think of it. I'm looking into generalizing the check to cover MKTME. Leaving off the Kconfig changes and moving the check to other file, doest the change below look reasonable to you. It's only build tested so far. diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c index fece30ca8b0c..6c86adcd02da 100644 --- a/arch/x86/mm/mem_encrypt.c +++ b/arch/x86/mm/mem_encrypt.c @@ -355,6 +355,8 @@ EXPORT_SYMBOL(sev_active); /* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ bool force_dma_unencrypted(struct device *dev) { + u64 dma_enc_mask; + /* * For SEV, all DMA must be to unencrypted addresses. */ @@ -362,18 +364,20 @@ bool force_dma_unencrypted(struct device *dev) return true; /* -* For SME, all DMA must be to unencrypted addresses if the -* device does not support DMA to addresses that include the -* encryption mask. +* For SME and MKTME, all DMA must be to unencrypted addresses if the +* device does not support DMA to addresses that include the encryption +* mask. */ - if (sme_active()) { - u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); - u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, - dev->bus_dma_mask); + if (!sme_active() && !mktme_enabled()) + return false; - if (dma_dev_mask <= dma_enc_mask) - return true; - } + dma_enc_mask = sme_me_mask | mktme_keyid_mask(); + + if (dev->coherent_dma_mask && (dev->coherent_dma_mask & dma_enc_mask) != dma_enc_mask) + return true; + + if (dev->bus_dma_mask && (dev->bus_dma_mask & dma_enc_mask) != dma_enc_mask) + return true; return false; } -- Kirill A. Shutemov
Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs
On Wed, Jul 24, 2019 at 07:23:50PM +0200, Nicolas Saenz Julienne wrote: > Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb > instead of arm_dma_ops altogether? Nothing fundamental. We just need to do a very careful piecemeal migration as the arm code handles a ot of interesting corner case and we need to ensure we don't break that. I have various WIP patches for the easier bits and we can work from there.
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On 7/24/19 12:06 PM, Robin Murphy wrote: > On 24/07/2019 17:42, Lendacky, Thomas wrote: >> On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: >>> On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: @@ -351,6 +355,32 @@ bool sev_active(void) } EXPORT_SYMBOL(sev_active); +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ +bool force_dma_unencrypted(struct device *dev) +{ + /* + * For SEV, all DMA must be to unencrypted addresses. + */ + if (sev_active()) + return true; + + /* + * For SME, all DMA must be to unencrypted addresses if the + * device does not support DMA to addresses that include the + * encryption mask. + */ + if (sme_active()) { + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, + dev->bus_dma_mask); + + if (dma_dev_mask <= dma_enc_mask) + return true; >>> >>> Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it >>> means that device mask is wide enough to cover encryption bit, doesn't it? >> >> Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's say >> that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) >> will generate a mask without bit 47 set (0x7fff). So the check >> will catch anything that does not support at least 48-bit DMA. > > Couldn't that be expressed as just: > > if (sme_me_mask & dma_dev_mask == sme_me_mask) Actually !=, but yes, it could have been done like that, I just didn't think of it. Thanks, Tom > > ? > > Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs
On Tue, 2019-07-09 at 07:20 -0700, Christoph Hellwig wrote: > The DMA API requires that 32-bit DMA masks are always supported, but on > arm LPAE configs they do not currently work when memory is present > above 4GB. Wire up the swiotlb code like for all other architectures > to provide the bounce buffering in that case. > > Fixes: 21e07dba9fb11 ("scsi: reduce use of block bounce buffers"). > Reported-by: Roger Quadros > Signed-off-by: Christoph Hellwig > --- Hi Chistoph, Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb instead of arm_dma_ops altogether? Regards, Nicolas signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma-mapping fix for 5.3
The pull request you sent on Wed, 24 Jul 2019 16:49:42 +0200: > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/c2626876c24fe1f326381e3f1d48301bfc627d8e Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On 24/07/2019 17:42, Lendacky, Thomas wrote: On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: @@ -351,6 +355,32 @@ bool sev_active(void) } EXPORT_SYMBOL(sev_active); +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED */ +bool force_dma_unencrypted(struct device *dev) +{ + /* +* For SEV, all DMA must be to unencrypted addresses. +*/ + if (sev_active()) + return true; + + /* +* For SME, all DMA must be to unencrypted addresses if the +* device does not support DMA to addresses that include the +* encryption mask. +*/ + if (sme_active()) { + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, + dev->bus_dma_mask); + + if (dma_dev_mask <= dma_enc_mask) + return true; Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it means that device mask is wide enough to cover encryption bit, doesn't it? Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's say that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) will generate a mask without bit 47 set (0x7fff). So the check will catch anything that does not support at least 48-bit DMA. Couldn't that be expressed as just: if (sme_me_mask & dma_dev_mask == sme_me_mask) ? Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On 7/24/19 10:55 AM, Kirill A. Shutemov wrote: > On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: >> @@ -351,6 +355,32 @@ bool sev_active(void) >> } >> EXPORT_SYMBOL(sev_active); >> >> +/* Override for DMA direct allocation check - >> ARCH_HAS_FORCE_DMA_UNENCRYPTED */ >> +bool force_dma_unencrypted(struct device *dev) >> +{ >> +/* >> + * For SEV, all DMA must be to unencrypted addresses. >> + */ >> +if (sev_active()) >> +return true; >> + >> +/* >> + * For SME, all DMA must be to unencrypted addresses if the >> + * device does not support DMA to addresses that include the >> + * encryption mask. >> + */ >> +if (sme_active()) { >> +u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); >> +u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, >> +dev->bus_dma_mask); >> + >> +if (dma_dev_mask <= dma_enc_mask) >> +return true; > > Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it > means that device mask is wide enough to cover encryption bit, doesn't it? Not really... it's the way DMA_BIT_MASK works vs bit numbering. Let's say that sme_me_mask has bit 47 set. __ffs64 returns 47 and DMA_BIT_MASK(47) will generate a mask without bit 47 set (0x7fff). So the check will catch anything that does not support at least 48-bit DMA. Thanks, Tom > >> +} >> + >> +return false; >> +} > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: add support for drivers that manage iommu explicitly
On Wed, Jul 24, 2019 at 3:51 AM Will Deacon wrote: > > On Tue, Jul 23, 2019 at 10:40:55AM -0700, Rob Clark wrote: > > On Tue, Jul 23, 2019 at 8:38 AM Will Deacon wrote: > > > > > > On Mon, Jul 22, 2019 at 09:23:48AM -0700, Rob Clark wrote: > > > > On Mon, Jul 22, 2019 at 8:48 AM Joerg Roedel wrote: > > > > > > > > > > On Mon, Jul 22, 2019 at 08:41:34AM -0700, Rob Clark wrote: > > > > > > It is set by the driver: > > > > > > > > > > > > https://patchwork.freedesktop.org/patch/315291/ > > > > > > > > > > > > (This doesn't really belong in devicetree, since it isn't a > > > > > > description of the hardware, so the driver is really the only place > > > > > > to > > > > > > set this.. which is fine because it is about a detail of how the > > > > > > driver works.) > > > > > > > > > > It is more a detail about how the firmware works. IIUC the problem is > > > > > that the firmware initializes the context mappings for the GPU and the > > > > > OS doesn't know anything about that and just overwrites them, causing > > > > > the firmware GPU driver to fail badly. > > > > > > > > > > So I think it is the task of the firmware to tell the OS not to touch > > > > > the devices mappings until the OS device driver takes over. On x86 > > > > > there > > > > > is something similar with the RMRR/unity-map tables from the firmware. > > > > > > > > > > > > > Bjorn had a patchset[1] to inherit the config from firmware/bootloader > > > > when arm-smmu is probed which handles that part of the problem. My > > > > patch is intended to be used on top of his patchset. This seems to me > > > > like the best solution, if we don't have control over the firmware. > > > > > > Hmm, but the feedback from Robin on the thread you cite was that this > > > should > > > be generalised to look more like RMRR, so there seems to be a clear > > > message > > > here. > > > > > > > Perhaps it is a lack of creativity, or lack of familiarity w/ iommu vs > > virtualization, but I'm not quite seeing how RMRR would help.. in > > particular when dealing with both DT and ACPI cases. > > Well, I suppose we'd have something for DT and something for ACPI and we'd > try to make them look similar enough that we don't need lots of divergent > code in the kernel. The RMRR-like description would describe that, for a > particular device, a specific virt:phys mapping needs to exist in the > small window between initialising the SMMU and re-initialising the device > (GPU). For both DT and ACPI (or perhaps more accurately UEFI and non-UEFI) we often don't have much/any control of the firmware. In the UEFI case, we can at least get the physical address of the scanout buffer from EFI GOP (since VA=PA at that point). In either case we could get the iova by reading back controller state. We kinda just need the iommu driver to not change anything about the context bank the display is using until the display driver is ready to install it's own pagetables. Initially I just want to shut down display, and then bring it back up w/ my own pagetables.. but eventually, once iommu/clk/genpd issues are sorted upstream, I'm planning to read out more completely the display state, and remap the existing scanout buffer at same iova in my own pagetables/iommu_domain, for a flicker-free display handover... that is a lot more work but at least it is self contained in the display (and bridge/panel) drivers. > I would prefer this to be framebuffer-specific, since we're already in > flagrant violation of the arm64 boot requirements wrt ongoing DMA and making > this too general could lead to all sorts of brain damage. That would > probably also allow us to limit the flexibility, by mandating things like > alignment and memory attributes. I'd be pretty happy if we could convince qcom to use EFI GOP on android devices too.. Although there is a lot more activity these days with people bringing upstream support to various existing android phones/tablets.. and I'd like to see that continue without downstream hacks due to lack of control over firmware. > Having said that, I just realised I'm still a bit confused about the > problem: why does the bootloader require SMMU translation for the GPU at > all? If we could limit this whole thing to be identity-mapped/bypassed, > then all we need is a per-device flag in the firmware to indicate that the > SMMU should be initialised to allow passthrough for transactions originating > from that device. I was chatting last night w/ Bjorn on IRC.. and he mentioned that it looked like TTBRn was 0x0. Which I wasn't expecting, and I didn't realize was a legit thing. Maybe the purpose is to allow display to access memory w/ iova==pa but disallow memory access from other devices using different context banks of the same iommu? Maybe this makes more sense to you? > > So I kinda prefer, when possible, if arm-smmu can figure out what is going > > on by looking at the hw state at boot (since that approach would work > > equally well for DT and ACPI).
Re: [PATCH] dma-direct: Force unencrypted DMA under SME for certain DMA masks
On Wed, Jul 10, 2019 at 07:01:19PM +, Lendacky, Thomas wrote: > @@ -351,6 +355,32 @@ bool sev_active(void) > } > EXPORT_SYMBOL(sev_active); > > +/* Override for DMA direct allocation check - ARCH_HAS_FORCE_DMA_UNENCRYPTED > */ > +bool force_dma_unencrypted(struct device *dev) > +{ > + /* > + * For SEV, all DMA must be to unencrypted addresses. > + */ > + if (sev_active()) > + return true; > + > + /* > + * For SME, all DMA must be to unencrypted addresses if the > + * device does not support DMA to addresses that include the > + * encryption mask. > + */ > + if (sme_active()) { > + u64 dma_enc_mask = DMA_BIT_MASK(__ffs64(sme_me_mask)); > + u64 dma_dev_mask = min_not_zero(dev->coherent_dma_mask, > + dev->bus_dma_mask); > + > + if (dma_dev_mask <= dma_enc_mask) > + return true; Hm. What is wrong with the dev mask being equal to enc mask? IIUC, it means that device mask is wide enough to cover encryption bit, doesn't it? > + } > + > + return false; > +} -- Kirill A. Shutemov
Re: add swiotlb support to arm32
FYI, I've tentatively pulled the branch into the dma-mapping for-next branch so that we get some linux-next exposure. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] dma-mapping fix for 5.3
The following changes since commit 7b5cf701ea9c395c792e2a7e3b7caf4c68b87721: Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2019-07-22 09:30:34 -0700) are available in the Git repository at: git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.3-2 for you to fetch changes up to 06532750010e06dd4b6d69983773677df7fc5291: dma-mapping: use dma_get_mask in dma_addressing_limited (2019-07-23 17:43:58 +0200) dma-mapping regression fix for 5.3 - ensure that dma_addressing_limited doesn't crash on devices without a dma mask (Eric Auger) Eric Auger (1): dma-mapping: use dma_get_mask in dma_addressing_limited include/linux/dma-mapping.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
On Wed, Jul 24, 2019 at 03:25:07PM +0100, John Garry wrote: > On 24/07/2019 13:20, Will Deacon wrote: > > On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote: > > > On 11/07/2019 18:19, Will Deacon wrote: > > > > This is a significant rework of the RFC I previously posted here: > > > > > > > > https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com > > > > > > > > But this time, it looks like it might actually be worthwhile according > > > > to my perf profiles, where __iommu_unmap() falls a long way down the > > > > profile for a multi-threaded netperf run. I'm still relying on others to > > > > confirm this is useful, however. > > > > > > > > Some of the changes since last time are: > > > > > > > > * Support for constructing and submitting a list of commands in the > > > > driver > > > > > > > > * Numerous changes to the IOMMU and io-pgtable APIs so that we can > > > > submit commands in batches > > > > > > > > * Removal of cmpxchg() from cmdq_shared_lock() fast-path > > > > > > > > * Code restructuring and cleanups > > > > > > > > This current applies against my iommu/devel branch that Joerg has pulled > > > > for 5.3. If you want to test it out, I've put everything here: > > > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq > > > > > > > > Feedback welcome. I appreciate that we're in the merge window, but I > > > > wanted to get this on the list for people to look at as an RFC. > > > > > > > > > > I tested storage performance on this series, which I think is a better > > > scenario to test than network performance, that being generally limited by > > > the network link speed. > > > > Interesting, thanks for sharing. Do you also see a similar drop in CPU time > > to the one reported by Ganapat? > > Not really, CPU load reported by fio is mostly the same. That's a pity. Maybe the cmdq isn't actually getting hit very heavily by fio. > > > Baseline performance (will/iommu/devel, commit 9e6ea59f3) > > > 8x SAS disks D05 839K IOPS > > > 1x NVMe D05 454K IOPS > > > 1x NVMe D06 442k IOPS > > > > > > Patchset performance (will/iommu/cmdq) > > > 8x SAS disk D05 835K IOPS > > > 1x NVMe D05 472K IOPS > > > 1x NVMe D06 459k IOPS > > > > > > So we see a bit of an NVMe boost, but about the same for 8x disks. No > > > iommu > > > performance is about 918K IOPs for 8x disks, so it is not limited by the > > > medium. > > > > It would be nice to know if this performance gap is because of Linux, or > > simply because of the translation overhead in the SMMU hardware. Are you > > able to get a perf profile to see where we're spending time? > > I'll look to do that, but I'd really expect it to be down to the time linux > spends on the DMA map and unmaps. Right, and it would be good to see how much of that is in SMMUv3-specific code. Another interesting thing to try would be reducing the depth of the io-pgtable. We currently key that off VA_BITS which may be much larger than you need (by virtue of being a compile-time value). Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
Hi again, John, On Wed, Jul 24, 2019 at 09:20:49AM +0100, John Garry wrote: > On 11/07/2019 18:19, Will Deacon wrote: > > +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > + u64 *cmds, int n, bool sync) > > +{ > > + u64 cmd_sync[CMDQ_ENT_DWORDS]; > > + u32 prod; > > unsigned long flags; > > - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); > > - struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; > > - int ret; > > + bool owner; > > + struct arm_smmu_cmdq *cmdq = &smmu->cmdq; > > + struct arm_smmu_ll_queue llq = { > > + .max_n_shift = cmdq->q.llq.max_n_shift, > > + }, head = llq; > > + int ret = 0; > > > > - arm_smmu_cmdq_build_cmd(cmd, &ent); > > + /* 1. Allocate some space in the queue */ > > + local_irq_save(flags); > > + llq.val = READ_ONCE(cmdq->q.llq.val); > > + do { > > + u64 old; > > + > > + while (!queue_has_space(&llq, n + sync)) { > > + local_irq_restore(flags); > > + if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) > > + dev_err_ratelimited(smmu->dev, "CMDQ > > timeout\n"); > > + local_irq_save(flags); > > + } > > + > > + head.cons = llq.cons; > > + head.prod = queue_inc_prod_n(&llq, n + sync) | > > +CMDQ_PROD_OWNED_FLAG; > > + > > + old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); > > I added some basic debug to the stress test on your branch, and this cmpxchg > was failing ~10 times on average on my D06. > > So we're not using the spinlock now, but this cmpxchg may lack fairness. It definitely lacks fairness, although that's going to be the case in many other places where locking is elided as well. If it shows up as an issue, we should try to address it, but queueing means serialisation and that largely defeats the point of this patch. I also don't expect it to be a problem unless the cmpxchg() is heavily contended, which shouldn't be the case if we're batching. > Since we're batching commands, I wonder if it's better to restore the > spinlock and send batched commands + CMD_SYNC under the lock, and then wait > for the CMD_SYNC completion outside the lock. Again, we'd need some numbers, but my concern with that approach is that we're serialising CPUs which is what I've been trying hard to avoid. It also doesn't let you straightforwardly remove the cmpxchg() loop, since the owner clears the OWNED flag and you probably wouldn't want to hold the lock for that. > I don't know if it improves the queue contetion, but at least the prod > pointer would be more closely track the issued commands, such that we're not > waiting to kick off many gathered batches of commands, while the SMMU HW may > be idle (in terms of command processing). Again, probably going to need some numbers to change this, although it sounds like your other suggestion about having the owner move prod twice would largely address your concerns. Reintroducing the lock, on the other hand, feels like a big step backwards to me, and the whole reason I started down the current route was because of vague claims that the locking was a problem for large systems. Thanks, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
24.07.2019 17:23, Robin Murphy пишет: > On 24/07/2019 15:09, Dmitry Osipenko wrote: >> 24.07.2019 17:03, Yuehaibing пишет: >>> On 2019/7/24 21:49, Robin Murphy wrote: On 24/07/2019 11:30, Sakari Ailus wrote: > Hi Yue, > > On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: >> If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. >> But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" >> in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but >> IOMMU_IOVA >> is m, then the building fails like this: >> >> drivers/staging/media/tegra-vde/iommu.o: In function >> `tegra_vde_iommu_map': >> iommu.c:(.text+0x41): undefined reference to `alloc_iova' >> iommu.c:(.text+0x56): undefined reference to `__free_iova' >> >> Reported-by: Hulk Robot >> Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top >> level pci device driver") >> Signed-off-by: YueHaibing >> --- >> drivers/staging/media/ipu3/Kconfig | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/staging/media/ipu3/Kconfig >> b/drivers/staging/media/ipu3/Kconfig >> index 4b51c67..b7df18f 100644 >> --- a/drivers/staging/media/ipu3/Kconfig >> +++ b/drivers/staging/media/ipu3/Kconfig >> @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU >> depends on PCI && VIDEO_V4L2 >> depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API >> depends on X86 >> - select IOMMU_IOVA >> + select IOMMU_IOVA if IOMMU_SUPPORT > > This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA > independently of IOMMU_SUPPORT. > > Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but > that's not > declared in its Kconfig entry. I wonder if adding that would be the > right > way to fix this. > > Cc'ing the IOMMU list. >> IOMMU_SUPPORT is optional for the Tegra-VDE driver. >> Right, I also had the impression that we'd made the IOVA library completely standalone. And what does the IPU3 driver's Kconfig have to do with some *other* driver failing to link anyway? >> >> I can see it failing if IPU3 is compiled as a loadable module, while >> Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel >> module and thus the IOVA symbols will be missing during of linkage of >> the VDE driver. >> >>> Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank >>> you for clarification. >>> >>> I will try to fix this in tegra-vde. >> >> Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA >> library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled. > > Oh, I think I get the problem now - tegra-vde/iommu.c is built > unconditionally and relies on the static inline stubs for IOMMU and IOVA > calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for > other reasons, it then picks up the real declarations from linux/iova.h > instead of the stubs, and things go downhill from there. So there is a > real issue, but indeed it's Tegra-VDE which needs to be restructured to > cope with such configurations, and not IPU3's (or anyone else who may > select IOVA=m in future) job to work around it. I guess it could be: select IOMMU_IOVA if (IOMMU_SUPPORT || COMPILE_TEST) as a workaround. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 3/4] dma-direct: add dma_direct_min_mask
On Wed, 2019-07-24 at 15:56 +0200, Christoph Hellwig wrote: > On Wed, Jul 24, 2019 at 02:51:24PM +0100, Catalin Marinas wrote: > > I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on > > arm64. ZONE_DMA would be based on the smallest dma-ranges as described > > in the DT while DMA32 covers the first naturally aligned 4GB of RAM > > (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded > > to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as > > 0-bytes? I don't think GFP_DMA can still allocate memory in this case). > > > > We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something > > smaller than 32-bit but sufficient to cover the known platforms like > > RPi4 (the current 24 is too small, so maybe 30). AFAICT, > > __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32 > > should be passed. > > ARCH_ZONE_DMA_BITS should probably become a variable. That way we can > just initialize it to the default 24 bits in kernel/dma/direct.c and > allow architectures to override it in their early boot code. Thanks both for your feedback. I'll start preparing a proper series. signature.asc Description: This is a digitally signed message part ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
On 24/07/2019 13:20, Will Deacon wrote: On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote: On 11/07/2019 18:19, Will Deacon wrote: This is a significant rework of the RFC I previously posted here: https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com But this time, it looks like it might actually be worthwhile according to my perf profiles, where __iommu_unmap() falls a long way down the profile for a multi-threaded netperf run. I'm still relying on others to confirm this is useful, however. Some of the changes since last time are: * Support for constructing and submitting a list of commands in the driver * Numerous changes to the IOMMU and io-pgtable APIs so that we can submit commands in batches * Removal of cmpxchg() from cmdq_shared_lock() fast-path * Code restructuring and cleanups This current applies against my iommu/devel branch that Joerg has pulled for 5.3. If you want to test it out, I've put everything here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq Feedback welcome. I appreciate that we're in the merge window, but I wanted to get this on the list for people to look at as an RFC. I tested storage performance on this series, which I think is a better scenario to test than network performance, that being generally limited by the network link speed. Interesting, thanks for sharing. Do you also see a similar drop in CPU time to the one reported by Ganapat? Not really, CPU load reported by fio is mostly the same. Baseline performance (will/iommu/devel, commit 9e6ea59f3) 8x SAS disks D05839K IOPS 1x NVMe D05 454K IOPS 1x NVMe D06 442k IOPS Patchset performance (will/iommu/cmdq) 8x SAS disk D05 835K IOPS 1x NVMe D05 472K IOPS 1x NVMe D06 459k IOPS So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu performance is about 918K IOPs for 8x disks, so it is not limited by the medium. It would be nice to know if this performance gap is because of Linux, or simply because of the translation overhead in the SMMU hardware. Are you able to get a perf profile to see where we're spending time? I'll look to do that, but I'd really expect it to be down to the time linux spends on the DMA map and unmaps. Cheers, john Thanks, Will . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
On 24/07/2019 15:09, Dmitry Osipenko wrote: 24.07.2019 17:03, Yuehaibing пишет: On 2019/7/24 21:49, Robin Murphy wrote: On 24/07/2019 11:30, Sakari Ailus wrote: Hi Yue, On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA is m, then the building fails like this: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Reported-by: Hulk Robot Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device driver") Signed-off-by: YueHaibing --- drivers/staging/media/ipu3/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/ipu3/Kconfig b/drivers/staging/media/ipu3/Kconfig index 4b51c67..b7df18f 100644 --- a/drivers/staging/media/ipu3/Kconfig +++ b/drivers/staging/media/ipu3/Kconfig @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU depends on PCI && VIDEO_V4L2 depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API depends on X86 -select IOMMU_IOVA +select IOMMU_IOVA if IOMMU_SUPPORT This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA independently of IOMMU_SUPPORT. Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not declared in its Kconfig entry. I wonder if adding that would be the right way to fix this. Cc'ing the IOMMU list. IOMMU_SUPPORT is optional for the Tegra-VDE driver. Right, I also had the impression that we'd made the IOVA library completely standalone. And what does the IPU3 driver's Kconfig have to do with some *other* driver failing to link anyway? I can see it failing if IPU3 is compiled as a loadable module, while Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel module and thus the IOVA symbols will be missing during of linkage of the VDE driver. Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for clarification. I will try to fix this in tegra-vde. Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled. Oh, I think I get the problem now - tegra-vde/iommu.c is built unconditionally and relies on the static inline stubs for IOMMU and IOVA calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for other reasons, it then picks up the real declarations from linux/iova.h instead of the stubs, and things go downhill from there. So there is a real issue, but indeed it's Tegra-VDE which needs to be restructured to cope with such configurations, and not IPU3's (or anyone else who may select IOVA=m in future) job to work around it. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
24.07.2019 17:03, Yuehaibing пишет: > On 2019/7/24 21:49, Robin Murphy wrote: >> On 24/07/2019 11:30, Sakari Ailus wrote: >>> Hi Yue, >>> >>> On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA is m, then the building fails like this: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Reported-by: Hulk Robot Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device driver") Signed-off-by: YueHaibing --- drivers/staging/media/ipu3/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/ipu3/Kconfig b/drivers/staging/media/ipu3/Kconfig index 4b51c67..b7df18f 100644 --- a/drivers/staging/media/ipu3/Kconfig +++ b/drivers/staging/media/ipu3/Kconfig @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU depends on PCI && VIDEO_V4L2 depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API depends on X86 -select IOMMU_IOVA +select IOMMU_IOVA if IOMMU_SUPPORT >>> >>> This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA >>> independently of IOMMU_SUPPORT. >>> >>> Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not >>> declared in its Kconfig entry. I wonder if adding that would be the right >>> way to fix this. >>> >>> Cc'ing the IOMMU list. IOMMU_SUPPORT is optional for the Tegra-VDE driver. >> Right, I also had the impression that we'd made the IOVA library completely >> standalone. And what does the IPU3 driver's Kconfig have to do with some >> *other* driver failing to link anyway? I can see it failing if IPU3 is compiled as a loadable module, while Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel module and thus the IOVA symbols will be missing during of linkage of the VDE driver. > Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for > clarification. > > I will try to fix this in tegra-vde. Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled.
Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
On Wed, Jul 24, 2019 at 03:03:20PM +0100, John Garry wrote: > On 24/07/2019 13:15, Will Deacon wrote: > > > Could it be a minor optimisation to advance the HW producer pointer at > > > this > > > stage for the owner only? We know that its entries are written, and it > > > should be first in the new batch of commands (right?), so we could advance > > > the pointer to at least get the HW started. > > > > I think that would be a valid thing to do, but it depends on the relative > > cost of writing to prod compared to how long we're likely to wait. Given > > that everybody has irqs disabled when writing out their commands, I wouldn't > > expect the waiting to be a big issue, > > For sure, but I'm thinking of the possible scenario where the the guy(s) > we're waiting on have many more commands. Or they just joined the current > gathering quite late, just prior to clearing the owner flag. Understood, but a "cacheable" memcpy (assuming the SMMU is coherent) should be pretty quick, even for maximum batch size I think. > although we could probably optimise > > arm_smmu_cmdq_write_entries() into a memcpy() if we needed to. > > > > In other words, I think we need numbers to justify that change. > > Anyway, this is quite minor, and I will see if the change could be justified > by numbers. Thanks! If the numbers show it's useful, we can definitely add it. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
On 2019/7/24 21:49, Robin Murphy wrote: > On 24/07/2019 11:30, Sakari Ailus wrote: >> Hi Yue, >> >> On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: >>> If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. >>> But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" >>> in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA >>> is m, then the building fails like this: >>> >>> drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': >>> iommu.c:(.text+0x41): undefined reference to `alloc_iova' >>> iommu.c:(.text+0x56): undefined reference to `__free_iova' >>> >>> Reported-by: Hulk Robot >>> Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci >>> device driver") >>> Signed-off-by: YueHaibing >>> --- >>> drivers/staging/media/ipu3/Kconfig | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/staging/media/ipu3/Kconfig >>> b/drivers/staging/media/ipu3/Kconfig >>> index 4b51c67..b7df18f 100644 >>> --- a/drivers/staging/media/ipu3/Kconfig >>> +++ b/drivers/staging/media/ipu3/Kconfig >>> @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU >>> depends on PCI && VIDEO_V4L2 >>> depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API >>> depends on X86 >>> -select IOMMU_IOVA >>> +select IOMMU_IOVA if IOMMU_SUPPORT >> >> This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA >> independently of IOMMU_SUPPORT. >> >> Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not >> declared in its Kconfig entry. I wonder if adding that would be the right >> way to fix this. >> >> Cc'ing the IOMMU list. > > Right, I also had the impression that we'd made the IOVA library completely > standalone. And what does the IPU3 driver's Kconfig have to do with some > *other* driver failing to link anyway? Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for clarification. I will try to fix this in tegra-vde. > > Robin. > >> >>> select VIDEOBUF2_DMA_SG >>> help >>> This is the Video4Linux2 driver for Intel IPU3 image processing >>> unit, >> > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
On 24/07/2019 13:15, Will Deacon wrote: Hi John, Thanks for reading the code! On Fri, Jul 19, 2019 at 12:04:15PM +0100, John Garry wrote: +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64 cmd_sync[CMDQ_ENT_DWORDS]; + u32 prod; unsigned long flags; - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); - struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; - int ret; + bool owner; + struct arm_smmu_cmdq *cmdq = &smmu->cmdq; + struct arm_smmu_ll_queue llq = { + .max_n_shift = cmdq->q.llq.max_n_shift, + }, head = llq; + int ret = 0; - arm_smmu_cmdq_build_cmd(cmd, &ent); + /* 1. Allocate some space in the queue */ + local_irq_save(flags); + llq.val = READ_ONCE(cmdq->q.llq.val); + do { + u64 old; + + while (!queue_has_space(&llq, n + sync)) { + local_irq_restore(flags); + if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) + dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); + local_irq_save(flags); + } + + head.cons = llq.cons; + head.prod = queue_inc_prod_n(&llq, n + sync) | +CMDQ_PROD_OWNED_FLAG; + + old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); + if (old == llq.val) + break; + + llq.val = old; + } while (1); + owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG); + + /* +* 2. Write our commands into the queue +* Dependency ordering from the cmpxchg() loop above. +*/ + arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n); + if (sync) { + prod = queue_inc_prod_n(&llq, n); + arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod); + queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS); + + /* +* In order to determine completion of our CMD_SYNC, we must +* ensure that the queue can't wrap twice without us noticing. +* We achieve that by taking the cmdq lock as shared before +* marking our slot as valid. +*/ + arm_smmu_cmdq_shared_lock(cmdq); + } + + /* 3. Mark our slots as valid, ensuring commands are visible first */ + dma_wmb(); + prod = queue_inc_prod_n(&llq, n + sync); + arm_smmu_cmdq_set_valid_map(cmdq, llq.prod, prod); + + /* 4. If we are the owner, take control of the SMMU hardware */ + if (owner) { + /* a. Wait for previous owner to finish */ + atomic_cond_read_relaxed(&cmdq->owner_prod, VAL == llq.prod); + + /* b. Stop gathering work by clearing the owned flag */ + prod = atomic_fetch_andnot_relaxed(CMDQ_PROD_OWNED_FLAG, + &cmdq->q.llq.atomic.prod); + prod &= ~CMDQ_PROD_OWNED_FLAG; + head.prod &= ~CMDQ_PROD_OWNED_FLAG; + Could it be a minor optimisation to advance the HW producer pointer at this stage for the owner only? We know that its entries are written, and it should be first in the new batch of commands (right?), so we could advance the pointer to at least get the HW started. I think that would be a valid thing to do, but it depends on the relative cost of writing to prod compared to how long we're likely to wait. Given that everybody has irqs disabled when writing out their commands, I wouldn't expect the waiting to be a big issue, For sure, but I'm thinking of the possible scenario where the the guy(s) we're waiting on have many more commands. Or they just joined the current gathering quite late, just prior to clearing the owner flag. although we could probably optimise arm_smmu_cmdq_write_entries() into a memcpy() if we needed to. In other words, I think we need numbers to justify that change. Anyway, this is quite minor, and I will see if the change could be justified by numbers. Cheers, John Thanks, Will . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 3/4] dma-direct: add dma_direct_min_mask
On Wed, Jul 24, 2019 at 02:51:24PM +0100, Catalin Marinas wrote: > I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on > arm64. ZONE_DMA would be based on the smallest dma-ranges as described > in the DT while DMA32 covers the first naturally aligned 4GB of RAM > (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded > to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as > 0-bytes? I don't think GFP_DMA can still allocate memory in this case). > > We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something > smaller than 32-bit but sufficient to cover the known platforms like > RPi4 (the current 24 is too small, so maybe 30). AFAICT, > __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32 > should be passed. ARCH_ZONE_DMA_BITS should probably become a variable. That way we can just initialize it to the default 24 bits in kernel/dma/direct.c and allow architectures to override it in their early boot code.
Re: [RFC 3/4] dma-direct: add dma_direct_min_mask
On Fri, Jul 19, 2019 at 03:08:52PM +0200, Nicolas Saenz Julienne wrote: > On Thu, 2019-07-18 at 13:18 +0200, Nicolas Saenz Julienne wrote: > > On Thu, 2019-07-18 at 11:15 +0200, Christoph Hellwig wrote: > > > On Wed, Jul 17, 2019 at 05:31:34PM +0200, Nicolas Saenz Julienne wrote: > > > > Historically devices with ZONE_DMA32 have been assumed to be able to > > > > address at least the lower 4GB of ram for DMA. This is still the defualt > > > > behavior yet the Raspberry Pi 4 is limited to the first GB of memory. > > > > This has been observed to trigger failures in dma_direct_supported() as > > > > the 'min_mask' isn't properly set. > > > > > > > > We create 'dma_direct_min_mask' in order for the arch init code to be > > > > able to fine-tune dma direct's 'min_dma' mask. > > > > > > Normally we use ZONE_DMA for that case. > > > > Fair enough, I didn't think of that possibility. > > > > So would the arm64 maintainers be happy with something like this: > > > > - ZONE_DMA: Follows standard definition, 16MB in size. ARCH_ZONE_DMA_BITS is > > left as is. > > - ZONE_DMA32: Will honor the most constraining 'dma-ranges'. Which so far > > for > > most devices is 4G, except for RPi4. > > - ZONE_NORMAL: The rest of the memory. > > Never mind this suggestion, I don't think it makes any sense. If anything > arm64 > seems to fit the ZONE_DMA usage pattern of arm and powerpc: where ZONE_DMA's > size is decided based on ram size and/or board configuration. It was actually > set-up like this until Christoph's ad67f5a6545f7 ("arm64: replace ZONE_DMA > with > ZONE_DMA32"). > > So the easy solution would be to simply revert that commit. On one hand I feel > it would be a step backwards as most 64 bit architectures have been moving to > use ZONE_DMA32. On the other, current ZONE_DMA32 usage seems to be heavily > rooted on having a 32 bit DMA mask*, which will no longer be the case on arm64 > if we want to support the RPi 4. > > So the way I see it and lacking a better solution, the argument is stronger on > moving back arm64 to using ZONE_DMA. Any comments/opinions? As it was suggested in this or the previous thread, I'm not keen on limiting ZONE_DMA32 to the smalles RPi4 can cover, as the naming implies this zone should cover 32-bit devices that can deal with a full 32-bit mask. I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on arm64. ZONE_DMA would be based on the smallest dma-ranges as described in the DT while DMA32 covers the first naturally aligned 4GB of RAM (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as 0-bytes? I don't think GFP_DMA can still allocate memory in this case). We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something smaller than 32-bit but sufficient to cover the known platforms like RPi4 (the current 24 is too small, so maybe 30). AFAICT, __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32 should be passed. -- Catalin
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
On 24/07/2019 11:30, Sakari Ailus wrote: Hi Yue, On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA is m, then the building fails like this: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Reported-by: Hulk Robot Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device driver") Signed-off-by: YueHaibing --- drivers/staging/media/ipu3/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/ipu3/Kconfig b/drivers/staging/media/ipu3/Kconfig index 4b51c67..b7df18f 100644 --- a/drivers/staging/media/ipu3/Kconfig +++ b/drivers/staging/media/ipu3/Kconfig @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU depends on PCI && VIDEO_V4L2 depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API depends on X86 - select IOMMU_IOVA + select IOMMU_IOVA if IOMMU_SUPPORT This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA independently of IOMMU_SUPPORT. Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not declared in its Kconfig entry. I wonder if adding that would be the right way to fix this. Cc'ing the IOMMU list. Right, I also had the impression that we'd made the IOVA library completely standalone. And what does the IPU3 driver's Kconfig have to do with some *other* driver failing to link anyway? Robin. select VIDEOBUF2_DMA_SG help This is the Video4Linux2 driver for Intel IPU3 image processing unit,
Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
On Fri, Jul 19, 2019 at 09:55:39AM +0530, Ganapatrao Kulkarni wrote: > On Thu, Jul 11, 2019 at 10:58 PM Will Deacon wrote: > > This is a significant rework of the RFC I previously posted here: > > > > https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com > > > > But this time, it looks like it might actually be worthwhile according > > to my perf profiles, where __iommu_unmap() falls a long way down the > > profile for a multi-threaded netperf run. I'm still relying on others to > > confirm this is useful, however. > > > > Some of the changes since last time are: > > > > * Support for constructing and submitting a list of commands in the > > driver > > > > * Numerous changes to the IOMMU and io-pgtable APIs so that we can > > submit commands in batches > > > > * Removal of cmpxchg() from cmdq_shared_lock() fast-path > > > > * Code restructuring and cleanups > > > > This current applies against my iommu/devel branch that Joerg has pulled > > for 5.3. If you want to test it out, I've put everything here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq > > > > Feedback welcome. I appreciate that we're in the merge window, but I > > wanted to get this on the list for people to look at as an RFC. > > I have tried branch iommu/cmdq on ThunderX2. I do see there is drastic > reduction in CPU bandwidth consumption(from 15 to 20% to 1 to 2% in > perf top) from SMMU CMDQ helper functions, when I run iperf with more > than 64 clients(-P 64). However I have not noticed any measurable > performance improvement in iperf results. IMO, this might/should help > in performance improvement of IO intensive workloads. > > FWIW, you can add, > Tested-by: Ganapatrao Kulkarni Brilliant, thanks. Your measurements reflect mine in that I can saturate the NICs I have access to regardless of these changes, but the CPU time is drastically reduced. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
On Wed, Jul 24, 2019 at 10:58:26AM +0100, John Garry wrote: > On 11/07/2019 18:19, Will Deacon wrote: > > This is a significant rework of the RFC I previously posted here: > > > > https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com > > > > But this time, it looks like it might actually be worthwhile according > > to my perf profiles, where __iommu_unmap() falls a long way down the > > profile for a multi-threaded netperf run. I'm still relying on others to > > confirm this is useful, however. > > > > Some of the changes since last time are: > > > > * Support for constructing and submitting a list of commands in the > > driver > > > > * Numerous changes to the IOMMU and io-pgtable APIs so that we can > > submit commands in batches > > > > * Removal of cmpxchg() from cmdq_shared_lock() fast-path > > > > * Code restructuring and cleanups > > > > This current applies against my iommu/devel branch that Joerg has pulled > > for 5.3. If you want to test it out, I've put everything here: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq > > > > Feedback welcome. I appreciate that we're in the merge window, but I > > wanted to get this on the list for people to look at as an RFC. > > > > I tested storage performance on this series, which I think is a better > scenario to test than network performance, that being generally limited by > the network link speed. Interesting, thanks for sharing. Do you also see a similar drop in CPU time to the one reported by Ganapat? > Baseline performance (will/iommu/devel, commit 9e6ea59f3) > 8x SAS disks D05 839K IOPS > 1x NVMe D05 454K IOPS > 1x NVMe D06 442k IOPS > > Patchset performance (will/iommu/cmdq) > 8x SAS disk D05 835K IOPS > 1x NVMe D05 472K IOPS > 1x NVMe D06 459k IOPS > > So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu > performance is about 918K IOPs for 8x disks, so it is not limited by the > medium. It would be nice to know if this performance gap is because of Linux, or simply because of the translation overhead in the SMMU hardware. Are you able to get a perf profile to see where we're spending time? Thanks, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
Hi John, Thanks for reading the code! On Fri, Jul 19, 2019 at 12:04:15PM +0100, John Garry wrote: > > +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > + u64 *cmds, int n, bool sync) > > +{ > > + u64 cmd_sync[CMDQ_ENT_DWORDS]; > > + u32 prod; > > unsigned long flags; > > - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); > > - struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; > > - int ret; > > + bool owner; > > + struct arm_smmu_cmdq *cmdq = &smmu->cmdq; > > + struct arm_smmu_ll_queue llq = { > > + .max_n_shift = cmdq->q.llq.max_n_shift, > > + }, head = llq; > > + int ret = 0; > > > > - arm_smmu_cmdq_build_cmd(cmd, &ent); > > + /* 1. Allocate some space in the queue */ > > + local_irq_save(flags); > > + llq.val = READ_ONCE(cmdq->q.llq.val); > > + do { > > + u64 old; > > + > > + while (!queue_has_space(&llq, n + sync)) { > > + local_irq_restore(flags); > > + if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) > > + dev_err_ratelimited(smmu->dev, "CMDQ > > timeout\n"); > > + local_irq_save(flags); > > + } > > + > > + head.cons = llq.cons; > > + head.prod = queue_inc_prod_n(&llq, n + sync) | > > +CMDQ_PROD_OWNED_FLAG; > > + > > + old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); > > + if (old == llq.val) > > + break; > > + > > + llq.val = old; > > + } while (1); > > + owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG); > > + > > + /* > > +* 2. Write our commands into the queue > > +* Dependency ordering from the cmpxchg() loop above. > > +*/ > > + arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n); > > + if (sync) { > > + prod = queue_inc_prod_n(&llq, n); > > + arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, prod); > > + queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS); > > + > > + /* > > +* In order to determine completion of our CMD_SYNC, we must > > +* ensure that the queue can't wrap twice without us noticing. > > +* We achieve that by taking the cmdq lock as shared before > > +* marking our slot as valid. > > +*/ > > + arm_smmu_cmdq_shared_lock(cmdq); > > + } > > + > > + /* 3. Mark our slots as valid, ensuring commands are visible first */ > > + dma_wmb(); > > + prod = queue_inc_prod_n(&llq, n + sync); > > + arm_smmu_cmdq_set_valid_map(cmdq, llq.prod, prod); > > + > > + /* 4. If we are the owner, take control of the SMMU hardware */ > > + if (owner) { > > + /* a. Wait for previous owner to finish */ > > + atomic_cond_read_relaxed(&cmdq->owner_prod, VAL == llq.prod); > > + > > + /* b. Stop gathering work by clearing the owned flag */ > > + prod = atomic_fetch_andnot_relaxed(CMDQ_PROD_OWNED_FLAG, > > + &cmdq->q.llq.atomic.prod); > > + prod &= ~CMDQ_PROD_OWNED_FLAG; > > + head.prod &= ~CMDQ_PROD_OWNED_FLAG; > > + > > Could it be a minor optimisation to advance the HW producer pointer at this > stage for the owner only? We know that its entries are written, and it > should be first in the new batch of commands (right?), so we could advance > the pointer to at least get the HW started. I think that would be a valid thing to do, but it depends on the relative cost of writing to prod compared to how long we're likely to wait. Given that everybody has irqs disabled when writing out their commands, I wouldn't expect the waiting to be a big issue, although we could probably optimise arm_smmu_cmdq_write_entries() into a memcpy() if we needed to. In other words, I think we need numbers to justify that change. Thanks, Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu: add support for drivers that manage iommu explicitly
On Tue, Jul 23, 2019 at 10:40:55AM -0700, Rob Clark wrote: > On Tue, Jul 23, 2019 at 8:38 AM Will Deacon wrote: > > > > On Mon, Jul 22, 2019 at 09:23:48AM -0700, Rob Clark wrote: > > > On Mon, Jul 22, 2019 at 8:48 AM Joerg Roedel wrote: > > > > > > > > On Mon, Jul 22, 2019 at 08:41:34AM -0700, Rob Clark wrote: > > > > > It is set by the driver: > > > > > > > > > > https://patchwork.freedesktop.org/patch/315291/ > > > > > > > > > > (This doesn't really belong in devicetree, since it isn't a > > > > > description of the hardware, so the driver is really the only place to > > > > > set this.. which is fine because it is about a detail of how the > > > > > driver works.) > > > > > > > > It is more a detail about how the firmware works. IIUC the problem is > > > > that the firmware initializes the context mappings for the GPU and the > > > > OS doesn't know anything about that and just overwrites them, causing > > > > the firmware GPU driver to fail badly. > > > > > > > > So I think it is the task of the firmware to tell the OS not to touch > > > > the devices mappings until the OS device driver takes over. On x86 there > > > > is something similar with the RMRR/unity-map tables from the firmware. > > > > > > > > > > Bjorn had a patchset[1] to inherit the config from firmware/bootloader > > > when arm-smmu is probed which handles that part of the problem. My > > > patch is intended to be used on top of his patchset. This seems to me > > > like the best solution, if we don't have control over the firmware. > > > > Hmm, but the feedback from Robin on the thread you cite was that this should > > be generalised to look more like RMRR, so there seems to be a clear message > > here. > > > > Perhaps it is a lack of creativity, or lack of familiarity w/ iommu vs > virtualization, but I'm not quite seeing how RMRR would help.. in > particular when dealing with both DT and ACPI cases. Well, I suppose we'd have something for DT and something for ACPI and we'd try to make them look similar enough that we don't need lots of divergent code in the kernel. The RMRR-like description would describe that, for a particular device, a specific virt:phys mapping needs to exist in the small window between initialising the SMMU and re-initialising the device (GPU). I would prefer this to be framebuffer-specific, since we're already in flagrant violation of the arm64 boot requirements wrt ongoing DMA and making this too general could lead to all sorts of brain damage. That would probably also allow us to limit the flexibility, by mandating things like alignment and memory attributes. Having said that, I just realised I'm still a bit confused about the problem: why does the bootloader require SMMU translation for the GPU at all? If we could limit this whole thing to be identity-mapped/bypassed, then all we need is a per-device flag in the firmware to indicate that the SMMU should be initialised to allow passthrough for transactions originating from that device. > So I kinda prefer, when possible, if arm-smmu can figure out what is going > on by looking at the hw state at boot (since that approach would work > equally well for DT and ACPI). That's not going to fly. Forcing Linux to infer the state of the system by effectively parsing the hardware configuration directly is fragile, error-prone and may not even be possible in the general case. Worse, if this goes wrong, the symptoms are very likely to be undiagnosable memory corruption, which is pretty awful in my opinion. Not only would you need separate parsing code for every IOMMU out there, but you'd also need to make Linux aware of device aspects that it otherwise doesn't care about, just in case the firmware decided to use them. Furthermore, running an older kernel on newer hardware (which may have some extensions), could cause the parsing to silently ignore parts of the device that indicate memory regions which are in use. On top of that, there made be device-global state that we are unable to reconfigure and that affect devices other than the ones in question. So no, I'm very much against this approach and the solution absolutely needs to come in the form of a more abstract description from firmware. > I *think* (but need to confirm if Bjorn hasn't already) that the > memory for the pagetables that firmware/bootloader sets up is already > removed from the memory map efi passes to kernel, so we don't need to > worry about kernel stomping in-use pagetables. It's precisely this sort of fragility that makes me nervous about this whole approach. Will
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
Hi Yue, On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: > If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. > But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" > in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA > is m, then the building fails like this: > > drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': > iommu.c:(.text+0x41): undefined reference to `alloc_iova' > iommu.c:(.text+0x56): undefined reference to `__free_iova' > > Reported-by: Hulk Robot > Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci > device driver") > Signed-off-by: YueHaibing > --- > drivers/staging/media/ipu3/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/media/ipu3/Kconfig > b/drivers/staging/media/ipu3/Kconfig > index 4b51c67..b7df18f 100644 > --- a/drivers/staging/media/ipu3/Kconfig > +++ b/drivers/staging/media/ipu3/Kconfig > @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU > depends on PCI && VIDEO_V4L2 > depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API > depends on X86 > - select IOMMU_IOVA > + select IOMMU_IOVA if IOMMU_SUPPORT This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA independently of IOMMU_SUPPORT. Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not declared in its Kconfig entry. I wonder if adding that would be the right way to fix this. Cc'ing the IOMMU list. > select VIDEOBUF2_DMA_SG > help > This is the Video4Linux2 driver for Intel IPU3 image processing unit, -- Regards, Sakari Ailus sakari.ai...@linux.intel.com
Re: [RFC PATCH v2 00/19] Try to reduce lock contention on the SMMUv3 command queue
On 11/07/2019 18:19, Will Deacon wrote: Hi everyone, This is a significant rework of the RFC I previously posted here: https://lkml.kernel.org/r/20190611134603.4253-1-will.dea...@arm.com But this time, it looks like it might actually be worthwhile according to my perf profiles, where __iommu_unmap() falls a long way down the profile for a multi-threaded netperf run. I'm still relying on others to confirm this is useful, however. Some of the changes since last time are: * Support for constructing and submitting a list of commands in the driver * Numerous changes to the IOMMU and io-pgtable APIs so that we can submit commands in batches * Removal of cmpxchg() from cmdq_shared_lock() fast-path * Code restructuring and cleanups This current applies against my iommu/devel branch that Joerg has pulled for 5.3. If you want to test it out, I've put everything here: https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/cmdq Feedback welcome. I appreciate that we're in the merge window, but I wanted to get this on the list for people to look at as an RFC. I tested storage performance on this series, which I think is a better scenario to test than network performance, that being generally limited by the network link speed. Results: Baseline performance (will/iommu/devel, commit 9e6ea59f3) 8x SAS disks D05839K IOPS 1x NVMe D05 454K IOPS 1x NVMe D06 442k IOPS Patchset performance (will/iommu/cmdq) 8x SAS disk D05 835K IOPS 1x NVMe D05 472K IOPS 1x NVMe D06 459k IOPS So we see a bit of an NVMe boost, but about the same for 8x disks. No iommu performance is about 918K IOPs for 8x disks, so it is not limited by the medium. The D06 is a bit memory starved, so that may account for generally lower NVMe performance. John Cheers, Will --->8 Cc: Jean-Philippe Brucker Cc: Robin Murphy Cc: Jayachandran Chandrasekharan Nair Cc: Jan Glauber Cc: Jon Masters Cc: Eric Auger Cc: Zhen Lei Cc: Jonathan Cameron Cc: Vijay Kilary Cc: Joerg Roedel Cc: John Garry Cc: Alex Williamson Will Deacon (19): iommu: Remove empty iommu_tlb_range_add() callback from iommu_ops iommu/io-pgtable-arm: Remove redundant call to io_pgtable_tlb_sync() iommu/io-pgtable: Rename iommu_gather_ops to iommu_flush_ops iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes iommu: Introduce iommu_iotlb_gather_add_page() iommu: Pass struct iommu_iotlb_gather to ->unmap() and ->iotlb_sync() iommu/io-pgtable: Introduce tlb_flush_walk() and tlb_flush_leaf() iommu/io-pgtable: Hook up ->tlb_flush_walk() and ->tlb_flush_leaf() in drivers iommu/io-pgtable-arm: Call ->tlb_flush_walk() and ->tlb_flush_leaf() iommu/io-pgtable: Replace ->tlb_add_flush() with ->tlb_add_page() iommu/io-pgtable: Remove unused ->tlb_sync() callback iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->unmap() iommu/io-pgtable: Pass struct iommu_iotlb_gather to ->tlb_add_page() iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue iommu/arm-smmu-v3: Operate directly on low-level queue where possible iommu/arm-smmu-v3: Reduce contention during command-queue insertion iommu/arm-smmu-v3: Defer TLB invalidation until ->iotlb_sync() drivers/gpu/drm/panfrost/panfrost_mmu.c | 24 +- drivers/iommu/amd_iommu.c | 11 +- drivers/iommu/arm-smmu-v3.c | 856 drivers/iommu/arm-smmu.c| 103 +++- drivers/iommu/dma-iommu.c | 9 +- drivers/iommu/exynos-iommu.c| 3 +- drivers/iommu/intel-iommu.c | 3 +- drivers/iommu/io-pgtable-arm-v7s.c | 57 +-- drivers/iommu/io-pgtable-arm.c | 48 +- drivers/iommu/iommu.c | 24 +- drivers/iommu/ipmmu-vmsa.c | 28 +- drivers/iommu/msm_iommu.c | 42 +- drivers/iommu/mtk_iommu.c | 45 +- drivers/iommu/mtk_iommu_v1.c| 3 +- drivers/iommu/omap-iommu.c | 2 +- drivers/iommu/qcom_iommu.c | 44 +- drivers/iommu/rockchip-iommu.c | 2 +- drivers/iommu/s390-iommu.c | 3 +- drivers/iommu/tegra-gart.c | 12 +- drivers/iommu/tegra-smmu.c | 2 +- drivers/vfio/vfio_iommu_type1.c | 27 +- include/linux/io-pgtable.h | 57 ++- include/linux/iommu.h | 92 +++- 23 files changed, 1090 insertions(+), 407 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 18/19] iommu/arm-smmu-v3: Reduce contention during command-queue insertion
On 11/07/2019 18:19, Will Deacon wrote: +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64 cmd_sync[CMDQ_ENT_DWORDS]; + u32 prod; unsigned long flags; - bool wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); - struct arm_smmu_cmdq_ent ent = { .opcode = CMDQ_OP_CMD_SYNC }; - int ret; + bool owner; + struct arm_smmu_cmdq *cmdq = &smmu->cmdq; + struct arm_smmu_ll_queue llq = { + .max_n_shift = cmdq->q.llq.max_n_shift, + }, head = llq; + int ret = 0; - arm_smmu_cmdq_build_cmd(cmd, &ent); + /* 1. Allocate some space in the queue */ + local_irq_save(flags); + llq.val = READ_ONCE(cmdq->q.llq.val); + do { + u64 old; + + while (!queue_has_space(&llq, n + sync)) { + local_irq_restore(flags); + if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) + dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); + local_irq_save(flags); + } + + head.cons = llq.cons; + head.prod = queue_inc_prod_n(&llq, n + sync) | +CMDQ_PROD_OWNED_FLAG; + + old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); I added some basic debug to the stress test on your branch, and this cmpxchg was failing ~10 times on average on my D06. So we're not using the spinlock now, but this cmpxchg may lack fairness. Since we're batching commands, I wonder if it's better to restore the spinlock and send batched commands + CMD_SYNC under the lock, and then wait for the CMD_SYNC completion outside the lock. I don't know if it improves the queue contetion, but at least the prod pointer would be more closely track the issued commands, such that we're not waiting to kick off many gathered batches of commands, while the SMMU HW may be idle (in terms of command processing). Cheers, John + if (old == llq.val) + break; + + llq.val = old; + } while (1); + owner = !(llq.prod & CMDQ_PROD_OWNED_F ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
Hi Joerg, On Wed, Jul 24, 2019 at 09:19:59AM +0200, Joerg Roedel wrote: > On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote: > > static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t > > dma_addr, > > size_t size) > > { > > + struct iommu_iotlb_gather iotlb_gather; > > struct iommu_dma_cookie *cookie = domain->iova_cookie; > > struct iova_domain *iovad = &cookie->iovad; > > size_t iova_off = iova_offset(iovad, dma_addr); > > + size_t unmapped; > > > > dma_addr -= iova_off; > > size = iova_align(iovad, size + iova_off); > > + iommu_iotlb_gather_init(&iotlb_gather); > > + > > + unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather); > > + WARN_ON(unmapped != size); > > > > - WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size); > > if (!cookie->fq_domain) > > - iommu_tlb_sync(domain); > > + iommu_tlb_sync(domain, &iotlb_gather); > > iommu_dma_free_iova(cookie, dma_addr, size); > > I looked through your patches and was wondering if we can't make the > 'struct iotlb_gather' a member of 'struct iommu_domain' and update it > transparently for the user? That would make things easier for users of > the iommu-api. Thanks for having a look. My first (private) attempt at this series did exactly what you suggest, but the problem there is that you can have concurrent, disjoint map()/unmap() operations on the same 'struct iommu_domain', so you end up needing either multiple 'iotlb_gather' structures to support these callers or you end up massively over-invalidating. Worse, you can't just make things per-cpu without disabling preemption, so whatever you do you end up re-introducing synchronization to maintain these things. The /huge/ advantage of getting the caller to allocate the 'iotlb_gather' structure on their stack is that you don't have to worry about object lifetime at all, so all of the synchronization disappears. There is also precedent for this when unmapping things on the CPU side -- see the use of 'struct mmu_gather' and tlb_gather_mmu() in unmap_region() [mm/mmap.c], for example. There aren't tonnes of (direct) users of the iommu-api, and the additional complexity introduced by the 'struct iotlb_gather' only applies to users of iommu_unmap_fast(), which I think is a reasonable trade-off in return for the potential for improved performance. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v2 04/19] iommu: Introduce struct iommu_iotlb_gather for batching TLB flushes
Hi Will, On Thu, Jul 11, 2019 at 06:19:12PM +0100, Will Deacon wrote: > static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t > dma_addr, > size_t size) > { > + struct iommu_iotlb_gather iotlb_gather; > struct iommu_dma_cookie *cookie = domain->iova_cookie; > struct iova_domain *iovad = &cookie->iovad; > size_t iova_off = iova_offset(iovad, dma_addr); > + size_t unmapped; > > dma_addr -= iova_off; > size = iova_align(iovad, size + iova_off); > + iommu_iotlb_gather_init(&iotlb_gather); > + > + unmapped = iommu_unmap_fast(domain, dma_addr, size, &iotlb_gather); > + WARN_ON(unmapped != size); > > - WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size); > if (!cookie->fq_domain) > - iommu_tlb_sync(domain); > + iommu_tlb_sync(domain, &iotlb_gather); > iommu_dma_free_iova(cookie, dma_addr, size); I looked through your patches and was wondering if we can't make the 'struct iotlb_gather' a member of 'struct iommu_domain' and update it transparently for the user? That would make things easier for users of the iommu-api. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu