[PATCH 11/13 v2] ARM: ixp4xx: Drop custom DMA coherency and bouncing

2022-02-11 Thread Linus Walleij
The new PCI driver does not need any of this stuff, so just
drop it.

Cc: iommu@lists.linux-foundation.org
Reviewed-by: Christoph Hellwig 
Signed-off-by: Linus Walleij 
---
ChangeLog v1->v2:
- Pick up Christoph's Reviewed-by and add proper CC for iommu
- Resending with the rest
---
 arch/arm/Kconfig  |  5 ---
 arch/arm/mach-ixp4xx/common.c | 57 ---
 kernel/dma/mapping.c  |  2 --
 3 files changed, 64 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 3a95203236d2..ec0dbaf73a81 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -217,9 +217,6 @@ config ARCH_MAY_HAVE_PC_FDC
 config ARCH_SUPPORTS_UPROBES
def_bool y
 
-config ARCH_HAS_DMA_SET_COHERENT_MASK
-   bool
-
 config GENERIC_ISA_DMA
bool
 
@@ -381,10 +378,8 @@ config ARCH_IOP32X
 config ARCH_IXP4XX
bool "IXP4xx-based"
depends on MMU
-   select ARCH_HAS_DMA_SET_COHERENT_MASK
select ARCH_SUPPORTS_BIG_ENDIAN
select CPU_XSCALE
-   select DMABOUNCE if PCI
select GENERIC_IRQ_MULTI_HANDLER
select GPIO_IXP4XX
select GPIOLIB
diff --git a/arch/arm/mach-ixp4xx/common.c b/arch/arm/mach-ixp4xx/common.c
index 4e51514ace6d..310e1602fbfc 100644
--- a/arch/arm/mach-ixp4xx/common.c
+++ b/arch/arm/mach-ixp4xx/common.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -330,59 +329,3 @@ void ixp4xx_restart(enum reboot_mode mode, const char *cmd)
*IXP4XX_OSWE = IXP4XX_WDT_RESET_ENABLE | 
IXP4XX_WDT_COUNT_ENABLE;
}
 }
-
-#ifdef CONFIG_PCI
-static int ixp4xx_needs_bounce(struct device *dev, dma_addr_t dma_addr, size_t 
size)
-{
-   return (dma_addr + size) > SZ_64M;
-}
-
-static int ixp4xx_platform_notify_remove(struct device *dev)
-{
-   if (dev_is_pci(dev))
-   dmabounce_unregister_dev(dev);
-
-   return 0;
-}
-#endif
-
-/*
- * Setup DMA mask to 64MB on PCI devices and 4 GB on all other things.
- */
-static int ixp4xx_platform_notify(struct device *dev)
-{
-   dev->dma_mask = &dev->coherent_dma_mask;
-
-#ifdef CONFIG_PCI
-   if (dev_is_pci(dev)) {
-   dev->coherent_dma_mask = DMA_BIT_MASK(28); /* 64 MB */
-   dmabounce_register_dev(dev, 2048, 4096, ixp4xx_needs_bounce);
-   return 0;
-   }
-#endif
-
-   dev->coherent_dma_mask = DMA_BIT_MASK(32);
-   return 0;
-}
-
-int dma_set_coherent_mask(struct device *dev, u64 mask)
-{
-   if (dev_is_pci(dev))
-   mask &= DMA_BIT_MASK(28); /* 64 MB */
-
-   if ((mask & DMA_BIT_MASK(28)) == DMA_BIT_MASK(28)) {
-   dev->coherent_dma_mask = mask;
-   return 0;
-   }
-
-   return -EIO;/* device wanted sub-64MB mask */
-}
-EXPORT_SYMBOL(dma_set_coherent_mask);
-
-void __init ixp4xx_init_early(void)
-{
-   platform_notify = ixp4xx_platform_notify;
-#ifdef CONFIG_PCI
-   platform_notify_remove = ixp4xx_platform_notify_remove;
-#endif
-}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e..559461a826ba 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -745,7 +745,6 @@ int dma_set_mask(struct device *dev, u64 mask)
 }
 EXPORT_SYMBOL(dma_set_mask);
 
-#ifndef CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK
 int dma_set_coherent_mask(struct device *dev, u64 mask)
 {
/*
@@ -761,7 +760,6 @@ int dma_set_coherent_mask(struct device *dev, u64 mask)
return 0;
 }
 EXPORT_SYMBOL(dma_set_coherent_mask);
-#endif
 
 size_t dma_max_mapping_size(struct device *dev)
 {
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-11 Thread Dave Hansen
On 2/7/22 15:02, Fenghua Yu wrote:
...
> Get rid of the refcounting mechanisms and replace/rename the interfaces
> to reflect this new approach.
...
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
>  drivers/iommu/intel/iommu.c   |  4 +-
>  drivers/iommu/intel/svm.c |  9 -
>  drivers/iommu/ioasid.c| 39 ++-
>  drivers/iommu/iommu-sva-lib.c | 39 ++-
>  drivers/iommu/iommu-sva-lib.h |  1 -
>  include/linux/ioasid.h| 12 +-
>  include/linux/sched/mm.h  | 16 
>  kernel/fork.c |  1 +
>  9 files changed, 38 insertions(+), 88 deletions(-)

Given the heavily non-x86 diffstat here, I was hoping to see some acks
from folks that this might affect, especially on the ARM side.

Is everyone OK with this?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 00/11] Re-enable ENQCMD and PASID MSR

2022-02-11 Thread Fenghua Yu
Hi, Thomas,

On Mon, Feb 07, 2022 at 03:02:43PM -0800, Fenghua Yu wrote:
> Problems in the old code to manage SVM (Shared Virtual Memory) devices
> and the PASID (Process Address Space ID) led to that code being
> disabled.
> 
> Subsequent discussions resulted in a far simpler approach:
> 
> 1) PASID life cycle is from first allocation by a process until that
>process exits.
> 2) All tasks begin with PASID disabled
> 3) The #GP fault handler tries to fix faulting ENQCMD instructions very
>early (thus avoiding complexities of the XSAVE infrastructure)
> 
> Change Log:
> v4:
> - Update commit message in patch #4 (Thomas).
> - Update commit message in patch #5 (Thomas).
> - Add "Reviewed-by: Thomas Gleixner " in patch #1-#3
>   and patch #6-#9 (Thomas).
> - Rebased to 5.17-rc3.

A friendly reminder. Any comment on this series? Will you pick up this
series in tip?

Thank you very much!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: benchmark: Extract a common header file for map_benchmark definition

2022-02-11 Thread Shuah Khan

On 2/10/22 9:22 PM, Song Bao Hua (Barry Song) wrote:




-Original Message-
From: tiantao (H)
Sent: Friday, February 11, 2022 4:15 PM
To: Song Bao Hua (Barry Song) ; sh...@kernel.org;
chenxiang (M) 
Cc: iommu@lists.linux-foundation.org; linux-kselft...@vger.kernel.org;
linux...@openeuler.org
Subject: [PATCH] dma-mapping: benchmark: Extract a common header file for
map_benchmark definition

kernel/dma/map_benchmark.c and selftests/dma/dma_map_benchmark.c
have duplicate map_benchmark definitions, which tends to lead to
inconsistent changes to map_benchmark on both sides, extract a
common header file to avoid this problem.

Signed-off-by: Tian Tao 


+To: Christoph

Looks like a right cleanup. This will help decrease the maintain
overhead in the future. Other similar selftests tools are also
doing this.

Acked-by: Barry Song 



+1 on this cleanup making this code maintainable. We are moving in
the direction of cleaning up defines in selftests for the same
reason.

Let's just make sure this works on older kernels. We do support
mainline kselftest on stable releases. With that:

Reviewed-by: Shuah Khan 

thanks,
-- Shuah



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/6] drivers: virtio_mem: use pageblock size as the minimum virtio_mem size.

2022-02-11 Thread Zi Yan
From: Zi Yan 

alloc_contig_range() now only needs to be aligned to pageblock_order,
drop virtio_mem size requirement that it needs to be the max of
pageblock_order and MAX_ORDER.

Signed-off-by: Zi Yan 
---
 drivers/virtio/virtio_mem.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 38becd8d578c..2307e65d18c2 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -2476,13 +2476,12 @@ static int virtio_mem_init_hotplug(struct virtio_mem 
*vm)
  VIRTIO_MEM_DEFAULT_OFFLINE_THRESHOLD);
 
/*
-* We want subblocks to span at least MAX_ORDER_NR_PAGES and
-* pageblock_nr_pages pages. This:
+* We want subblocks to span at least pageblock_nr_pages pages.
+* This:
 * - Is required for now for alloc_contig_range() to work reliably -
 *   it doesn't properly handle smaller granularity on ZONE_NORMAL.
 */
-   sb_size = max_t(uint64_t, MAX_ORDER_NR_PAGES,
-   pageblock_nr_pages) * PAGE_SIZE;
+   sb_size = pageblock_nr_pages * PAGE_SIZE;
sb_size = max_t(uint64_t, vm->device_block_size, sb_size);
 
if (sb_size < memory_block_size_bytes() && !force_bbm) {
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment.

2022-02-11 Thread Zi Yan
From: Zi Yan 

Hi all,

This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make
MAX_ORDER adjustable at boot time[1]. It is on top of mmotm-2022-02-08-15-31.

Changelog
===
V5
---
1. Moved isolation address alignment handling in start_isolate_page_range().
2. Rewrote and simplified how alloc_contig_range() works at pageblock
   granularity (Patch 3). Only two pageblock migratetypes need to be saved and
   restored. start_isolate_page_range() might need to migrate pages in this
   version, but it prevents the caller from worrying about
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) alignment after the page range
   is isolated.

V4
---
1. Dropped two irrelevant patches on non-lru compound page handling, as
   it is not supported upstream.
2. Renamed migratetype_has_fallback() to migratetype_is_mergeable().
3. Always check whether two pageblocks can be merged in
   __free_one_page() when order is >= pageblock_order, as the case (not
   mergeable pageblocks are isolated, CMA, and HIGHATOMIC) becomes more common.
3. Moving has_unmovable_pages() is now a separate patch.
4. Removed MAX_ORDER-1 alignment requirement in the comment in virtio_mem code.

Description
===

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. isolates pages at pageblock granularity instead of
   max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages);
2. splits free pages across the specified range or migrates in-use pages
   across the specified range then splits the freed page to avoid free page
   accounting issues (it happens when multiple pageblocks within a single page
   have different migratetypes);
3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to avoid alloc_contig_range() failure when pageblocks
   within a MAX_ORDER - 1 aligned range are allocated separately.
4. returns pages not in the range as it did before.

One optimization might come later:
1. make MIGRATE_ISOLATE a separate bit to be able to restore the original
   migratetypes when isolation fails in the middle of the range.

Feel free to give comments and suggestions. Thanks.

[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi@sent.com/
[2] 
https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b94...@redhat.com/

Zi Yan (6):
  mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c
  mm: page_isolation: check specified range for unmovable pages
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c|   7 +-
 include/linux/mmzone.h |   5 +-
 include/linux/page-isolation.h |  16 +-
 kernel/dma/contiguous.c|   2 +-
 mm/cma.c   |   6 +-
 mm/internal.h  |   3 +
 mm/memory_hotplug.c|   3 +-
 mm/page_alloc.c| 371 ++---
 mm/page_isolation.c| 172 +-
 10 files changed, 367 insertions(+), 222 deletions(-)

-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 6/6] arch: powerpc: adjust fadump alignment to be pageblock aligned.

2022-02-11 Thread Zi Yan
From: Zi Yan 

CMA only requires pageblock alignment now. Change CMA alignment in
fadump too.

Signed-off-by: Zi Yan 
---
 arch/powerpc/include/asm/fadump-internal.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump-internal.h 
b/arch/powerpc/include/asm/fadump-internal.h
index 52189928ec08..fbfca85b4200 100644
--- a/arch/powerpc/include/asm/fadump-internal.h
+++ b/arch/powerpc/include/asm/fadump-internal.h
@@ -20,9 +20,7 @@
 #define memblock_num_regions(memblock_type)(memblock.memblock_type.cnt)
 
 /* Alignment per CMA requirement. */
-#define FADUMP_CMA_ALIGNMENT   (PAGE_SIZE <<   \
-max_t(unsigned long, MAX_ORDER - 1,\
-pageblock_order))
+#define FADUMP_CMA_ALIGNMENT   (PAGE_SIZE << pageblock_order)
 
 /* FAD commands */
 #define FADUMP_REGISTER1
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/6] mm: page_isolation: check specified range for unmovable pages

2022-02-11 Thread Zi Yan
From: Zi Yan 

Enable set_migratetype_isolate() to check specified sub-range for
unmovable pages during isolation. Page isolation is done
at max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages) granularity, but not all
pages within that granularity are intended to be isolated. For example,
alloc_contig_range(), which uses page isolation, allows ranges without
alignment. This commit makes unmovable page check only look for
interesting pages, so that page isolation can succeed for any
non-overlapping ranges.

Signed-off-by: Zi Yan 
---
 include/linux/page-isolation.h | 12 +
 mm/page_alloc.c| 15 +--
 mm/page_isolation.c| 46 +-
 3 files changed, 41 insertions(+), 32 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index e14eddf6741a..4ef7be6def83 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -15,6 +15,18 @@ static inline bool is_migrate_isolate(int migratetype)
 {
return migratetype == MIGRATE_ISOLATE;
 }
+static inline unsigned long pfn_max_align_down(unsigned long pfn)
+{
+   return ALIGN_DOWN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+pageblock_nr_pages));
+}
+
+static inline unsigned long pfn_max_align_up(unsigned long pfn)
+{
+   return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
+   pageblock_nr_pages));
+}
+
 #else
 static inline bool has_isolate_pageblock(struct zone *zone)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e2c6a67fc386..62ef78f3d771 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8963,18 +8963,6 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 }
 
 #ifdef CONFIG_CONTIG_ALLOC
-static unsigned long pfn_max_align_down(unsigned long pfn)
-{
-   return pfn & ~(max_t(unsigned long, MAX_ORDER_NR_PAGES,
-pageblock_nr_pages) - 1);
-}
-
-static unsigned long pfn_max_align_up(unsigned long pfn)
-{
-   return ALIGN(pfn, max_t(unsigned long, MAX_ORDER_NR_PAGES,
-   pageblock_nr_pages));
-}
-
 #if defined(CONFIG_DYNAMIC_DEBUG) || \
(defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
 /* Usage: See admin-guide/dynamic-debug-howto.rst */
@@ -9119,8 +9107,7 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
 * put back to page allocator so that buddy can use them.
 */
 
-   ret = start_isolate_page_range(pfn_max_align_down(start),
-  pfn_max_align_up(end), migratetype, 0);
+   ret = start_isolate_page_range(start, end, migratetype, 0);
if (ret)
return ret;
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index b34f1310aeaa..64d093ab83ec 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -16,7 +16,8 @@
 #include 
 
 /*
- * This function checks whether pageblock includes unmovable pages or not.
+ * This function checks whether pageblock within [start_pfn, end_pfn) includes
+ * unmovable pages or not.
  *
  * PageLRU check without isolation or lru_lock could race so that
  * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
@@ -29,11 +30,14 @@
  *
  */
 static struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-int migratetype, int flags)
+int migratetype, int flags,
+unsigned long start_pfn, unsigned long end_pfn)
 {
-   unsigned long iter = 0;
-   unsigned long pfn = page_to_pfn(page);
-   unsigned long offset = pfn % pageblock_nr_pages;
+   unsigned long first_pfn = max(page_to_pfn(page), start_pfn);
+   unsigned long pfn = first_pfn;
+   unsigned long last_pfn = min(ALIGN(pfn + 1, pageblock_nr_pages), 
end_pfn);
+
+   page = pfn_to_page(pfn);
 
if (is_migrate_cma_page(page)) {
/*
@@ -47,8 +51,8 @@ static struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
return page;
}
 
-   for (; iter < pageblock_nr_pages - offset; iter++) {
-   page = pfn_to_page(pfn + iter);
+   for (pfn = first_pfn; pfn < last_pfn; pfn++) {
+   page = pfn_to_page(pfn);
 
/*
 * Both, bootmem allocations and memory holes are marked
@@ -85,7 +89,7 @@ static struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
}
 
skip_pages = compound_nr(head) - (page - head);
-   iter += skip_pages - 1;
+   pfn += skip_pages - 1;
continue;
}
 
@@ -97,7 +101,7 @@ static struct page *has_unmovable_pages(struct zone *zone, 
struct page *page,
 */
if (!page_ref_count(page)) {
if (PageBuddy(pag

[PATCH v5 3/6] mm: make alloc_contig_range work at pageblock granularity

2022-02-11 Thread Zi Yan
From: Zi Yan 

alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging
pageblocks with different migratetypes. It might unnecessarily convert
extra pageblocks at the beginning and at the end of the range. Change
alloc_contig_range() to work at pageblock granularity.

Special handling is needed for free pages and in-use pages across the
boundaries of the range specified alloc_contig_range(). Because these
partially isolated pages causes free page accounting issues. The free
pages will be split and freed into separate migratetype lists; the
in-use pages will be migrated then the freed pages will be handled.

Signed-off-by: Zi Yan 
---
 include/linux/page-isolation.h |   2 +-
 mm/internal.h  |   3 +
 mm/memory_hotplug.c|   3 +-
 mm/page_alloc.c| 235 +
 mm/page_isolation.c|  33 -
 5 files changed, 211 insertions(+), 65 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4ef7be6def83..78ff940cc169 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -54,7 +54,7 @@ int move_freepages_block(struct zone *zone, struct page *page,
  */
 int
 start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-unsigned migratetype, int flags);
+unsigned migratetype, int flags, gfp_t gfp_flags);
 
 /*
  * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE.
diff --git a/mm/internal.h b/mm/internal.h
index 0d240e876831..509cbdc25992 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -319,6 +319,9 @@ isolate_freepages_range(struct compact_control *cc,
 int
 isolate_migratepages_range(struct compact_control *cc,
   unsigned long low_pfn, unsigned long end_pfn);
+
+int
+isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, int 
isolate_before_boundary);
 #endif
 int find_suitable_fallback(struct free_area *area, unsigned int order,
int migratetype, bool only_stealable, bool *can_steal);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ce68098832aa..82406d2f3e46 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1863,7 +1863,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned 
long nr_pages,
/* set above range as isolated */
ret = start_isolate_page_range(start_pfn, end_pfn,
   MIGRATE_MOVABLE,
-  MEMORY_OFFLINE | REPORT_FAILURE);
+  MEMORY_OFFLINE | REPORT_FAILURE,
+  GFP_USER | __GFP_MOVABLE | 
__GFP_RETRY_MAYFAIL);
if (ret) {
reason = "failure to isolate range";
goto failed_removal_pcplists_disabled;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 62ef78f3d771..7a4fa21aea5c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8985,7 +8985,7 @@ static inline void alloc_contig_dump_pages(struct 
list_head *page_list)
 #endif
 
 /* [start, end) must belong to a single zone. */
-static int __alloc_contig_migrate_range(struct compact_control *cc,
+int __alloc_contig_migrate_range(struct compact_control *cc,
unsigned long start, unsigned long end)
 {
/* This function is based on compact_zone() from compaction.c. */
@@ -9043,6 +9043,167 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
return 0;
 }
 
+/**
+ * split_free_page() -- split a free page at split_pfn_offset
+ * @free_page: the original free page
+ * @order: the order of the page
+ * @split_pfn_offset:  split offset within the page
+ *
+ * It is used when the free page crosses two pageblocks with different 
migratetypes
+ * at split_pfn_offset within the page. The split free page will be put into
+ * separate migratetype lists afterwards. Otherwise, the function achieves
+ * nothing.
+ */
+static inline void split_free_page(struct page *free_page,
+   int order, unsigned long split_pfn_offset)
+{
+   struct zone *zone = page_zone(free_page);
+   unsigned long free_page_pfn = page_to_pfn(free_page);
+   unsigned long pfn;
+   unsigned long flags;
+   int free_page_order;
+
+   spin_lock_irqsave(&zone->lock, flags);
+   del_page_from_free_list(free_page, zone, order);
+   for (pfn = free_page_pfn;
+pfn < free_page_pfn + (1UL << order);) {
+   int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn);
+
+   free_page_order = order_base_2(split_pfn_offset);
+   __free_one_page(pfn_to_page(pfn), pfn, zone, free_page_order,
+   mt, FPI_NONE);
+   pfn += 1UL << free_page_order;
+   split_pfn_offset -= (1UL << free_page_order);
+   /* we have done the first part, now switch to second part 

[PATCH v5 4/6] mm: cma: use pageblock_order as the single alignment

2022-02-11 Thread Zi Yan
From: Zi Yan 

Now alloc_contig_range() works at pageblock granularity. Change CMA
allocation, which uses alloc_contig_range(), to use pageblock_order
alignment.

Signed-off-by: Zi Yan 
---
 include/linux/mmzone.h  | 5 +
 kernel/dma/contiguous.c | 2 +-
 mm/cma.c| 6 ++
 mm/page_alloc.c | 4 ++--
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3fff6deca2c0..da38c8436493 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -54,10 +54,7 @@ enum migratetype {
 *
 * The way to use it is to change migratetype of a range of
 * pageblocks to MIGRATE_CMA which can be done by
-* __free_pageblock_cma() function.  What is important though
-* is that a range of pageblocks must be aligned to
-* MAX_ORDER_NR_PAGES should biggest page be bigger than
-* a single pageblock.
+* __free_pageblock_cma() function.
 */
MIGRATE_CMA,
 #endif
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index 3d63d91cba5c..ac35b14b0786 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -399,7 +399,7 @@ static const struct reserved_mem_ops rmem_cma_ops = {
 
 static int __init rmem_cma_setup(struct reserved_mem *rmem)
 {
-   phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
+   phys_addr_t align = PAGE_SIZE << pageblock_order;
phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
diff --git a/mm/cma.c b/mm/cma.c
index 766f1b82b532..b2e927fab7b5 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -187,8 +187,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, 
phys_addr_t size,
return -EINVAL;
 
/* ensure minimal alignment required by mm core */
-   alignment = PAGE_SIZE <<
-   max_t(unsigned long, MAX_ORDER - 1, pageblock_order);
+   alignment = PAGE_SIZE << pageblock_order;
 
/* alignment should be aligned with order_per_bit */
if (!IS_ALIGNED(alignment >> PAGE_SHIFT, 1 << order_per_bit))
@@ -275,8 +274,7 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 * migratetype page by page allocator's buddy algorithm. In the case,
 * you couldn't get a contiguous memory, which is not what we want.
 */
-   alignment = max(alignment,  (phys_addr_t)PAGE_SIZE <<
- max_t(unsigned long, MAX_ORDER - 1, pageblock_order));
+   alignment = max(alignment,  (phys_addr_t)PAGE_SIZE << pageblock_order);
if (fixed && base & (alignment - 1)) {
ret = -EINVAL;
pr_err("Region at %pa must be aligned to %pa bytes\n",
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7a4fa21aea5c..ac9432e63ce1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -9214,8 +9214,8 @@ int isolate_single_pageblock(unsigned long boundary_pfn, 
gfp_t gfp_flags,
  * be either of the two.
  * @gfp_mask:  GFP mask to use during compaction
  *
- * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES
- * aligned.  The PFN range must belong to a single zone.
+ * The PFN range does not have to be pageblock aligned. The PFN range must
+ * belong to a single zone.
  *
  * The first thing this routine does is attempt to MIGRATE_ISOLATE all
  * pageblocks in the range.  Once isolated, the pageblocks should not
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/6] mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c

2022-02-11 Thread Zi Yan
From: Zi Yan 

has_unmovable_pages() is only used in mm/page_isolation.c. Move it from
mm/page_alloc.c and make it static.

Signed-off-by: Zi Yan 
Reviewed-by: Oscar Salvador 
---
 include/linux/page-isolation.h |   2 -
 mm/page_alloc.c| 119 -
 mm/page_isolation.c| 119 +
 3 files changed, 119 insertions(+), 121 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 572458016331..e14eddf6741a 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype)
 #define MEMORY_OFFLINE 0x1
 #define REPORT_FAILURE 0x2
 
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-int migratetype, int flags);
 void set_pageblock_migratetype(struct page *page, int migratetype);
 int move_freepages_block(struct zone *zone, struct page *page,
int migratetype, int *num_movable);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cface1d38093..e2c6a67fc386 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8962,125 +8962,6 @@ void *__init alloc_large_system_hash(const char 
*tablename,
return table;
 }
 
-/*
- * This function checks whether pageblock includes unmovable pages or not.
- *
- * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. And __PageMovable
- * check without lock_page also may miss some movable non-lru pages at
- * race condition. So you can't expect this function should be exact.
- *
- * Returns a page without holding a reference. If the caller wants to
- * dereference that page (e.g., dumping), it has to make sure that it
- * cannot get removed (e.g., via memory unplug) concurrently.
- *
- */
-struct page *has_unmovable_pages(struct zone *zone, struct page *page,
-int migratetype, int flags)
-{
-   unsigned long iter = 0;
-   unsigned long pfn = page_to_pfn(page);
-   unsigned long offset = pfn % pageblock_nr_pages;
-
-   if (is_migrate_cma_page(page)) {
-   /*
-* CMA allocations (alloc_contig_range) really need to mark
-* isolate CMA pageblocks even when they are not movable in fact
-* so consider them movable here.
-*/
-   if (is_migrate_cma(migratetype))
-   return NULL;
-
-   return page;
-   }
-
-   for (; iter < pageblock_nr_pages - offset; iter++) {
-   page = pfn_to_page(pfn + iter);
-
-   /*
-* Both, bootmem allocations and memory holes are marked
-* PG_reserved and are unmovable. We can even have unmovable
-* allocations inside ZONE_MOVABLE, for example when
-* specifying "movablecore".
-*/
-   if (PageReserved(page))
-   return page;
-
-   /*
-* If the zone is movable and we have ruled out all reserved
-* pages then it should be reasonably safe to assume the rest
-* is movable.
-*/
-   if (zone_idx(zone) == ZONE_MOVABLE)
-   continue;
-
-   /*
-* Hugepages are not in LRU lists, but they're movable.
-* THPs are on the LRU, but need to be counted as #small pages.
-* We need not scan over tail pages because we don't
-* handle each tail page individually in migration.
-*/
-   if (PageHuge(page) || PageTransCompound(page)) {
-   struct page *head = compound_head(page);
-   unsigned int skip_pages;
-
-   if (PageHuge(page)) {
-   if 
(!hugepage_migration_supported(page_hstate(head)))
-   return page;
-   } else if (!PageLRU(head) && !__PageMovable(head)) {
-   return page;
-   }
-
-   skip_pages = compound_nr(head) - (page - head);
-   iter += skip_pages - 1;
-   continue;
-   }
-
-   /*
-* We can't use page_count without pin a page
-* because another CPU can free compound page.
-* This check already skips compound tails of THP
-* because their page->_refcount is zero at all time.
-*/
-   if (!page_ref_count(page)) {
-   if (PageBuddy(page))
-   iter += (1 << buddy_order(page)) - 1;
-   continue;
-   }
-
-   /*
-* The HWPoisoned page may be

Re: [PATCH v1 00/10] iommu/vt-d: Some Intel IOMMU cleanups

2022-02-11 Thread Jason Gunthorpe via iommu
On Mon, Feb 07, 2022 at 02:41:32PM +0800, Lu Baolu wrote:
> Hi folks,
> 
> After a long time of evolution, the drivers/iommu/intel/iommu.c becomes
> fat and a bit messy. This series tries to cleanup and refactor the
> driver to make it more concise. Your comments are very appreciated.

I wanted to take a closer look at what you are trying to do with rcu,
but these patches don't apply. Please always sent patches against a
well known tree like v5.17-rc or the iommu tree, or something.

Anyhow, I think you should split the last 4 patches out of this series
and send them seperately.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Error when running fio against nvme-of rdma target (mlx5 driver)

2022-02-11 Thread Robin Murphy

On 2022-02-10 23:58, Martin Oliveira wrote:

On 2/9/22 1:41 AM, Chaitanya Kulkarni wrote:

On 2/8/22 6:50 PM, Martin Oliveira wrote:

Hello,

We have been hitting an error when running IO over our nvme-of setup, using the 
mlx5 driver and we are wondering if anyone has seen anything similar/has any 
suggestions.

Both initiator and target are AMD EPYC 7502 machines connected over RDMA using 
a Mellanox MT28908. Target has 12 NVMe SSDs which are exposed as a single NVMe 
fabrics device, one physical SSD per namespace.



Thanks for reporting this, if you can bisect the problem on your setup
it will help others to help you better.

-ck


Hi Chaitanya,

I went back to a kernel as old as 4.15 and the problem was still there, so I 
don't know of a good commit to start from.

I also learned that I can reproduce this with as little as 3 cards and I 
updated the firmware on the Mellanox cards to the latest version.

I'd be happy to try any tests if someone has any suggestions.


The IOMMU is probably your friend here - one thing that might be worth 
trying is capturing the iommu:map and iommu:unmap tracepoints to see if 
the address reported in subsequent IOMMU faults was previously mapped as 
a valid DMA address (be warned that there will likely be a *lot* of 
trace generated). With 5.13 or newer, booting with "iommu.forcedac=1" 
should also make it easier to tell real DMA IOVAs from rogue physical 
addresses or other nonsense, as real DMA addresses should then look more 
like 0x24d08000.


That could at least help narrow down whether it's some kind of 
use-after-free race or a completely bogus address creeping in somehow.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: explicitly check for NULL in iommu_dma_get_resv_regions()

2022-02-11 Thread Aleksandr Fedorov
> On 2022-02-09 14:09, Aleksandr Fedorov wrote:
>> iommu_dma_get_resv_regions() assumes that iommu_fwspec field for
>> corresponding device is set which is not always true. Since
>> iommu_dma_get_resv_regions() seems to be a future-proof generic API
>> that can be used by any iommu driver, add an explicit check for NULL.
> 
> Except it's not a "generic" interface for drivers to call at random,
> it's a helper for retrieving common firmware-based information
> specifically for drivers already using the fwspec mechanism for common
> firmware bindings. If any driver calls this with a device *without* a
> valid fwnode, it deserves to crash because it's done something
> fundamentally wrong.
> 
> I concur that it's not exactly obvious that "non-IOMMU-specific" means
> "based on common firmware bindings, thus implying fwspec".

Thanks for the explanations, yes, this was the misunderstanding on my
part. Maybe add a comment?

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d85d54f2b549..ce5e7d4d054a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -379,6 +379,9 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
  * for general non-IOMMU-specific reservations. Currently, this covers GICv3
  * ITS region reservation on ACPI based ARM platforms that may require HW MSI
  * reservation.
+ *
+ * Note that this helper is meant to be used only by drivers that are already
+ * using the fwspec mechanism for common firmware bindings.
  */
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
 {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu