from:"js1304"

[PATCH v3 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020-09-29 Thread js1304

From: Joonsoo Kim 

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

For now, these APIs are used to prevent long-term pinning on the CMA page.
When the long-term pinning is requested on the CMA page, it is migrated to
the non-CMA page before pinning. This non-CMA page is allocated by using
memalloc_nocma_{save/restore} APIs. If APIs doesn't work as intended,
the CMA page is allocated and it is pinned for a long time. This long-term
pin for the CMA page causes cma_alloc() failure and it could result in
wrong behaviour on the device driver who uses the cma_alloc().

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97..b5a3f18 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
struct page *page;
 
if (likely(order == 0)) {
-   page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
+   /*
+* MIGRATE_MOVABLE pcplist could have the pages on CMA area and
+* we need to skip it when CMA area isn't allowed.
+*/
+   if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
+   migratetype != MIGRATE_MOVABLE) {
+   page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
migratetype, alloc_flags);
-   goto out;
+   goto out;
+   }
}
 
/*
@@ -3381,7 +3388,13 @@ struct page *rmqueue(struct zone *preferred_zone,
 
do {
page = NULL;
-   if (alloc_flags & ALLOC_HARDER) {
+   /*
+* order-0 request can reach here when the pcplist is skipped
+* due to non-CMA allocation context. HIGHATOMIC area is
+* reserved for high-order atomic allocation, so order-0
+* request should skip it.
+*/
+   if (order > 0 && alloc_flags & ALLOC_HARDER) {
page = __rmqueue_smallest(zone, order, 
MIGRATE_HIGHATOMIC);
if (page)
trace_mm_page_alloc_zone_locked(page, order, 
migratetype);
-- 
2.7.4

[PATCH v2 for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020-09-28 Thread js1304

From: Joonsoo Kim 

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97..104d2e1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3367,9 +3367,16 @@ struct page *rmqueue(struct zone *preferred_zone,
struct page *page;
 
if (likely(order == 0)) {
-   page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
+   /*
+* MIGRATE_MOVABLE pcplist could have the pages on CMA area and
+* we need to skip it when CMA area isn't allowed.
+*/
+   if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA ||
+   migratetype != MIGRATE_MOVABLE) {
+   page = rmqueue_pcplist(preferred_zone, zone, gfp_flags,
migratetype, alloc_flags);
-   goto out;
+   goto out;
+   }
}
 
/*
@@ -3381,7 +3388,7 @@ struct page *rmqueue(struct zone *preferred_zone,
 
do {
page = NULL;
-   if (alloc_flags & ALLOC_HARDER) {
+   if (order > 0 && alloc_flags & ALLOC_HARDER) {
page = __rmqueue_smallest(zone, order, 
MIGRATE_HIGHATOMIC);
if (page)
trace_mm_page_alloc_zone_locked(page, order, 
migratetype);
-- 
2.7.4

[PATCH for v5.9] mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs

2020-08-24 Thread js1304

From: Joonsoo Kim 

memalloc_nocma_{save/restore} APIs can be used to skip page allocation
on CMA area, but, there is a missing case and the page on CMA area could
be allocated even if APIs are used. This patch handles this case to fix
the potential issue.

Missing case is an allocation from the pcplist. MIGRATE_MOVABLE pcplist
could have the pages on CMA area so we need to skip it if ALLOC_CMA isn't
specified.

This patch implements this behaviour by checking allocated page from
the pcplist rather than skipping an allocation from the pcplist entirely.
Skipping the pcplist entirely would result in a mismatch between watermark
check and actual page allocation. And, it requires to break current code
layering that order-0 page is always handled by the pcplist. I'd prefer
to avoid it so this patch uses different way to skip CMA page allocation
from the pcplist.

Fixes: 8510e69c8efe (mm/page_alloc: fix memalloc_nocma_{save/restore} APIs)
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e2bab4..c4abf58 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3341,6 +3341,22 @@ static struct page *rmqueue_pcplist(struct zone 
*preferred_zone,
pcp = _cpu_ptr(zone->pageset)->pcp;
list = >lists[migratetype];
page = __rmqueue_pcplist(zone,  migratetype, alloc_flags, pcp, list);
+#ifdef CONFIG_CMA
+   if (page) {
+   int mt = get_pcppage_migratetype(page);
+
+   /*
+* pcp could have the pages on CMA area and we need to skip it
+* when !ALLOC_CMA. Free all pcplist and retry allocation.
+*/
+   if (is_migrate_cma(mt) && !(alloc_flags & ALLOC_CMA)) {
+   list_add(>lru, >lists[migratetype]);
+   pcp->count++;
+   free_pcppages_bulk(zone, pcp->count, pcp);
+   page = __rmqueue_pcplist(zone, migratetype, 
alloc_flags, pcp, list);
+   }
+   }
+#endif
if (page) {
__count_zid_vm_events(PGALLOC, page_zonenum(page), 1);
zone_statistics(preferred_zone, zone);
-- 
2.7.4

[PATCH v3 1/3] mm/gup: restrict CMA region by using allocation scope API

2020-07-31 Thread js1304

From: Joonsoo Kim 

We have well defined scope API to exclude CMA region.
Use it rather than manipulating gfp_mask manually. With this change,
we can now restore __GFP_MOVABLE for gfp_mask like as usual migration
target allocation. It would result in that the ZONE_MOVABLE is also
searched by page allocator. For hugetlb, gfp_mask is redefined since
it has a regular allocation mask filter for migration target.
__GPF_NOWARN is added to hugetlb gfp_mask filter since a new user for
gfp_mask filter, gup, want to be silent when allocation fails.

Note that this can be considered as a fix for the commit 9a4e9f3b2d73
("mm: update get_user_pages_longterm to migrate pages allocated from
CMA region"). However, "Fixes" tag isn't added here since it is just
suboptimal but it doesn't cause any problem.

Suggested-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 ++
 mm/gup.c| 17 -
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b9508d..2660b04 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate 
*h, gfp_t gfp_mask)
/* Some callers might want to enfoce node */
modified_mask |= (gfp_mask & __GFP_THISNODE);
 
+   modified_mask |= (gfp_mask & __GFP_NOWARN);
+
return modified_mask;
 }
 
diff --git a/mm/gup.c b/mm/gup.c
index a55f1ec..3990ddc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1601,10 +1601,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 * Trying to allocate a page for migration. Ignore allocation
 * failure warnings. We don't force __GFP_THISNODE here because
 * this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
+* in some case these nodes will have really less non CMA
 * allocation memory.
+*
+* Note that CMA region is prohibited by allocation scope.
 */
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
 
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1612,6 +1614,8 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
+   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
@@ -1626,11 +1630,6 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 */
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
 
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
if (!thp)
return NULL;
@@ -1773,7 +1772,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 vmas_tmp, NULL, gup_flags);
 
if (gup_flags & FOLL_LONGTERM) {
-   memalloc_nocma_restore(flags);
if (rc < 0)
goto out;
 
@@ -1786,9 +1784,10 @@ static long __gup_longterm_locked(struct mm_struct *mm,
 
rc = check_and_migrate_cma_pages(mm, start, rc, pages,
 vmas_tmp, gup_flags);
+out:
+   memalloc_nocma_restore(flags);
}
 
-out:
if (vmas_tmp != vmas)
kfree(vmas_tmp);
return rc;
-- 
2.7.4

[PATCH v3 3/3] mm/gup: use a standard migration target allocation callback

2020-07-31 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback. Use it.

Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 54 ++
 1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7b63d72..ae096ea 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1590,52 +1590,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, 
long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non CMA
-* allocation memory.
-*
-* Note that CMA region is prohibited by allocation scope.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
-
-   return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
 static long check_and_migrate_cma_pages(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
@@ -1649,6 +1603,10 @@ static long check_and_migrate_cma_pages(struct mm_struct 
*mm,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+   };
 
 check_again:
for (i = 0; i < nr_pages;) {
@@ -1694,8 +1652,8 @@ static long check_and_migrate_cma_pages(struct mm_struct 
*mm,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH v3 2/3] mm/hugetlb: make hugetlb migration callback CMA aware

2020-07-31 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c requires to allocate the new page that is not
on the CMA area. new_non_cma_page() implements it by using allocation
scope APIs.

However, there is a work-around for hugetlb. Normal hugetlb page
allocation API for migration is alloc_huge_page_nodemask(). It consists
of two steps. First is dequeing from the pool. Second is, if there is no
available page on the queue, allocating by using the page allocator.

new_non_cma_page() can't use this API since first step (deque) isn't
aware of scope API to exclude CMA area. So, new_non_cma_page() exports
hugetlb internal function for the second step, alloc_migrate_huge_page(),
to global scope and uses it directly. This is suboptimal since hugetlb
pages on the queue cannot be utilized.

This patch tries to fix this situation by making the deque function on
hugetlb CMA aware. In the deque function, CMA memory is skipped if
PF_MEMALLOC_NOCMA flag is found.

Acked-by: Mike Kravetz 
Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  6 +-
 mm/hugetlb.c| 11 +--
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2660b04..fb2b5aa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int 
preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
diff --git a/mm/gup.c b/mm/gup.c
index 3990ddc..7b63d72 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1616,11 +1616,7 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct hstate *h = page_hstate(page);
 
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4645f14..d1706b7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -30,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1041,10 +1042,16 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 {
struct page *page;
+   bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
+
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (nocma && is_migrate_cma_page(page))
+   continue;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1973,7 +1980,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 int nid, nodemask_t *nmask)
 {
struct page *page;
-- 
2.7.4

[PATCH v7 6/6] mm/vmscan: restore active/inactive ratio for anonymous LRU

2020-07-23 Thread js1304

From: Joonsoo Kim 

Now that workingset detection is implemented for anonymous LRU, we don't
need large inactive list to allow detecting frequently accessed pages
before they are reclaimed, anymore. This effectively reverts the temporary
measure put in by commit "mm/vmscan: make active/inactive ratio as 1:1 for
anon lru".

Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9d4e28c..b0de23d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2207,7 +2207,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum 
lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
 
gb = (inactive + active) >> (30 - PAGE_SHIFT);
-   if (gb && is_file_lru(inactive_lru))
+   if (gb)
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
-- 
2.7.4

[PATCH v7 1/6] mm/vmscan: make active/inactive ratio as 1:1 for anon lru

2020-07-23 Thread js1304

From: Joonsoo Kim 

Current implementation of LRU management for anonymous page has some
problems. Most important one is that it doesn't protect the workingset,
that is, pages on the active LRU list. Although, this problem will be
fixed in the following patchset, the preparation is required and
this patch does it.

What following patch does is to implement workingset protection. After
the following patchset, newly created or swap-in pages will start their
lifetime on the inactive list. If inactive list is too small, there is not
enough chance to be referenced and the page cannot become the workingset.

In order to provide the newly anonymous or swap-in pages enough chance to
be referenced again, this patch makes active/inactive LRU ratio as 1:1.

This is just a temporary measure. Later patch in the series introduces
workingset detection for anonymous LRU that will be used to better decide
if pages should start on the active and inactive list. Afterwards this
patch is effectively reverted.

Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6acc956..d5a19c7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2208,7 +2208,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum 
lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
 
gb = (inactive + active) >> (30 - PAGE_SHIFT);
-   if (gb)
+   if (gb && is_file_lru(inactive_lru))
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
-- 
2.7.4

[PATCH v7 4/6] mm/swapcache: support to handle the shadow entries

2020-07-23 Thread js1304

From: Joonsoo Kim 

Workingset detection for anonymous page will be implemented in the
following patch and it requires to store the shadow entries into the
swapcache. This patch implements an infrastructure to store the shadow
entry in the swapcache.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h | 17 
 mm/shmem.c   |  3 ++-
 mm/swap_state.c  | 57 ++--
 mm/swapfile.c|  2 ++
 mm/vmscan.c  |  2 +-
 5 files changed, 69 insertions(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 51ec9cd..8a4c592 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -414,9 +414,13 @@ extern struct address_space *swapper_spaces[];
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
-extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
-extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
+extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
+   gfp_t gfp, void **shadowp);
+extern void __delete_from_swap_cache(struct page *page,
+   swp_entry_t entry, void *shadow);
 extern void delete_from_swap_cache(struct page *);
+extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
+   unsigned long end);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
@@ -570,13 +574,13 @@ static inline int add_to_swap(struct page *page)
 }
 
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
-   gfp_t gfp_mask)
+   gfp_t gfp_mask, void **shadowp)
 {
return -1;
 }
 
 static inline void __delete_from_swap_cache(struct page *page,
-   swp_entry_t entry)
+   swp_entry_t entry, void *shadow)
 {
 }
 
@@ -584,6 +588,11 @@ static inline void delete_from_swap_cache(struct page 
*page)
 {
 }
 
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+   unsigned long end)
+{
+}
+
 static inline int page_swapcount(struct page *page)
 {
return 0;
diff --git a/mm/shmem.c b/mm/shmem.c
index 89b357a..85ed46f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1434,7 +1434,8 @@ static int shmem_writepage(struct page *page, struct 
writeback_control *wbc)
list_add(>swaplist, _swaplist);
 
if (add_to_swap_cache(page, swap,
-   __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
+   __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
+   NULL) == 0) {
spin_lock_irq(>lock);
shmem_recalc_inode(inode);
info->swapped++;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 66e750f..13d8d66 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -110,12 +110,14 @@ void show_swap_cache_info(void)
  * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
  */
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+   gfp_t gfp, void **shadowp)
 {
struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
XA_STATE_ORDER(xas, _space->i_pages, idx, compound_order(page));
unsigned long i, nr = thp_nr_pages(page);
+   void *old;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@@ -125,16 +127,25 @@ int add_to_swap_cache(struct page *page, swp_entry_t 
entry, gfp_t gfp)
SetPageSwapCache(page);
 
do {
+   unsigned long nr_shadows = 0;
+
xas_lock_irq();
xas_create_range();
if (xas_error())
goto unlock;
for (i = 0; i < nr; i++) {
VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
+   old = xas_load();
+   if (xa_is_value(old)) {
+   nr_shadows++;
+   if (shadowp)
+   *shadowp = old;
+   }
set_page_private(page + i, entry.val + i);
xas_store(, page);
xas_next();
}
+   address_space->nrexceptional -= nr_shadows;
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page),

[PATCH v7 3/6] mm/workingset: prepare the workingset detection infrastructure for anon LRU

2020-07-23 Thread js1304

From: Joonsoo Kim 

To prepare the workingset detection for anon LRU, this patch splits
workingset event counters for refault, activate and restore into anon
and file variants, as well as the refaults counter in struct lruvec.

Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/mmzone.h | 16 +++-
 mm/memcontrol.c| 16 +++-
 mm/vmscan.c| 15 ++-
 mm/vmstat.c|  9 ++---
 mm/workingset.c|  8 +---
 5 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 635a96c..efbd95d 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -173,9 +173,15 @@ enum node_stat_item {
NR_ISOLATED_ANON,   /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE,   /* Temporary isolated pages from file lru */
WORKINGSET_NODES,
-   WORKINGSET_REFAULT,
-   WORKINGSET_ACTIVATE,
-   WORKINGSET_RESTORE,
+   WORKINGSET_REFAULT_BASE,
+   WORKINGSET_REFAULT_ANON = WORKINGSET_REFAULT_BASE,
+   WORKINGSET_REFAULT_FILE,
+   WORKINGSET_ACTIVATE_BASE,
+   WORKINGSET_ACTIVATE_ANON = WORKINGSET_ACTIVATE_BASE,
+   WORKINGSET_ACTIVATE_FILE,
+   WORKINGSET_RESTORE_BASE,
+   WORKINGSET_RESTORE_ANON = WORKINGSET_RESTORE_BASE,
+   WORKINGSET_RESTORE_FILE,
WORKINGSET_NODERECLAIM,
NR_ANON_MAPPED, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
@@ -277,8 +283,8 @@ struct lruvec {
unsigned long   file_cost;
/* Non-resident age, driven by LRU movement */
atomic_long_t   nonresident_age;
-   /* Refaults at the time of last reclaim cycle */
-   unsigned long   refaults;
+   /* Refaults at the time of last reclaim cycle, anon=0, file=1 */
+   unsigned long   refaults[2];
/* Various lruvec state flags (enum lruvec_flags) */
unsigned long   flags;
 #ifdef CONFIG_MEMCG
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 14dd98d..e84c2b5 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1530,12 +1530,18 @@ static char *memory_stat_format(struct mem_cgroup 
*memcg)
seq_buf_printf(, "%s %lu\n", vm_event_name(PGMAJFAULT),
   memcg_events(memcg, PGMAJFAULT));
 
-   seq_buf_printf(, "workingset_refault %lu\n",
-  memcg_page_state(memcg, WORKINGSET_REFAULT));
-   seq_buf_printf(, "workingset_activate %lu\n",
-  memcg_page_state(memcg, WORKINGSET_ACTIVATE));
+   seq_buf_printf(, "workingset_refault_anon %lu\n",
+  memcg_page_state(memcg, WORKINGSET_REFAULT_ANON));
+   seq_buf_printf(, "workingset_refault_file %lu\n",
+  memcg_page_state(memcg, WORKINGSET_REFAULT_FILE));
+   seq_buf_printf(, "workingset_activate_anon %lu\n",
+  memcg_page_state(memcg, WORKINGSET_ACTIVATE_ANON));
+   seq_buf_printf(, "workingset_activate_file %lu\n",
+  memcg_page_state(memcg, WORKINGSET_ACTIVATE_FILE));
seq_buf_printf(, "workingset_restore %lu\n",
-  memcg_page_state(memcg, WORKINGSET_RESTORE));
+  memcg_page_state(memcg, WORKINGSET_RESTORE_ANON));
+   seq_buf_printf(, "workingset_restore %lu\n",
+  memcg_page_state(memcg, WORKINGSET_RESTORE_FILE));
seq_buf_printf(, "workingset_nodereclaim %lu\n",
   memcg_page_state(memcg, WORKINGSET_NODERECLAIM));
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9406948..6dda5b2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2683,7 +2683,10 @@ static void shrink_node(pg_data_t *pgdat, struct 
scan_control *sc)
if (!sc->force_deactivate) {
unsigned long refaults;
 
-   if (inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
+   refaults = lruvec_page_state(target_lruvec,
+   WORKINGSET_ACTIVATE_ANON);
+   if (refaults != target_lruvec->refaults[0] ||
+   inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
sc->may_deactivate |= DEACTIVATE_ANON;
else
sc->may_deactivate &= ~DEACTIVATE_ANON;
@@ -2694,8 +2697,8 @@ static void shrink_node(pg_data_t *pgdat, struct 
scan_control *sc)
 * rid of any stale active pages quickly.
 */
refaults = lruvec_page_state(target_lruvec,
-WORKINGSET_ACTIVATE);
-   if (refaults != target_lruvec->refaults ||
+   WORKINGSET_ACTIVATE_FILE);
+   if (refaults != target_lruvec->refaults[1] ||
inactive_is_low(target_lruvec, LRU_INACTIVE_FILE))

[PATCH v7 0/6] workingset protection/detection on the anonymous LRU list

2020-07-23 Thread js1304

From: Joonsoo Kim 

Hello,

This patchset implements workingset protection and detection on
the anonymous LRU list.

* Changes on v7
- fix a bug on clear_shadow_from_swap_cache()
- enhance the commit description
- fix workingset detection formula

* Changes on v6
- rework to reflect a new LRU balance model
- remove memcg charge timing stuff on v5 since alternative is already
merged on mainline
- remove readahead stuff on v5 (reason is the same with above)
- clear shadow entry if corresponding swap entry is deleted
(mm/swapcache: support to handle the exceptional entries in swapcache)
- change experiment environment
(from ssd swap to ram swap, for fast evaluation and for reducing side-effect of 
I/O)
- update performance number

* Changes on v5
- change memcg charge timing for the swapped-in page (fault -> swap-in)
- avoid readahead if previous owner of the swapped-out page isn't me
- use another lruvec to update the reclaim_stat for a new anonymous page
- add two more cases to fix up the reclaim_stat

* Changes on v4
- In the patch "mm/swapcache: support to handle the exceptional
entries in swapcache":
-- replace the word "value" with "exceptional entries"
-- add to handle the shadow entry in add_to_swap_cache()
-- support the huge page
-- remove the registration code for shadow shrinker

- remove the patch "mm/workingset: use the node counter
if memcg is the root memcg" since workingset detection for
anonymous page doesn't use shadow shrinker now
- minor style fixes

* Changes on v3
- rework the patch, "mm/vmscan: protect the workingset on anonymous LRU"
(use almost same reference tracking algorithm to the one for the file
mapped page)

* Changes on v2
- fix a critical bug that uses out of index lru list in
workingset_refault()
- fix a bug that reuses the rotate value for previous page

* SUBJECT
workingset protection

* PROBLEM
In current implementation, newly created or swap-in anonymous page is
started on the active list. Growing the active list results in rebalancing
active/inactive list so old pages on the active list are demoted to the
inactive list. Hence, hot page on the active list isn't protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list and system can contain total
100 pages. Numbers denote the number of pages on active/inactive
list (active | inactive). (h) stands for hot pages and (uo) stands for
used-once pages.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

As we can see, hot pages are swapped-out and it would cause swap-in later.

* SOLUTION
Since this is what we want to avoid, this patchset implements workingset
protection. Like as the file LRU list, newly created or swap-in anonymous
page is started on the inactive list. Also, like as the file LRU list,
if enough reference happens, the page will be promoted. This simple
modification changes the above example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

hot pages remains in the active list. :)

* EXPERIMENT
I tested this scenario on my test bed and confirmed that this problem
happens on current implementation. I also checked that it is fixed by
this patchset.


* SUBJECT
workingset detection

* PROBLEM
Later part of the patchset implements the workingset detection for
the anonymous LRU list. There is a corner case that workingset protection
could cause thrashing. If we can avoid thrashing by workingset detection,
we can get the better performance.

Following is an example of thrashing due to the workingset protection.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (will be hot) pages
50(h) | 50(wh)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

4. workload: 50 (will be hot) pages
50(h) | 50(wh), swap-in 50(wh)

5. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

6. repeat 4, 5

Without workingset detection, this kind of workload cannot be promoted
and thrashing happens forever.

* SOLUTION
Therefore, this patchset implements workingset detection.
All the infrastructure for workingset detecion is already implemented,
so there is not much work to do. First, extend workingset detection
code to deal with the anonymous LRU list. Then, make swap cache handles
the exceptional value for the shadow entry. Lastly, install/retrieve
the shadow value into/from the swap cache and check the refault distance.

* EXPERIMENT
I made a test program to imitates above scenario and confirmed that
problem exists. Then, I checked that this patchset fixes it.

My test setup is a virtual machine with 8 cpus and 6100MB memory. But,
the amount of the memory that the test program can use is about 280 MB.

[PATCH v7 5/6] mm/swap: implement workingset detection for anonymous LRU

2020-07-23 Thread js1304

From: Joonsoo Kim 

This patch implements workingset detection for anonymous LRU.
All the infrastructure is implemented by the previous patches so this
patch just activates the workingset detection by installing/retrieving
the shadow entry and adding refault calculation.

Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h |  6 ++
 mm/memory.c  | 11 ---
 mm/swap_state.c  | 23 ++-
 mm/vmscan.c  |  7 ---
 mm/workingset.c  | 15 +++
 5 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a4c592..6610469 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -414,6 +414,7 @@ extern struct address_space *swapper_spaces[];
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
+extern void *get_shadow_from_swap_cache(swp_entry_t entry);
 extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp, void **shadowp);
 extern void __delete_from_swap_cache(struct page *page,
@@ -573,6 +574,11 @@ static inline int add_to_swap(struct page *page)
return 0;
 }
 
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+   return NULL;
+}
+
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp_mask, void **shadowp)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 25769b6..4934dbc 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3100,6 +3100,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
int locked;
int exclusive = 0;
vm_fault_t ret = 0;
+   void *shadow = NULL;
 
if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
goto out;
@@ -3151,13 +3152,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
goto out_page;
}
 
-   /*
-* XXX: Move to lru_cache_add() when it
-* supports new vs putback
-*/
-   spin_lock_irq(_pgdat(page)->lru_lock);
-   lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
+   shadow = get_shadow_from_swap_cache(entry);
+   if (shadow)
+   workingset_refault(page, shadow);
 
lru_cache_add(page);
swap_readpage(page, true);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 13d8d66..146a86d 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -106,6 +106,20 @@ void show_swap_cache_info(void)
printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
 }
 
+void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+   struct address_space *address_space = swap_address_space(entry);
+   pgoff_t idx = swp_offset(entry);
+   struct page *page;
+
+   page = find_get_entry(address_space, idx);
+   if (xa_is_value(page))
+   return page;
+   if (page)
+   put_page(page);
+   return NULL;
+}
+
 /*
  * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
@@ -406,6 +420,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
 {
struct swap_info_struct *si;
struct page *page;
+   void *shadow = NULL;
 
*new_page_allocated = false;
 
@@ -474,7 +489,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
__SetPageSwapBacked(page);
 
/* May fail (-ENOMEM) if XArray node allocation failed. */
-   if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, NULL)) {
+   if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, 
)) {
put_swap_page(page, entry);
goto fail_unlock;
}
@@ -484,10 +499,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
goto fail_unlock;
}
 
-   /* XXX: Move to lru_cache_add() when it supports new vs putback */
-   spin_lock_irq(_pgdat(page)->lru_lock);
-   lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
+   if (shadow)
+   workingset_refault(page, shadow);
 
/* Caller will initiate read into locked page */
SetPageWorkingset(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b9b543e..9d4e28c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -854,6 +854,7 @@ static int __remove_mapping(struct address_space *mapping, 
struct page *page,
 {
unsigned long flags;

[PATCH v7 2/6] mm/vmscan: protect the workingset on anonymous LRU

2020-07-23 Thread js1304

From: Joonsoo Kim 

In current implementation, newly created or swap-in anonymous page
is started on active list. Growing active list results in rebalancing
active/inactive list so old pages on active list are demoted to inactive
list. Hence, the page on active list isn't protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list. Numbers denote the number of
pages on active/inactive list (active | inactive).

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

This patch tries to fix this issue.
Like as file LRU, newly created or swap-in anonymous pages will be
inserted to the inactive list. They are promoted to active list if
enough reference happens. This simple modification changes the above
example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

As you can see, hot pages on active list would be protected.

Note that, this implementation has a drawback that the page cannot
be promoted and will be swapped-out if re-access interval is greater than
the size of inactive list but less than the size of total(active+inactive).
To solve this potential issue, following patch will apply workingset
detection similar to the one that's already applied to file LRU.

Acked-by: Johannes Weiner 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h|  2 +-
 kernel/events/uprobes.c |  2 +-
 mm/huge_memory.c|  2 +-
 mm/khugepaged.c |  2 +-
 mm/memory.c |  9 -
 mm/migrate.c|  2 +-
 mm/swap.c   | 13 +++--
 mm/swapfile.c   |  2 +-
 mm/userfaultfd.c|  2 +-
 mm/vmscan.c |  4 +---
 10 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 7eb59bc..51ec9cd 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -352,7 +352,7 @@ extern void deactivate_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
 extern void swap_setup(void);
 
-extern void lru_cache_add_active_or_unevictable(struct page *page,
+extern void lru_cache_add_inactive_or_unevictable(struct page *page,
struct vm_area_struct *vma);
 
 /* linux/mm/vmscan.c */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f500204..02791f8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -184,7 +184,7 @@ static int __replace_page(struct vm_area_struct *vma, 
unsigned long addr,
if (new_page) {
get_page(new_page);
page_add_new_anon_rmap(new_page, vma, addr, false);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+   lru_cache_add_inactive_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
dec_mm_counter(mm, MM_ANONPAGES);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 15c9690..2068518 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -619,7 +619,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr, true);
-   lru_cache_add_active_or_unevictable(page, vma);
+   lru_cache_add_inactive_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b043c40..02fb51f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1173,7 +1173,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address, true);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+   lru_cache_add_inactive_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
diff --git a/mm/memory.c b/mm/memory.c
index 45e1dc0..25769b6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2717,7 +2717,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 */
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+

[PATCH v2] mm/page_alloc: fix memalloc_nocma_{save/restore} APIs

2020-07-22 Thread js1304

From: Joonsoo Kim 

Currently, memalloc_nocma_{save/restore} API that prevents CMA area
in page allocation is implemented by using current_gfp_context(). However,
there are two problems of this implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath. So, CMA area can be
allocated through the allocation fastpath even if
memalloc_nocma_{save/restore} APIs are used. Currently, there is just
one user for these APIs and it has a fallback method to prevent actual
problem.
Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect
to exclude the memory on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
CMA area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding CMA area in allocation.

Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc)
Cc: 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/sched/mm.h |  8 +---
 mm/page_alloc.c  | 31 +--
 2 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 480a4d1..17e0c31 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -177,12 +177,10 @@ static inline bool in_vfork(struct task_struct *tsk)
  * Applies per-task gfp context to the given allocation flags.
  * PF_MEMALLOC_NOIO implies GFP_NOIO
  * PF_MEMALLOC_NOFS implies GFP_NOFS
- * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
  */
 static inline gfp_t current_gfp_context(gfp_t flags)
 {
-   if (unlikely(current->flags &
-(PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | 
PF_MEMALLOC_NOCMA))) {
+   if (unlikely(current->flags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS))) {
/*
 * NOIO implies both NOIO and NOFS and it is a weaker context
 * so always make sure it makes precedence
@@ -191,10 +189,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
flags &= ~(__GFP_IO | __GFP_FS);
else if (current->flags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
-#ifdef CONFIG_CMA
-   if (current->flags & PF_MEMALLOC_NOCMA)
-   flags &= ~__GFP_MOVABLE;
-#endif
}
return flags;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87c..7336e94 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2790,7 +2790,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 * allocating from CMA when over half of the zone's free memory
 * is in the CMA area.
 */
-   if (migratetype == MIGRATE_MOVABLE &&
+   if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
@@ -2801,7 +2801,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
-   if (migratetype == MIGRATE_MOVABLE)
+   if (alloc_flags & ALLOC_CMA)
page = __rmqueue_cma_fallback(zone, order);
 
if (!page && __rmqueue_fallback(zone, order, migratetype,
@@ -3671,6 +3671,20 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
return alloc_flags;
 }
 
+static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
+   unsigned int alloc_flags)
+{
+#ifdef CONFIG_CMA
+   unsigned int pflags = current->flags;
+
+   if (!(pflags & PF_MEMALLOC_NOCMA) &&
+   gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+   alloc_flags |= ALLOC_CMA;
+
+#endif
+   return alloc_flags;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -4316,10 +4330,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-   if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+   alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+
return alloc_flags;
 }
 
@@ -4620,7 +4632,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
if (reserve_flags)
-   alloc_flags = reserve_flags;
+   alloc_flags = current_alloc_flags(gfp_mask, reserve_flags);
 
/*
 * Reset the nodemask and zonelist iterators if memory policies can be
@@ -4697,7 +4709,7 @@ __alloc_pages_slowpath(gfp_t

[PATCH] mm/page_alloc: fix memalloc_nocma_{save/restore} APIs

2020-07-20 Thread js1304

From: Joonsoo Kim 

Currently, memalloc_nocma_{save/restore} API that prevents CMA area
in page allocation is implemented by using current_gfp_context(). However,
there are two problems of this implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath. So, CMA area can be
allocated through the allocation fastpath even if
memalloc_nocma_{save/restore} APIs are used. Currently, there is just
one user for these APIs and it has a fallback method to prevent actual
problem.
Second, clearing __GFP_MOVABLE in current_gfp_context() has a side effect
to exclude the memory on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
CMA area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding CMA area in allocation.

Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc)
Cc: 
Signed-off-by: Joonsoo Kim 
---
 include/linux/sched/mm.h |  8 +---
 mm/page_alloc.c  | 33 +++--
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 480a4d1..17e0c31 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -177,12 +177,10 @@ static inline bool in_vfork(struct task_struct *tsk)
  * Applies per-task gfp context to the given allocation flags.
  * PF_MEMALLOC_NOIO implies GFP_NOIO
  * PF_MEMALLOC_NOFS implies GFP_NOFS
- * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
  */
 static inline gfp_t current_gfp_context(gfp_t flags)
 {
-   if (unlikely(current->flags &
-(PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | 
PF_MEMALLOC_NOCMA))) {
+   if (unlikely(current->flags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS))) {
/*
 * NOIO implies both NOIO and NOFS and it is a weaker context
 * so always make sure it makes precedence
@@ -191,10 +189,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
flags &= ~(__GFP_IO | __GFP_FS);
else if (current->flags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
-#ifdef CONFIG_CMA
-   if (current->flags & PF_MEMALLOC_NOCMA)
-   flags &= ~__GFP_MOVABLE;
-#endif
}
return flags;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e028b87c..08cb35c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2790,7 +2790,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 * allocating from CMA when over half of the zone's free memory
 * is in the CMA area.
 */
-   if (migratetype == MIGRATE_MOVABLE &&
+   if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
@@ -2801,7 +2801,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
-   if (migratetype == MIGRATE_MOVABLE)
+   if (alloc_flags & ALLOC_CMA)
page = __rmqueue_cma_fallback(zone, order);
 
if (!page && __rmqueue_fallback(zone, order, migratetype,
@@ -3671,6 +3671,20 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
return alloc_flags;
 }
 
+static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
+   unsigned int alloc_flags)
+{
+#ifdef CONFIG_CMA
+   unsigned int pflags = current->flags;
+
+   if (!(pflags & PF_MEMALLOC_NOCMA) &&
+   gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+   alloc_flags |= ALLOC_CMA;
+
+#endif
+   return alloc_flags;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -4316,10 +4330,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-   if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+   alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+
return alloc_flags;
 }
 
@@ -4619,8 +4631,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
wake_all_kswapds(order, gfp_mask, ac);
 
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
-   if (reserve_flags)
+   if (reserve_flags) {
alloc_flags = reserve_flags;
+   alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+   }
 
/*
 * Reset the nodemask and zonelist iterators if memory policies

[PATCH v2 4/4] mm/gup: use a standard migration target allocation callback

2020-07-19 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback. Use it.

Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 54 ++
 1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4ba822a..628ca4c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,52 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, 
long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non CMA
-* allocation memory.
-*
-* Note that CMA region is prohibited by allocation scope.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
-
-   return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1668,6 +1622,10 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+   };
 
 check_again:
for (i = 0; i < nr_pages;) {
@@ -1713,8 +1671,8 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH v2 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

2020-07-19 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c requires to allocate the new page that is not
on the CMA area. new_non_cma_page() implements it by using allocation
scope APIs.

However, there is a work-around for hugetlb. Normal hugetlb page
allocation API for migration is alloc_huge_page_nodemask(). It consists
of two steps. First is dequeing from the pool. Second is, if there is no
available page on the queue, allocating by using the page allocator.

new_non_cma_page() can't use this API since first step (deque) isn't
aware of scope API to exclude CMA area. So, new_non_cma_page() exports
hugetlb internal function for the second step, alloc_migrate_huge_page(),
to global scope and uses it directly. This is suboptimal since hugetlb
pages on the queue cannot be utilized.

This patch tries to fix this situation by making the deque function on
hugetlb CMA aware. In the deque function, CMA memory is skipped if
PF_MEMALLOC_NOCMA flag is found.

Acked-by: Mike Kravetz 
Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  6 +-
 mm/hugetlb.c| 11 +--
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2660b04..fb2b5aa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int 
preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
diff --git a/mm/gup.c b/mm/gup.c
index bbd36a1..4ba822a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1634,11 +1634,7 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct hstate *h = page_hstate(page);
 
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..d9eb923 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 {
struct page *page;
+   bool nocma = !!(current->flags & PF_MEMALLOC_NOCMA);
+
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (nocma && is_migrate_cma_page(page))
+   continue;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 int nid, nodemask_t *nmask)
 {
struct page *page;
-- 
2.7.4

[PATCH v2 1/4] mm/page_alloc: fix non cma alloc context

2020-07-19 Thread js1304

From: Joonsoo Kim 

Currently, preventing cma area in page allocation is implemented by using
current_gfp_context(). However, there are two problems of this
implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath.
Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
cma area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding cma area in allocation.

Fixes: d7fefcc8de91 (mm/cma: add PF flag to force non cma alloc)
Cc: 
Signed-off-by: Joonsoo Kim 
---
 include/linux/sched/mm.h |  8 +---
 mm/page_alloc.c  | 37 -
 2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 44ad5b7..6c652ec 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -175,14 +175,12 @@ static inline bool in_vfork(struct task_struct *tsk)
  * Applies per-task gfp context to the given allocation flags.
  * PF_MEMALLOC_NOIO implies GFP_NOIO
  * PF_MEMALLOC_NOFS implies GFP_NOFS
- * PF_MEMALLOC_NOCMA implies no allocation from CMA region.
  */
 static inline gfp_t current_gfp_context(gfp_t flags)
 {
unsigned int pflags = READ_ONCE(current->flags);
 
-   if (unlikely(pflags &
-(PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | 
PF_MEMALLOC_NOCMA))) {
+   if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS))) {
/*
 * NOIO implies both NOIO and NOFS and it is a weaker context
 * so always make sure it makes precedence
@@ -191,10 +189,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
flags &= ~(__GFP_IO | __GFP_FS);
else if (pflags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
-#ifdef CONFIG_CMA
-   if (pflags & PF_MEMALLOC_NOCMA)
-   flags &= ~__GFP_MOVABLE;
-#endif
}
return flags;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6416d08..b529220 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 * allocating from CMA when over half of the zone's free memory
 * is in the CMA area.
 */
-   if (migratetype == MIGRATE_MOVABLE &&
+   if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
@@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
-   if (migratetype == MIGRATE_MOVABLE)
+   if (alloc_flags & ALLOC_CMA)
page = __rmqueue_cma_fallback(zone, order);
 
if (!page && __rmqueue_fallback(zone, order, migratetype,
@@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct 
zone *z,
if (likely(!alloc_harder))
unusable_free += z->nr_reserved_highatomic;
 
-#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
-   if (!(alloc_flags & ALLOC_CMA))
+   if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
return unusable_free;
 }
@@ -3693,6 +3691,20 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
return alloc_flags;
 }
 
+static inline unsigned int current_alloc_flags(gfp_t gfp_mask,
+   unsigned int alloc_flags)
+{
+#ifdef CONFIG_CMA
+   unsigned int pflags = current->flags;
+
+   if (!(pflags & PF_MEMALLOC_NOCMA) &&
+   gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+   alloc_flags |= ALLOC_CMA;
+
+#endif
+   return alloc_flags;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -4339,10 +4351,8 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-   if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
+   alloc_flags = current_alloc_flags(gfp_mask, alloc_flags);
+
return alloc_flags;
 }
 
@@ -4642,8 +4652,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int 
order,
wake_all_kswapds(order, gfp_mask, ac);
 
reserve_flags = __gfp_pfmemalloc_flags(gfp_mask);
-   if

[PATCH v2 2/4] mm/gup: restrict CMA region by using allocation scope API

2020-07-19 Thread js1304

From: Joonsoo Kim 

We have well defined scope API to exclude CMA region.
Use it rather than manipulating gfp_mask manually. With this change,
we can now restore __GFP_MOVABLE for gfp_mask like as usual migration
target allocation. It would result in that the ZONE_MOVABLE is also
searched by page allocator. For hugetlb, gfp_mask is redefined since
it has a regular allocation mask filter for migration target.
__GPF_NOWARN is added to hugetlb gfp_mask filter since a new user for
gfp_mask filter, gup, want to be silent when allocation fails.

Note that this can be considered as a fix for the commit 9a4e9f3b2d73
("mm: update get_user_pages_longterm to migrate pages allocated from
CMA region"). However, "Fixes" tag isn't added here since it is just
suboptimal but it doesn't cause any problem.

Suggested-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 ++
 mm/gup.c| 17 -
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b9508d..2660b04 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate 
*h, gfp_t gfp_mask)
/* Some callers might want to enfoce node */
modified_mask |= (gfp_mask & __GFP_THISNODE);
 
+   modified_mask |= (gfp_mask & __GFP_NOWARN);
+
return modified_mask;
 }
 
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..bbd36a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1619,10 +1619,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 * Trying to allocate a page for migration. Ignore allocation
 * failure warnings. We don't force __GFP_THISNODE here because
 * this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
+* in some case these nodes will have really less non CMA
 * allocation memory.
+*
+* Note that CMA region is prohibited by allocation scope.
 */
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
 
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1630,6 +1632,8 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
+   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
@@ -1644,11 +1648,6 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 */
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
 
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
if (!thp)
return NULL;
@@ -1794,7 +1793,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
 vmas_tmp, NULL, gup_flags);
 
if (gup_flags & FOLL_LONGTERM) {
-   memalloc_nocma_restore(flags);
if (rc < 0)
goto out;
 
@@ -1807,9 +1805,10 @@ static long __gup_longterm_locked(struct task_struct 
*tsk,
 
rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
 vmas_tmp, gup_flags);
+out:
+   memalloc_nocma_restore(flags);
}
 
-out:
if (vmas_tmp != vmas)
kfree(vmas_tmp);
return rc;
-- 
2.7.4

[PATCH 2/4] mm/gup: restrict CMA region by using allocation scope API

2020-07-14 Thread js1304

From: Joonsoo Kim 

We have well defined scope API to exclude CMA region.
Use it rather than manipulating gfp_mask manually. With this change,
we can now use __GFP_MOVABLE for gfp_mask and the ZONE_MOVABLE is also
searched by page allocator. For hugetlb, gfp_mask is redefined since
it has a regular allocation mask filter for migration target.

Note that this can be considered as a fix for the commit 9a4e9f3b2d73
("mm: update get_user_pages_longterm to migrate pages allocated from
CMA region"). However, "Fixes" tag isn't added here since it is just
suboptimal but it doesn't cause any problem.

Suggested-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 ++
 mm/gup.c| 17 -
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6b9508d..2660b04 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -708,6 +708,8 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate 
*h, gfp_t gfp_mask)
/* Some callers might want to enfoce node */
modified_mask |= (gfp_mask & __GFP_THISNODE);
 
+   modified_mask |= (gfp_mask & __GFP_NOWARN);
+
return modified_mask;
 }
 
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..bbd36a1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1619,10 +1619,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 * Trying to allocate a page for migration. Ignore allocation
 * failure warnings. We don't force __GFP_THISNODE here because
 * this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
+* in some case these nodes will have really less non CMA
 * allocation memory.
+*
+* Note that CMA region is prohibited by allocation scope.
 */
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
 
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
@@ -1630,6 +1632,8 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
+   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
@@ -1644,11 +1648,6 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 */
gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
 
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
if (!thp)
return NULL;
@@ -1794,7 +1793,6 @@ static long __gup_longterm_locked(struct task_struct *tsk,
 vmas_tmp, NULL, gup_flags);
 
if (gup_flags & FOLL_LONGTERM) {
-   memalloc_nocma_restore(flags);
if (rc < 0)
goto out;
 
@@ -1807,9 +1805,10 @@ static long __gup_longterm_locked(struct task_struct 
*tsk,
 
rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages,
 vmas_tmp, gup_flags);
+out:
+   memalloc_nocma_restore(flags);
}
 
-out:
if (vmas_tmp != vmas)
kfree(vmas_tmp);
return rc;
-- 
2.7.4

[PATCH 3/4] mm/hugetlb: make hugetlb migration callback CMA aware

2020-07-14 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c requires to allocate the new page that is not
on the CMA area. new_non_cma_page() implements it by using allocation
scope APIs.

However, there is a work-around for hugetlb. Normal hugetlb page
allocation API for migration is alloc_huge_page_nodemask(). It consists
of two steps. First is dequeing from the pool. Second is, if there is no
available page on the queue, allocating by using the page allocator.

new_non_cma_page() can't use this API since first step (deque) isn't
aware of scope API to exclude CMA area. So, new_non_cma_page() exports
hugetlb internal function for the second step, alloc_migrate_huge_page(),
to global scope and uses it directly. This is suboptimal since hugetlb
pages on the queue cannot be utilized.

This patch tries to fix this situation by making the deque function on
hugetlb CMA aware. In the deque function, CMA memory is skipped if
PF_MEMALLOC_NOCMA flag is found.

Acked-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  6 +-
 mm/hugetlb.c| 11 +--
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 2660b04..fb2b5aa 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int 
preferred_nid,
nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
diff --git a/mm/gup.c b/mm/gup.c
index bbd36a1..4ba822a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1634,11 +1634,7 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct hstate *h = page_hstate(page);
 
gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..514e29c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1036,10 +1037,16 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
 static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
 {
struct page *page;
+   bool nocma = !!(READ_ONCE(current->flags) & PF_MEMALLOC_NOCMA);
+
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (nocma && is_migrate_cma_page(page))
+   continue;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1928,7 +1935,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
+static struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
 int nid, nodemask_t *nmask)
 {
struct page *page;
-- 
2.7.4

[PATCH 4/4] mm/gup: use a standard migration target allocation callback

2020-07-14 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback. Use it.

Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 54 ++
 1 file changed, 6 insertions(+), 48 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4ba822a..628ca4c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,52 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, 
long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non CMA
-* allocation memory.
-*
-* Note that CMA region is prohibited by allocation scope.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   gfp_mask = htlb_modify_alloc_mask(h, gfp_mask);
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
-
-   return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1668,6 +1622,10 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_NOWARN,
+   };
 
 check_again:
for (i = 0; i < nr_pages;) {
@@ -1713,8 +1671,8 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH 1/4] mm/page_alloc: fix non cma alloc context

2020-07-14 Thread js1304

From: Joonsoo Kim 

Currently, preventing cma area in page allocation is implemented by using
current_gfp_context(). However, there are two problems of this
implementation.

First, this doesn't work for allocation fastpath. In the fastpath,
original gfp_mask is used since current_gfp_context() is introduced in
order to control reclaim and it is on slowpath.
Second, clearing __GFP_MOVABLE has a side effect to exclude the memory
on the ZONE_MOVABLE for allocation target.

To fix these problems, this patch changes the implementation to exclude
cma area in page allocation. Main point of this change is using the
alloc_flags. alloc_flags is mainly used to control allocation so it fits
for excluding cma area in allocation.

Fixes: d7fefcc (mm/cma: add PF flag to force non cma alloc)
Cc: 
Signed-off-by: Joonsoo Kim 
---
 include/linux/sched/mm.h |  4 
 mm/page_alloc.c  | 27 +++
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 44ad5b7..a73847a 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -191,10 +191,6 @@ static inline gfp_t current_gfp_context(gfp_t flags)
flags &= ~(__GFP_IO | __GFP_FS);
else if (pflags & PF_MEMALLOC_NOFS)
flags &= ~__GFP_FS;
-#ifdef CONFIG_CMA
-   if (pflags & PF_MEMALLOC_NOCMA)
-   flags &= ~__GFP_MOVABLE;
-#endif
}
return flags;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6416d08..cd53894 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2791,7 +2791,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 * allocating from CMA when over half of the zone's free memory
 * is in the CMA area.
 */
-   if (migratetype == MIGRATE_MOVABLE &&
+   if (alloc_flags & ALLOC_CMA &&
zone_page_state(zone, NR_FREE_CMA_PAGES) >
zone_page_state(zone, NR_FREE_PAGES) / 2) {
page = __rmqueue_cma_fallback(zone, order);
@@ -2802,7 +2802,7 @@ __rmqueue(struct zone *zone, unsigned int order, int 
migratetype,
 retry:
page = __rmqueue_smallest(zone, order, migratetype);
if (unlikely(!page)) {
-   if (migratetype == MIGRATE_MOVABLE)
+   if (alloc_flags & ALLOC_CMA)
page = __rmqueue_cma_fallback(zone, order);
 
if (!page && __rmqueue_fallback(zone, order, migratetype,
@@ -3502,11 +3502,9 @@ static inline long __zone_watermark_unusable_free(struct 
zone *z,
if (likely(!alloc_harder))
unusable_free += z->nr_reserved_highatomic;
 
-#ifdef CONFIG_CMA
/* If allocation can't use CMA areas don't use free CMA pages */
-   if (!(alloc_flags & ALLOC_CMA))
+   if (IS_ENABLED(CONFIG_CMA) && !(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
-#endif
 
return unusable_free;
 }
@@ -3693,6 +3691,16 @@ alloc_flags_nofragment(struct zone *zone, gfp_t gfp_mask)
return alloc_flags;
 }
 
+static inline void current_alloc_flags(gfp_t gfp_mask,
+   unsigned int *alloc_flags)
+{
+   unsigned int pflags = READ_ONCE(current->flags);
+
+   if (!(pflags & PF_MEMALLOC_NOCMA) &&
+   gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
+   *alloc_flags |= ALLOC_CMA;
+}
+
 /*
  * get_page_from_freelist goes through the zonelist trying to allocate
  * a page.
@@ -3706,6 +3714,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int 
order, int alloc_flags,
struct pglist_data *last_pgdat_dirty_limit = NULL;
bool no_fallback;
 
+   current_alloc_flags(gfp_mask, _flags);
+
 retry:
/*
 * Scan zonelist, looking for a zone with enough free.
@@ -4339,10 +4349,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
} else if (unlikely(rt_task(current)) && !in_interrupt())
alloc_flags |= ALLOC_HARDER;
 
-#ifdef CONFIG_CMA
-   if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-   alloc_flags |= ALLOC_CMA;
-#endif
return alloc_flags;
 }
 
@@ -4808,9 +4814,6 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, 
unsigned int order,
if (should_fail_alloc_page(gfp_mask, order))
return false;
 
-   if (IS_ENABLED(CONFIG_CMA) && ac->migratetype == MIGRATE_MOVABLE)
-   *alloc_flags |= ALLOC_CMA;
-
return true;
 }
 
-- 
2.7.4

[PATCH v5 5/9] mm/migrate: make a standard migration target allocation function

2020-07-13 Thread js1304

From: Joonsoo Kim 

There are some similar functions for migration target allocation.  Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants.  This patch implements base migration target
allocation function.  In the following patches, variants will be converted
to use this function.

Changes should be mechanical, but, unfortunately, there are some
differences. First, some callers' nodemask is assgined to NULL since NULL
nodemask will be considered as all available nodes, that is,
_states[N_MEMORY]. Second, for hugetlb page allocation, gfp_mask is
redefined as regular hugetlb allocation gfp_mask plus __GFP_THISNODE if
user provided gfp_mask has it. This is because future caller of this
function requires to set this node constaint. Lastly, if provided nodeid
is NUMA_NO_NODE, nodeid is set up to the node where migration source
lives. It helps to remove simple wrappers for setting up the nodeid.

Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.

Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 15 +++
 include/linux/migrate.h |  9 +
 mm/internal.h   |  7 +++
 mm/memory-failure.c |  7 +--
 mm/memory_hotplug.c | 12 
 mm/migrate.c| 26 --
 mm/page_isolation.c |  7 +--
 7 files changed, 61 insertions(+), 22 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index bb93e95..6b9508d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -701,6 +701,16 @@ static inline gfp_t htlb_alloc_mask(struct hstate *h)
return GFP_HIGHUSER;
 }
 
+static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
+{
+   gfp_t modified_mask = htlb_alloc_mask(h);
+
+   /* Some callers might want to enfoce node */
+   modified_mask |= (gfp_mask & __GFP_THISNODE);
+
+   return modified_mask;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
@@ -888,6 +898,11 @@ static inline gfp_t htlb_alloc_mask(struct hstate *h)
return 0;
 }
 
+static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask)
+{
+   return 0;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..cc56f0d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,8 @@
 typedef struct page *new_page_t(struct page *page, unsigned long private);
 typedef void free_page_t(struct page *page, unsigned long private);
 
+struct migration_target_control;
+
 /*
  * Return values from addresss_space_operations.migratepage():
  * - negative errno on page migration failure;
@@ -39,8 +41,7 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask);
+extern struct page *alloc_migration_target(struct page *page, unsigned long 
private);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -59,8 +60,8 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
+static inline struct page *alloc_migration_target(struct page *page,
+   unsigned long private)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/mm/internal.h b/mm/internal.h
index dd14c53..0beacf3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -614,4 +614,11 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 
 void setup_zone_pageset(struct zone *zone);
 extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+
+struct migration_target_control {
+   int nid;/* preferred node id */
+   nodemask_t *nmask;
+   gfp_t gfp_mask;
+};
+
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c5e4cee..609d42b6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1679,9 +1679,12 @@ EXPORT_SYMBOL(unpoison_memory);
 
 static struct page *new_page(struct page *p, unsigned long private)

[PATCH v5 8/9] mm/memory-failure: remove a wrapper for alloc_migration_target()

2020-07-13 Thread js1304

From: Joonsoo Kim 

There is a well-defined standard migration target callback. Use it
directly.

Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/memory-failure.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 609d42b6..3b89804 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
 }
 EXPORT_SYMBOL(unpoison_memory);
 
-static struct page *new_page(struct page *p, unsigned long private)
-{
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(p),
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(p, (unsigned long));
-}
-
 /*
  * Safely get reference count of an arbitrary page.
  * Returns 0 for a free page, -EIO for a zero refcount page
@@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
const char *msg_page[] = {"page", "hugepage"};
bool huge = PageHuge(page);
LIST_HEAD(pagelist);
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
/*
 * Check PageHWPoison again inside page lock because PageHWPoison
@@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
}
 
if (isolate_page(hpage, )) {
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (!ret) {
bool release = !huge;
 
-- 
2.7.4

[PATCH v5 6/9] mm/mempolicy: use a standard migration target allocation callback

2020-07-13 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback.  Use it.

Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/internal.h  |  1 -
 mm/mempolicy.c | 31 ++-
 mm/migrate.c   |  8 ++--
 3 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 0beacf3..10c6776 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -613,7 +613,6 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
 
 struct migration_target_control {
int nid;/* preferred node id */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 9034a53..93fcfc1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,29 +1065,6 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
return 0;
 }
 
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, unsigned long node)
-{
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(compound_head(page));
-   gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-
-   return alloc_huge_page_nodemask(h, node, NULL, gfp_mask);
-   } else if (PageTransHuge(page)) {
-   struct page *thp;
-
-   thp = alloc_pages_node(node,
-   (GFP_TRANSHUGE | __GFP_THISNODE),
-   HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   } else
-   return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-   __GFP_THISNODE, 0);
-}
-
 /*
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
@@ -1098,6 +1075,10 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
nodemask_t nmask;
LIST_HEAD(pagelist);
int err = 0;
+   struct migration_target_control mtc = {
+   .nid = dest,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
nodes_clear(nmask);
node_set(source, nmask);
@@ -1112,8 +1093,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
flags | MPOL_MF_DISCONTIG_OK, );
 
if (!list_empty()) {
-   err = migrate_pages(, alloc_new_node_page, NULL, dest,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages();
}
diff --git a/mm/migrate.c b/mm/migrate.c
index c35ba2a..1a891c4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1591,9 +1591,13 @@ static int do_move_pages_to_node(struct mm_struct *mm,
struct list_head *pagelist, int node)
 {
int err;
+   struct migration_target_control mtc = {
+   .nid = node,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
-   err = migrate_pages(pagelist, alloc_new_node_page, NULL, node,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(pagelist, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
return err;
-- 
2.7.4

[PATCH v5 2/9] mm/migrate: move migration helper from .h to .c

2020-07-13 Thread js1304

From: Joonsoo Kim 

It's not performance sensitive function.  Move it to .c.  This is a
preparation step for future change.

Acked-by: Mike Kravetz 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 33 +
 mm/migrate.c| 29 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
 extern const char *migrate_reason_names[MR_TYPES];
 
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
-   unsigned int order = 0;
-   struct page *new_page = NULL;
-
-   if (PageHuge(page))
-   return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
-
-   if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
-   order = HPAGE_PMD_ORDER;
-   }
-
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   new_page = __alloc_pages_nodemask(gfp_mask, order,
-   preferred_nid, nodemask);
-
-   if (new_page && PageTransHuge(new_page))
-   prep_transhuge_page(new_page);
-
-   return new_page;
-}
-
 #ifdef CONFIG_MIGRATION
 
 extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+   { return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index d105b67..7370a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1531,6 +1531,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
+struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+{
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   unsigned int order = 0;
+   struct page *new_page = NULL;
+
+   if (PageHuge(page))
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)),
+   preferred_nid, nodemask);
+
+   if (PageTransHuge(page)) {
+   gfp_mask |= GFP_TRANSHUGE;
+   order = HPAGE_PMD_ORDER;
+   }
+
+   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   gfp_mask |= __GFP_HIGHMEM;
+
+   new_page = __alloc_pages_nodemask(gfp_mask, order,
+   preferred_nid, nodemask);
+
+   if (new_page && PageTransHuge(new_page))
+   prep_transhuge_page(new_page);
+
+   return new_page;
+}
+
 #ifdef CONFIG_NUMA
 
 static int store_status(int __user *status, int start, int value, int nr)
-- 
2.7.4

[PATCH v5 9/9] mm/memory_hotplug: remove a wrapper for alloc_migration_target()

2020-07-13 Thread js1304

From: Joonsoo Kim 

To calculate the correct node to migrate the page for hotplug, we need
to check node id of the page. Wrapper for alloc_migration_target() exists
for this purpose.

However, Vlastimil informs that all migration source pages come from
a single node. In this case, we don't need to check the node id for each
page and we don't need to re-set the target nodemask for each page by
using the wrapper. Set up the migration_target_control once and use it for
all pages.

Acked-by: Vlastimil Babka 
Acked-by: Michal Hocko 
Signed-off-by: Joonsoo Kim 
---
 mm/memory_hotplug.c | 46 ++
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 431b470f..7c216d6 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1265,27 +1265,6 @@ static int scan_movable_pages(unsigned long start, 
unsigned long end,
return 0;
 }
 
-static struct page *new_node_page(struct page *page, unsigned long private)
-{
-   nodemask_t nmask = node_states[N_MEMORY];
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(page),
-   .nmask = ,
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   /*
-* try to allocate from a different node but reuse this node if there
-* are no other online nodes to be used (e.g. we are offlining a part
-* of the only existing node)
-*/
-   node_clear(mtc.nid, nmask);
-   if (nodes_empty(nmask))
-   node_set(mtc.nid, nmask);
-
-   return alloc_migration_target(page, (unsigned long));
-}
-
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
put_page(page);
}
if (!list_empty()) {
-   /* Allocate a new page from the nearest neighbor node */
-   ret = migrate_pages(, new_node_page, NULL, 0,
-   MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+   nodemask_t nmask = node_states[N_MEMORY];
+   struct migration_target_control mtc = {
+   .nmask = ,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | 
__GFP_RETRY_MAYFAIL,
+   };
+
+   /*
+* We have checked that migration range is on a single zone so
+* we can use the nid of the first page to all the others.
+*/
+   mtc.nid = page_to_nid(list_first_entry(, struct page, 
lru));
+
+   /*
+* try to allocate from a different node but reuse this node
+* if there are no other online nodes to be used (e.g. we are
+* offlining a part of the only existing node)
+*/
+   node_clear(mtc.nid, nmask);
+   if (nodes_empty(nmask))
+   node_set(mtc.nid, nmask);
+   ret = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret) {
list_for_each_entry(page, , lru) {
pr_warn("migrating pfn %lx failed ret:%d ",
-- 
2.7.4

[PATCH v5 0/9] clean-up the migration target allocation functions

2020-07-13 Thread js1304

From: Joonsoo Kim 

This patchset clean-up the migration target allocation functions.

* Changes on v5
- remove new_non_cma_page() related patches
(implementation for memalloc_nocma_{save,restore} has a critical bug that
cannot exclude CMA memory in some cases so cannot use them here. Need to
fix them first.)
- introduce a wrapper to handle gfp_mask for hugetlb use it

* Changes on v4
- use full gfp_mask
- use memalloc_nocma_{save,restore} to exclude CMA memory
- separate __GFP_RECLAIM handling for THP allocation
- remove more wrapper functions

* Changes on v3
- As Vlastimil suggested, do not introduce alloc_control for hugetlb functions
- do not change the signature of migrate_pages()
- rename alloc_control to migration_target_control

* Changes on v2
- add acked-by tags
- fix missing compound_head() call for patch #3
- remove thisnode field on alloc_control and use __GFP_THISNODE directly
- fix missing __gfp_mask setup for patch
"mm/hugetlb: do not modify user provided gfp_mask"

* Cover-letter

Contributions of this patchset are:

1. unify two hugetlb alloc functions. As a result, one is remained.
2. remove one implementation for migration target allocaion
3. remove three wrapper for migration target allocation

The patchset is based on next-20200703 + revert following commits.
ddc017c727e429488cccd401a7794c8152e50a5b~1..583c2617fd3244fff79ba3b445964884c5cd7780

The patchset is available on:

https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v5.00-next-20200703

Thanks.
Joonsoo Kim (9):
  mm/page_isolation: prefer the node of the source page
  mm/migrate: move migration helper from .h to .c
  mm/hugetlb: unify migration callbacks
  mm/migrate: clear __GFP_RECLAIM to make the migration callback
consistent with regular THP allocations
  mm/migrate: make a standard migration target allocation function
  mm/mempolicy: use a standard migration target allocation callback
  mm/page_alloc: remove a wrapper for alloc_migration_target()
  mm/memory-failure: remove a wrapper for alloc_migration_target()
  mm/memory_hotplug: remove a wrapper for alloc_migration_target()

 include/linux/hugetlb.h | 41 +++
 include/linux/migrate.h | 34 ++---
 mm/hugetlb.c| 35 ++---
 mm/internal.h   |  8 +++-
 mm/memory-failure.c | 15 ++-
 mm/memory_hotplug.c | 42 +---
 mm/mempolicy.c  | 29 ++--
 mm/migrate.c| 51 +++--
 mm/page_alloc.c |  8 ++--
 mm/page_isolation.c |  5 -
 10 files changed, 137 insertions(+), 131 deletions(-)

-- 
2.7.4

[PATCH v5 1/9] mm/page_isolation: prefer the node of the source page

2020-07-13 Thread js1304

From: Joonsoo Kim 

For locality, it's better to migrate the page to the same node rather than
the node of the current caller's cpu.

Acked-by: Roman Gushchin 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/page_isolation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5..aec26d9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -309,5 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private)
 {
-   return new_page_nodemask(page, numa_node_id(), _states[N_MEMORY]);
+   int nid = page_to_nid(page);
+
+   return new_page_nodemask(page, nid, _states[N_MEMORY]);
 }
-- 
2.7.4

[PATCH v5 4/9] mm/migrate: clear __GFP_RECLAIM to make the migration callback consistent with regular THP allocations

2020-07-13 Thread js1304

From: Joonsoo Kim 

new_page_nodemask is a migration callback and it tries to use a common
gfp flags for the target page allocation whether it is a base page or a
THP. The later only adds GFP_TRANSHUGE to the given mask. This results
in the allocation being slightly more aggressive than necessary because
the resulting gfp mask will contain also __GFP_RECLAIM_KSWAPD. THP
allocations usually exclude this flag to reduce over eager background
reclaim during a high THP allocation load which has been seen during
large mmaps initialization. There is no indication that this is a
problem for migration as well but theoretically the same might happen
when migrating large mappings to a different node. Make the migration
callback consistent with regular THP allocations.

Signed-off-by: Joonsoo Kim 
---
 mm/migrate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 3b3d918..1cfc965 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
}
 
if (PageTransHuge(page)) {
+   /*
+* clear __GFP_RECALIM to make the migration callback
+* consistent with regular THP allocations.
+*/
+   gfp_mask &= ~__GFP_RECLAIM;
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
-- 
2.7.4

[PATCH v5 3/9] mm/hugetlb: unify migration callbacks

2020-07-13 Thread js1304

From: Joonsoo Kim 

There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. It's redundant to have two almost similar
functions in order to handle this flag. So, this patch tries to
remove one by introducing a new argument, gfp_mask, to
alloc_huge_page_nodemask().

After introducing gfp_mask argument, it's caller's job to provide correct
gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
to provide gfp_mask.

Note that it's safe to remove a node id check in alloc_huge_page_node()
since there is no caller passing NUMA_NO_NODE as a node id.

Reviewed-by: Mike Kravetz 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 26 ++
 mm/hugetlb.c| 35 ++-
 mm/mempolicy.c  | 10 ++
 mm/migrate.c| 11 +++
 4 files changed, 33 insertions(+), 49 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 50650d0..bb93e95 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct ctl_table;
 struct user_struct;
@@ -504,9 +505,8 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask);
+   nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
 struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
@@ -692,6 +692,15 @@ static inline bool hugepage_movable_supported(struct 
hstate *h)
return true;
 }
 
+/* Movability of hugepages depends on migration support. */
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+   if (hugepage_movable_supported(h))
+   return GFP_HIGHUSER_MOVABLE;
+   else
+   return GFP_HIGHUSER;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
@@ -759,13 +768,9 @@ static inline struct page *alloc_huge_page(struct 
vm_area_struct *vma,
return NULL;
 }
 
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   return NULL;
-}
-
 static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t 
*nmask)
+alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
return NULL;
 }
@@ -878,6 +883,11 @@ static inline bool hugepage_movable_supported(struct 
hstate *h)
return false;
 }
 
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+   return 0;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7e5ba5c0..3245aa0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1089,15 +1089,6 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h, gfp_t gfp_mask,
return NULL;
 }
 
-/* Movability of hugepages depends on migration support. */
-static inline gfp_t htlb_alloc_mask(struct hstate *h)
-{
-   if (hugepage_movable_supported(h))
-   return GFP_HIGHUSER_MOVABLE;
-   else
-   return GFP_HIGHUSER;
-}
-
 static struct page *dequeue_huge_page_vma(struct hstate *h,
struct vm_area_struct *vma,
unsigned long address, int avoid_reserve,
@@ -1979,31 +1970,9 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 }
 
 /* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   gfp_t gfp_mask = htlb_alloc_mask(h);
-   struct page *page = NULL;
-
-   if (nid != NUMA_NO_NODE)
-   gfp_mask |= __GFP_THISNODE;
-
-   spin_lock(_lock);
-   if (h->free_huge_pages - h->resv_huge_pages > 0)
-   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
-   spin_unlock(_lock);
-
-   if (!page)
-   page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
-
-   return page;
-}
-
-/* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask)
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
-   gfp_t gfp_mask = htlb_alloc_mask(h);
-
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
struct

[PATCH v5 7/9] mm/page_alloc: remove a wrapper for alloc_migration_target()

2020-07-13 Thread js1304

From: Joonsoo Kim 

There is a well-defined standard migration target callback.  Use it
directly.

Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c |  8 ++--
 mm/page_isolation.c | 10 --
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3b70ee..6416d08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8354,6 +8354,10 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+   struct migration_target_control mtc = {
+   .nid = zone_to_nid(cc->zone),
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
migrate_prep();
 
@@ -8380,8 +8384,8 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
>migratepages);
cc->nr_migratepages -= nr_reclaimed;
 
-   ret = migrate_pages(>migratepages, alloc_migrate_target,
-   NULL, 0, cc->mode, MR_CONTIG_RANGE);
+   ret = migrate_pages(>migratepages, alloc_migration_target,
+   NULL, (unsigned long), cc->mode, 
MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(>migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f25c66e..242c031 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -306,13 +306,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
return pfn < end_pfn ? -EBUSY : 0;
 }
-
-struct page *alloc_migrate_target(struct page *page, unsigned long private)
-{
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(page),
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(page, (unsigned long));
-}
-- 
2.7.4

[PATCH v4 11/11] mm/memory_hotplug: remove a wrapper for alloc_migration_target()

2020-07-07 Thread js1304

From: Joonsoo Kim 

To calculate the correct node to migrate the page for hotplug, we need
to check node id of the page. Wrapper for alloc_migration_target() exists
for this purpose.

However, Vlastimil informs that all migration source pages come from
a single node. In this case, we don't need to check the node id for each
page and we don't need to re-set the target nodemask for each page by
using the wrapper. Set up the migration_target_control once and use it for
all pages.

Signed-off-by: Joonsoo Kim 
---
 mm/memory_hotplug.c | 46 ++
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 86bc2ad..269e8ca 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1265,27 +1265,6 @@ static int scan_movable_pages(unsigned long start, 
unsigned long end,
return 0;
 }
 
-static struct page *new_node_page(struct page *page, unsigned long private)
-{
-   nodemask_t nmask = node_states[N_MEMORY];
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(page),
-   .nmask = ,
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   /*
-* try to allocate from a different node but reuse this node if there
-* are no other online nodes to be used (e.g. we are offlining a part
-* of the only existing node)
-*/
-   node_clear(mtc.nid, *mtc.nmask);
-   if (nodes_empty(*mtc.nmask))
-   node_set(mtc.nid, *mtc.nmask);
-
-   return alloc_migration_target(page, (unsigned long));
-}
-
 static int
 do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
@@ -1345,9 +1324,28 @@ do_migrate_range(unsigned long start_pfn, unsigned long 
end_pfn)
put_page(page);
}
if (!list_empty()) {
-   /* Allocate a new page from the nearest neighbor node */
-   ret = migrate_pages(, new_node_page, NULL, 0,
-   MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
+   nodemask_t nmask = node_states[N_MEMORY];
+   struct migration_target_control mtc = {
+   .nmask = ,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | 
__GFP_RETRY_MAYFAIL,
+   };
+
+   /*
+* We have checked that migration range is on a single zone so
+* we can use the nid of the first page to all the others.
+*/
+   mtc.nid = page_to_nid(list_first_entry(, struct page, 
lru));
+
+   /*
+* try to allocate from a different node but reuse this node
+* if there are no other online nodes to be used (e.g. we are
+* offlining a part of the only existing node)
+*/
+   node_clear(mtc.nid, *mtc.nmask);
+   if (nodes_empty(*mtc.nmask))
+   node_set(mtc.nid, *mtc.nmask);
+   ret = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_MEMORY_HOTPLUG);
if (ret) {
list_for_each_entry(page, , lru) {
pr_warn("migrating pfn %lx failed ret:%d ",
-- 
2.7.4

[PATCH v4 10/11] mm/memory-failure: remove a wrapper for alloc_migration_target()

2020-07-07 Thread js1304

From: Joonsoo Kim 

There is a well-defined standard migration target callback. Use it
directly.

Signed-off-by: Joonsoo Kim 
---
 mm/memory-failure.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 609d42b6..3b89804 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1677,16 +1677,6 @@ int unpoison_memory(unsigned long pfn)
 }
 EXPORT_SYMBOL(unpoison_memory);
 
-static struct page *new_page(struct page *p, unsigned long private)
-{
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(p),
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(p, (unsigned long));
-}
-
 /*
  * Safely get reference count of an arbitrary page.
  * Returns 0 for a free page, -EIO for a zero refcount page
@@ -1793,6 +1783,10 @@ static int __soft_offline_page(struct page *page)
const char *msg_page[] = {"page", "hugepage"};
bool huge = PageHuge(page);
LIST_HEAD(pagelist);
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
/*
 * Check PageHWPoison again inside page lock because PageHWPoison
@@ -1829,8 +1823,8 @@ static int __soft_offline_page(struct page *page)
}
 
if (isolate_page(hpage, )) {
-   ret = migrate_pages(, new_page, NULL, MPOL_MF_MOVE_ALL,
-   MIGRATE_SYNC, MR_MEMORY_FAILURE);
+   ret = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_MEMORY_FAILURE);
if (!ret) {
bool release = !huge;
 
-- 
2.7.4

[PATCH v4 09/11] mm/page_alloc: remove a wrapper for alloc_migration_target()

2020-07-07 Thread js1304

From: Joonsoo Kim 

There is a well-defined standard migration target callback.  Use it
directly.

Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c |  8 ++--
 mm/page_isolation.c | 10 --
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f3b70ee..6416d08 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8354,6 +8354,10 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+   struct migration_target_control mtc = {
+   .nid = zone_to_nid(cc->zone),
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
migrate_prep();
 
@@ -8380,8 +8384,8 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
>migratepages);
cc->nr_migratepages -= nr_reclaimed;
 
-   ret = migrate_pages(>migratepages, alloc_migrate_target,
-   NULL, 0, cc->mode, MR_CONTIG_RANGE);
+   ret = migrate_pages(>migratepages, alloc_migration_target,
+   NULL, (unsigned long), cc->mode, 
MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(>migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f25c66e..242c031 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -306,13 +306,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
return pfn < end_pfn ? -EBUSY : 0;
 }
-
-struct page *alloc_migrate_target(struct page *page, unsigned long private)
-{
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(page),
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(page, (unsigned long));
-}
-- 
2.7.4

[PATCH v4 08/11] mm/mempolicy: use a standard migration target allocation callback

2020-07-07 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback.  Use it.

Acked-by: Michal Hocko 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/internal.h  |  1 -
 mm/mempolicy.c | 31 ++-
 mm/migrate.c   |  8 ++--
 3 files changed, 12 insertions(+), 28 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 3236fef..6205d8a 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -613,7 +613,6 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
 
 struct migration_target_control {
int nid;/* preferred node id */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 667b453..93fcfc1 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,29 +1065,6 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
return 0;
 }
 
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, unsigned long node)
-{
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(compound_head(page));
-   gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
-
-   return alloc_huge_page_nodemask(h, node, NULL, gfp_mask, false);
-   } else if (PageTransHuge(page)) {
-   struct page *thp;
-
-   thp = alloc_pages_node(node,
-   (GFP_TRANSHUGE | __GFP_THISNODE),
-   HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   } else
-   return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-   __GFP_THISNODE, 0);
-}
-
 /*
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
@@ -1098,6 +1075,10 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
nodemask_t nmask;
LIST_HEAD(pagelist);
int err = 0;
+   struct migration_target_control mtc = {
+   .nid = dest,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
nodes_clear(nmask);
node_set(source, nmask);
@@ -1112,8 +1093,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
flags | MPOL_MF_DISCONTIG_OK, );
 
if (!list_empty()) {
-   err = migrate_pages(, alloc_new_node_page, NULL, dest,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages();
}
diff --git a/mm/migrate.c b/mm/migrate.c
index ab18b9c..b7eac38 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1599,9 +1599,13 @@ static int do_move_pages_to_node(struct mm_struct *mm,
struct list_head *pagelist, int node)
 {
int err;
+   struct migration_target_control mtc = {
+   .nid = node,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
-   err = migrate_pages(pagelist, alloc_new_node_page, NULL, node,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(pagelist, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
return err;
-- 
2.7.4

[PATCH v4 00/11] clean-up the migration target allocation functions

2020-07-07 Thread js1304

From: Joonsoo Kim 

This patchset clean-up the migration target allocation functions.

* Changes on v4
- use full gfp_mask
- use memalloc_nocma_{save,restore} to exclude CMA memory
- separate __GFP_RECLAIM handling for THP allocation
- remove more wrapper functions

* Changes on v3
- As Vlastimil suggested, do not introduce alloc_control for hugetlb functions
- do not change the signature of migrate_pages()
- rename alloc_control to migration_target_control

* Changes on v2
- add acked-by tags
- fix missing compound_head() call for patch #3
- remove thisnode field on alloc_control and use __GFP_THISNODE directly
- fix missing __gfp_mask setup for patch
"mm/hugetlb: do not modify user provided gfp_mask"

* Cover-letter

Contributions of this patchset are:
1. unify two hugetlb alloc functions. As a result, one is remained.
2. make one external hugetlb alloc function to internal one.
3. unify three functions for migration target allocation.

The patchset is based on next-20200703 + revert v3 of this patchset.

git revert 
ddc017c727e429488cccd401a7794c8152e50a5b~1..583c2617fd3244fff79ba3b445964884c5cd7780

The patchset is available on:

https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v4.00-next-20200703

Thanks.

Joonsoo Kim (11):
  mm/page_isolation: prefer the node of the source page
  mm/migrate: move migration helper from .h to .c
  mm/hugetlb: unify migration callbacks
  mm/hugetlb: make hugetlb migration callback CMA aware
  mm/migrate: clear __GFP_RECLAIM for THP allocation for migration
  mm/migrate: make a standard migration target allocation function
  mm/gup: use a standard migration target allocation callback
  mm/mempolicy: use a standard migration target allocation callback
  mm/page_alloc: remove a wrapper for alloc_migration_target()
  mm/memory-failure: remove a wrapper for alloc_migration_target()
  mm/memory_hotplug: remove a wrapper for alloc_migration_target()

 include/linux/hugetlb.h | 28 ---
 include/linux/migrate.h | 34 +--
 mm/gup.c| 60 +
 mm/hugetlb.c| 71 +++--
 mm/internal.h   |  9 ++-
 mm/memory-failure.c | 15 +--
 mm/memory_hotplug.c | 42 +++--
 mm/mempolicy.c  | 29 +---
 mm/migrate.c| 59 ++--
 mm/page_alloc.c |  8 --
 mm/page_isolation.c |  5 
 11 files changed, 163 insertions(+), 197 deletions(-)

-- 
2.7.4

[PATCH v4 01/11] mm/page_isolation: prefer the node of the source page

2020-07-07 Thread js1304

From: Joonsoo Kim 

For locality, it's better to migrate the page to the same node rather than
the node of the current caller's cpu.

Acked-by: Roman Gushchin 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/page_isolation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5..aec26d9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -309,5 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private)
 {
-   return new_page_nodemask(page, numa_node_id(), _states[N_MEMORY]);
+   int nid = page_to_nid(page);
+
+   return new_page_nodemask(page, nid, _states[N_MEMORY]);
 }
-- 
2.7.4

[PATCH v4 07/11] mm/gup: use a standard migration target allocation callback

2020-07-07 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback.  It's mostly
similar with new_non_cma_page() except considering CMA pages.

This patch adds a CMA consideration to the standard migration target
allocation callback and use it on gup.c.

Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c  | 61 +++
 mm/internal.h |  1 +
 mm/migrate.c  |  9 -
 3 files changed, 16 insertions(+), 55 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2c3dab4..6a74c30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,58 +1608,6 @@ static bool check_dax_vmas(struct vm_area_struct **vmas, 
long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
-{
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
-* allocation memory.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
-
-   return __alloc_pages_node(nid, gfp_mask, 0);
-}
-
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
struct mm_struct *mm,
unsigned long start,
@@ -1674,6 +1622,11 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
bool migrate_allow = true;
LIST_HEAD(cma_page_list);
long ret = nr_pages;
+   struct migration_target_control mtc = {
+   .nid = NUMA_NO_NODE,
+   .gfp_mask = GFP_USER | __GFP_NOWARN,
+   .skip_cma = true,
+   };
 
 check_again:
for (i = 0; i < nr_pages;) {
@@ -1719,8 +1672,8 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
diff --git a/mm/internal.h b/mm/internal.h
index 0beacf3..3236fef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -619,6 +619,7 @@ struct migration_target_control {
int nid;/* preferred node id */
nodemask_t *nmask;
gfp_t gfp_mask;
+   bool skip_cma;
 };
 
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 00cd81c..ab18b9c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1539,6 +1539,7 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
struct page *new_page = NULL;
int nid;
int zidx;
+   unsigned int flags = 0;
 
mtc = (struct migration_target_control *)private;
gfp_mask = mtc->gfp_mask;
@@ -1551,9 +1552,12 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
 
gfp_mask |= htlb_alloc_mask(h);
return alloc_huge_page_nodemask(h, nid, mtc->nmask,
-   gfp_mask, false);
+   gfp_mask, mtc->skip_cma);
}
 
+   if (mtc->skip_cma)
+   flags = memalloc_nocma_save();
+
if (PageTransHuge(page)) {

[PATCH v4 05/11] mm/migrate: clear __GFP_RECLAIM for THP allocation for migration

2020-07-07 Thread js1304

From: Joonsoo Kim 

In mm/migrate.c, THP allocation for migration is called with the provided
gfp_mask | GFP_TRANSHUGE. This gfp_mask contains __GFP_RECLAIM and it
would be conflict with the intention of the GFP_TRANSHUGE.

GFP_TRANSHUGE/GFP_TRANSHUGE_LIGHT is introduced to control the reclaim
behaviour by well defined manner since overhead of THP allocation is
quite large and the whole system could suffer from it. So, they deals
with __GFP_RECLAIM mask deliberately. If gfp_mask contains __GFP_RECLAIM
and uses gfp_mask | GFP_TRANSHUGE(_LIGHT) for THP allocation, it means
that it breaks the purpose of the GFP_TRANSHUGE(_LIGHT).

This patch fixes this situation by clearing __GFP_RECLAIM in provided
gfp_mask. Note that there are some other THP allocations for migration
and they just uses GFP_TRANSHUGE(_LIGHT) directly. This patch would make
all THP allocation for migration consistent.

Signed-off-by: Joonsoo Kim 
---
 mm/migrate.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/mm/migrate.c b/mm/migrate.c
index 02b31fe..ecd7615 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1547,6 +1547,11 @@ struct page *new_page_nodemask(struct page *page,
}
 
if (PageTransHuge(page)) {
+   /*
+* clear __GFP_RECALIM since GFP_TRANSHUGE is the gfp_mask
+* that chooses the reclaim masks deliberately.
+*/
+   gfp_mask &= ~__GFP_RECLAIM;
gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
-- 
2.7.4

[PATCH v4 06/11] mm/migrate: make a standard migration target allocation function

2020-07-07 Thread js1304

From: Joonsoo Kim 

There are some similar functions for migration target allocation.  Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants.  This patch implements base migration target
allocation function.  In the following patches, variants will be converted
to use this function.

Changes should be mechanical but there are some differences. First, Some
callers' nodemask is assgined to NULL since NULL nodemask will be
considered as all available nodes, that is, _states[N_MEMORY].
Second, for hugetlb page allocation, gfp_mask is ORed since a user could
provide a gfp_mask from now on. Third, if provided node id is NUMA_NO_NODE,
node id is set up to the node where migration source lives.

Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.

Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h |  9 +
 mm/internal.h   |  7 +++
 mm/memory-failure.c |  7 +--
 mm/memory_hotplug.c | 14 +-
 mm/migrate.c| 27 +--
 mm/page_isolation.c |  7 +--
 6 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..cc56f0d 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,8 @@
 typedef struct page *new_page_t(struct page *page, unsigned long private);
 typedef void free_page_t(struct page *page, unsigned long private);
 
+struct migration_target_control;
+
 /*
  * Return values from addresss_space_operations.migratepage():
  * - negative errno on page migration failure;
@@ -39,8 +41,7 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask);
+extern struct page *alloc_migration_target(struct page *page, unsigned long 
private);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -59,8 +60,8 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
+static inline struct page *alloc_migration_target(struct page *page,
+   unsigned long private)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/mm/internal.h b/mm/internal.h
index dd14c53..0beacf3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -614,4 +614,11 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 
 void setup_zone_pageset(struct zone *zone);
 extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+
+struct migration_target_control {
+   int nid;/* preferred node id */
+   nodemask_t *nmask;
+   gfp_t gfp_mask;
+};
+
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c5e4cee..609d42b6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1679,9 +1679,12 @@ EXPORT_SYMBOL(unpoison_memory);
 
 static struct page *new_page(struct page *p, unsigned long private)
 {
-   int nid = page_to_nid(p);
+   struct migration_target_control mtc = {
+   .nid = page_to_nid(p),
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
-   return new_page_nodemask(p, nid, _states[N_MEMORY]);
+   return alloc_migration_target(p, (unsigned long));
 }
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cafe65eb..86bc2ad 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1267,19 +1267,23 @@ static int scan_movable_pages(unsigned long start, 
unsigned long end,
 
 static struct page *new_node_page(struct page *page, unsigned long private)
 {
-   int nid = page_to_nid(page);
nodemask_t nmask = node_states[N_MEMORY];
+   struct migration_target_control mtc = {
+   .nid = page_to_nid(page),
+   .nmask = ,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
/*
 * try to allocate from a different node but reuse this node if there
 * are no other online nodes to be used (e.g. we are offlining a part
 * of the only existing node)
 */
-   node_clear(nid, nmask);
-   if (nodes_empty(nmask))
-   node_set(nid, nmask);
+   node_clear(mtc.nid, *mtc.nmask);

[PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware

2020-07-07 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c which try to allocate migration target page
requires to allocate the new page that is not on the CMA area.
new_non_cma_page() implements it by removing __GFP_MOVABLE flag.  This way
works well for THP page or normal page but not for hugetlb page.

hugetlb page allocation process consists of two steps.  First is dequeing
from the pool.  Second is, if there is no available page on the queue,
allocating from the page allocator.

new_non_cma_page() can control allocation from the page allocator by
specifying correct gfp flag.  However, dequeing cannot be controlled until
now, so, new_non_cma_page() skips dequeing completely.  It is a suboptimal
since new_non_cma_page() cannot utilize hugetlb pages on the queue so this
patch tries to fix this situation.

This patch makes the deque function on hugetlb CMA aware and skip CMA
pages if newly added skip_cma argument is passed as true.

Acked-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  6 ++
 mm/gup.c|  3 ++-
 mm/hugetlb.c| 46 ++
 mm/mempolicy.c  |  2 +-
 mm/migrate.c|  2 +-
 5 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index bb93e95..5a9ddf1 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -506,11 +506,9 @@ struct huge_bootmem_page {
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask, gfp_t gfp_mask);
+   nodemask_t *nmask, gfp_t gfp_mask, bool 
skip_cma);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
@@ -770,7 +768,7 @@ static inline struct page *alloc_huge_page(struct 
vm_area_struct *vma,
 
 static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask, gfp_t gfp_mask)
+   nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
 {
return NULL;
 }
diff --git a/mm/gup.c b/mm/gup.c
index 5daadae..2c3dab4 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
 */
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3245aa0..bcf4abe 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1033,13 +1034,18 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
h->free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, 
bool skip_cma)
 {
struct page *page;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (skip_cma && is_migrate_cma_page(page))
+   continue;
+
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1054,7 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct 
hstate *h, int nid)
 }
 
 static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t 
gfp_mask, int nid,
-   nodemask_t *nmask)
+   nodemask_t *nmask, bool skip_cma)
 {
unsigned int cpuset_mems_cookie;
struct zonelist *zonelist;
@@ -1079,7 +1085,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h, gfp_t gfp_mask,
continue;
node = zone_to_nid(zone);
 
-   page = dequeue_huge_page_node_exact(h, node);
+   page = dequeue_huge_page_node_exact(h, node, skip_cma);
if (page)
return page;
}
@@ -1115,7 +1121,7 @@ static struct page

[PATCH v4 03/11] mm/hugetlb: unify migration callbacks

2020-07-07 Thread js1304

From: Joonsoo Kim 

There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. It's redundant to have two almost similar
functions in order to handle this flag. So, this patch tries to
remove one by introducing a new argument, gfp_mask, to
alloc_huge_page_nodemask().

After introducing gfp_mask argument, it's caller's job to provide correct
gfp_mask. So, every callsites for alloc_huge_page_nodemask() are changed
to provide gfp_mask.

Note that it's safe to remove a node id check in alloc_huge_page_node()
since there is no caller passing NUMA_NO_NODE as a node id.

Reviewed-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 26 ++
 mm/hugetlb.c| 35 ++-
 mm/mempolicy.c  | 10 ++
 mm/migrate.c| 11 +++
 4 files changed, 33 insertions(+), 49 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 50650d0..bb93e95 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct ctl_table;
 struct user_struct;
@@ -504,9 +505,8 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask);
+   nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
 struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
@@ -692,6 +692,15 @@ static inline bool hugepage_movable_supported(struct 
hstate *h)
return true;
 }
 
+/* Movability of hugepages depends on migration support. */
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+   if (hugepage_movable_supported(h))
+   return GFP_HIGHUSER_MOVABLE;
+   else
+   return GFP_HIGHUSER;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
@@ -759,13 +768,9 @@ static inline struct page *alloc_huge_page(struct 
vm_area_struct *vma,
return NULL;
 }
 
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   return NULL;
-}
-
 static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t 
*nmask)
+alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
return NULL;
 }
@@ -878,6 +883,11 @@ static inline bool hugepage_movable_supported(struct 
hstate *h)
return false;
 }
 
+static inline gfp_t htlb_alloc_mask(struct hstate *h)
+{
+   return 0;
+}
+
 static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
   struct mm_struct *mm, pte_t *pte)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7e5ba5c0..3245aa0 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1089,15 +1089,6 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h, gfp_t gfp_mask,
return NULL;
 }
 
-/* Movability of hugepages depends on migration support. */
-static inline gfp_t htlb_alloc_mask(struct hstate *h)
-{
-   if (hugepage_movable_supported(h))
-   return GFP_HIGHUSER_MOVABLE;
-   else
-   return GFP_HIGHUSER;
-}
-
 static struct page *dequeue_huge_page_vma(struct hstate *h,
struct vm_area_struct *vma,
unsigned long address, int avoid_reserve,
@@ -1979,31 +1970,9 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 }
 
 /* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   gfp_t gfp_mask = htlb_alloc_mask(h);
-   struct page *page = NULL;
-
-   if (nid != NUMA_NO_NODE)
-   gfp_mask |= __GFP_THISNODE;
-
-   spin_lock(_lock);
-   if (h->free_huge_pages - h->resv_huge_pages > 0)
-   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
-   spin_unlock(_lock);
-
-   if (!page)
-   page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
-
-   return page;
-}
-
-/* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask)
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
-   gfp_t gfp_mask = htlb_alloc_mask(h);
-
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
struct page *page;
@@ -2031,7 +2000,7 @@ struct page

[PATCH v4 02/11] mm/migrate: move migration helper from .h to .c

2020-07-07 Thread js1304

From: Joonsoo Kim 

It's not performance sensitive function.  Move it to .c.  This is a
preparation step for future change.

Acked-by: Mike Kravetz 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 33 +
 mm/migrate.c| 29 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
 extern const char *migrate_reason_names[MR_TYPES];
 
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
-   unsigned int order = 0;
-   struct page *new_page = NULL;
-
-   if (PageHuge(page))
-   return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
-
-   if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
-   order = HPAGE_PMD_ORDER;
-   }
-
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   new_page = __alloc_pages_nodemask(gfp_mask, order,
-   preferred_nid, nodemask);
-
-   if (new_page && PageTransHuge(new_page))
-   prep_transhuge_page(new_page);
-
-   return new_page;
-}
-
 #ifdef CONFIG_MIGRATION
 
 extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+   { return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index d105b67..7370a66 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1531,6 +1531,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
+struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+{
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   unsigned int order = 0;
+   struct page *new_page = NULL;
+
+   if (PageHuge(page))
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)),
+   preferred_nid, nodemask);
+
+   if (PageTransHuge(page)) {
+   gfp_mask |= GFP_TRANSHUGE;
+   order = HPAGE_PMD_ORDER;
+   }
+
+   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   gfp_mask |= __GFP_HIGHMEM;
+
+   new_page = __alloc_pages_nodemask(gfp_mask, order,
+   preferred_nid, nodemask);
+
+   if (new_page && PageTransHuge(new_page))
+   prep_transhuge_page(new_page);
+
+   return new_page;
+}
+
 #ifdef CONFIG_NUMA
 
 static int store_status(int __user *status, int start, int value, int nr)
-- 
2.7.4

[PATCH v3 6/8] mm/gup: use a standard migration target allocation callback

2020-06-23 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback.
It's mostly similar with new_non_cma_page() except considering CMA pages.

This patch adds a CMA consideration to the standard migration target
allocation callback and use it on gup.c.

Signed-off-by: Joonsoo Kim 
---
 mm/gup.c  | 57 -
 mm/internal.h |  1 +
 mm/migrate.c  |  4 +++-
 3 files changed, 12 insertions(+), 50 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 15be281..f6124e3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,56 +1608,15 @@ static bool check_dax_vmas(struct vm_area_struct 
**vmas, long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page, unsigned long private)
+static struct page *alloc_migration_target_non_cma(struct page *page, unsigned 
long private)
 {
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
-* allocation memory.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-#ifdef CONFIG_HUGETLB_PAGE
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
-   }
-#endif
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
+   struct migration_target_control mtc = {
+   .nid = page_to_nid(page),
+   .gfp_mask = GFP_USER | __GFP_NOWARN,
+   .skip_cma = true,
+   };
 
-   return __alloc_pages_node(nid, gfp_mask, 0);
+   return alloc_migration_target(page, (unsigned long));
 }
 
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
@@ -1719,7 +1678,7 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
+   if (migrate_pages(_page_list, 
alloc_migration_target_non_cma,
  NULL, 0, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
diff --git a/mm/internal.h b/mm/internal.h
index f725aa8..fb7f7fe 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -619,6 +619,7 @@ struct migration_target_control {
int nid;/* preferred node id */
nodemask_t *nmask;
gfp_t gfp_mask;
+   bool skip_cma;
 };
 
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/migrate.c b/mm/migrate.c
index 3afff59..7c4cd74 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1550,7 +1550,7 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
if (PageHuge(page)) {
return alloc_huge_page_nodemask(
page_hstate(compound_head(page)), mtc->nid,
-   mtc->nmask, gfp_mask, false);
+   mtc->nmask, gfp_mask, mtc->skip_cma);
}
 
if (PageTransHuge(page)) {
@@ -1561,6 +1561,8 @@ struct page *alloc_migration_target(struct page *page, 
unsigned long private)
zidx = zone_idx(page_zone(page));
if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;
+   if (mtc->skip_cma)
+   gfp_mask &= ~__GFP_MOVABLE;
 
new_page = __alloc_pages_nodemask(gfp_mask, order,
mtc->nid, mtc->nmask);
-- 
2.7.4

[PATCH v3 2/8] mm/migrate: move migration helper from .h to .c

2020-06-23 Thread js1304

From: Joonsoo Kim 

It's not performance sensitive function. Move it to .c.
This is a preparation step for future change.

Acked-by: Mike Kravetz 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 33 +
 mm/migrate.c| 29 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
 extern const char *migrate_reason_names[MR_TYPES];
 
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
-   unsigned int order = 0;
-   struct page *new_page = NULL;
-
-   if (PageHuge(page))
-   return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
-
-   if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
-   order = HPAGE_PMD_ORDER;
-   }
-
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   new_page = __alloc_pages_nodemask(gfp_mask, order,
-   preferred_nid, nodemask);
-
-   if (new_page && PageTransHuge(new_page))
-   prep_transhuge_page(new_page);
-
-   return new_page;
-}
-
 #ifdef CONFIG_MIGRATION
 
 extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+   { return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index c95912f..6b5c75b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1536,6 +1536,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
+struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+{
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   unsigned int order = 0;
+   struct page *new_page = NULL;
+
+   if (PageHuge(page))
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)),
+   preferred_nid, nodemask);
+
+   if (PageTransHuge(page)) {
+   gfp_mask |= GFP_TRANSHUGE;
+   order = HPAGE_PMD_ORDER;
+   }
+
+   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   gfp_mask |= __GFP_HIGHMEM;
+
+   new_page = __alloc_pages_nodemask(gfp_mask, order,
+   preferred_nid, nodemask);
+
+   if (new_page && PageTransHuge(new_page))
+   prep_transhuge_page(new_page);
+
+   return new_page;
+}
+
 #ifdef CONFIG_NUMA
 
 static int store_status(int __user *status, int start, int value, int nr)
-- 
2.7.4

[PATCH v3 3/8] mm/hugetlb: unify migration callbacks

2020-06-23 Thread js1304

From: Joonsoo Kim 

There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. This patch adds an argument, gfp_mask, on
alloc_huge_page_nodemask() and replace the callsite for
alloc_huge_page_node() with the call to
alloc_huge_page_nodemask(..., __GFP_THISNODE).

It's safe to remove a node id check in alloc_huge_page_node() since
there is no caller passing NUMA_NO_NODE as a node id.

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 11 +++
 mm/hugetlb.c| 26 +++---
 mm/mempolicy.c  |  9 +
 mm/migrate.c|  5 +++--
 4 files changed, 14 insertions(+), 37 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 50650d0..8a8b755 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -504,9 +504,8 @@ struct huge_bootmem_page {
 
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask);
+   nodemask_t *nmask, gfp_t gfp_mask);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
 struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
@@ -759,13 +758,9 @@ static inline struct page *alloc_huge_page(struct 
vm_area_struct *vma,
return NULL;
 }
 
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   return NULL;
-}
-
 static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t 
*nmask)
+alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
return NULL;
 }
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d54bb7e..bd408f2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1979,30 +1979,10 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 }
 
 /* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h, int nid)
-{
-   gfp_t gfp_mask = htlb_alloc_mask(h);
-   struct page *page = NULL;
-
-   if (nid != NUMA_NO_NODE)
-   gfp_mask |= __GFP_THISNODE;
-
-   spin_lock(_lock);
-   if (h->free_huge_pages - h->resv_huge_pages > 0)
-   page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL);
-   spin_unlock(_lock);
-
-   if (!page)
-   page = alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
-
-   return page;
-}
-
-/* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask)
+   nodemask_t *nmask, gfp_t gfp_mask)
 {
-   gfp_t gfp_mask = htlb_alloc_mask(h);
+   gfp_mask |= htlb_alloc_mask(h);
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
@@ -2031,7 +2011,7 @@ struct page *alloc_huge_page_vma(struct hstate *h, struct 
vm_area_struct *vma,
 
gfp_mask = htlb_alloc_mask(h);
node = huge_node(vma, address, gfp_mask, , );
-   page = alloc_huge_page_nodemask(h, node, nodemask);
+   page = alloc_huge_page_nodemask(h, node, nodemask, 0);
mpol_cond_put(mpol);
 
return page;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index b9e85d4..f21cff5 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1068,10 +1068,11 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
 /* page allocation callback for NUMA node migration */
 struct page *alloc_new_node_page(struct page *page, unsigned long node)
 {
-   if (PageHuge(page))
-   return alloc_huge_page_node(page_hstate(compound_head(page)),
-   node);
-   else if (PageTransHuge(page)) {
+   if (PageHuge(page)) {
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)), node,
+   NULL, __GFP_THISNODE);
+   } else if (PageTransHuge(page)) {
struct page *thp;
 
thp = alloc_pages_node(node,
diff --git a/mm/migrate.c b/mm/migrate.c
index 6b5c75b..6ca9f0c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1543,10 +1543,11 @@ struct page *new_page_nodemask(struct page *page,
unsigned int order = 0;
struct page *new_page = NULL;
 
-   if (PageHuge(page))
+   if (PageHuge(page)) {
return alloc_huge_page_nodemask(
page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
+   preferred_nid, nodemask, 0);
+   }
 
if (PageTransHuge(page)) {

[PATCH v3 5/8] mm/migrate: make a standard migration target allocation function

2020-06-23 Thread js1304

From: Joonsoo Kim 

There are some similar functions for migration target allocation. Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants. This patch implements base migration target
allocation function. In the following patches, variants will be converted
to use this function.

Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h |  5 +++--
 mm/internal.h   |  7 +++
 mm/memory-failure.c |  8 ++--
 mm/memory_hotplug.c | 14 +-
 mm/migrate.c| 21 +
 mm/page_isolation.c |  8 ++--
 6 files changed, 44 insertions(+), 19 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..5e9c866 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -10,6 +10,8 @@
 typedef struct page *new_page_t(struct page *page, unsigned long private);
 typedef void free_page_t(struct page *page, unsigned long private);
 
+struct migration_target_control;
+
 /*
  * Return values from addresss_space_operations.migratepage():
  * - negative errno on page migration failure;
@@ -39,8 +41,7 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask);
+extern struct page *alloc_migration_target(struct page *page, unsigned long 
private);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
diff --git a/mm/internal.h b/mm/internal.h
index 42cf0b6..f725aa8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -614,4 +614,11 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 
 void setup_zone_pageset(struct zone *zone);
 extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
+
+struct migration_target_control {
+   int nid;/* preferred node id */
+   nodemask_t *nmask;
+   gfp_t gfp_mask;
+};
+
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 47b8ccb..820ea5e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1648,9 +1648,13 @@ EXPORT_SYMBOL(unpoison_memory);
 
 static struct page *new_page(struct page *p, unsigned long private)
 {
-   int nid = page_to_nid(p);
+   struct migration_target_control mtc = {
+   .nid = page_to_nid(p),
+   .nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
-   return new_page_nodemask(p, nid, _states[N_MEMORY]);
+   return alloc_migration_target(p, (unsigned long));
 }
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index be3c62e3..d2b65a5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1259,19 +1259,23 @@ static int scan_movable_pages(unsigned long start, 
unsigned long end,
 
 static struct page *new_node_page(struct page *page, unsigned long private)
 {
-   int nid = page_to_nid(page);
nodemask_t nmask = node_states[N_MEMORY];
+   struct migration_target_control mtc = {
+   .nid = page_to_nid(page),
+   .nmask = ,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
/*
 * try to allocate from a different node but reuse this node if there
 * are no other online nodes to be used (e.g. we are offlining a part
 * of the only existing node)
 */
-   node_clear(nid, nmask);
-   if (nodes_empty(nmask))
-   node_set(nid, nmask);
+   node_clear(mtc.nid, *mtc.nmask);
+   if (nodes_empty(*mtc.nmask))
+   node_set(mtc.nid, *mtc.nmask);
 
-   return new_page_nodemask(page, nid, );
+   return alloc_migration_target(page, (unsigned long));
 }
 
 static int
diff --git a/mm/migrate.c b/mm/migrate.c
index 634f1ea..3afff59 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1536,29 +1536,34 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
-struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
+struct page *alloc_migration_target(struct page *page, unsigned long private)
 {
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   struct migration_target_control *mtc;
+   gfp_t gfp_mask;
unsigned int order = 0;
struct page *new_page = NULL;
+   int zidx;
+
+   mtc = (struct migration_target_control *)private;
+   gfp_mask = mtc->gfp_mask;
 
if

[PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback

2020-06-23 Thread js1304

From: Joonsoo Kim 

There is a well-defined migration target allocation callback.
Use it.

Signed-off-by: Joonsoo Kim 
---
 mm/internal.h  |  1 -
 mm/mempolicy.c | 30 ++
 mm/migrate.c   |  8 ++--
 3 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index fb7f7fe..4f9f6b6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -613,7 +613,6 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-extern struct page *alloc_new_node_page(struct page *page, unsigned long node);
 
 struct migration_target_control {
int nid;/* preferred node id */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index a3abf64..85a3f21 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,28 +1065,6 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
return 0;
 }
 
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, unsigned long node)
-{
-   if (PageHuge(page)) {
-   return alloc_huge_page_nodemask(
-   page_hstate(compound_head(page)), node,
-   NULL, __GFP_THISNODE, false);
-   } else if (PageTransHuge(page)) {
-   struct page *thp;
-
-   thp = alloc_pages_node(node,
-   (GFP_TRANSHUGE | __GFP_THISNODE),
-   HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   } else
-   return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE |
-   __GFP_THISNODE, 0);
-}
-
 /*
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
@@ -1097,6 +1075,10 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
nodemask_t nmask;
LIST_HEAD(pagelist);
int err = 0;
+   struct migration_target_control mtc = {
+   .nid = dest,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
nodes_clear(nmask);
node_set(source, nmask);
@@ -,8 +1093,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
flags | MPOL_MF_DISCONTIG_OK, );
 
if (!list_empty()) {
-   err = migrate_pages(, alloc_new_node_page, NULL, dest,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages();
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 7c4cd74..1c943b0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1590,9 +1590,13 @@ static int do_move_pages_to_node(struct mm_struct *mm,
struct list_head *pagelist, int node)
 {
int err;
+   struct migration_target_control mtc = {
+   .nid = node,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
+   };
 
-   err = migrate_pages(pagelist, alloc_new_node_page, NULL, node,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(pagelist, alloc_migration_target, NULL,
+   (unsigned long), MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
return err;
-- 
2.7.4

[PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware

2020-06-23 Thread js1304

From: Joonsoo Kim 

new_non_cma_page() in gup.c which try to allocate migration target page
requires to allocate the new page that is not on the CMA area.
new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way
works well for THP page or normal page but not for hugetlb page.

hugetlb page allocation process consists of two steps. First is dequeing
from the pool. Second is, if there is no available page on the queue,
allocating from the page allocator.

new_non_cma_page() can control allocation from the page allocator by
specifying correct gfp flag. However, dequeing cannot be controlled until
now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal
since new_non_cma_page() cannot utilize hugetlb pages on the queue so
this patch tries to fix this situation.

This patch makes the deque function on hugetlb CMA aware and skip CMA
pages if newly added skip_cma argument is passed as true.

Acked-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  6 ++
 mm/gup.c|  3 ++-
 mm/hugetlb.c| 31 ++-
 mm/mempolicy.c  |  2 +-
 mm/migrate.c|  2 +-
 5 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 8a8b755..858522e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,11 +505,9 @@ struct huge_bootmem_page {
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve);
 struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask, gfp_t gfp_mask);
+   nodemask_t *nmask, gfp_t gfp_mask, bool 
skip_cma);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
@@ -760,7 +758,7 @@ static inline struct page *alloc_huge_page(struct 
vm_area_struct *vma,
 
 static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask, gfp_t gfp_mask)
+   nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma)
 {
return NULL;
 }
diff --git a/mm/gup.c b/mm/gup.c
index 6f47697..15be281 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 #ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
 */
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true);
}
 #endif
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bd408f2..1410e62 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1033,13 +1033,18 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
h->free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, 
bool skip_cma)
 {
struct page *page;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (skip_cma && is_migrate_cma_page(page))
+   continue;
+
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1054,7 +1059,7 @@ static struct page *dequeue_huge_page_node_exact(struct 
hstate *h, int nid)
 }
 
 static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t 
gfp_mask, int nid,
-   nodemask_t *nmask)
+   nodemask_t *nmask, bool skip_cma)
 {
unsigned int cpuset_mems_cookie;
struct zonelist *zonelist;
@@ -1079,7 +1084,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h, gfp_t gfp_mask,
continue;
node = zone_to_nid(zone);
 
-   page = dequeue_huge_page_node_exact(h, node);
+   page = dequeue_huge_page_node_exact(h, node, skip_cma);
if (page)
return page;
}
@@ -1124,7 +1129,7 @@ static struct page *dequeue_huge_page_vma(struct hstate 
*h,
 
gfp_mask = htlb_alloc_mask(h);
nid = huge_node(vma,

[PATCH v3 8/8] mm/page_alloc: remove a wrapper for alloc_migration_target()

2020-06-23 Thread js1304

From: Joonsoo Kim 

There is a well-defined standard migration target callback.
Use it directly.

Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c |  9 +++--
 mm/page_isolation.c | 11 ---
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9808339..884dfb5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8359,6 +8359,11 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+   struct migration_target_control mtc = {
+   .nid = zone_to_nid(cc->zone),
+   .nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
migrate_prep();
 
@@ -8385,8 +8390,8 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
>migratepages);
cc->nr_migratepages -= nr_reclaimed;
 
-   ret = migrate_pages(>migratepages, alloc_migrate_target,
-   NULL, 0, cc->mode, MR_CONTIG_RANGE);
+   ret = migrate_pages(>migratepages, alloc_migration_target,
+   NULL, (unsigned long), cc->mode, 
MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(>migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index adba031..242c031 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -306,14 +306,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
return pfn < end_pfn ? -EBUSY : 0;
 }
-
-struct page *alloc_migrate_target(struct page *page, unsigned long private)
-{
-   struct migration_target_control mtc = {
-   .nid = page_to_nid(page),
-   .nmask = _states[N_MEMORY],
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(page, (unsigned long));
-}
-- 
2.7.4

[PATCH v3 1/8] mm/page_isolation: prefer the node of the source page

2020-06-23 Thread js1304

From: Joonsoo Kim 

For locality, it's better to migrate the page to the same node
rather than the node of the current caller's cpu.

Acked-by: Roman Gushchin 
Acked-by: Michal Hocko 
Reviewed-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 mm/page_isolation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f6d07c5..aec26d9 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -309,5 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private)
 {
-   return new_page_nodemask(page, numa_node_id(), _states[N_MEMORY]);
+   int nid = page_to_nid(page);
+
+   return new_page_nodemask(page, nid, _states[N_MEMORY]);
 }
-- 
2.7.4

[PATCH v3 0/8] clean-up the migration target allocation functions

2020-06-23 Thread js1304

From: Joonsoo Kim 

This patchset clean-up the migration target allocation functions.

* Changes on v3
- do not introduce alloc_control for hugetlb functions
- do not change the signature of migrate_pages()
- rename alloc_control to migration_target_control

* Changes on v2
- add acked-by tags
- fix missing compound_head() call for patch #3
- remove thisnode field on alloc_control and use __GFP_THISNODE directly
- fix missing __gfp_mask setup for patch
"mm/hugetlb: do not modify user provided gfp_mask"

* Cover-letter

Contributions of this patchset are:
1. unify two hugetlb alloc functions. As a result, one is remained.
2. make one external hugetlb alloc function to internal one.
3. unify three functions for migration target allocation.

The patchset is based on next-20200621.
The patchset is available on:

https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v3.00-next-20200621

Thanks.

Joonsoo Kim (8):
  mm/page_isolation: prefer the node of the source page
  mm/migrate: move migration helper from .h to .c
  mm/hugetlb: unify migration callbacks
  mm/hugetlb: make hugetlb migration callback CMA aware
  mm/migrate: make a standard migration target allocation function
  mm/gup: use a standard migration target allocation callback
  mm/mempolicy: use a standard migration target allocation callback
  mm/page_alloc: remove a wrapper for alloc_migration_target()

 include/linux/hugetlb.h | 13 +++-
 include/linux/migrate.h | 34 ++
 mm/gup.c| 56 +++--
 mm/hugetlb.c| 53 --
 mm/internal.h   |  9 +++-
 mm/memory-failure.c |  8 +--
 mm/memory_hotplug.c | 14 -
 mm/mempolicy.c  | 29 ++---
 mm/migrate.c| 45 +--
 mm/page_alloc.c |  9 ++--
 mm/page_isolation.c |  5 -
 11 files changed, 119 insertions(+), 156 deletions(-)

-- 
2.7.4

[PATCH v6 0/6] workingset protection/detection on the anonymous LRU list

2020-06-16 Thread js1304

From: Joonsoo Kim 

Hello,

This patchset implements workingset protection and detection on
the anonymous LRU list.

* Changes on v6
- rework to reflect a new LRU balance model
- remove memcg charge timing stuff on v5 since alternative is already
merged on mainline
- remove readahead stuff on v5 (reason is the same with above)
- clear shadow entry if corresponding swap entry is deleted
(mm/swapcache: support to handle the exceptional entries in swapcache)
- change experiment environment
(from ssd swap to ram swap, for fast evaluation and for reducing side-effect of 
I/O)
- update performance number

* Changes on v5
- change memcg charge timing for the swapped-in page (fault -> swap-in)
- avoid readahead if previous owner of the swapped-out page isn't me
- use another lruvec to update the reclaim_stat for a new anonymous page
- add two more cases to fix up the reclaim_stat

* Changes on v4
- In the patch "mm/swapcache: support to handle the exceptional
entries in swapcache":
-- replace the word "value" with "exceptional entries"
-- add to handle the shadow entry in add_to_swap_cache()
-- support the huge page
-- remove the registration code for shadow shrinker

- remove the patch "mm/workingset: use the node counter
if memcg is the root memcg" since workingset detection for
anonymous page doesn't use shadow shrinker now
- minor style fixes

* Changes on v3
- rework the patch, "mm/vmscan: protect the workingset on anonymous LRU"
(use almost same reference tracking algorithm to the one for the file
mapped page)

* Changes on v2
- fix a critical bug that uses out of index lru list in
workingset_refault()
- fix a bug that reuses the rotate value for previous page

* SUBJECT
workingset protection

* PROBLEM
In current implementation, newly created or swap-in anonymous page is
started on the active list. Growing the active list results in rebalancing
active/inactive list so old pages on the active list are demoted to the
inactive list. Hence, hot page on the active list isn't protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list and system can contain total
100 pages. Numbers denote the number of pages on active/inactive
list (active | inactive). (h) stands for hot pages and (uo) stands for
used-once pages.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

As we can see, hot pages are swapped-out and it would cause swap-in later.

* SOLUTION
Since this is what we want to avoid, this patchset implements workingset
protection. Like as the file LRU list, newly created or swap-in anonymous
page is started on the inactive list. Also, like as the file LRU list,
if enough reference happens, the page will be promoted. This simple
modification changes the above example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

hot pages remains in the active list. :)

* EXPERIMENT
I tested this scenario on my test bed and confirmed that this problem
happens on current implementation. I also checked that it is fixed by
this patchset.


* SUBJECT
workingset detection

* PROBLEM
Later part of the patchset implements the workingset detection for
the anonymous LRU list. There is a corner case that workingset protection
could cause thrashing. If we can avoid thrashing by workingset detection,
we can get the better performance.

Following is an example of thrashing due to the workingset protection.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (will be hot) pages
50(h) | 50(wh)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

4. workload: 50 (will be hot) pages
50(h) | 50(wh), swap-in 50(wh)

5. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(wh)

6. repeat 4, 5

Without workingset detection, this kind of workload cannot be promoted
and thrashing happens forever.

* SOLUTION
Therefore, this patchset implements workingset detection.
All the infrastructure for workingset detecion is already implemented,
so there is not much work to do. First, extend workingset detection
code to deal with the anonymous LRU list. Then, make swap cache handles
the exceptional value for the shadow entry. Lastly, install/retrieve
the shadow value into/from the swap cache and check the refault distance.

* EXPERIMENT
I made a test program to imitates above scenario and confirmed that
problem exists. Then, I checked that this patchset fixes it.

My test setup is a virtual machine with 8 cpus and 6100MB memory. But,
the amount of the memory that the test program can use is about 280 MB.
This is because the system uses large ram-backed swap and large ramdisk
to capture the trace.

Test scenario is like as below.

1.

[PATCH v6 2/6] mm/vmscan: protect the workingset on anonymous LRU

2020-06-16 Thread js1304

From: Joonsoo Kim 

In current implementation, newly created or swap-in anonymous page
is started on active list. Growing active list results in rebalancing
active/inactive list so old pages on active list are demoted to inactive
list. Hence, the page on active list isn't protected at all.

Following is an example of this situation.

Assume that 50 hot pages on active list. Numbers denote the number of
pages on active/inactive list (active | inactive).

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(uo) | 50(h)

3. workload: another 50 newly created (used-once) pages
50(uo) | 50(uo), swap-out 50(h)

This patch tries to fix this issue.
Like as file LRU, newly created or swap-in anonymous pages will be
inserted to the inactive list. They are promoted to active list if
enough reference happens. This simple modification changes the above
example as following.

1. 50 hot pages on active list
50(h) | 0

2. workload: 50 newly created (used-once) pages
50(h) | 50(uo)

3. workload: another 50 newly created (used-once) pages
50(h) | 50(uo), swap-out 50(uo)

As you can see, hot pages on active list would be protected.

Note that, this implementation has a drawback that the page cannot
be promoted and will be swapped-out if re-access interval is greater than
the size of inactive list but less than the size of total(active+inactive).
To solve this potential issue, following patch will apply workingset
detection that is applied to file LRU some day before.

v6: Before this patch, all anon pages (inactive + active) are considered
as workingset. However, with this patch, only active pages are considered
as workingset. So, file refault formula which uses the number of all
anon pages is changed to use only the number of active anon pages.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h|  2 +-
 kernel/events/uprobes.c |  2 +-
 mm/huge_memory.c|  2 +-
 mm/khugepaged.c |  2 +-
 mm/memory.c |  9 -
 mm/migrate.c|  2 +-
 mm/swap.c   | 13 +++--
 mm/swapfile.c   |  2 +-
 mm/userfaultfd.c|  2 +-
 mm/vmscan.c |  4 +---
 mm/workingset.c |  2 --
 11 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5b3216b..f4f5f94 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -353,7 +353,7 @@ extern void deactivate_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
 extern void swap_setup(void);
 
-extern void lru_cache_add_active_or_unevictable(struct page *page,
+extern void lru_cache_add_inactive_or_unevictable(struct page *page,
struct vm_area_struct *vma);
 
 /* linux/mm/vmscan.c */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bb08628..67814de 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -184,7 +184,7 @@ static int __replace_page(struct vm_area_struct *vma, 
unsigned long addr,
if (new_page) {
get_page(new_page);
page_add_new_anon_rmap(new_page, vma, addr, false);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+   lru_cache_add_inactive_or_unevictable(new_page, vma);
} else
/* no new page, just dec_mm_counter for old_page */
dec_mm_counter(mm, MM_ANONPAGES);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78c84be..ffbf5ad 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -640,7 +640,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
entry = mk_huge_pmd(page, vma->vm_page_prot);
entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma);
page_add_new_anon_rmap(page, vma, haddr, true);
-   lru_cache_add_active_or_unevictable(page, vma);
+   lru_cache_add_inactive_or_unevictable(page, vma);
pgtable_trans_huge_deposit(vma->vm_mm, vmf->pmd, pgtable);
set_pmd_at(vma->vm_mm, haddr, vmf->pmd, entry);
add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b043c40..02fb51f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1173,7 +1173,7 @@ static void collapse_huge_page(struct mm_struct *mm,
spin_lock(pmd_ptl);
BUG_ON(!pmd_none(*pmd));
page_add_new_anon_rmap(new_page, vma, address, true);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+   lru_cache_add_inactive_or_unevictable(new_page, vma);
pgtable_trans_huge_deposit(mm, pmd, pgtable);
set_pmd_at(mm, address, pmd, _pmd);
update_mmu_cache_pmd(vma, address, pmd);
diff --git a/mm/memory.c b/mm/memory.c
index 3359057..f221f96 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2711,7 +2711,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)

[PATCH v6 6/6] mm/vmscan: restore active/inactive ratio for anonymous LRU

2020-06-16 Thread js1304

From: Joonsoo Kim 

Now, workingset detection is implemented for anonymous LRU.
We don't have to worry about the misfound for workingset due to
the ratio of active/inactive. Let's restore the ratio.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eb02d18..ec77691 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2211,7 +2211,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum 
lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
 
gb = (inactive + active) >> (30 - PAGE_SHIFT);
-   if (gb && is_file_lru(inactive_lru))
+   if (gb)
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
-- 
2.7.4

[PATCH v6 3/6] mm/workingset: extend the workingset detection for anon LRU

2020-06-16 Thread js1304

From: Joonsoo Kim 

In the following patch, workingset detection will be applied to
anonymous LRU. To prepare it, this patch adds some code to
distinguish/handle the both LRUs.

v6: do not introduce a new nonresident_age for anon LRU since
we need to use *unified* nonresident_age to implement workingset
detection for anon LRU.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 include/linux/mmzone.h | 16 +++-
 mm/memcontrol.c| 16 +++-
 mm/vmscan.c| 15 ++-
 mm/vmstat.c|  9 ++---
 mm/workingset.c|  8 +---
 5 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f6f8849..8e9d0b9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -179,9 +179,15 @@ enum node_stat_item {
NR_ISOLATED_ANON,   /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE,   /* Temporary isolated pages from file lru */
WORKINGSET_NODES,
-   WORKINGSET_REFAULT,
-   WORKINGSET_ACTIVATE,
-   WORKINGSET_RESTORE,
+   WORKINGSET_REFAULT_BASE,
+   WORKINGSET_REFAULT_ANON = WORKINGSET_REFAULT_BASE,
+   WORKINGSET_REFAULT_FILE,
+   WORKINGSET_ACTIVATE_BASE,
+   WORKINGSET_ACTIVATE_ANON = WORKINGSET_ACTIVATE_BASE,
+   WORKINGSET_ACTIVATE_FILE,
+   WORKINGSET_RESTORE_BASE,
+   WORKINGSET_RESTORE_ANON = WORKINGSET_RESTORE_BASE,
+   WORKINGSET_RESTORE_FILE,
WORKINGSET_NODERECLAIM,
NR_ANON_MAPPED, /* Mapped anonymous pages */
NR_FILE_MAPPED, /* pagecache pages mapped into pagetables.
@@ -259,8 +265,8 @@ struct lruvec {
unsigned long   file_cost;
/* Non-resident age, driven by LRU movement */
atomic_long_t   nonresident_age;
-   /* Refaults at the time of last reclaim cycle */
-   unsigned long   refaults;
+   /* Refaults at the time of last reclaim cycle, anon=0, file=1 */
+   unsigned long   refaults[2];
/* Various lruvec state flags (enum lruvec_flags) */
unsigned long   flags;
 #ifdef CONFIG_MEMCG
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 0b38b6a..2127dd1 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1425,12 +1425,18 @@ static char *memory_stat_format(struct mem_cgroup 
*memcg)
seq_buf_printf(, "%s %lu\n", vm_event_name(PGMAJFAULT),
   memcg_events(memcg, PGMAJFAULT));
 
-   seq_buf_printf(, "workingset_refault %lu\n",
-  memcg_page_state(memcg, WORKINGSET_REFAULT));
-   seq_buf_printf(, "workingset_activate %lu\n",
-  memcg_page_state(memcg, WORKINGSET_ACTIVATE));
+   seq_buf_printf(, "workingset_refault_anon %lu\n",
+  memcg_page_state(memcg, WORKINGSET_REFAULT_ANON));
+   seq_buf_printf(, "workingset_refault_file %lu\n",
+  memcg_page_state(memcg, WORKINGSET_REFAULT_FILE));
+   seq_buf_printf(, "workingset_activate_anon %lu\n",
+  memcg_page_state(memcg, WORKINGSET_ACTIVATE_ANON));
+   seq_buf_printf(, "workingset_activate_file %lu\n",
+  memcg_page_state(memcg, WORKINGSET_ACTIVATE_FILE));
seq_buf_printf(, "workingset_restore %lu\n",
-  memcg_page_state(memcg, WORKINGSET_RESTORE));
+  memcg_page_state(memcg, WORKINGSET_RESTORE_ANON));
+   seq_buf_printf(, "workingset_restore %lu\n",
+  memcg_page_state(memcg, WORKINGSET_RESTORE_FILE));
seq_buf_printf(, "workingset_nodereclaim %lu\n",
   memcg_page_state(memcg, WORKINGSET_NODERECLAIM));
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4745e88..3caa35f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2695,7 +2695,10 @@ static void shrink_node(pg_data_t *pgdat, struct 
scan_control *sc)
if (!sc->force_deactivate) {
unsigned long refaults;
 
-   if (inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
+   refaults = lruvec_page_state(target_lruvec,
+   WORKINGSET_ACTIVATE_ANON);
+   if (refaults != target_lruvec->refaults[0] ||
+   inactive_is_low(target_lruvec, LRU_INACTIVE_ANON))
sc->may_deactivate |= DEACTIVATE_ANON;
else
sc->may_deactivate &= ~DEACTIVATE_ANON;
@@ -2706,8 +2709,8 @@ static void shrink_node(pg_data_t *pgdat, struct 
scan_control *sc)
 * rid of any stale active pages quickly.
 */
refaults = lruvec_page_state(target_lruvec,
-WORKINGSET_ACTIVATE);
-   if (refaults != target_lruvec->refaults ||
+   WORKINGSET_ACTIVATE_FILE);
+   if (refaults != target_lruvec->refaults[1] ||

[PATCH v6 1/6] mm/vmscan: make active/inactive ratio as 1:1 for anon lru

2020-06-16 Thread js1304

From: Joonsoo Kim 

Current implementation of LRU management for anonymous page has some
problems. Most important one is that it doesn't protect the workingset,
that is, pages on the active LRU list. Although, this problem will be
fixed in the following patchset, the preparation is required and
this patch does it.

What following patchset does is to restore workingset protection. In this
case, newly created or swap-in pages are started their lifetime on the
inactive list. If inactive list is too small, there is not enough chance
to be referenced and the page cannot become the workingset.

In order to provide enough chance to the newly anonymous pages, this patch
makes active/inactive LRU ratio as 1:1.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 749d239..9f940c4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2212,7 +2212,7 @@ static bool inactive_is_low(struct lruvec *lruvec, enum 
lru_list inactive_lru)
active = lruvec_page_state(lruvec, NR_LRU_BASE + active_lru);
 
gb = (inactive + active) >> (30 - PAGE_SHIFT);
-   if (gb)
+   if (gb && is_file_lru(inactive_lru))
inactive_ratio = int_sqrt(10 * gb);
else
inactive_ratio = 1;
-- 
2.7.4

[PATCH v6 4/6] mm/swapcache: support to handle the exceptional entries in swapcache

2020-06-16 Thread js1304

From: Joonsoo Kim 

Swapcache doesn't handle the exceptional entries since there is no case
using it. In the following patch, workingset detection for anonymous
page will be implemented and it stores the shadow entries as exceptional
entries into the swapcache. So, we need to handle the exceptional entries
and this patch implements it.

Acked-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h | 17 
 mm/shmem.c   |  3 ++-
 mm/swap_state.c  | 56 ++--
 mm/swapfile.c|  2 ++
 mm/vmscan.c  |  2 +-
 5 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f4f5f94..901da54 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -416,9 +416,13 @@ extern struct address_space *swapper_spaces[];
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
-extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
-extern void __delete_from_swap_cache(struct page *, swp_entry_t entry);
+extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
+   gfp_t gfp, void **shadowp);
+extern void __delete_from_swap_cache(struct page *page,
+   swp_entry_t entry, void *shadow);
 extern void delete_from_swap_cache(struct page *);
+extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
+   unsigned long end);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
 extern struct page *lookup_swap_cache(swp_entry_t entry,
@@ -572,13 +576,13 @@ static inline int add_to_swap(struct page *page)
 }
 
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
-   gfp_t gfp_mask)
+   gfp_t gfp_mask, void **shadowp)
 {
return -1;
 }
 
 static inline void __delete_from_swap_cache(struct page *page,
-   swp_entry_t entry)
+   swp_entry_t entry, void *shadow)
 {
 }
 
@@ -586,6 +590,11 @@ static inline void delete_from_swap_cache(struct page 
*page)
 {
 }
 
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+   unsigned long end)
+{
+}
+
 static inline int page_swapcount(struct page *page)
 {
return 0;
diff --git a/mm/shmem.c b/mm/shmem.c
index a0dbe62..e9a99a2 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1374,7 +1374,8 @@ static int shmem_writepage(struct page *page, struct 
writeback_control *wbc)
list_add(>swaplist, _swaplist);
 
if (add_to_swap_cache(page, swap,
-   __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
+   __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN,
+   NULL) == 0) {
spin_lock_irq(>lock);
shmem_recalc_inode(inode);
info->swapped++;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 1050fde..43c4e3a 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -110,12 +110,15 @@ void show_swap_cache_info(void)
  * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
  */
-int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+   gfp_t gfp, void **shadowp)
 {
struct address_space *address_space = swap_address_space(entry);
pgoff_t idx = swp_offset(entry);
XA_STATE_ORDER(xas, _space->i_pages, idx, compound_order(page));
unsigned long i, nr = hpage_nr_pages(page);
+   unsigned long nrexceptional = 0;
+   void *old;
 
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_PAGE(PageSwapCache(page), page);
@@ -131,10 +134,17 @@ int add_to_swap_cache(struct page *page, swp_entry_t 
entry, gfp_t gfp)
goto unlock;
for (i = 0; i < nr; i++) {
VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
+   old = xas_load();
+   if (xa_is_value(old)) {
+   nrexceptional++;
+   if (shadowp)
+   *shadowp = old;
+   }
set_page_private(page + i, entry.val + i);
xas_store(, page);
xas_next();
}
+   address_space->nrexceptional -= nrexceptional;
address_space->nrpages += nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, nr);
ADD_CACHE_INFO(add_total, nr);
@@

[PATCH v6 5/6] mm/swap: implement workingset detection for anonymous LRU

2020-06-16 Thread js1304

From: Joonsoo Kim 

This patch implements workingset detection for anonymous LRU.
All the infrastructure is implemented by the previous patches so this patch
just activates the workingset detection by installing/retrieving
the shadow entry.

Signed-off-by: Joonsoo Kim 
---
 include/linux/swap.h |  6 ++
 mm/memory.c  | 11 ---
 mm/swap_state.c  | 23 ++-
 mm/vmscan.c  |  7 ---
 mm/workingset.c  |  5 +++--
 5 files changed, 35 insertions(+), 17 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 901da54..9ee78b8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -416,6 +416,7 @@ extern struct address_space *swapper_spaces[];
 extern unsigned long total_swapcache_pages(void);
 extern void show_swap_cache_info(void);
 extern int add_to_swap(struct page *page);
+extern void *get_shadow_from_swap_cache(swp_entry_t entry);
 extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp, void **shadowp);
 extern void __delete_from_swap_cache(struct page *page,
@@ -575,6 +576,11 @@ static inline int add_to_swap(struct page *page)
return 0;
 }
 
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+   return NULL;
+}
+
 static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
gfp_t gfp_mask, void **shadowp)
 {
diff --git a/mm/memory.c b/mm/memory.c
index f221f96..2411cf57 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3094,6 +3094,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
int locked;
int exclusive = 0;
vm_fault_t ret = 0;
+   void *shadow = NULL;
 
if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
goto out;
@@ -3143,13 +3144,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (err)
goto out_page;
 
-   /*
-* XXX: Move to lru_cache_add() when it
-* supports new vs putback
-*/
-   spin_lock_irq(_pgdat(page)->lru_lock);
-   lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
+   shadow = get_shadow_from_swap_cache(entry);
+   if (shadow)
+   workingset_refault(page, shadow);
 
lru_cache_add(page);
swap_readpage(page, true);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 43c4e3a..90c5bd1 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -106,6 +106,20 @@ void show_swap_cache_info(void)
printk("Total swap = %lukB\n", total_swap_pages << (PAGE_SHIFT - 10));
 }
 
+void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+   struct address_space *address_space = swap_address_space(entry);
+   pgoff_t idx = swp_offset(entry);
+   struct page *page;
+
+   page = find_get_entry(address_space, idx);
+   if (xa_is_value(page))
+   return page;
+   if (page)
+   put_page(page);
+   return NULL;
+}
+
 /*
  * add_to_swap_cache resembles add_to_page_cache_locked on swapper_space,
  * but sets SwapCache flag and private instead of mapping and index.
@@ -405,6 +419,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
 {
struct swap_info_struct *si;
struct page *page;
+   void *shadow = NULL;
 
*new_page_allocated = false;
 
@@ -473,7 +488,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
__SetPageSwapBacked(page);
 
/* May fail (-ENOMEM) if XArray node allocation failed. */
-   if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL, NULL)) {
+   if (add_to_swap_cache(page, entry, gfp_mask & GFP_KERNEL, )) {
put_swap_page(page, entry);
goto fail_unlock;
}
@@ -483,10 +498,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
goto fail_unlock;
}
 
-   /* XXX: Move to lru_cache_add() when it supports new vs putback */
-   spin_lock_irq(_pgdat(page)->lru_lock);
-   lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
+   if (shadow)
+   workingset_refault(page, shadow);
 
/* Caller will initiate read into locked page */
SetPageWorkingset(page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 37943bf..eb02d18 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -859,6 +859,7 @@ static int __remove_mapping(struct address_space *mapping, 
struct page *page,
 {
unsigned long flags;
int refcount;
+   void *shadow = NULL;
 
BUG_ON(!PageLocked(page));

[PATCH for v5.8 1/3] mm: workingset: age nonresident information alongside anonymous pages

2020-06-16 Thread js1304

From: Johannes Weiner 

After ("mm: workingset: let cache workingset challenge anon fix"), we
compare refault distances to active_file + anon. But age of the
non-resident information is only driven by the file LRU. As a result,
we may overestimate the recency of any incoming refaults and activate
them too eagerly, causing unnecessary LRU churn in certain situations.

Make anon aging drive nonresident age as well to address that.

Reported-by: Joonsoo Kim 
Signed-off-by: Johannes Weiner 
Signed-off-by: Joonsoo Kim 
---
 include/linux/mmzone.h |  4 ++--
 include/linux/swap.h   |  1 +
 mm/vmscan.c|  3 +++
 mm/workingset.c| 46 +++---
 4 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index c4c37fd..f6f8849 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -257,8 +257,8 @@ struct lruvec {
 */
unsigned long   anon_cost;
unsigned long   file_cost;
-   /* Evictions & activations on the inactive file list */
-   atomic_long_t   inactive_age;
+   /* Non-resident age, driven by LRU movement */
+   atomic_long_t   nonresident_age;
/* Refaults at the time of last reclaim cycle */
unsigned long   refaults;
/* Various lruvec state flags (enum lruvec_flags) */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 4c5974b..5b3216b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -313,6 +313,7 @@ struct vma_swap_readahead {
 };
 
 /* linux/mm/workingset.c */
+void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages);
 void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg);
 void workingset_refault(struct page *page, void *shadow);
 void workingset_activation(struct page *page);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b6d8432..749d239 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -904,6 +904,7 @@ static int __remove_mapping(struct address_space *mapping, 
struct page *page,
__delete_from_swap_cache(page, swap);
xa_unlock_irqrestore(>i_pages, flags);
put_swap_page(page, swap);
+   workingset_eviction(page, target_memcg);
} else {
void (*freepage)(struct page *);
void *shadow = NULL;
@@ -1884,6 +1885,8 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
list_add(>lru, _to_free);
} else {
nr_moved += nr_pages;
+   if (PageActive(page))
+   workingset_age_nonresident(lruvec, nr_pages);
}
}
 
diff --git a/mm/workingset.c b/mm/workingset.c
index d481ea4..50b7937 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -156,8 +156,8 @@
  *
  * Implementation
  *
- * For each node's file LRU lists, a counter for inactive evictions
- * and activations is maintained (node->inactive_age).
+ * For each node's LRU lists, a counter for inactive evictions and
+ * activations is maintained (node->nonresident_age).
  *
  * On eviction, a snapshot of this counter (along with some bits to
  * identify the node) is stored in the now empty page cache
@@ -213,7 +213,17 @@ static void unpack_shadow(void *shadow, int *memcgidp, 
pg_data_t **pgdat,
*workingsetp = workingset;
 }
 
-static void advance_inactive_age(struct mem_cgroup *memcg, pg_data_t *pgdat)
+/**
+ * workingset_age_nonresident - age non-resident entries as LRU ages
+ * @memcg: the lruvec that was aged
+ * @nr_pages: the number of pages to count
+ *
+ * As in-memory pages are aged, non-resident pages need to be aged as
+ * well, in order for the refault distances later on to be comparable
+ * to the in-memory dimensions. This function allows reclaim and LRU
+ * operations to drive the non-resident aging along in parallel.
+ */
+void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages)
 {
/*
 * Reclaiming a cgroup means reclaiming all its children in a
@@ -227,11 +237,8 @@ static void advance_inactive_age(struct mem_cgroup *memcg, 
pg_data_t *pgdat)
 * the root cgroup's, age as well.
 */
do {
-   struct lruvec *lruvec;
-
-   lruvec = mem_cgroup_lruvec(memcg, pgdat);
-   atomic_long_inc(>inactive_age);
-   } while (memcg && (memcg = parent_mem_cgroup(memcg)));
+   atomic_long_add(nr_pages, >nonresident_age);
+   } while ((lruvec = parent_lruvec(lruvec)));
 }
 
 /**
@@ -254,12 +261,11 @@ void *workingset_eviction(struct page *page, struct 
mem_cgroup *target_memcg)
VM_BUG_ON_PAGE(page_count(page), page);
VM_BUG_ON_PAGE(!PageLocked(page), page);
 
-   advance_inactive_age(page_memcg(page), pgdat);
-
lruvec =

[PATCH for v5.8 2/3] mm/swap: fix for "mm: workingset: age nonresident information alongside anonymous pages"

2020-06-16 Thread js1304

From: Joonsoo Kim 

Non-file-lru page could also be activated in mark_page_accessed()
and we need to count this activation for nonresident_age.

Note that it's better for this patch to be squashed into the patch
"mm: workingset: age nonresident information alongside anonymous pages".

Signed-off-by: Joonsoo Kim 
---
 mm/swap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 667133d..c5d5114 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -443,8 +443,7 @@ void mark_page_accessed(struct page *page)
else
__lru_cache_activate_page(page);
ClearPageReferenced(page);
-   if (page_is_file_lru(page))
-   workingset_activation(page);
+   workingset_activation(page);
}
if (page_is_idle(page))
clear_page_idle(page);
-- 
2.7.4

[PATCH for v5.8 0/3] fix for "mm: balance LRU lists based on relative thrashing" patchset

2020-06-16 Thread js1304

From: Joonsoo Kim 

This patchset fixes some problems of the patchset,
"mm: balance LRU lists based on relative thrashing", which is now merged
on the mainline.

Patch "mm: workingset: let cache workingset challenge anon fix" is
the result of discussion with Johannes. See following link.

http://lkml.kernel.org/r/20200520232525.798933-6-han...@cmpxchg.org

And, the other two are minor things which are found when I try
to rebase my patchset.

Johannes Weiner (1):
  mm: workingset: age nonresident information alongside anonymous pages

Joonsoo Kim (2):
  mm/swap: fix for "mm: workingset: age nonresident information
alongside anonymous pages"
  mm/memory: fix IO cost for anonymous page

 include/linux/mmzone.h |  4 ++--
 include/linux/swap.h   |  1 +
 mm/memory.c|  8 
 mm/swap.c  |  3 +--
 mm/vmscan.c|  3 +++
 mm/workingset.c| 46 +++---
 6 files changed, 42 insertions(+), 23 deletions(-)

-- 
2.7.4

[PATCH for v5.8 3/3] mm/memory: fix IO cost for anonymous page

2020-06-16 Thread js1304

From: Joonsoo Kim 

With synchronous IO swap device, swap-in is directly handled in fault
code. Since IO cost notation isn't added there, with synchronous IO swap
device, LRU balancing could be wrongly biased. Fix it to count it
in fault code.

Signed-off-by: Joonsoo Kim 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index bc6a471..3359057 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3143,6 +3143,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
if (err)
goto out_page;
 
+   /*
+* XXX: Move to lru_cache_add() when it
+* supports new vs putback
+*/
+   spin_lock_irq(_pgdat(page)->lru_lock);
+   lru_note_cost_page(page);
+   spin_unlock_irq(_pgdat(page)->lru_lock);
+
lru_cache_add(page);
swap_readpage(page, true);
}
-- 
2.7.4

[PATCH v2 12/12] mm/page_alloc: use standard migration target allocation function directly

2020-05-27 Thread js1304

From: Joonsoo Kim 

There is no need to make a function in order to call standard migration
target allocation function. Use standard one directly.

Signed-off-by: Joonsoo Kim 
---
 include/linux/page-isolation.h |  2 --
 mm/page_alloc.c|  9 +++--
 mm/page_isolation.c| 11 ---
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 35e3bdb..20a4b63 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -62,6 +62,4 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
int isol_flags);
 
-struct page *alloc_migrate_target(struct page *page, struct alloc_control *ac);
-
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9803158..3f5cfab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8359,6 +8359,11 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+   struct alloc_control ac = {
+   .nid = zone_to_nid(cc->zone),
+   .nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
migrate_prep();
 
@@ -8385,8 +8390,8 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
>migratepages);
cc->nr_migratepages -= nr_reclaimed;
 
-   ret = migrate_pages(>migratepages, alloc_migrate_target,
-   NULL, NULL, cc->mode, MR_CONTIG_RANGE);
+   ret = migrate_pages(>migratepages, alloc_migration_target,
+   NULL, , cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(>migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index aba799d..03d6cad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -297,14 +297,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
return pfn < end_pfn ? -EBUSY : 0;
 }
-
-struct page *alloc_migrate_target(struct page *page, struct alloc_control 
*__ac)
-{
-   struct alloc_control ac = {
-   .nid = page_to_nid(page),
-   .nmask = _states[N_MEMORY],
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(page, );
-}
-- 
2.7.4

[PATCH v2 07/12] mm/hugetlb: do not modify user provided gfp_mask

2020-05-27 Thread js1304

From: Joonsoo Kim 

It's not good practice to modify user input. Instead of using it to
build correct gfp_mask for APIs, this patch introduces another gfp_mask
field, __gfp_mask, for internal usage.

Signed-off-by: Joonsoo Kim 
---
 mm/hugetlb.c  | 19 ++-
 mm/internal.h |  2 ++
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e465582..4757e72 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1068,15 +1068,15 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h,
struct zoneref *z;
int node = NUMA_NO_NODE;
 
-   zonelist = node_zonelist(ac->nid, ac->gfp_mask);
+   zonelist = node_zonelist(ac->nid, ac->__gfp_mask);
 
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
for_each_zone_zonelist_nodemask(zone, z, zonelist,
-   gfp_zone(ac->gfp_mask), ac->nmask) {
+   gfp_zone(ac->__gfp_mask), ac->nmask) {
struct page *page;
 
-   if (!cpuset_zone_allowed(zone, ac->gfp_mask))
+   if (!cpuset_zone_allowed(zone, ac->__gfp_mask))
continue;
/*
 * no need to ask again on the same node. Pool is node rather 
than
@@ -1127,8 +1127,8 @@ static struct page *dequeue_huge_page_vma(struct hstate 
*h,
if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0)
goto err;
 
-   ac.gfp_mask = htlb_alloc_mask(h);
-   ac.nid = huge_node(vma, address, ac.gfp_mask, , );
+   ac.__gfp_mask = htlb_alloc_mask(h);
+   ac.nid = huge_node(vma, address, ac.__gfp_mask, , );
 
page = dequeue_huge_page_nodemask(h, );
if (page && !avoid_reserve && vma_has_reserves(vma, chg)) {
@@ -1951,7 +1951,7 @@ static struct page *alloc_migrate_huge_page(struct hstate 
*h,
if (hstate_is_gigantic(h))
return NULL;
 
-   page = alloc_fresh_huge_page(h, ac->gfp_mask,
+   page = alloc_fresh_huge_page(h, ac->__gfp_mask,
ac->nid, ac->nmask, NULL);
if (!page)
return NULL;
@@ -1989,9 +1989,10 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac)
 {
-   ac->gfp_mask |= htlb_alloc_mask(h);
+   ac->__gfp_mask = htlb_alloc_mask(h);
+   ac->__gfp_mask |= ac->gfp_mask;
if (ac->nid == NUMA_NO_NODE)
-   ac->gfp_mask &= ~__GFP_THISNODE;
+   ac->__gfp_mask &= ~__GFP_THISNODE;
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
@@ -2010,7 +2011,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h,
 * will not come from CMA area
 */
if (ac->skip_cma)
-   ac->gfp_mask &= ~__GFP_MOVABLE;
+   ac->__gfp_mask &= ~__GFP_MOVABLE;
 
return alloc_migrate_huge_page(h, ac);
 }
diff --git a/mm/internal.h b/mm/internal.h
index 159cfd6..2dc0268 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -619,6 +619,8 @@ struct alloc_control {
nodemask_t *nmask;
gfp_t gfp_mask;
bool skip_cma;
+
+   gfp_t __gfp_mask;   /* Used internally in API implementation */
 };
 
 #endif /* __MM_INTERNAL_H */
-- 
2.7.4

[PATCH v2 10/12] mm/gup: use standard migration target allocation function

2020-05-27 Thread js1304

From: Joonsoo Kim 

There is no reason to implement it's own function for migration
target allocation. Use standard one.

Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 61 ++---
 1 file changed, 10 insertions(+), 51 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index a49d7ea..0e4214d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1593,58 +1593,16 @@ static bool check_dax_vmas(struct vm_area_struct 
**vmas, long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page,
+static struct page *alloc_migration_target_non_cma(struct page *page,
struct alloc_control *ac)
 {
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
-* allocation memory.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-   struct alloc_control ac = {
-   .nid = nid,
-   .nmask = NULL,
-   .gfp_mask = __GFP_NOWARN,
-   .skip_cma = true,
-   };
-
-   return alloc_huge_page_nodemask(h, );
-   }
-
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
+   struct alloc_control __ac = {
+   .nid = page_to_nid(page),
+   .gfp_mask = GFP_USER | __GFP_NOWARN,
+   .skip_cma = true,
+   };
 
-   return __alloc_pages_node(nid, gfp_mask, 0);
+   return alloc_migration_target(page, &__ac);
 }
 
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
@@ -1706,8 +1664,9 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, NULL, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list,
+   alloc_migration_target_non_cma, NULL, NULL,
+   MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH v2 11/12] mm/mempolicy: use standard migration target allocation function

2020-05-27 Thread js1304

From: Joonsoo Kim 

There is no reason to implement it's own function for migration
target allocation. Use standard one.

Signed-off-by: Joonsoo Kim 
---
 mm/internal.h  |  3 ---
 mm/mempolicy.c | 32 +++-
 mm/migrate.c   |  3 ++-
 3 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 6f5d810..82495ee 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -612,9 +612,6 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-struct alloc_control;
-extern struct page *alloc_new_node_page(struct page *page,
-   struct alloc_control *ac);
 
 struct alloc_control {
int nid;/* preferred node id */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index e50c3eb..27329bdf 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,33 +1065,6 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
return 0;
 }
 
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, struct alloc_control *__ac)
-{
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(compound_head(page));
-   struct alloc_control ac = {
-   .nid = __ac->nid,
-   .nmask = NULL,
-   .gfp_mask = __GFP_THISNODE,
-   };
-
-   return alloc_huge_page_nodemask(h, );
-   } else if (PageTransHuge(page)) {
-   struct page *thp;
-
-   thp = alloc_pages_node(__ac->nid,
-   (GFP_TRANSHUGE | __GFP_THISNODE),
-   HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   } else
-   return __alloc_pages_node(__ac->nid, GFP_HIGHUSER_MOVABLE |
-   __GFP_THISNODE, 0);
-}
-
 /*
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
@@ -1104,6 +1077,7 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
int err = 0;
struct alloc_control ac = {
.nid = dest,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
};
 
nodes_clear(nmask);
@@ -1119,8 +1093,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
flags | MPOL_MF_DISCONTIG_OK, );
 
if (!list_empty()) {
-   err = migrate_pages(, alloc_new_node_page, NULL, ,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(, alloc_migration_target, NULL,
+   , MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages();
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 780135a..393f592 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1590,9 +1590,10 @@ static int do_move_pages_to_node(struct mm_struct *mm,
int err;
struct alloc_control ac = {
.nid = node,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE | __GFP_THISNODE,
};
 
-   err = migrate_pages(pagelist, alloc_new_node_page, NULL, ,
+   err = migrate_pages(pagelist, alloc_migration_target, NULL, ,
MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
-- 
2.7.4

[PATCH v2 08/12] mm/migrate: change the interface of the migration target alloc/free functions

2020-05-27 Thread js1304

From: Joonsoo Kim 

To prepare unifying duplicated functions in following patches, this patch
changes the interface of the migration target alloc/free functions.
Functions now use struct alloc_control as an argument.

There is no functional change.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h| 15 +++--
 include/linux/page-isolation.h |  4 +++-
 mm/compaction.c| 15 -
 mm/gup.c   |  5 +++--
 mm/internal.h  |  5 -
 mm/memory-failure.c| 13 ++-
 mm/memory_hotplug.c|  9 +---
 mm/mempolicy.c | 22 +++---
 mm/migrate.c   | 51 ++
 mm/page_alloc.c|  2 +-
 mm/page_isolation.c|  9 +---
 11 files changed, 89 insertions(+), 61 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..923c4f3 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,8 +7,9 @@
 #include 
 #include 
 
-typedef struct page *new_page_t(struct page *page, unsigned long private);
-typedef void free_page_t(struct page *page, unsigned long private);
+struct alloc_control;
+typedef struct page *new_page_t(struct page *page, struct alloc_control *ac);
+typedef void free_page_t(struct page *page, struct alloc_control *ac);
 
 /*
  * Return values from addresss_space_operations.migratepage():
@@ -38,9 +39,9 @@ extern int migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
-   unsigned long private, enum migrate_mode mode, int reason);
+   struct alloc_control *ac, enum migrate_mode mode, int reason);
 extern struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask);
+   struct alloc_control *ac);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -56,11 +57,11 @@ extern int migrate_page_move_mapping(struct address_space 
*mapping,
 
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t new,
-   free_page_t free, unsigned long private, enum migrate_mode mode,
-   int reason)
+   free_page_t free, struct alloc_control *ac,
+   enum migrate_mode mode, int reason)
{ return -ENOSYS; }
 static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
+   struct alloc_control *ac)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 5724580..35e3bdb 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,6 +2,8 @@
 #ifndef __LINUX_PAGEISOLATION_H
 #define __LINUX_PAGEISOLATION_H
 
+struct alloc_control;
+
 #ifdef CONFIG_MEMORY_ISOLATION
 static inline bool has_isolate_pageblock(struct zone *zone)
 {
@@ -60,6 +62,6 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
int isol_flags);
 
-struct page *alloc_migrate_target(struct page *page, unsigned long private);
+struct page *alloc_migrate_target(struct page *page, struct alloc_control *ac);
 
 #endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 9ce4cff..538ed7b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1563,9 +1563,9 @@ static void isolate_freepages(struct compact_control *cc)
  * from the isolated freelists in the block we are migrating to.
  */
 static struct page *compaction_alloc(struct page *migratepage,
-   unsigned long data)
+   struct alloc_control *ac)
 {
-   struct compact_control *cc = (struct compact_control *)data;
+   struct compact_control *cc = (struct compact_control *)ac->private;
struct page *freepage;
 
if (list_empty(>freepages)) {
@@ -1587,9 +1587,9 @@ static struct page *compaction_alloc(struct page 
*migratepage,
  * freelist.  All pages on the freelist are from the same zone, so there is no
  * special handling needed for NUMA.
  */
-static void compaction_free(struct page *page, unsigned long data)
+static void compaction_free(struct page *page, struct alloc_control *ac)
 {
-   struct compact_control *cc = (struct compact_control *)data;
+   struct compact_control *cc = (struct compact_control *)ac->private;
 
list_add(>lru, >freepages);
cc->nr_freepages++;
@@ -2097,6 +2097,9 @@

[PATCH v2 02/12] mm/migrate: move migration helper from .h to .c

2020-05-27 Thread js1304

From: Joonsoo Kim 

It's not performance sensitive function. Move it to .c.
This is a preparation step for future change.

Acked-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 33 +
 mm/migrate.c| 29 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
 extern const char *migrate_reason_names[MR_TYPES];
 
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
-   unsigned int order = 0;
-   struct page *new_page = NULL;
-
-   if (PageHuge(page))
-   return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
-
-   if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
-   order = HPAGE_PMD_ORDER;
-   }
-
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   new_page = __alloc_pages_nodemask(gfp_mask, order,
-   preferred_nid, nodemask);
-
-   if (new_page && PageTransHuge(new_page))
-   prep_transhuge_page(new_page);
-
-   return new_page;
-}
-
 #ifdef CONFIG_MIGRATION
 
 extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+   { return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 22a26a5..824c22e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1537,6 +1537,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
+struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+{
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   unsigned int order = 0;
+   struct page *new_page = NULL;
+
+   if (PageHuge(page))
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)),
+   preferred_nid, nodemask);
+
+   if (PageTransHuge(page)) {
+   gfp_mask |= GFP_TRANSHUGE;
+   order = HPAGE_PMD_ORDER;
+   }
+
+   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   gfp_mask |= __GFP_HIGHMEM;
+
+   new_page = __alloc_pages_nodemask(gfp_mask, order,
+   preferred_nid, nodemask);
+
+   if (new_page && PageTransHuge(new_page))
+   prep_transhuge_page(new_page);
+
+   return new_page;
+}
+
 #ifdef CONFIG_NUMA
 
 static int store_status(int __user *status, int start, int value, int nr)
-- 
2.7.4

[PATCH v2 09/12] mm/migrate: make standard migration target allocation functions

2020-05-27 Thread js1304

From: Joonsoo Kim 

There are some similar functions for migration target allocation. Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants. This patch implements base migration target
allocation function. In the following patches, variants will be converted
to use this function.

Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h |  6 +++---
 mm/memory-failure.c |  3 ++-
 mm/memory_hotplug.c |  3 ++-
 mm/migrate.c| 24 +---
 mm/page_isolation.c |  3 ++-
 5 files changed, 22 insertions(+), 17 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 923c4f3..abf09b3 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -40,8 +40,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
struct alloc_control *ac, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
-   struct alloc_control *ac);
+extern struct page *alloc_migration_target(struct page *page,
+   struct alloc_control *ac);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -60,7 +60,7 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, struct alloc_control *ac,
enum migrate_mode mode, int reason)
{ return -ENOSYS; }
-static inline struct page *new_page_nodemask(struct page *page,
+static inline struct page *alloc_migration_target(struct page *page,
struct alloc_control *ac)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0d5d59b..a75de67 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1639,9 +1639,10 @@ static struct page *new_page(struct page *p, struct 
alloc_control *__ac)
struct alloc_control ac = {
.nid = page_to_nid(p),
.nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
};
 
-   return new_page_nodemask(p, );
+   return alloc_migration_target(p, );
 }
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 89642f9..185f4c9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1249,7 +1249,8 @@ static struct page *new_node_page(struct page *page, 
struct alloc_control *__ac)
 
ac.nid = nid;
ac.nmask = 
-   return new_page_nodemask(page, );
+   ac.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   return alloc_migration_target(page, );
 }
 
 static int
diff --git a/mm/migrate.c b/mm/migrate.c
index 9d6ed94..780135a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1537,31 +1537,33 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
-struct page *new_page_nodemask(struct page *page, struct alloc_control *ac)
+struct page *alloc_migration_target(struct page *page, struct alloc_control 
*ac)
 {
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
unsigned int order = 0;
struct page *new_page = NULL;
+   int zidx;
 
+   /* hugetlb has it's own gfp handling logic */
if (PageHuge(page)) {
struct hstate *h = page_hstate(compound_head(page));
-   struct alloc_control __ac = {
-   .nid = ac->nid,
-   .nmask = ac->nmask,
-   };
 
-   return alloc_huge_page_nodemask(h, &__ac);
+   return alloc_huge_page_nodemask(h, ac);
}
 
+   ac->__gfp_mask = ac->gfp_mask;
if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
+   ac->__gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
+   zidx = zone_idx(page_zone(page));
+   if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
+   ac->__gfp_mask |= __GFP_HIGHMEM;
 
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
+   if (ac->skip_cma)
+   ac->__gfp_mask &= ~__GFP_MOVABLE;
 
-   new_page = __alloc_pages_nodemask(gfp_mask, order, ac->nid, ac->nmask);
+   new_page = __alloc_pages_nodemask(ac->__gfp_mask, order,
+   ac->nid, ac->nmask);
 
if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 1e1828b..aba799d 100644

[PATCH v2 00/12] clean-up the migration target allocation functions

2020-05-27 Thread js1304

From: Joonsoo Kim 

This patchset clean-up the migration target allocation functions.

* Changes on v2
- add acked-by tags
- fix missing compound_head() call for the patch #3
- remove thisnode field on alloc_control and use __GFP_THISNODE directly
- fix missing __gfp_mask setup for the patch
"mm/hugetlb: do not modify user provided gfp_mask"

* Cover-letter

Contributions of this patchset are:
1. unify two hugetlb alloc functions. As a result, one is remained.
2. make one external hugetlb alloc function to internal one.
3. unify three functions for migration target allocation.

The patchset is based on next-20200526.
The patchset is available on:

https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v2.00-next-20200526

Thanks.

Joonsoo Kim (12):
  mm/page_isolation: prefer the node of the source page
  mm/migrate: move migration helper from .h to .c
  mm/hugetlb: introduce alloc_control structure to simplify migration
target allocation APIs
  mm/hugetlb: use provided ac->gfp_mask for allocation
  mm/hugetlb: unify hugetlb migration callback function
  mm/hugetlb: make hugetlb migration target allocation APIs CMA aware
  mm/hugetlb: do not modify user provided gfp_mask
  mm/migrate: change the interface of the migration target alloc/free
functions
  mm/migrate: make standard migration target allocation functions
  mm/gup: use standard migration target allocation function
  mm/mempolicy: use standard migration target allocation function
  mm/page_alloc: use standard migration target allocation function
directly

 include/linux/hugetlb.h| 33 ++-
 include/linux/migrate.h| 44 +---
 include/linux/page-isolation.h |  4 +-
 mm/compaction.c| 15 ---
 mm/gup.c   | 60 +---
 mm/hugetlb.c   | 91 --
 mm/internal.h  | 12 +-
 mm/memory-failure.c| 14 ---
 mm/memory_hotplug.c| 10 +++--
 mm/mempolicy.c | 38 ++
 mm/migrate.c   | 72 +
 mm/page_alloc.c|  9 -
 mm/page_isolation.c|  5 ---
 13 files changed, 191 insertions(+), 216 deletions(-)

-- 
2.7.4

[PATCH v2 03/12] mm/hugetlb: introduce alloc_control structure to simplify migration target allocation APIs

2020-05-27 Thread js1304

From: Joonsoo Kim 

Currently, page allocation functions for migration requires some arguments.
More worse, in the following patch, more argument will be needed to unify
the similar functions. To simplify them, in this patch, unified data
structure that controls allocation behaviour is introduced.

For clean-up, function declarations are re-ordered.

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 35 +++-
 mm/gup.c| 11 ++---
 mm/hugetlb.c| 62 -
 mm/internal.h   |  7 ++
 mm/mempolicy.c  | 13 +++
 mm/migrate.c| 13 +++
 6 files changed, 83 insertions(+), 58 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 50650d0..15c8fb8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -14,6 +14,7 @@
 struct ctl_table;
 struct user_struct;
 struct mmu_gather;
+struct alloc_control;
 
 #ifndef is_hugepd
 typedef struct { unsigned long pd; } hugepd_t;
@@ -502,15 +503,16 @@ struct huge_bootmem_page {
struct hstate *hstate;
 };
 
-struct page *alloc_huge_page(struct vm_area_struct *vma,
-   unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
-struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask);
+struct page *alloc_migrate_huge_page(struct hstate *h,
+   struct alloc_control *ac);
+struct page *alloc_huge_page_node(struct hstate *h,
+   struct alloc_control *ac);
+struct page *alloc_huge_page_nodemask(struct hstate *h,
+   struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
+struct page *alloc_huge_page(struct vm_area_struct *vma,
+   unsigned long addr, int avoid_reserve);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
@@ -752,20 +754,14 @@ static inline void huge_ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 #else  /* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
-static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
-  unsigned long addr,
-  int avoid_reserve)
-{
-   return NULL;
-}
-
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
+static inline struct page *
+alloc_huge_page_node(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
 }
 
 static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t 
*nmask)
+alloc_huge_page_nodemask(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
 }
@@ -777,6 +773,13 @@ static inline struct page *alloc_huge_page_vma(struct 
hstate *h,
return NULL;
 }
 
+static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
+  unsigned long addr,
+  int avoid_reserve)
+{
+   return NULL;
+}
+
 static inline int __alloc_bootmem_huge_page(struct hstate *h)
 {
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index ee039d4..6b78f11 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1612,16 +1612,21 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
 
-#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+   struct alloc_control ac = {
+   .nid = nid,
+   .nmask = NULL,
+   .gfp_mask = gfp_mask,
+   };
+
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
 */
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_migrate_huge_page(h, );
}
-#endif
+
if (PageTransHuge(page)) {
struct page *thp;
/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 57ece74..453ba94 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1053,8 +1053,8 @@ static struct page *dequeue_huge_page_node_exact(struct 
hstate *h, int nid)
return page;
 }
 
-static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t 
gfp_mask, int nid,
-   nodemask_t *nmask)
+static struct page *dequeue_huge_page_nodemask(struct hstate *h,
+   struct

[PATCH v2 04/12] mm/hugetlb: use provided ac->gfp_mask for allocation

2020-05-27 Thread js1304

From: Joonsoo Kim 

gfp_mask handling on alloc_huge_page_(node|nodemask) is
slightly changed, from ASSIGN to OR. It's safe since caller of these
functions doesn't pass extra gfp_mask except htlb_alloc_mask().

This is a preparation step for following patches.

Signed-off-by: Joonsoo Kim 
---
 mm/hugetlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 453ba94..dabe460 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1985,7 +1985,7 @@ struct page *alloc_huge_page_node(struct hstate *h,
 {
struct page *page = NULL;
 
-   ac->gfp_mask = htlb_alloc_mask(h);
+   ac->gfp_mask |= htlb_alloc_mask(h);
if (ac->nid != NUMA_NO_NODE)
ac->gfp_mask |= __GFP_THISNODE;
 
@@ -2004,7 +2004,7 @@ struct page *alloc_huge_page_node(struct hstate *h,
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac)
 {
-   ac->gfp_mask = htlb_alloc_mask(h);
+   ac->gfp_mask |= htlb_alloc_mask(h);
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
-- 
2.7.4

[PATCH v2 01/12] mm/page_isolation: prefer the node of the source page

2020-05-27 Thread js1304

From: Joonsoo Kim 

For locality, it's better to migrate the page to the same node
rather than the node of the current caller's cpu.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 mm/page_isolation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 2c11a38..7df89bd 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -300,5 +300,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private)
 {
-   return new_page_nodemask(page, numa_node_id(), _states[N_MEMORY]);
+   int nid = page_to_nid(page);
+
+   return new_page_nodemask(page, nid, _states[N_MEMORY]);
 }
-- 
2.7.4

[PATCH v2 06/12] mm/hugetlb: make hugetlb migration target allocation APIs CMA aware

2020-05-27 Thread js1304

From: Joonsoo Kim 

There is a user who do not want to use CMA memory for migration. Until
now, it is implemented by caller side but it's not optimal since there
is limited information on caller. This patch implements it on callee side
to get better result.

Acked-by: Mike Kravetz 
Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  9 +++--
 mm/hugetlb.c| 21 +
 mm/internal.h   |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index f482563..3d05f7d 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -503,8 +503,6 @@ struct huge_bootmem_page {
struct hstate *hstate;
 };
 
-struct page *alloc_migrate_huge_page(struct hstate *h,
-   struct alloc_control *ac);
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
diff --git a/mm/gup.c b/mm/gup.c
index 6b78f11..87eca79 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1617,14 +1617,11 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct alloc_control ac = {
.nid = nid,
.nmask = NULL,
-   .gfp_mask = gfp_mask,
+   .gfp_mask = __GFP_NOWARN,
+   .skip_cma = true,
};
 
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, );
+   return alloc_huge_page_nodemask(h, );
}
 
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8132985..e465582 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1033,13 +1033,19 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
h->free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h,
+   int nid, bool skip_cma)
 {
struct page *page;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (skip_cma && is_migrate_cma_page(page))
+   continue;
+
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1080,7 +1086,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h,
continue;
node = zone_to_nid(zone);
 
-   page = dequeue_huge_page_node_exact(h, node);
+   page = dequeue_huge_page_node_exact(h, node, ac->skip_cma);
if (page)
return page;
}
@@ -1937,7 +1943,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h,
+static struct page *alloc_migrate_huge_page(struct hstate *h,
struct alloc_control *ac)
 {
struct page *page;
@@ -1999,6 +2005,13 @@ struct page *alloc_huge_page_nodemask(struct hstate *h,
}
spin_unlock(_lock);
 
+   /*
+* clearing __GFP_MOVABLE flag ensure that allocated page
+* will not come from CMA area
+*/
+   if (ac->skip_cma)
+   ac->gfp_mask &= ~__GFP_MOVABLE;
+
return alloc_migrate_huge_page(h, ac);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 6e613ce..159cfd6 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -618,6 +618,7 @@ struct alloc_control {
int nid;/* preferred node id */
nodemask_t *nmask;
gfp_t gfp_mask;
+   bool skip_cma;
 };
 
 #endif /* __MM_INTERNAL_H */
-- 
2.7.4

[PATCH v2 05/12] mm/hugetlb: unify hugetlb migration callback function

2020-05-27 Thread js1304

From: Joonsoo Kim 

There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. This patch moves this handling to
alloc_huge_page_nodemask() and function caller. Then, remove
alloc_huge_page_node().

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  8 
 mm/hugetlb.c| 23 ++-
 mm/mempolicy.c  |  3 ++-
 3 files changed, 4 insertions(+), 30 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 15c8fb8..f482563 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,8 +505,6 @@ struct huge_bootmem_page {
 
 struct page *alloc_migrate_huge_page(struct hstate *h,
struct alloc_control *ac);
-struct page *alloc_huge_page_node(struct hstate *h,
-   struct alloc_control *ac);
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
@@ -755,12 +753,6 @@ static inline void huge_ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 struct hstate {};
 
 static inline struct page *
-alloc_huge_page_node(struct hstate *h, struct alloc_control *ac)
-{
-   return NULL;
-}
-
-static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dabe460..8132985 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1980,31 +1980,12 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 }
 
 /* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h,
-   struct alloc_control *ac)
-{
-   struct page *page = NULL;
-
-   ac->gfp_mask |= htlb_alloc_mask(h);
-   if (ac->nid != NUMA_NO_NODE)
-   ac->gfp_mask |= __GFP_THISNODE;
-
-   spin_lock(_lock);
-   if (h->free_huge_pages - h->resv_huge_pages > 0)
-   page = dequeue_huge_page_nodemask(h, ac);
-   spin_unlock(_lock);
-
-   if (!page)
-   page = alloc_migrate_huge_page(h, ac);
-
-   return page;
-}
-
-/* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac)
 {
ac->gfp_mask |= htlb_alloc_mask(h);
+   if (ac->nid == NUMA_NO_NODE)
+   ac->gfp_mask &= ~__GFP_THISNODE;
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3b6b551..e705efd 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1073,9 +1073,10 @@ struct page *alloc_new_node_page(struct page *page, 
unsigned long node)
struct alloc_control ac = {
.nid = node,
.nmask = NULL,
+   .gfp_mask = __GFP_THISNODE,
};
 
-   return alloc_huge_page_node(h, );
+   return alloc_huge_page_nodemask(h, );
} else if (PageTransHuge(page)) {
struct page *thp;
 
-- 
2.7.4

[PATCH 06/11] mm/hugetlb: do not modify user provided gfp_mask

2020-05-17 Thread js1304

From: Joonsoo Kim 

It's not good practice to modify user input. Instead of using it to
build correct gfp_mask for APIs, this patch introduces another gfp_mask
field, __gfp_mask, for internal usage.

Signed-off-by: Joonsoo Kim 
---
 mm/hugetlb.c  | 15 ---
 mm/internal.h |  2 ++
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 53edd02..5f43b7e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1069,15 +1069,15 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h,
struct zoneref *z;
int node = NUMA_NO_NODE;
 
-   zonelist = node_zonelist(ac->nid, ac->gfp_mask);
+   zonelist = node_zonelist(ac->nid, ac->__gfp_mask);
 
 retry_cpuset:
cpuset_mems_cookie = read_mems_allowed_begin();
for_each_zone_zonelist_nodemask(zone, z, zonelist,
-   gfp_zone(ac->gfp_mask), ac->nmask) {
+   gfp_zone(ac->__gfp_mask), ac->nmask) {
struct page *page;
 
-   if (!cpuset_zone_allowed(zone, ac->gfp_mask))
+   if (!cpuset_zone_allowed(zone, ac->__gfp_mask))
continue;
/*
 * no need to ask again on the same node. Pool is node rather 
than
@@ -1952,7 +1952,7 @@ static struct page *alloc_migrate_huge_page(struct hstate 
*h,
if (hstate_is_gigantic(h))
return NULL;
 
-   page = alloc_fresh_huge_page(h, ac->gfp_mask,
+   page = alloc_fresh_huge_page(h, ac->__gfp_mask,
ac->nid, ac->nmask, NULL);
if (!page)
return NULL;
@@ -1990,9 +1990,10 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac)
 {
-   ac->gfp_mask |= htlb_alloc_mask(h);
+   ac->__gfp_mask = htlb_alloc_mask(h);
+   ac->__gfp_mask |= ac->gfp_mask;
if (ac->thisnode && ac->nid != NUMA_NO_NODE)
-   ac->gfp_mask |= __GFP_THISNODE;
+   ac->__gfp_mask |= __GFP_THISNODE;
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
@@ -2011,7 +2012,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h,
 * will not come from CMA area
 */
if (ac->skip_cma)
-   ac->gfp_mask &= ~__GFP_MOVABLE;
+   ac->__gfp_mask &= ~__GFP_MOVABLE;
 
return alloc_migrate_huge_page(h, ac);
 }
diff --git a/mm/internal.h b/mm/internal.h
index 6b6507e..3239d71 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -620,6 +620,8 @@ struct alloc_control {
gfp_t gfp_mask;
bool thisnode;
bool skip_cma;
+
+   gfp_t __gfp_mask;   /* Used internally in API implementation */
 };
 
 #endif /* __MM_INTERNAL_H */
-- 
2.7.4

[PATCH 10/11] mm/mempolicy: use standard migration target allocation function

2020-05-17 Thread js1304

From: Joonsoo Kim 

There is no reason to implement it's own function for migration
target allocation. Use standard one.

Signed-off-by: Joonsoo Kim 
---
 mm/internal.h  |  3 ---
 mm/mempolicy.c | 33 -
 mm/migrate.c   |  4 +++-
 3 files changed, 7 insertions(+), 33 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index abe94a7..5ade079 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -612,9 +612,6 @@ static inline bool is_migrate_highatomic_page(struct page 
*page)
 }
 
 void setup_zone_pageset(struct zone *zone);
-struct alloc_control;
-extern struct page *alloc_new_node_page(struct page *page,
-   struct alloc_control *ac);
 
 struct alloc_control {
int nid;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 7241621..8d3ccab 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1065,33 +1065,6 @@ static int migrate_page_add(struct page *page, struct 
list_head *pagelist,
return 0;
 }
 
-/* page allocation callback for NUMA node migration */
-struct page *alloc_new_node_page(struct page *page, struct alloc_control *__ac)
-{
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-   struct alloc_control ac = {
-   .nid = __ac->nid,
-   .nmask = NULL,
-   .thisnode = true,
-   };
-
-   return alloc_huge_page_nodemask(h, );
-   } else if (PageTransHuge(page)) {
-   struct page *thp;
-
-   thp = alloc_pages_node(__ac->nid,
-   (GFP_TRANSHUGE | __GFP_THISNODE),
-   HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   } else
-   return __alloc_pages_node(__ac->nid, GFP_HIGHUSER_MOVABLE |
-   __GFP_THISNODE, 0);
-}
-
 /*
  * Migrate pages from one node to a target node.
  * Returns error or the number of pages not migrated.
@@ -1104,6 +1077,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
int err = 0;
struct alloc_control ac = {
.nid = dest,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE,
+   .thisnode = true,
};
 
nodes_clear(nmask);
@@ -1119,8 +1094,8 @@ static int migrate_to_node(struct mm_struct *mm, int 
source, int dest,
flags | MPOL_MF_DISCONTIG_OK, );
 
if (!list_empty()) {
-   err = migrate_pages(, alloc_new_node_page, NULL, ,
-   MIGRATE_SYNC, MR_SYSCALL);
+   err = migrate_pages(, alloc_migration_target, NULL,
+   , MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages();
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 029af0b..3dfb108 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1574,9 +1574,11 @@ static int do_move_pages_to_node(struct mm_struct *mm,
int err;
struct alloc_control ac = {
.nid = node,
+   .gfp_mask = GFP_HIGHUSER_MOVABLE,
+   .thisnode = true,
};
 
-   err = migrate_pages(pagelist, alloc_new_node_page, NULL, ,
+   err = migrate_pages(pagelist, alloc_migration_target, NULL, ,
MIGRATE_SYNC, MR_SYSCALL);
if (err)
putback_movable_pages(pagelist);
-- 
2.7.4

[PATCH 03/11] mm/hugetlb: introduce alloc_control structure to simplify migration target allocation APIs

2020-05-17 Thread js1304

From: Joonsoo Kim 

Currently, page allocation functions for migration requires some arguments.
More worse, in the following patch, more argument will be needed to unify
the similar functions. To simplify them, in this patch, unified data
structure that controls allocation behaviour is introduced.

For clean-up, function declarations are re-ordered.

Note that, gfp_mask handling on alloc_huge_page_(node|nodemask) is
slightly changed, from ASSIGN to OR. It's safe since caller of these
functions doesn't pass extra gfp_mask except htlb_alloc_mask().

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h | 35 +++-
 mm/gup.c| 11 ++---
 mm/hugetlb.c| 62 -
 mm/internal.h   |  7 ++
 mm/mempolicy.c  | 13 +++
 mm/migrate.c| 13 +++
 6 files changed, 83 insertions(+), 58 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 0cced41..6da217e 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -14,6 +14,7 @@
 struct ctl_table;
 struct user_struct;
 struct mmu_gather;
+struct alloc_control;
 
 #ifndef is_hugepd
 typedef struct { unsigned long pd; } hugepd_t;
@@ -502,15 +503,16 @@ struct huge_bootmem_page {
struct hstate *hstate;
 };
 
-struct page *alloc_huge_page(struct vm_area_struct *vma,
-   unsigned long addr, int avoid_reserve);
-struct page *alloc_huge_page_node(struct hstate *h, int nid);
-struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
-   nodemask_t *nmask);
+struct page *alloc_migrate_huge_page(struct hstate *h,
+   struct alloc_control *ac);
+struct page *alloc_huge_page_node(struct hstate *h,
+   struct alloc_control *ac);
+struct page *alloc_huge_page_nodemask(struct hstate *h,
+   struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
unsigned long address);
-struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask,
-int nid, nodemask_t *nmask);
+struct page *alloc_huge_page(struct vm_area_struct *vma,
+   unsigned long addr, int avoid_reserve);
 int huge_add_to_page_cache(struct page *page, struct address_space *mapping,
pgoff_t idx);
 
@@ -752,20 +754,14 @@ static inline void huge_ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 #else  /* CONFIG_HUGETLB_PAGE */
 struct hstate {};
 
-static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
-  unsigned long addr,
-  int avoid_reserve)
-{
-   return NULL;
-}
-
-static inline struct page *alloc_huge_page_node(struct hstate *h, int nid)
+static inline struct page *
+alloc_huge_page_node(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
 }
 
 static inline struct page *
-alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t 
*nmask)
+alloc_huge_page_nodemask(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
 }
@@ -777,6 +773,13 @@ static inline struct page *alloc_huge_page_vma(struct 
hstate *h,
return NULL;
 }
 
+static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
+  unsigned long addr,
+  int avoid_reserve)
+{
+   return NULL;
+}
+
 static inline int __alloc_bootmem_huge_page(struct hstate *h)
 {
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index 0d64ea8..9890fb0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1613,16 +1613,21 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
if (PageHighMem(page))
gfp_mask |= __GFP_HIGHMEM;
 
-#ifdef CONFIG_HUGETLB_PAGE
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
+   struct alloc_control ac = {
+   .nid = nid,
+   .nmask = NULL,
+   .gfp_mask = gfp_mask,
+   };
+
/*
 * We don't want to dequeue from the pool because pool pages 
will
 * mostly be from the CMA region.
 */
-   return alloc_migrate_huge_page(h, gfp_mask, nid, NULL);
+   return alloc_migrate_huge_page(h, );
}
-#endif
+
if (PageTransHuge(page)) {
struct page *thp;
/*
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dcb34d7..859dba4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1054,8 +1054,8 @@ static struct page *dequeue_huge_page_node_exact(struct 
hstate *h, int nid)
return page;
 }
 
-static struct page

[PATCH 02/11] mm/migrate: move migration helper from .h to .c

2020-05-17 Thread js1304

From: Joonsoo Kim 

It's not performance sensitive function. Move it to .c.
This is a preparation step for future change.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 33 +
 mm/migrate.c| 29 +
 2 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..1d70b4a 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -31,34 +31,6 @@ enum migrate_reason {
 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */
 extern const char *migrate_reason_names[MR_TYPES];
 
-static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
-{
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
-   unsigned int order = 0;
-   struct page *new_page = NULL;
-
-   if (PageHuge(page))
-   return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
-   preferred_nid, nodemask);
-
-   if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
-   order = HPAGE_PMD_ORDER;
-   }
-
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   new_page = __alloc_pages_nodemask(gfp_mask, order,
-   preferred_nid, nodemask);
-
-   if (new_page && PageTransHuge(new_page))
-   prep_transhuge_page(new_page);
-
-   return new_page;
-}
-
 #ifdef CONFIG_MIGRATION
 
 extern void putback_movable_pages(struct list_head *l);
@@ -67,6 +39,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
unsigned long private, enum migrate_mode mode, int reason);
+extern struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -85,6 +59,9 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, unsigned long private, enum migrate_mode mode,
int reason)
{ return -ENOSYS; }
+static inline struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+   { return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 5fed030..a298a8c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1519,6 +1519,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
+struct page *new_page_nodemask(struct page *page,
+   int preferred_nid, nodemask_t *nodemask)
+{
+   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   unsigned int order = 0;
+   struct page *new_page = NULL;
+
+   if (PageHuge(page))
+   return alloc_huge_page_nodemask(
+   page_hstate(compound_head(page)),
+   preferred_nid, nodemask);
+
+   if (PageTransHuge(page)) {
+   gfp_mask |= GFP_TRANSHUGE;
+   order = HPAGE_PMD_ORDER;
+   }
+
+   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   gfp_mask |= __GFP_HIGHMEM;
+
+   new_page = __alloc_pages_nodemask(gfp_mask, order,
+   preferred_nid, nodemask);
+
+   if (new_page && PageTransHuge(new_page))
+   prep_transhuge_page(new_page);
+
+   return new_page;
+}
+
 #ifdef CONFIG_NUMA
 
 static int store_status(int __user *status, int start, int value, int nr)
-- 
2.7.4

[PATCH 00/11] clean-up the migration target allocation functions

2020-05-17 Thread js1304

From: Joonsoo Kim 

This patchset clean-up the migration target allocation functions.

Contributions of this patchset are:
1. unify two hugetlb alloc functions. As a result, one is remained.
2. make one external hugetlb alloc function to internal one.
3. unify three functions for migration target allocation.

The patchset is based on next-20200515.
The patchset is available on:

https://github.com/JoonsooKim/linux/tree/cleanup-migration-target-allocation-v1.00-next-20200515

Thanks.

Joonsoo Kim (11):
  mm/page_isolation: prefer the node of the source page
  mm/migrate: move migration helper from .h to .c
  mm/hugetlb: introduce alloc_control structure to simplify migration
target allocation APIs
  mm/hugetlb: unify hugetlb migration callback function
  mm/hugetlb: make hugetlb migration target allocation APIs CMA aware
  mm/hugetlb: do not modify user provided gfp_mask
  mm/migrate: change the interface of the migration target alloc/free
functions
  mm/migrate: make standard migration target allocation functions
  mm/gup: use standard migration target allocation function
  mm/mempolicy: use standard migration target allocation function
  mm/page_alloc: use standard migration target allocation function
directly

 include/linux/hugetlb.h| 33 ++-
 include/linux/migrate.h| 44 +---
 include/linux/page-isolation.h |  4 +-
 mm/compaction.c| 15 ---
 mm/gup.c   | 60 +---
 mm/hugetlb.c   | 91 --
 mm/internal.h  | 13 +-
 mm/memory-failure.c| 14 ---
 mm/memory_hotplug.c| 10 +++--
 mm/mempolicy.c | 39 ++
 mm/migrate.c   | 75 ++
 mm/page_alloc.c|  9 -
 mm/page_isolation.c|  5 ---
 13 files changed, 196 insertions(+), 216 deletions(-)

-- 
2.7.4

[PATCH 11/11] mm/page_alloc: use standard migration target allocation function directly

2020-05-17 Thread js1304

From: Joonsoo Kim 

There is no need to make a function in order to call standard migration
target allocation function. Use standard one directly.

Signed-off-by: Joonsoo Kim 
---
 include/linux/page-isolation.h |  2 --
 mm/page_alloc.c|  9 +++--
 mm/page_isolation.c| 11 ---
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 35e3bdb..20a4b63 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -62,6 +62,4 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
int isol_flags);
 
-struct page *alloc_migrate_target(struct page *page, struct alloc_control *ac);
-
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index afdd0fb..2a7ab2b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8288,6 +8288,11 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
unsigned long pfn = start;
unsigned int tries = 0;
int ret = 0;
+   struct alloc_control ac = {
+   .nid = zone_to_nid(cc->zone),
+   .nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+   };
 
migrate_prep();
 
@@ -8314,8 +8319,8 @@ static int __alloc_contig_migrate_range(struct 
compact_control *cc,
>migratepages);
cc->nr_migratepages -= nr_reclaimed;
 
-   ret = migrate_pages(>migratepages, alloc_migrate_target,
-   NULL, NULL, cc->mode, MR_CONTIG_RANGE);
+   ret = migrate_pages(>migratepages, alloc_migration_target,
+   NULL, , cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
putback_movable_pages(>migratepages);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index aba799d..03d6cad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -297,14 +297,3 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
return pfn < end_pfn ? -EBUSY : 0;
 }
-
-struct page *alloc_migrate_target(struct page *page, struct alloc_control 
*__ac)
-{
-   struct alloc_control ac = {
-   .nid = page_to_nid(page),
-   .nmask = _states[N_MEMORY],
-   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
-   };
-
-   return alloc_migration_target(page, );
-}
-- 
2.7.4

[PATCH 07/11] mm/migrate: change the interface of the migration target alloc/free functions

2020-05-17 Thread js1304

From: Joonsoo Kim 

To prepare unifying duplicated functions in following patches, this patch
changes the interface of the migration target alloc/free functions.
Functions now use struct alloc_control as an argument.

There is no functional change.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h| 15 +++--
 include/linux/page-isolation.h |  4 +++-
 mm/compaction.c| 15 -
 mm/gup.c   |  5 +++--
 mm/internal.h  |  5 -
 mm/memory-failure.c| 13 ++-
 mm/memory_hotplug.c|  9 +---
 mm/mempolicy.c | 22 +++---
 mm/migrate.c   | 51 ++
 mm/page_alloc.c|  2 +-
 mm/page_isolation.c|  9 +---
 11 files changed, 89 insertions(+), 61 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 1d70b4a..923c4f3 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -7,8 +7,9 @@
 #include 
 #include 
 
-typedef struct page *new_page_t(struct page *page, unsigned long private);
-typedef void free_page_t(struct page *page, unsigned long private);
+struct alloc_control;
+typedef struct page *new_page_t(struct page *page, struct alloc_control *ac);
+typedef void free_page_t(struct page *page, struct alloc_control *ac);
 
 /*
  * Return values from addresss_space_operations.migratepage():
@@ -38,9 +39,9 @@ extern int migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
-   unsigned long private, enum migrate_mode mode, int reason);
+   struct alloc_control *ac, enum migrate_mode mode, int reason);
 extern struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask);
+   struct alloc_control *ac);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -56,11 +57,11 @@ extern int migrate_page_move_mapping(struct address_space 
*mapping,
 
 static inline void putback_movable_pages(struct list_head *l) {}
 static inline int migrate_pages(struct list_head *l, new_page_t new,
-   free_page_t free, unsigned long private, enum migrate_mode mode,
-   int reason)
+   free_page_t free, struct alloc_control *ac,
+   enum migrate_mode mode, int reason)
{ return -ENOSYS; }
 static inline struct page *new_page_nodemask(struct page *page,
-   int preferred_nid, nodemask_t *nodemask)
+   struct alloc_control *ac)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
{ return -EBUSY; }
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 5724580..35e3bdb 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,6 +2,8 @@
 #ifndef __LINUX_PAGEISOLATION_H
 #define __LINUX_PAGEISOLATION_H
 
+struct alloc_control;
+
 #ifdef CONFIG_MEMORY_ISOLATION
 static inline bool has_isolate_pageblock(struct zone *zone)
 {
@@ -60,6 +62,6 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned 
long end_pfn,
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
int isol_flags);
 
-struct page *alloc_migrate_target(struct page *page, unsigned long private);
+struct page *alloc_migrate_target(struct page *page, struct alloc_control *ac);
 
 #endif
diff --git a/mm/compaction.c b/mm/compaction.c
index 67fd317..aec1c1f 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1561,9 +1561,9 @@ static void isolate_freepages(struct compact_control *cc)
  * from the isolated freelists in the block we are migrating to.
  */
 static struct page *compaction_alloc(struct page *migratepage,
-   unsigned long data)
+   struct alloc_control *ac)
 {
-   struct compact_control *cc = (struct compact_control *)data;
+   struct compact_control *cc = (struct compact_control *)ac->private;
struct page *freepage;
 
if (list_empty(>freepages)) {
@@ -1585,9 +1585,9 @@ static struct page *compaction_alloc(struct page 
*migratepage,
  * freelist.  All pages on the freelist are from the same zone, so there is no
  * special handling needed for NUMA.
  */
-static void compaction_free(struct page *page, unsigned long data)
+static void compaction_free(struct page *page, struct alloc_control *ac)
 {
-   struct compact_control *cc = (struct compact_control *)data;
+   struct compact_control *cc = (struct compact_control *)ac->private;
 
list_add(>lru, >freepages);
cc->nr_freepages++;
@@ -2095,6 +2095,9 @@

[PATCH 01/11] mm/page_isolation: prefer the node of the source page

2020-05-17 Thread js1304

From: Joonsoo Kim 

For locality, it's better to migrate the page to the same node
rather than the node of the current caller's cpu.

Signed-off-by: Joonsoo Kim 
---
 mm/page_isolation.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 2c11a38..7df89bd 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -300,5 +300,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned 
long end_pfn,
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private)
 {
-   return new_page_nodemask(page, numa_node_id(), _states[N_MEMORY]);
+   int nid = page_to_nid(page);
+
+   return new_page_nodemask(page, nid, _states[N_MEMORY]);
 }
-- 
2.7.4

[PATCH 05/11] mm/hugetlb: make hugetlb migration target allocation APIs CMA aware

2020-05-17 Thread js1304

From: Joonsoo Kim 

There is a user who do not want to use CMA memory for migration. Until
now, it is implemented by caller side but it's not optimal since there
is limited information on caller. This patch implements it on callee side
to get better result.

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  2 --
 mm/gup.c|  9 +++--
 mm/hugetlb.c| 21 +
 mm/internal.h   |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 4892ed3..6485e92 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -503,8 +503,6 @@ struct huge_bootmem_page {
struct hstate *hstate;
 };
 
-struct page *alloc_migrate_huge_page(struct hstate *h,
-   struct alloc_control *ac);
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
diff --git a/mm/gup.c b/mm/gup.c
index 9890fb0..1c86db5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1618,14 +1618,11 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
struct alloc_control ac = {
.nid = nid,
.nmask = NULL,
-   .gfp_mask = gfp_mask,
+   .gfp_mask = __GFP_NOWARN,
+   .skip_cma = true,
};
 
-   /*
-* We don't want to dequeue from the pool because pool pages 
will
-* mostly be from the CMA region.
-*/
-   return alloc_migrate_huge_page(h, );
+   return alloc_huge_page_nodemask(h, );
}
 
if (PageTransHuge(page)) {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 60b0983..53edd02 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1034,13 +1034,19 @@ static void enqueue_huge_page(struct hstate *h, struct 
page *page)
h->free_huge_pages_node[nid]++;
 }
 
-static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
+static struct page *dequeue_huge_page_node_exact(struct hstate *h,
+   int nid, bool skip_cma)
 {
struct page *page;
 
-   list_for_each_entry(page, >hugepage_freelists[nid], lru)
+   list_for_each_entry(page, >hugepage_freelists[nid], lru) {
+   if (skip_cma && is_migrate_cma_page(page))
+   continue;
+
if (!PageHWPoison(page))
break;
+   }
+
/*
 * if 'non-isolated free hugepage' not found on the list,
 * the allocation fails.
@@ -1081,7 +1087,7 @@ static struct page *dequeue_huge_page_nodemask(struct 
hstate *h,
continue;
node = zone_to_nid(zone);
 
-   page = dequeue_huge_page_node_exact(h, node);
+   page = dequeue_huge_page_node_exact(h, node, ac->skip_cma);
if (page)
return page;
}
@@ -1938,7 +1944,7 @@ static struct page *alloc_surplus_huge_page(struct hstate 
*h, gfp_t gfp_mask,
return page;
 }
 
-struct page *alloc_migrate_huge_page(struct hstate *h,
+static struct page *alloc_migrate_huge_page(struct hstate *h,
struct alloc_control *ac)
 {
struct page *page;
@@ -2000,6 +2006,13 @@ struct page *alloc_huge_page_nodemask(struct hstate *h,
}
spin_unlock(_lock);
 
+   /*
+* clearing __GFP_MOVABLE flag ensure that allocated page
+* will not come from CMA area
+*/
+   if (ac->skip_cma)
+   ac->gfp_mask &= ~__GFP_MOVABLE;
+
return alloc_migrate_huge_page(h, ac);
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 574722d0..6b6507e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -619,6 +619,7 @@ struct alloc_control {
nodemask_t *nmask;
gfp_t gfp_mask;
bool thisnode;
+   bool skip_cma;
 };
 
 #endif /* __MM_INTERNAL_H */
-- 
2.7.4

[PATCH 04/11] mm/hugetlb: unify hugetlb migration callback function

2020-05-17 Thread js1304

From: Joonsoo Kim 

There is no difference between two migration callback functions,
alloc_huge_page_node() and alloc_huge_page_nodemask(), except
__GFP_THISNODE handling. This patch adds one more field on to
the alloc_control and handles this exception.

Signed-off-by: Joonsoo Kim 
---
 include/linux/hugetlb.h |  8 
 mm/hugetlb.c| 23 ++-
 mm/internal.h   |  1 +
 mm/mempolicy.c  |  3 ++-
 4 files changed, 5 insertions(+), 30 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6da217e..4892ed3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -505,8 +505,6 @@ struct huge_bootmem_page {
 
 struct page *alloc_migrate_huge_page(struct hstate *h,
struct alloc_control *ac);
-struct page *alloc_huge_page_node(struct hstate *h,
-   struct alloc_control *ac);
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac);
 struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma,
@@ -755,12 +753,6 @@ static inline void huge_ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 struct hstate {};
 
 static inline struct page *
-alloc_huge_page_node(struct hstate *h, struct alloc_control *ac)
-{
-   return NULL;
-}
-
-static inline struct page *
 alloc_huge_page_nodemask(struct hstate *h, struct alloc_control *ac)
 {
return NULL;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 859dba4..60b0983 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1981,31 +1981,12 @@ struct page *alloc_buddy_huge_page_with_mpol(struct 
hstate *h,
 }
 
 /* page migration callback function */
-struct page *alloc_huge_page_node(struct hstate *h,
-   struct alloc_control *ac)
-{
-   struct page *page = NULL;
-
-   ac->gfp_mask |= htlb_alloc_mask(h);
-   if (ac->nid != NUMA_NO_NODE)
-   ac->gfp_mask |= __GFP_THISNODE;
-
-   spin_lock(_lock);
-   if (h->free_huge_pages - h->resv_huge_pages > 0)
-   page = dequeue_huge_page_nodemask(h, ac);
-   spin_unlock(_lock);
-
-   if (!page)
-   page = alloc_migrate_huge_page(h, ac);
-
-   return page;
-}
-
-/* page migration callback function */
 struct page *alloc_huge_page_nodemask(struct hstate *h,
struct alloc_control *ac)
 {
ac->gfp_mask |= htlb_alloc_mask(h);
+   if (ac->thisnode && ac->nid != NUMA_NO_NODE)
+   ac->gfp_mask |= __GFP_THISNODE;
 
spin_lock(_lock);
if (h->free_huge_pages - h->resv_huge_pages > 0) {
diff --git a/mm/internal.h b/mm/internal.h
index 75b3f8e..574722d0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -618,6 +618,7 @@ struct alloc_control {
int nid;
nodemask_t *nmask;
gfp_t gfp_mask;
+   bool thisnode;
 };
 
 #endif /* __MM_INTERNAL_H */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 06f60a5..629feaa 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1073,9 +1073,10 @@ struct page *alloc_new_node_page(struct page *page, 
unsigned long node)
struct alloc_control ac = {
.nid = node,
.nmask = NULL,
+   .thisnode = true,
};
 
-   return alloc_huge_page_node(h, );
+   return alloc_huge_page_nodemask(h, );
} else if (PageTransHuge(page)) {
struct page *thp;
 
-- 
2.7.4

[PATCH 09/11] mm/gup: use standard migration target allocation function

2020-05-17 Thread js1304

From: Joonsoo Kim 

There is no reason to implement it's own function for migration
target allocation. Use standard one.

Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 61 ++---
 1 file changed, 10 insertions(+), 51 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index be9cb79..d88a965 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1594,58 +1594,16 @@ static bool check_dax_vmas(struct vm_area_struct 
**vmas, long nr_pages)
 }
 
 #ifdef CONFIG_CMA
-static struct page *new_non_cma_page(struct page *page,
+static struct page *alloc_migration_target_non_cma(struct page *page,
struct alloc_control *ac)
 {
-   /*
-* We want to make sure we allocate the new page from the same node
-* as the source page.
-*/
-   int nid = page_to_nid(page);
-   /*
-* Trying to allocate a page for migration. Ignore allocation
-* failure warnings. We don't force __GFP_THISNODE here because
-* this node here is the node where we have CMA reservation and
-* in some case these nodes will have really less non movable
-* allocation memory.
-*/
-   gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
-
-   if (PageHighMem(page))
-   gfp_mask |= __GFP_HIGHMEM;
-
-   if (PageHuge(page)) {
-   struct hstate *h = page_hstate(page);
-   struct alloc_control ac = {
-   .nid = nid,
-   .nmask = NULL,
-   .gfp_mask = __GFP_NOWARN,
-   .skip_cma = true,
-   };
-
-   return alloc_huge_page_nodemask(h, );
-   }
-
-   if (PageTransHuge(page)) {
-   struct page *thp;
-   /*
-* ignore allocation failure warnings
-*/
-   gfp_t thp_gfpmask = GFP_TRANSHUGE | __GFP_NOWARN;
-
-   /*
-* Remove the movable mask so that we don't allocate from
-* CMA area again.
-*/
-   thp_gfpmask &= ~__GFP_MOVABLE;
-   thp = __alloc_pages_node(nid, thp_gfpmask, HPAGE_PMD_ORDER);
-   if (!thp)
-   return NULL;
-   prep_transhuge_page(thp);
-   return thp;
-   }
+   struct alloc_control __ac = {
+   .nid = page_to_nid(page),
+   .gfp_mask = GFP_USER | __GFP_NOWARN,
+   .skip_cma = true,
+   };
 
-   return __alloc_pages_node(nid, gfp_mask, 0);
+   return alloc_migration_target(page, &__ac);
 }
 
 static long check_and_migrate_cma_pages(struct task_struct *tsk,
@@ -1707,8 +1665,9 @@ static long check_and_migrate_cma_pages(struct 
task_struct *tsk,
for (i = 0; i < nr_pages; i++)
put_page(pages[i]);
 
-   if (migrate_pages(_page_list, new_non_cma_page,
- NULL, NULL, MIGRATE_SYNC, MR_CONTIG_RANGE)) {
+   if (migrate_pages(_page_list,
+   alloc_migration_target_non_cma, NULL, NULL,
+   MIGRATE_SYNC, MR_CONTIG_RANGE)) {
/*
 * some of the pages failed migration. Do get_user_pages
 * without migration.
-- 
2.7.4

[PATCH 08/11] mm/migrate: make standard migration target allocation functions

2020-05-17 Thread js1304

From: Joonsoo Kim 

There are some similar functions for migration target allocation. Since
there is no fundamental difference, it's better to keep just one rather
than keeping all variants. This patch implements base migration target
allocation function. In the following patches, variants will be converted
to use this function.

Note that PageHighmem() call in previous function is changed to open-code
"is_highmem_idx()" since it provides more readability.

Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h |  6 +++---
 mm/memory-failure.c |  3 ++-
 mm/memory_hotplug.c |  3 ++-
 mm/migrate.c| 26 +++---
 mm/page_isolation.c |  3 ++-
 5 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 923c4f3..abf09b3 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -40,8 +40,8 @@ extern int migrate_page(struct address_space *mapping,
enum migrate_mode mode);
 extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free,
struct alloc_control *ac, enum migrate_mode mode, int reason);
-extern struct page *new_page_nodemask(struct page *page,
-   struct alloc_control *ac);
+extern struct page *alloc_migration_target(struct page *page,
+   struct alloc_control *ac);
 extern int isolate_movable_page(struct page *page, isolate_mode_t mode);
 extern void putback_movable_page(struct page *page);
 
@@ -60,7 +60,7 @@ static inline int migrate_pages(struct list_head *l, 
new_page_t new,
free_page_t free, struct alloc_control *ac,
enum migrate_mode mode, int reason)
{ return -ENOSYS; }
-static inline struct page *new_page_nodemask(struct page *page,
+static inline struct page *alloc_migration_target(struct page *page,
struct alloc_control *ac)
{ return NULL; }
 static inline int isolate_movable_page(struct page *page, isolate_mode_t mode)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3f92e70..b400161 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1626,9 +1626,10 @@ static struct page *new_page(struct page *p, struct 
alloc_control *__ac)
struct alloc_control ac = {
.nid = page_to_nid(p),
.nmask = _states[N_MEMORY],
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
};
 
-   return new_page_nodemask(p, );
+   return alloc_migration_target(p, );
 }
 
 /*
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 89642f9..185f4c9 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1249,7 +1249,8 @@ static struct page *new_node_page(struct page *page, 
struct alloc_control *__ac)
 
ac.nid = nid;
ac.nmask = 
-   return new_page_nodemask(page, );
+   ac.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+   return alloc_migration_target(page, );
 }
 
 static int
diff --git a/mm/migrate.c b/mm/migrate.c
index ba31153..029af0b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1519,31 +1519,35 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
return rc;
 }
 
-struct page *new_page_nodemask(struct page *page, struct alloc_control *ac)
+struct page *alloc_migration_target(struct page *page, struct alloc_control 
*ac)
 {
-   gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
unsigned int order = 0;
struct page *new_page = NULL;
+   int zidx;
 
+   /* hugetlb has it's own gfp handling logic */
if (PageHuge(page)) {
struct hstate *h = page_hstate(page);
-   struct alloc_control __ac = {
-   .nid = ac->nid,
-   .nmask = ac->nmask,
-   };
 
-   return alloc_huge_page_nodemask(h, &__ac);
+   return alloc_huge_page_nodemask(h, ac);
}
 
+   ac->__gfp_mask = ac->gfp_mask;
if (PageTransHuge(page)) {
-   gfp_mask |= GFP_TRANSHUGE;
+   ac->__gfp_mask |= GFP_TRANSHUGE;
order = HPAGE_PMD_ORDER;
}
+   zidx = zone_idx(page_zone(page));
+   if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
+   ac->__gfp_mask |= __GFP_HIGHMEM;
 
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
-   gfp_mask |= __GFP_HIGHMEM;
+   if (ac->thisnode)
+   ac->__gfp_mask |= __GFP_THISNODE;
+   if (ac->skip_cma)
+   ac->__gfp_mask &= ~__GFP_MOVABLE;
 
-   new_page = __alloc_pages_nodemask(gfp_mask, order, ac->nid, ac->nmask);
+   new_page = __alloc_pages_nodemask(ac->__gfp_mask, order,
+   ac->nid, ac->nmask);
 
if (new_page && PageTransHuge(new_page))
prep_transhuge_page(new_page);
diff --git

[PATCH v2 05/10] mm/gup: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #2 for this patch.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 mm/gup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 11fda53..9652eed 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1608,7 +1608,7 @@ static struct page *new_non_cma_page(struct page *page, 
unsigned long private)
 */
gfp_t gfp_mask = GFP_USER | __GFP_NOWARN;
 
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
gfp_mask |= __GFP_HIGHMEM;
 
 #ifdef CONFIG_HUGETLB_PAGE
-- 
2.7.4

[PATCH v2 06/10] mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5548e88..56c9143 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2639,7 +2639,7 @@ static void try_to_free_low(struct hstate *h, unsigned 
long count,
list_for_each_entry_safe(page, next, freel, lru) {
if (count >= h->nr_huge_pages)
return;
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
continue;
list_del(>lru);
update_and_free_page(h, page);
-- 
2.7.4

[PATCH v2 04/10] power: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 kernel/power/snapshot.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 6598001..be759a6 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1227,7 +1227,7 @@ static struct page *saveable_highmem_page(struct zone 
*zone, unsigned long pfn)
if (!page || page_zone(page) != zone)
return NULL;
 
-   BUG_ON(!PageHighMem(page));
+   BUG_ON(!PageHighMemZone(page));
 
if (swsusp_page_is_forbidden(page) ||  swsusp_page_is_free(page))
return NULL;
@@ -1291,7 +1291,7 @@ static struct page *saveable_page(struct zone *zone, 
unsigned long pfn)
if (!page || page_zone(page) != zone)
return NULL;
 
-   BUG_ON(PageHighMem(page));
+   BUG_ON(PageHighMemZone(page));
 
if (swsusp_page_is_forbidden(page) || swsusp_page_is_free(page))
return NULL;
@@ -1529,7 +1529,7 @@ static unsigned long preallocate_image_pages(unsigned 
long nr_pages, gfp_t mask)
if (!page)
break;
memory_bm_set_bit(_bm, page_to_pfn(page));
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
alloc_highmem++;
else
alloc_normal++;
@@ -1625,7 +1625,7 @@ static unsigned long free_unnecessary_pages(void)
unsigned long pfn = memory_bm_next_pfn(_bm);
struct page *page = pfn_to_page(pfn);
 
-   if (PageHighMem(page)) {
+   if (PageHighMemZone(page)) {
if (!to_free_highmem)
continue;
to_free_highmem--;
@@ -2264,7 +2264,7 @@ static unsigned int count_highmem_image_pages(struct 
memory_bitmap *bm)
memory_bm_position_reset(bm);
pfn = memory_bm_next_pfn(bm);
while (pfn != BM_END_OF_MAP) {
-   if (PageHighMem(pfn_to_page(pfn)))
+   if (PageHighMemZone(pfn_to_page(pfn)))
cnt++;
 
pfn = memory_bm_next_pfn(bm);
@@ -2541,7 +2541,7 @@ static void *get_buffer(struct memory_bitmap *bm, struct 
chain_allocator *ca)
return ERR_PTR(-EFAULT);
 
page = pfn_to_page(pfn);
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
return get_highmem_page_buffer(page, ca);
 
if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
-- 
2.7.4

[PATCH v2 08/10] mm/page_alloc: correct the use of is_highmem_idx()

2020-04-28 Thread js1304

From: Joonsoo Kim 

What we'd like to check here is whether page has direct mapping or not.
Use PageHighMem() since it is perfectly matched for this purpose.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7fe5115..da473c7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1399,7 +1399,7 @@ static void __meminit __init_single_page(struct page 
*page, unsigned long pfn,
INIT_LIST_HEAD(>lru);
 #ifdef WANT_PAGE_VIRTUAL
/* The shift won't overflow because ZONE_NORMAL is below 4G. */
-   if (!is_highmem_idx(zone))
+   if (!PageHighMem(page))
set_page_address(page, __va(pfn << PAGE_SHIFT));
 #endif
 }
-- 
2.7.4

[PATCH v2 10/10] mm/page-flags: change the implementation of the PageHighMem()

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Previous patches introduce PageHighMemZone() macro and separates both
cases strictly. So, now, PageHighMem() is used just for checking if
there is a direct mapping for this page or not.

In the following patchset, ZONE_MOVABLE which could be considered as
the highmem type zone in some configuration could have both types of
pages, direct mapped pages and unmapped pages. So, current implementation
of PageHighMem() that checks the zone rather than checks the page in order
to check if a direct mapping exists will be invalid. This patch prepares
that case by implementing PageHighMem() with the max_low_pfn.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 include/linux/page-flags.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index fca0cce..7ac5fc8 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -375,6 +375,8 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 
 #ifdef CONFIG_HIGHMEM
+extern unsigned long max_low_pfn;
+
 /*
  * Must use a macro here due to header dependency issues. page_zone() is not
  * available at this point.
@@ -383,7 +385,7 @@ PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
  * in order to predict previous gfp_flags or to count something for system
  * memory management.
  */
-#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))
+#define PageHighMem(__p) (page_to_pfn(__p) >= max_low_pfn)
 #define PageHighMemZone(__p) is_highmem_idx(page_zonenum(__p))
 #else
 PAGEFLAG_FALSE(HighMem)
-- 
2.7.4

[PATCH v2 09/10] mm/migrate: replace PageHighMem() with open-code

2020-04-28 Thread js1304

From: Joonsoo Kim 

Implementation of PageHighMem() will be changed in following patches.
Before that, use open-code to avoid the side effect of implementation
change on PageHighMem().

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 include/linux/migrate.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e546cb..a9cfd8e 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -37,6 +37,7 @@ static inline struct page *new_page_nodemask(struct page 
*page,
gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
unsigned int order = 0;
struct page *new_page = NULL;
+   int zidx;
 
if (PageHuge(page))
return 
alloc_huge_page_nodemask(page_hstate(compound_head(page)),
@@ -47,7 +48,8 @@ static inline struct page *new_page_nodemask(struct page 
*page,
order = HPAGE_PMD_ORDER;
}
 
-   if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE))
+   zidx = zone_idx(page_zone(page));
+   if (is_highmem_idx(zidx) || zidx == ZONE_MOVABLE)
gfp_mask |= __GFP_HIGHMEM;
 
new_page = __alloc_pages_nodemask(gfp_mask, order,
-- 
2.7.4

[PATCH v2 07/10] mm: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #3 for this patch.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 mm/memory_hotplug.c | 2 +-
 mm/page_alloc.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 555137b..891c214 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -593,7 +593,7 @@ void generic_online_page(struct page *page, unsigned int 
order)
__free_pages_core(page, order);
totalram_pages_add(1UL << order);
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
totalhigh_pages_add(1UL << order);
 #endif
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fc5919e..7fe5115 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7444,7 +7444,7 @@ void adjust_managed_page_count(struct page *page, long 
count)
atomic_long_add(count, _zone(page)->managed_pages);
totalram_pages_add(count);
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page))
+   if (PageHighMemZone(page))
totalhigh_pages_add(count);
 #endif
 }
-- 
2.7.4

[PATCH v2 03/10] kexec: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #2 for this patch.

Acked-by: Roman Gushchin 
Signed-off-by: Joonsoo Kim 
---
 kernel/kexec_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ba1d91e..33097b7 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -766,7 +766,7 @@ static struct page *kimage_alloc_page(struct kimage *image,
 * gfp_flags honor the ones passed in.
 */
if (!(gfp_mask & __GFP_HIGHMEM) &&
-   PageHighMem(old_page)) {
+   PageHighMemZone(old_page)) {
kimage_free_pages(old_page);
continue;
}
-- 
2.7.4

[PATCH v2 02/10] drm/ttm: separate PageHighMem() and PageHighMemZone() use case

2020-04-28 Thread js1304

From: Joonsoo Kim 

Until now, PageHighMem() is used for two different cases. One is to check
if there is a direct mapping for this page or not. The other is to check
the zone of this page, that is, weather it is the highmem type zone or not.

Now, we have separate functions, PageHighMem() and PageHighMemZone() for
each cases. Use appropriate one.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

I apply the rule #4 for this patch.

Acked-by: Roman Gushchin 
Reviewed-by: Christian König 
Signed-off-by: Joonsoo Kim 
---
 drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +-
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 2 +-
 drivers/gpu/drm/ttm/ttm_tt.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index acd63b7..d071b71 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -641,7 +641,7 @@ int ttm_mem_global_alloc_page(struct ttm_mem_global *glob,
 */
 
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
+   if (PageHighMemZone(page) && glob->zone_highmem != NULL)
zone = glob->zone_highmem;
 #else
if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
@@ -656,7 +656,7 @@ void ttm_mem_global_free_page(struct ttm_mem_global *glob, 
struct page *page,
struct ttm_mem_zone *zone = NULL;
 
 #ifdef CONFIG_HIGHMEM
-   if (PageHighMem(page) && glob->zone_highmem != NULL)
+   if (PageHighMemZone(page) && glob->zone_highmem != NULL)
zone = glob->zone_highmem;
 #else
if (glob->zone_dma32 && page_to_pfn(page) > 0x0010UL)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index b40a467..847fabe 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -530,7 +530,7 @@ static int ttm_alloc_new_pages(struct list_head *pages, 
gfp_t gfp_flags,
/* gfp flags of highmem page should never be dma32 so we
 * we should be fine in such case
 */
-   if (PageHighMem(p))
+   if (PageHighMemZone(p))
continue;
 
 #endif
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c 
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index faefaae..338b2a2 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -747,7 +747,7 @@ static int ttm_dma_pool_alloc_new_pages(struct dma_pool 
*pool,
/* gfp flags of highmem page should never be dma32 so we
 * we should be fine in such case
 */
-   if (PageHighMem(p))
+   if (PageHighMemZone(p))
continue;
 #endif
 
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 2ec448e..6e094dd 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -119,7 +119,7 @@ static int ttm_tt_set_page_caching(struct page *p,
 {
int ret = 0;
 
-   if (PageHighMem(p))
+   if (PageHighMemZone(p))
return 0;
 
if (c_old != tt_cached) {
-- 
2.7.4

[PATCH v2 00/10] change the implementation of the PageHighMem()

2020-04-28 Thread js1304

From: Joonsoo Kim 

Changes on v2
- add "acked-by", "reviewed-by" tags
- replace PageHighMem() with use open-code, instead of using
new PageHighMemZone() macro. Related file is "include/linux/migrate.h"

Hello,

This patchset separates two use cases of PageHighMem() by introducing
PageHighMemZone() macro. And, it changes the implementation of
PageHighMem() to reflect the actual meaning of this macro. This patchset
is a preparation step for the patchset,
"mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" [1].

PageHighMem() is used for two different cases. One is to check if there
is a direct mapping for this page or not. The other is to check the
zone of this page, that is, weather it is the highmem type zone or not.

Until now, both the cases are the perfectly same thing. So, implementation
of the PageHighMem() uses the one case that checks if the zone of the page
is the highmem type zone or not.

"#define PageHighMem(__p) is_highmem_idx(page_zonenum(__p))"

ZONE_MOVABLE is special. It is considered as normal type zone on
!CONFIG_HIGHMEM, but, it is considered as highmem type zone
on CONFIG_HIGHMEM. Let's focus on later case. In later case, all pages
on the ZONE_MOVABLE has no direct mapping until now.

However, following patchset
"mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE"
, which is once merged and reverted, will be tried again and will break
this assumption that all pages on the ZONE_MOVABLE has no direct mapping.
Hence, the ZONE_MOVABLE which is considered as highmem type zone could
have the both types of pages, direct mapped and not. Since
the ZONE_MOVABLE could have both type of pages, __GFP_HIGHMEM is still
required to allocate the memory from it. And, we conservatively need to
consider the ZONE_MOVABLE as highmem type zone.

Even in this situation, PageHighMem() for the pages on the ZONE_MOVABLE
when it is called for checking the direct mapping should return correct
result. Current implementation of PageHighMem() just returns TRUE
if the zone of the page is on a highmem type zone. So, it could be wrong
if the page on the MOVABLE_ZONE is actually direct mapped.

To solve this potential problem, this patch introduces a new
PageHighMemZone() macro. In following patches, two use cases of
PageHighMem() are separated by calling proper macro, PageHighMem() and
PageHighMemZone(). Then, implementation of PageHighMem() will be changed
as just checking if the direct mapping exists or not, regardless of
the zone of the page.

Note that there are some rules to determine the proper macro.

1. If PageHighMem() is called for checking if the direct mapping exists
or not, use PageHighMem().
2. If PageHighMem() is used to predict the previous gfp_flags for
this page, use PageHighMemZone(). The zone of the page is related to
the gfp_flags.
3. If purpose of calling PageHighMem() is to count highmem page and
to interact with the system by using this count, use PageHighMemZone().
This counter is usually used to calculate the available memory for an
kernel allocation and pages on the highmem zone cannot be available
for an kernel allocation.
4. Otherwise, use PageHighMemZone(). It's safe since it's implementation
is just copy of the previous PageHighMem() implementation and won't
be changed.

My final plan is to change the name, PageHighMem() to PageNoDirectMapped()
or something else in order to represent proper meaning.

This patchset is based on next-20200428 and you can find the full patchset on 
the
following link.

https://github.com/JoonsooKim/linux/tree/page_highmem-cleanup-v2.00-next-20200428

Thanks.

[1]: 
https://lore.kernel.org/linux-mm/1512114786-5085-1-git-send-email-iamjoonsoo@lge.com

Joonsoo Kim (10):
  mm/page-flags: introduce PageHighMemZone()
  drm/ttm: separate PageHighMem() and PageHighMemZone() use case
  kexec: separate PageHighMem() and PageHighMemZone() use case
  power: separate PageHighMem() and PageHighMemZone() use case
  mm/gup: separate PageHighMem() and PageHighMemZone() use case
  mm/hugetlb: separate PageHighMem() and PageHighMemZone() use case
  mm: separate PageHighMem() and PageHighMemZone() use case
  mm/page_alloc: correct the use of is_highmem_idx()
  mm/migrate: replace PageHighMem() with open-code
  mm/page-flags: change the implementation of the PageHighMem()

 drivers/gpu/drm/ttm/ttm_memory.c |  4 ++--
 drivers/gpu/drm/ttm/ttm_page_alloc.c |  2 +-
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |  2 +-
 drivers/gpu/drm/ttm/ttm_tt.c |  2 +-
 include/linux/migrate.h  |  4 +++-
 include/linux/page-flags.h   | 10 +-
 kernel/kexec_core.c  |  2 +-
 kernel/power/snapshot.c  | 12 ++--
 mm/gup.c |  2 +-
 mm/hugetlb.c |  2 +-
 mm/memory_hotplug.c  |  2 +-
 mm/page_alloc.c  |  4 ++--
 12 files changed, 29 insertions(+), 19 deletions(-)

--

1 2 3 4 5 6 >

1 - 100 of 591 matches

Mail list logo