Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
2020년 7월 9일 (목) 오후 3:43, Michal Hocko 님이 작성: > > On Wed 08-07-20 09:41:06, Michal Hocko wrote: > > On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: > > > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > > > > On 7/7/20 9:44 AM, js1...@gmail.com wrote: > > > > > From: Joonsoo Kim > > > > > > > > > > new_non_cma_page() in gup.c which try to allocate migration target > > > > > page > > > > > requires to allocate the new page that is not on the CMA area. > > > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. > > > > > This way > > > > > works well for THP page or normal page but not for hugetlb page. > > > > > > > > > > hugetlb page allocation process consists of two steps. First is > > > > > dequeing > > > > > from the pool. Second is, if there is no available page on the queue, > > > > > allocating from the page allocator. > > > > > > > > > > new_non_cma_page() can control allocation from the page allocator by > > > > > specifying correct gfp flag. However, dequeing cannot be controlled > > > > > until > > > > > now, so, new_non_cma_page() skips dequeing completely. It is a > > > > > suboptimal > > > > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so > > > > > this > > > > > patch tries to fix this situation. > > > > > > > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > > > > pages if newly added skip_cma argument is passed as true. > > > > > > > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test > > > > the PF_ > > > > flag and avoid adding bool skip_cma everywhere? > > > > > > Okay! Please check following patch. > > > > > > > > I think that's what Michal suggested [1] except he said "the code > > > > already does > > > > by memalloc_nocma_{save,restore} API". It needs extending a bit though, > > > > AFAICS. > > > > __gup_longterm_locked() indeed does the save/restore, but restore comes > > > > before > > > > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so > > > > an > > > > adjustment is needed there, but that's all? > > > > > > > > Hm the adjustment should be also done because save/restore is done > > > > around > > > > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls > > > > __get_user_pages_locked(), and that call not being between nocma save > > > > and > > > > restore is thus also a correctness issue? > > > > > > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It > > > would not cause any problem. > > > > I believe a proper fix is the following. The scope is really defined for > > FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages > > will solve the problem as well but it imho makes more sense to do it in > > the caller the same way we do for any others. > > > > Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages > > allocated from CMA region") > > > > I am not sure this is worth backporting to stable yet. > > Should I post it as a separate patch do you plan to include this into your > next version? It's better to include it on my next version since this patch would cause a conflict with the next tree that includes my v3 of this patchset. Thanks.
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Wed 08-07-20 09:41:06, Michal Hocko wrote: > On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: > > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > > > On 7/7/20 9:44 AM, js1...@gmail.com wrote: > > > > From: Joonsoo Kim > > > > > > > > new_non_cma_page() in gup.c which try to allocate migration target page > > > > requires to allocate the new page that is not on the CMA area. > > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This > > > > way > > > > works well for THP page or normal page but not for hugetlb page. > > > > > > > > hugetlb page allocation process consists of two steps. First is > > > > dequeing > > > > from the pool. Second is, if there is no available page on the queue, > > > > allocating from the page allocator. > > > > > > > > new_non_cma_page() can control allocation from the page allocator by > > > > specifying correct gfp flag. However, dequeing cannot be controlled > > > > until > > > > now, so, new_non_cma_page() skips dequeing completely. It is a > > > > suboptimal > > > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so > > > > this > > > > patch tries to fix this situation. > > > > > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > > > pages if newly added skip_cma argument is passed as true. > > > > > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the > > > PF_ > > > flag and avoid adding bool skip_cma everywhere? > > > > Okay! Please check following patch. > > > > > > I think that's what Michal suggested [1] except he said "the code already > > > does > > > by memalloc_nocma_{save,restore} API". It needs extending a bit though, > > > AFAICS. > > > __gup_longterm_locked() indeed does the save/restore, but restore comes > > > before > > > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an > > > adjustment is needed there, but that's all? > > > > > > Hm the adjustment should be also done because save/restore is done around > > > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls > > > __get_user_pages_locked(), and that call not being between nocma save and > > > restore is thus also a correctness issue? > > > > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It > > would not cause any problem. > > I believe a proper fix is the following. The scope is really defined for > FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages > will solve the problem as well but it imho makes more sense to do it in > the caller the same way we do for any others. > > Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages > allocated from CMA region") > > I am not sure this is worth backporting to stable yet. Should I post it as a separate patch do you plan to include this into your next version? > > diff --git a/mm/gup.c b/mm/gup.c > index de9e36262ccb..75980dd5a2fc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct > *tsk, >vmas_tmp, NULL, gup_flags); > > if (gup_flags & FOLL_LONGTERM) { > - memalloc_nocma_restore(flags); > if (rc < 0) > goto out; > > @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct > *tsk, > for (i = 0; i < rc; i++) > put_page(pages[i]); > rc = -EOPNOTSUPP; > + memalloc_nocma_restore(flags); > goto out; > } > > rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages, >vmas_tmp, gup_flags); > + memalloc_nocma_restore(flags); > } > > out: > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On 7/8/20 12:16 AM, Joonsoo Kim wrote: > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: >> On 7/7/20 9:44 AM, js1...@gmail.com wrote: >>> From: Joonsoo Kim >>> <...> >>> This patch makes the deque function on hugetlb CMA aware and skip CMA >>> pages if newly added skip_cma argument is passed as true. >> >> Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_ >> flag and avoid adding bool skip_cma everywhere? > > Okay! Please check following patch. >> >> I think that's what Michal suggested [1] except he said "the code already >> does >> by memalloc_nocma_{save,restore} API". It needs extending a bit though, >> AFAICS. >> __gup_longterm_locked() indeed does the save/restore, but restore comes >> before >> check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an >> adjustment is needed there, but that's all? >> >> Hm the adjustment should be also done because save/restore is done around >> __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls >> __get_user_pages_locked(), and that call not being between nocma save and >> restore is thus also a correctness issue? > > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It > would not cause any problem. > > -->8--- > From bcfc57e3c6f2df1ad2940308b89d740cd3f0fba8 Mon Sep 17 00:00:00 2001 > From: Joonsoo Kim > Date: Wed, 8 Jul 2020 14:39:26 +0900 > Subject: [PATCH] mm/hugetlb: make hugetlb migration callback CMA aware > <...> > > This patch makes new_non_cma_page() uses memalloc_nocma_{save,restore} > to exclude CMA memory rather than manually clearing __GFP_MOVABLE. And, > this patch also makes the deque function on hugetlb CMA aware. In the > deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is set > by memalloc_nocma_{save,restore}. > > Acked-by: Mike Kravetz > Signed-off-by: Joonsoo Kim I did ACK the previous version of the patch, but I like this much better. I assume there will be a new version built on top of Michal's patch to change the placement of memalloc_nocma_restore calls in __gup_longterm_locked. -- Mike Kravetz
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Wed 08-07-20 16:27:16, Aneesh Kumar K.V wrote: > Vlastimil Babka writes: > > > On 7/8/20 9:41 AM, Michal Hocko wrote: > >> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: > >>> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > >>> > >>> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It > >>> would not cause any problem. > >> > >> I believe a proper fix is the following. The scope is really defined for > >> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages > >> will solve the problem as well but it imho makes more sense to do it in > >> the caller the same way we do for any others. > >> > >> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages > >> allocated from CMA region") > > > > Agreed. > > > >> > >> I am not sure this is worth backporting to stable yet. > > > > CC Aneesh. > > > > Context: since check_and_migrate_cma_pages() calls > > __get_user_pages_locked(), it > > should also be called under memalloc_nocma_save(). > > But by then we faulted in all relevant pages and migrated them out of > CMA rea right? check_and_migrate_cma_pages will allocate target pages that you want to migrate off the CMA region unless I am misreading the code. And those allocation need to be placed outside of the CMA. -- Michal Hocko SUSE Labs
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
Vlastimil Babka writes: > On 7/8/20 9:41 AM, Michal Hocko wrote: >> On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: >>> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: >>> >>> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It >>> would not cause any problem. >> >> I believe a proper fix is the following. The scope is really defined for >> FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages >> will solve the problem as well but it imho makes more sense to do it in >> the caller the same way we do for any others. >> >> Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages >> allocated from CMA region") > > Agreed. > >> >> I am not sure this is worth backporting to stable yet. > > CC Aneesh. > > Context: since check_and_migrate_cma_pages() calls __get_user_pages_locked(), > it > should also be called under memalloc_nocma_save(). But by then we faulted in all relevant pages and migrated them out of CMA rea right? > >> diff --git a/mm/gup.c b/mm/gup.c >> index de9e36262ccb..75980dd5a2fc 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct >> *tsk, >> vmas_tmp, NULL, gup_flags); >> >> if (gup_flags & FOLL_LONGTERM) { >> -memalloc_nocma_restore(flags); >> if (rc < 0) >> goto out; >> >> @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct >> *tsk, >> for (i = 0; i < rc; i++) >> put_page(pages[i]); >> rc = -EOPNOTSUPP; >> +memalloc_nocma_restore(flags); >> goto out; >> } >> >> rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages, >> vmas_tmp, gup_flags); >> +memalloc_nocma_restore(flags); >> } >> >> out: >> -aneesh
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On 7/8/20 9:41 AM, Michal Hocko wrote: > On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: >> On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: >> >> Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It >> would not cause any problem. > > I believe a proper fix is the following. The scope is really defined for > FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages > will solve the problem as well but it imho makes more sense to do it in > the caller the same way we do for any others. > > Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages > allocated from CMA region") Agreed. > > I am not sure this is worth backporting to stable yet. CC Aneesh. Context: since check_and_migrate_cma_pages() calls __get_user_pages_locked(), it should also be called under memalloc_nocma_save(). > diff --git a/mm/gup.c b/mm/gup.c > index de9e36262ccb..75980dd5a2fc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct > *tsk, >vmas_tmp, NULL, gup_flags); > > if (gup_flags & FOLL_LONGTERM) { > - memalloc_nocma_restore(flags); > if (rc < 0) > goto out; > > @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct > *tsk, > for (i = 0; i < rc; i++) > put_page(pages[i]); > rc = -EOPNOTSUPP; > + memalloc_nocma_restore(flags); > goto out; > } > > rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages, >vmas_tmp, gup_flags); > + memalloc_nocma_restore(flags); > } > > out: >
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Wed 08-07-20 16:16:02, Joonsoo Kim wrote: > On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > > On 7/7/20 9:44 AM, js1...@gmail.com wrote: > > > From: Joonsoo Kim > > > > > > new_non_cma_page() in gup.c which try to allocate migration target page > > > requires to allocate the new page that is not on the CMA area. > > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > > > works well for THP page or normal page but not for hugetlb page. > > > > > > hugetlb page allocation process consists of two steps. First is dequeing > > > from the pool. Second is, if there is no available page on the queue, > > > allocating from the page allocator. > > > > > > new_non_cma_page() can control allocation from the page allocator by > > > specifying correct gfp flag. However, dequeing cannot be controlled until > > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > > > patch tries to fix this situation. > > > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > > pages if newly added skip_cma argument is passed as true. > > > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_ > > flag and avoid adding bool skip_cma everywhere? > > Okay! Please check following patch. > > > > I think that's what Michal suggested [1] except he said "the code already > > does > > by memalloc_nocma_{save,restore} API". It needs extending a bit though, > > AFAICS. > > __gup_longterm_locked() indeed does the save/restore, but restore comes > > before > > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an > > adjustment is needed there, but that's all? > > > > Hm the adjustment should be also done because save/restore is done around > > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls > > __get_user_pages_locked(), and that call not being between nocma save and > > restore is thus also a correctness issue? > > Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It > would not cause any problem. I believe a proper fix is the following. The scope is really defined for FOLL_LONGTERM pins and pushing it inside check_and_migrate_cma_pages will solve the problem as well but it imho makes more sense to do it in the caller the same way we do for any others. Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") I am not sure this is worth backporting to stable yet. diff --git a/mm/gup.c b/mm/gup.c index de9e36262ccb..75980dd5a2fc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1794,7 +1794,6 @@ static long __gup_longterm_locked(struct task_struct *tsk, vmas_tmp, NULL, gup_flags); if (gup_flags & FOLL_LONGTERM) { - memalloc_nocma_restore(flags); if (rc < 0) goto out; @@ -1802,11 +1801,13 @@ static long __gup_longterm_locked(struct task_struct *tsk, for (i = 0; i < rc; i++) put_page(pages[i]); rc = -EOPNOTSUPP; + memalloc_nocma_restore(flags); goto out; } rc = check_and_migrate_cma_pages(tsk, mm, start, rc, pages, vmas_tmp, gup_flags); + memalloc_nocma_restore(flags); } out: -- Michal Hocko SUSE Labs
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Tue, Jul 07, 2020 at 01:22:31PM +0200, Vlastimil Babka wrote: > On 7/7/20 9:44 AM, js1...@gmail.com wrote: > > From: Joonsoo Kim > > > > new_non_cma_page() in gup.c which try to allocate migration target page > > requires to allocate the new page that is not on the CMA area. > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > > works well for THP page or normal page but not for hugetlb page. > > > > hugetlb page allocation process consists of two steps. First is dequeing > > from the pool. Second is, if there is no available page on the queue, > > allocating from the page allocator. > > > > new_non_cma_page() can control allocation from the page allocator by > > specifying correct gfp flag. However, dequeing cannot be controlled until > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > > patch tries to fix this situation. > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > pages if newly added skip_cma argument is passed as true. > > Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_ > flag and avoid adding bool skip_cma everywhere? Okay! Please check following patch. > > I think that's what Michal suggested [1] except he said "the code already does > by memalloc_nocma_{save,restore} API". It needs extending a bit though, > AFAICS. > __gup_longterm_locked() indeed does the save/restore, but restore comes before > check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an > adjustment is needed there, but that's all? > > Hm the adjustment should be also done because save/restore is done around > __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls > __get_user_pages_locked(), and that call not being between nocma save and > restore is thus also a correctness issue? Simply, I call memalloc_nocma_{save,restore} in new_non_cma_page(). It would not cause any problem. -->8--- >From bcfc57e3c6f2df1ad2940308b89d740cd3f0fba8 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Wed, 8 Jul 2020 14:39:26 +0900 Subject: [PATCH] mm/hugetlb: make hugetlb migration callback CMA aware new_non_cma_page() in gup.c which try to allocate migration target page requires to allocate the new page that is not on the CMA area. new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way works well for THP page or normal page but not for hugetlb page. hugetlb page allocation process consists of two steps. First is dequeing from the pool. Second is, if there is no available page on the queue, allocating from the page allocator. new_non_cma_page() can control allocation from the page allocator by specifying correct gfp flag. However, dequeing cannot be controlled until now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal since new_non_cma_page() cannot utilize hugetlb pages on the queue so this patch tries to fix this situation. This patch makes new_non_cma_page() uses memalloc_nocma_{save,restore} to exclude CMA memory rather than manually clearing __GFP_MOVABLE. And, this patch also makes the deque function on hugetlb CMA aware. In the deque function, CMA memory is skipped if PF_MEMALLOC_NOCMA flag is set by memalloc_nocma_{save,restore}. Acked-by: Mike Kravetz Signed-off-by: Joonsoo Kim --- include/linux/hugetlb.h | 2 -- mm/gup.c| 32 +++- mm/hugetlb.c| 11 +-- 3 files changed, 24 insertions(+), 21 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bb93e95..34a10e5 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -509,8 +509,6 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, -int nid, nodemask_t *nmask); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); diff --git a/mm/gup.c b/mm/gup.c index 5daadae..79142a9 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1623,6 +1623,8 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private) * allocation memory. */ gfp_t gfp_mask = GFP_USER | __GFP_NOWARN; + unsigned int flags = memalloc_nocma_save(); + struct page *new_page = NULL; if (PageHighMem(page)) gfp_mask |= __GFP_HIGHMEM; @@ -1630,33 +1632,29 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private) #ifdef CONFIG_HUGETLB_PAGE if (PageHuge(page)) { struct hstate *h =
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Tue, Jul 07, 2020 at 01:31:16PM +0200, Michal Hocko wrote: > On Tue 07-07-20 16:44:42, Joonsoo Kim wrote: > > From: Joonsoo Kim > > > > new_non_cma_page() in gup.c which try to allocate migration target page > > requires to allocate the new page that is not on the CMA area. > > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > > works well for THP page or normal page but not for hugetlb page. > > > > hugetlb page allocation process consists of two steps. First is dequeing > > from the pool. Second is, if there is no available page on the queue, > > allocating from the page allocator. > > > > new_non_cma_page() can control allocation from the page allocator by > > specifying correct gfp flag. However, dequeing cannot be controlled until > > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > > patch tries to fix this situation. > > > > This patch makes the deque function on hugetlb CMA aware and skip CMA > > pages if newly added skip_cma argument is passed as true. > > I really dislike this as already mentioned in the previous version of > the patch. You are making hugetlb and only one part of its allocator a > special snowflake which is almost always a bad idea. Your changelog > lacks any real justification for this inconsistency. > > Also by doing so you are keeping an existing bug that the hugetlb > allocator doesn't respect scope gfp flags as I have mentioned when > reviewing the previous version. That bug likely doesn't matter now but > it might in future and as soon as it is fixed all this is just a > pointless exercise. > > I do not have energy and time to burn to repeat that argumentation to I > will let Mike to have a final word. Btw. you are keeping his acks even > after considerable changes to patches which I am not really sure he is > ok with. As you replied a minute ago, Mike acked. > > Acked-by: Mike Kravetz > > Signed-off-by: Joonsoo Kim > > To this particular patch. > [...] > > > diff --git a/mm/gup.c b/mm/gup.c > > index 5daadae..2c3dab4 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page > > *page, unsigned long private) > > #ifdef CONFIG_HUGETLB_PAGE > > if (PageHuge(page)) { > > struct hstate *h = page_hstate(page); > > + > > /* > > * We don't want to dequeue from the pool because pool pages > > will > > * mostly be from the CMA region. > > */ > > - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); > > + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true); > > Let me repeat that this whole thing is running under > memalloc_nocma_save. So additional parameter is bogus. As Vlasimil said in other reply, we are not under memalloc_nocma_save(). Anyway, now, I also think that additional parameter isn't need. > [...] > > -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) > > +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int > > nid, bool skip_cma) > > If you really insist on an additional parameter at this layer than it > should be checking for the PF_MEMALLOC_NOCMA instead. I will change the patch to check PF_MEMALLOC_NOCMA instead of introducing and checking skip_cma. > [...] > > @@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct > > hstate *h, > > > > /* page migration callback function */ > > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > > - nodemask_t *nmask, gfp_t gfp_mask) > > + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma) > > { > > + unsigned int flags = 0; > > + struct page *page = NULL; > > + > > + if (skip_cma) > > + flags = memalloc_nocma_save(); > > This is pointless for a scope that is already defined up in the call > chain and fundamentally this is breaking the expected use of the scope > API. The primary reason for that API to exist is to define the scope and > have it sticky for _all_ allocation contexts. So if you have to use it > deep in the allocator then you are doing something wrong. As mentioned above, we are not under memalloc_nocma_save(). Anyway, I will rework the patch and attach it to Vlasimil's reply. It's appreciate if you check it again. Thanks.
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Tue 07-07-20 13:31:19, Michal Hocko wrote: > Btw. you are keeping his acks even > after considerable changes to patches which I am not really sure he is > ok with. I am sorry but I have missed the last email from Mike in v3. -- Michal Hocko SUSE Labs
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On Tue 07-07-20 16:44:42, Joonsoo Kim wrote: > From: Joonsoo Kim > > new_non_cma_page() in gup.c which try to allocate migration target page > requires to allocate the new page that is not on the CMA area. > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > works well for THP page or normal page but not for hugetlb page. > > hugetlb page allocation process consists of two steps. First is dequeing > from the pool. Second is, if there is no available page on the queue, > allocating from the page allocator. > > new_non_cma_page() can control allocation from the page allocator by > specifying correct gfp flag. However, dequeing cannot be controlled until > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > patch tries to fix this situation. > > This patch makes the deque function on hugetlb CMA aware and skip CMA > pages if newly added skip_cma argument is passed as true. I really dislike this as already mentioned in the previous version of the patch. You are making hugetlb and only one part of its allocator a special snowflake which is almost always a bad idea. Your changelog lacks any real justification for this inconsistency. Also by doing so you are keeping an existing bug that the hugetlb allocator doesn't respect scope gfp flags as I have mentioned when reviewing the previous version. That bug likely doesn't matter now but it might in future and as soon as it is fixed all this is just a pointless exercise. I do not have energy and time to burn to repeat that argumentation to I will let Mike to have a final word. Btw. you are keeping his acks even after considerable changes to patches which I am not really sure he is ok with. > Acked-by: Mike Kravetz > Signed-off-by: Joonsoo Kim To this particular patch. [...] > diff --git a/mm/gup.c b/mm/gup.c > index 5daadae..2c3dab4 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page > *page, unsigned long private) > #ifdef CONFIG_HUGETLB_PAGE > if (PageHuge(page)) { > struct hstate *h = page_hstate(page); > + > /* >* We don't want to dequeue from the pool because pool pages > will >* mostly be from the CMA region. >*/ > - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); > + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true); Let me repeat that this whole thing is running under memalloc_nocma_save. So additional parameter is bogus. [...] > -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) > +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, > bool skip_cma) If you really insist on an additional parameter at this layer than it should be checking for the PF_MEMALLOC_NOCMA instead. [...] > @@ -1971,21 +1977,29 @@ struct page *alloc_buddy_huge_page_with_mpol(struct > hstate *h, > > /* page migration callback function */ > struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, > - nodemask_t *nmask, gfp_t gfp_mask) > + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma) > { > + unsigned int flags = 0; > + struct page *page = NULL; > + > + if (skip_cma) > + flags = memalloc_nocma_save(); This is pointless for a scope that is already defined up in the call chain and fundamentally this is breaking the expected use of the scope API. The primary reason for that API to exist is to define the scope and have it sticky for _all_ allocation contexts. So if you have to use it deep in the allocator then you are doing something wrong. -- Michal Hocko SUSE Labs
Re: [PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
On 7/7/20 9:44 AM, js1...@gmail.com wrote: > From: Joonsoo Kim > > new_non_cma_page() in gup.c which try to allocate migration target page > requires to allocate the new page that is not on the CMA area. > new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way > works well for THP page or normal page but not for hugetlb page. > > hugetlb page allocation process consists of two steps. First is dequeing > from the pool. Second is, if there is no available page on the queue, > allocating from the page allocator. > > new_non_cma_page() can control allocation from the page allocator by > specifying correct gfp flag. However, dequeing cannot be controlled until > now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal > since new_non_cma_page() cannot utilize hugetlb pages on the queue so this > patch tries to fix this situation. > > This patch makes the deque function on hugetlb CMA aware and skip CMA > pages if newly added skip_cma argument is passed as true. Hmm, can't you instead change dequeue_huge_page_node_exact() to test the PF_ flag and avoid adding bool skip_cma everywhere? I think that's what Michal suggested [1] except he said "the code already does by memalloc_nocma_{save,restore} API". It needs extending a bit though, AFAICS. __gup_longterm_locked() indeed does the save/restore, but restore comes before check_and_migrate_cma_pages() and thus new_non_cma_page() is called, so an adjustment is needed there, but that's all? Hm the adjustment should be also done because save/restore is done around __get_user_pages_locked(), but check_and_migrate_cma_pages() also calls __get_user_pages_locked(), and that call not being between nocma save and restore is thus also a correctness issue? [1] https://lore.kernel.org/r/20200629075510.ga32...@dhcp22.suse.cz > Acked-by: Mike Kravetz > Signed-off-by: Joonsoo Kim > --- > include/linux/hugetlb.h | 6 ++ > mm/gup.c| 3 ++- > mm/hugetlb.c| 46 ++ > mm/mempolicy.c | 2 +- > mm/migrate.c| 2 +- > 5 files changed, 36 insertions(+), 23 deletions(-) >
[PATCH v4 04/11] mm/hugetlb: make hugetlb migration callback CMA aware
From: Joonsoo Kim new_non_cma_page() in gup.c which try to allocate migration target page requires to allocate the new page that is not on the CMA area. new_non_cma_page() implements it by removing __GFP_MOVABLE flag. This way works well for THP page or normal page but not for hugetlb page. hugetlb page allocation process consists of two steps. First is dequeing from the pool. Second is, if there is no available page on the queue, allocating from the page allocator. new_non_cma_page() can control allocation from the page allocator by specifying correct gfp flag. However, dequeing cannot be controlled until now, so, new_non_cma_page() skips dequeing completely. It is a suboptimal since new_non_cma_page() cannot utilize hugetlb pages on the queue so this patch tries to fix this situation. This patch makes the deque function on hugetlb CMA aware and skip CMA pages if newly added skip_cma argument is passed as true. Acked-by: Mike Kravetz Signed-off-by: Joonsoo Kim --- include/linux/hugetlb.h | 6 ++ mm/gup.c| 3 ++- mm/hugetlb.c| 46 ++ mm/mempolicy.c | 2 +- mm/migrate.c| 2 +- 5 files changed, 36 insertions(+), 23 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bb93e95..5a9ddf1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -506,11 +506,9 @@ struct huge_bootmem_page { struct page *alloc_huge_page(struct vm_area_struct *vma, unsigned long addr, int avoid_reserve); struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, - nodemask_t *nmask, gfp_t gfp_mask); + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -struct page *alloc_migrate_huge_page(struct hstate *h, gfp_t gfp_mask, -int nid, nodemask_t *nmask); int huge_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); @@ -770,7 +768,7 @@ static inline struct page *alloc_huge_page(struct vm_area_struct *vma, static inline struct page * alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, - nodemask_t *nmask, gfp_t gfp_mask) + nodemask_t *nmask, gfp_t gfp_mask, bool skip_cma) { return NULL; } diff --git a/mm/gup.c b/mm/gup.c index 5daadae..2c3dab4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1630,11 +1630,12 @@ static struct page *new_non_cma_page(struct page *page, unsigned long private) #ifdef CONFIG_HUGETLB_PAGE if (PageHuge(page)) { struct hstate *h = page_hstate(page); + /* * We don't want to dequeue from the pool because pool pages will * mostly be from the CMA region. */ - return alloc_migrate_huge_page(h, gfp_mask, nid, NULL); + return alloc_huge_page_nodemask(h, nid, NULL, gfp_mask, true); } #endif if (PageTransHuge(page)) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3245aa0..bcf4abe 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -1033,13 +1034,18 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) h->free_huge_pages_node[nid]++; } -static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) +static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid, bool skip_cma) { struct page *page; - list_for_each_entry(page, >hugepage_freelists[nid], lru) + list_for_each_entry(page, >hugepage_freelists[nid], lru) { + if (skip_cma && is_migrate_cma_page(page)) + continue; + if (!PageHWPoison(page)) break; + } + /* * if 'non-isolated free hugepage' not found on the list, * the allocation fails. @@ -1054,7 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) } static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask, int nid, - nodemask_t *nmask) + nodemask_t *nmask, bool skip_cma) { unsigned int cpuset_mems_cookie; struct zonelist *zonelist; @@ -1079,7 +1085,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask, continue; node = zone_to_nid(zone); - page = dequeue_huge_page_node_exact(h, node); + page = dequeue_huge_page_node_exact(h, node, skip_cma); if (page) return page; } @@ -1115,7 +1121,7 @@ static struct page