On Sat, Apr 17, 2021 at 7:56 AM Mike Kravetz <mike.krav...@oracle.com> wrote: > > On 4/15/21 1:40 AM, Muchun Song wrote: > > In the subsequent patch, we should allocate the vmemmap pages when > > freeing a HugeTLB page. But update_and_free_page() can be called > > under any context, so we cannot use GFP_KERNEL to allocate vmemmap > > pages. However, we can defer the actual freeing in a kworker to > > prevent from using GFP_ATOMIC to allocate the vmemmap pages. > > Thanks! I knew we would need to introduce a kworker for this when I > removed the kworker previously used in free_huge_page.
Yeah, but another choice is using GFP_ATOMIC to allocate vmemmap pages when we are in an atomic context. If not atomic context, just use GFP_KERNEL. In this case, we can drop kworker. > > > The __update_and_free_page() is where the call to allocate vmemmmap > > pages will be inserted. > > This patch adds the functionality required for __update_and_free_page > to potentially sleep and fail. More questions will come up in the > subsequent patch when code must deal with the failures. Right. More questions are welcome. > > > > > Signed-off-by: Muchun Song <songmuc...@bytedance.com> > > --- > > mm/hugetlb.c | 73 > > ++++++++++++++++++++++++++++++++++++++++++++++++---- > > mm/hugetlb_vmemmap.c | 12 --------- > > mm/hugetlb_vmemmap.h | 17 ++++++++++++ > > 3 files changed, 85 insertions(+), 17 deletions(-) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 923d05e2806b..eeb8f5480170 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1376,7 +1376,7 @@ static void remove_hugetlb_page(struct hstate *h, > > struct page *page, > > h->nr_huge_pages_node[nid]--; > > } > > > > -static void update_and_free_page(struct hstate *h, struct page *page) > > +static void __update_and_free_page(struct hstate *h, struct page *page) > > { > > int i; > > struct page *subpage = page; > > @@ -1399,12 +1399,73 @@ static void update_and_free_page(struct hstate *h, > > struct page *page) > > } > > } > > > > +/* > > + * As update_and_free_page() can be called under any context, so we cannot > > + * use GFP_KERNEL to allocate vmemmap pages. However, we can defer the > > + * actual freeing in a workqueue to prevent from using GFP_ATOMIC to > > allocate > > + * the vmemmap pages. > > + * > > + * free_hpage_workfn() locklessly retrieves the linked list of pages to be > > + * freed and frees them one-by-one. As the page->mapping pointer is going > > + * to be cleared in free_hpage_workfn() anyway, it is reused as the > > llist_node > > + * structure of a lockless linked list of huge pages to be freed. > > + */ > > +static LLIST_HEAD(hpage_freelist); > > + > > +static void free_hpage_workfn(struct work_struct *work) > > +{ > > + struct llist_node *node; > > + > > + node = llist_del_all(&hpage_freelist); > > + > > + while (node) { > > + struct page *page; > > + struct hstate *h; > > + > > + page = container_of((struct address_space **)node, > > + struct page, mapping); > > + node = node->next; > > + page->mapping = NULL; > > + h = page_hstate(page); > > The VM_BUG_ON_PAGE(!PageHuge(page), page) in page_hstate is going to > trigger because a previous call to remove_hugetlb_page() will > set_compound_page_dtor(page, NULL_COMPOUND_DTOR) Sorry, I did not realise that. Thanks for your reminder. > > Note how h(hstate) is grabbed before calling update_and_free_page in > existing code. > > We could potentially drop the !PageHuge(page) in page_hstate. Or, > perhaps just use 'size_to_hstate(page_size(page))' in free_hpage_workfn. I prefer not to change the behavior of page_hstate(). So I should use 'size_to_hstate(page_size(page))' directly. Thanks Mike. > -- > Mike Kravetz