Re: [External] [PATCH v3 7/8] hugetlb: make free_huge_page irq safe
On 4/2/21 10:59 PM, Muchun Song wrote: > On Sat, Apr 3, 2021 at 4:56 AM Mike Kravetz wrote: >> >> On 4/2/21 5:47 AM, Muchun Song wrote: >>> On Wed, Mar 31, 2021 at 11:42 AM Mike Kravetz >>> wrote: Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context") was added to address the issue of free_huge_page being called from irq context. That commit hands off free_huge_page processing to a workqueue if !in_task. However, this doesn't cover all the cases as pointed out by 0day bot lockdep report [1]. : Possible interrupt unsafe locking scenario: : :CPU0CPU1 : : lock(hugetlb_lock); :local_irq_disable(); :lock(slock-AF_INET); :lock(hugetlb_lock); : : lock(slock-AF_INET); Shakeel has later explained that this is very likely TCP TX zerocopy from hugetlb pages scenario when the networking code drops a last reference to hugetlb page while having IRQ disabled. Hugetlb freeing path doesn't disable IRQ while holding hugetlb_lock so a lock dependency chain can lead to a deadlock. This commit addresses the issue by doing the following: - Make hugetlb_lock irq safe. This is mostly a simple process of changing spin_*lock calls to spin_*lock_irq* calls. - Make subpool lock irq safe in a similar manner. - Revert the !in_task check and workqueue handoff. [1] https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/ Signed-off-by: Mike Kravetz Acked-by: Michal Hocko Reviewed-by: Muchun Song >>> >>> Hi Mike, >>> >>> Today I pulled the newest code (next-20210401). I found that >>> alloc_and_dissolve_huge_page is not updated. In this function, >>> hugetlb_lock is still non-irq safe. Maybe you or Oscar need >>> to fix. >>> >>> Thanks. >> >> Thank you Muchun, >> >> Oscar's changes were not in Andrew's tree when I started on this series >> and I failed to notice their inclusion. In addition, >> isolate_or_dissolve_huge_page also needs updating as well as a change in >> set_max_huge_pages that was omitted while rebasing. >> >> Andrew, the following patch addresses those missing changes. Ideally, >> the changes should be combined/included in this patch. If you want me >> to sent another version of this patch or another series, let me know. >> >> From 450593eb3cea895f499ddc343c22424c552ea502 Mon Sep 17 00:00:00 2001 >> From: Mike Kravetz >> Date: Fri, 2 Apr 2021 13:18:13 -0700 >> Subject: [PATCH] hugetlb: fix irq locking omissions >> >> The pach "hugetlb: make free_huge_page irq safe" changed spin_*lock >> calls to spin_*lock_irq* calls. However, it missed several places >> in the file hugetlb.c. Add the overlooked changes. >> >> Signed-off-by: Mike Kravetz > > Thanks MIke. But there are still some places that need > improvement. See below. > Correct. My apologies again for not fully taking into account the new code from Oscar's series when working on this. >> --- >> mm/hugetlb.c | 16 >> 1 file changed, 8 insertions(+), 8 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index c22111f3da20..a6bfc6bcbc81 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2284,7 +2284,7 @@ static int alloc_and_dissolve_huge_page(struct hstate >> *h, struct page *old_page, >> */ >> page_ref_dec(new_page); >> retry: >> - spin_lock(&hugetlb_lock); >> + spin_lock_irq(&hugetlb_lock); >> if (!PageHuge(old_page)) { >> /* >> * Freed from under us. Drop new_page too. >> @@ -2297,7 +2297,7 @@ static int alloc_and_dissolve_huge_page(struct hstate >> *h, struct page *old_page, >> * Fail with -EBUSY if not possible. >> */ >> update_and_free_page(h, new_page); > > Now update_and_free_page can be called without holding > hugetlb_lock. We can move it out of hugetlb_lock. In this > function, there are 3 places which call update_and_free_page(). > We can move all of them out of hugetlb_lock. Right? We will need to do more than that. The call to update_and_free_page in alloc_and_dissolve_huge_page assumes old functionality. i.e. It assumes h->nr_huge_pages will be decremented in update_and_free_page. This is no longer the case. This will need to be fixed in patch 4 of my series which changes the functionality of update_and_free_page. I'm afraid a change there will end up requiring changes in subsequent patches due to context. I will have an update on Monday. -- Mike Kravetz
Re: [External] [PATCH v3 7/8] hugetlb: make free_huge_page irq safe
On Sat, Apr 3, 2021 at 4:56 AM Mike Kravetz wrote: > > On 4/2/21 5:47 AM, Muchun Song wrote: > > On Wed, Mar 31, 2021 at 11:42 AM Mike Kravetz > > wrote: > >> > >> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in > >> non-task context") was added to address the issue of free_huge_page > >> being called from irq context. That commit hands off free_huge_page > >> processing to a workqueue if !in_task. However, this doesn't cover > >> all the cases as pointed out by 0day bot lockdep report [1]. > >> > >> : Possible interrupt unsafe locking scenario: > >> : > >> :CPU0CPU1 > >> : > >> : lock(hugetlb_lock); > >> :local_irq_disable(); > >> :lock(slock-AF_INET); > >> :lock(hugetlb_lock); > >> : > >> : lock(slock-AF_INET); > >> > >> Shakeel has later explained that this is very likely TCP TX zerocopy > >> from hugetlb pages scenario when the networking code drops a last > >> reference to hugetlb page while having IRQ disabled. Hugetlb freeing > >> path doesn't disable IRQ while holding hugetlb_lock so a lock dependency > >> chain can lead to a deadlock. > >> > >> This commit addresses the issue by doing the following: > >> - Make hugetlb_lock irq safe. This is mostly a simple process of > >> changing spin_*lock calls to spin_*lock_irq* calls. > >> - Make subpool lock irq safe in a similar manner. > >> - Revert the !in_task check and workqueue handoff. > >> > >> [1] > >> https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/ > >> > >> Signed-off-by: Mike Kravetz > >> Acked-by: Michal Hocko > >> Reviewed-by: Muchun Song > > > > Hi Mike, > > > > Today I pulled the newest code (next-20210401). I found that > > alloc_and_dissolve_huge_page is not updated. In this function, > > hugetlb_lock is still non-irq safe. Maybe you or Oscar need > > to fix. > > > > Thanks. > > Thank you Muchun, > > Oscar's changes were not in Andrew's tree when I started on this series > and I failed to notice their inclusion. In addition, > isolate_or_dissolve_huge_page also needs updating as well as a change in > set_max_huge_pages that was omitted while rebasing. > > Andrew, the following patch addresses those missing changes. Ideally, > the changes should be combined/included in this patch. If you want me > to sent another version of this patch or another series, let me know. > > From 450593eb3cea895f499ddc343c22424c552ea502 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz > Date: Fri, 2 Apr 2021 13:18:13 -0700 > Subject: [PATCH] hugetlb: fix irq locking omissions > > The pach "hugetlb: make free_huge_page irq safe" changed spin_*lock > calls to spin_*lock_irq* calls. However, it missed several places > in the file hugetlb.c. Add the overlooked changes. > > Signed-off-by: Mike Kravetz Thanks MIke. But there are still some places that need improvement. See below. > --- > mm/hugetlb.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c22111f3da20..a6bfc6bcbc81 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2284,7 +2284,7 @@ static int alloc_and_dissolve_huge_page(struct hstate > *h, struct page *old_page, > */ > page_ref_dec(new_page); > retry: > - spin_lock(&hugetlb_lock); > + spin_lock_irq(&hugetlb_lock); > if (!PageHuge(old_page)) { > /* > * Freed from under us. Drop new_page too. > @@ -2297,7 +2297,7 @@ static int alloc_and_dissolve_huge_page(struct hstate > *h, struct page *old_page, > * Fail with -EBUSY if not possible. > */ > update_and_free_page(h, new_page); Now update_and_free_page can be called without holding hugetlb_lock. We can move it out of hugetlb_lock. In this function, there are 3 places which call update_and_free_page(). We can move all of them out of hugetlb_lock. Right? This optimization can squash to the commit: "hugetlb: call update_and_free_page without hugetlb_lock" Thanks. > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > if (!isolate_huge_page(old_page, list)) > ret = -EBUSY; > return ret; > @@ -2307,7 +2307,7 @@ static int alloc_and_dissolve_huge_page(struct hstate > *h, struct page *old_page, > * freelist yet. Race window is small, so we can succed here > if > * we retry. > */ > - spin_unlock(&hugetlb_lock); > + spin_unlock_irq(&hugetlb_lock); > cond_resched(); > goto retry; > } else { > @@ -2323,7 +2323,7 @@ static int alloc_and_dissolve_huge_page(struct hstate > *h, struct page *old_page, > __enqueue_huge_page(&h->hugepage_freelists[nid], ne
Re: [External] [PATCH v3 7/8] hugetlb: make free_huge_page irq safe
On 4/2/21 5:47 AM, Muchun Song wrote: > On Wed, Mar 31, 2021 at 11:42 AM Mike Kravetz wrote: >> >> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in >> non-task context") was added to address the issue of free_huge_page >> being called from irq context. That commit hands off free_huge_page >> processing to a workqueue if !in_task. However, this doesn't cover >> all the cases as pointed out by 0day bot lockdep report [1]. >> >> : Possible interrupt unsafe locking scenario: >> : >> :CPU0CPU1 >> : >> : lock(hugetlb_lock); >> :local_irq_disable(); >> :lock(slock-AF_INET); >> :lock(hugetlb_lock); >> : >> : lock(slock-AF_INET); >> >> Shakeel has later explained that this is very likely TCP TX zerocopy >> from hugetlb pages scenario when the networking code drops a last >> reference to hugetlb page while having IRQ disabled. Hugetlb freeing >> path doesn't disable IRQ while holding hugetlb_lock so a lock dependency >> chain can lead to a deadlock. >> >> This commit addresses the issue by doing the following: >> - Make hugetlb_lock irq safe. This is mostly a simple process of >> changing spin_*lock calls to spin_*lock_irq* calls. >> - Make subpool lock irq safe in a similar manner. >> - Revert the !in_task check and workqueue handoff. >> >> [1] https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/ >> >> Signed-off-by: Mike Kravetz >> Acked-by: Michal Hocko >> Reviewed-by: Muchun Song > > Hi Mike, > > Today I pulled the newest code (next-20210401). I found that > alloc_and_dissolve_huge_page is not updated. In this function, > hugetlb_lock is still non-irq safe. Maybe you or Oscar need > to fix. > > Thanks. Thank you Muchun, Oscar's changes were not in Andrew's tree when I started on this series and I failed to notice their inclusion. In addition, isolate_or_dissolve_huge_page also needs updating as well as a change in set_max_huge_pages that was omitted while rebasing. Andrew, the following patch addresses those missing changes. Ideally, the changes should be combined/included in this patch. If you want me to sent another version of this patch or another series, let me know. >From 450593eb3cea895f499ddc343c22424c552ea502 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Fri, 2 Apr 2021 13:18:13 -0700 Subject: [PATCH] hugetlb: fix irq locking omissions The pach "hugetlb: make free_huge_page irq safe" changed spin_*lock calls to spin_*lock_irq* calls. However, it missed several places in the file hugetlb.c. Add the overlooked changes. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c22111f3da20..a6bfc6bcbc81 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2284,7 +2284,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, */ page_ref_dec(new_page); retry: - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (!PageHuge(old_page)) { /* * Freed from under us. Drop new_page too. @@ -2297,7 +2297,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, * Fail with -EBUSY if not possible. */ update_and_free_page(h, new_page); - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); if (!isolate_huge_page(old_page, list)) ret = -EBUSY; return ret; @@ -2307,7 +2307,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, * freelist yet. Race window is small, so we can succed here if * we retry. */ - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); cond_resched(); goto retry; } else { @@ -2323,7 +2323,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, __enqueue_huge_page(&h->hugepage_freelists[nid], new_page); } unlock: - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return ret; } @@ -2339,15 +2339,15 @@ int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list) * to carefully check the state under the lock. * Return success when racing as if we dissolved the page ourselves. */ - spin_lock(&hugetlb_lock); + spin_lock_irq(&hugetlb_lock); if (PageHuge(page)) { head = compound_head(page); h = page_hstate(head); } else { - spin_unlock(&hugetlb_lock); + spin_unlock_irq(&hugetlb_lock); return 0;
Re: [External] [PATCH v3 7/8] hugetlb: make free_huge_page irq safe
On Wed, Mar 31, 2021 at 11:42 AM Mike Kravetz wrote: > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in > non-task context") was added to address the issue of free_huge_page > being called from irq context. That commit hands off free_huge_page > processing to a workqueue if !in_task. However, this doesn't cover > all the cases as pointed out by 0day bot lockdep report [1]. > > : Possible interrupt unsafe locking scenario: > : > :CPU0CPU1 > : > : lock(hugetlb_lock); > :local_irq_disable(); > :lock(slock-AF_INET); > :lock(hugetlb_lock); > : > : lock(slock-AF_INET); > > Shakeel has later explained that this is very likely TCP TX zerocopy > from hugetlb pages scenario when the networking code drops a last > reference to hugetlb page while having IRQ disabled. Hugetlb freeing > path doesn't disable IRQ while holding hugetlb_lock so a lock dependency > chain can lead to a deadlock. > > This commit addresses the issue by doing the following: > - Make hugetlb_lock irq safe. This is mostly a simple process of > changing spin_*lock calls to spin_*lock_irq* calls. > - Make subpool lock irq safe in a similar manner. > - Revert the !in_task check and workqueue handoff. > > [1] https://lore.kernel.org/linux-mm/f1c03b05bc43a...@google.com/ > > Signed-off-by: Mike Kravetz > Acked-by: Michal Hocko > Reviewed-by: Muchun Song Hi Mike, Today I pulled the newest code (next-20210401). I found that alloc_and_dissolve_huge_page is not updated. In this function, hugetlb_lock is still non-irq safe. Maybe you or Oscar need to fix. Thanks. > --- > mm/hugetlb.c| 167 > mm/hugetlb_cgroup.c | 8 +-- > 2 files changed, 66 insertions(+), 109 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 5b2ca4663d7f..0bd4dc04df0f 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -93,9 +93,10 @@ static inline bool subpool_is_free(struct hugepage_subpool > *spool) > return true; > } > > -static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) > +static inline void unlock_or_release_subpool(struct hugepage_subpool *spool, > + unsigned long irq_flags) > { > - spin_unlock(&spool->lock); > + spin_unlock_irqrestore(&spool->lock, irq_flags); > > /* If no pages are used, and no other handles to the subpool > * remain, give up any reservations based on minimum size and > @@ -134,10 +135,12 @@ struct hugepage_subpool *hugepage_new_subpool(struct > hstate *h, long max_hpages, > > void hugepage_put_subpool(struct hugepage_subpool *spool) > { > - spin_lock(&spool->lock); > + unsigned long flags; > + > + spin_lock_irqsave(&spool->lock, flags); > BUG_ON(!spool->count); > spool->count--; > - unlock_or_release_subpool(spool); > + unlock_or_release_subpool(spool, flags); > } > > /* > @@ -156,7 +159,7 @@ static long hugepage_subpool_get_pages(struct > hugepage_subpool *spool, > if (!spool) > return ret; > > - spin_lock(&spool->lock); > + spin_lock_irq(&spool->lock); > > if (spool->max_hpages != -1) { /* maximum size accounting */ > if ((spool->used_hpages + delta) <= spool->max_hpages) > @@ -183,7 +186,7 @@ static long hugepage_subpool_get_pages(struct > hugepage_subpool *spool, > } > > unlock_ret: > - spin_unlock(&spool->lock); > + spin_unlock_irq(&spool->lock); > return ret; > } > > @@ -197,11 +200,12 @@ static long hugepage_subpool_put_pages(struct > hugepage_subpool *spool, >long delta) > { > long ret = delta; > + unsigned long flags; > > if (!spool) > return delta; > > - spin_lock(&spool->lock); > + spin_lock_irqsave(&spool->lock, flags); > > if (spool->max_hpages != -1)/* maximum size accounting */ > spool->used_hpages -= delta; > @@ -222,7 +226,7 @@ static long hugepage_subpool_put_pages(struct > hugepage_subpool *spool, > * If hugetlbfs_put_super couldn't free spool due to an outstanding > * quota reference, free it now. > */ > - unlock_or_release_subpool(spool); > + unlock_or_release_subpool(spool, flags); > > return ret; > } > @@ -1412,7 +1416,7 @@ struct hstate *size_to_hstate(unsigned long size) > return NULL; > } > > -static void __free_huge_page(struct page *page) > +void free_huge_page(struct page *page) > { > /* > * Can't pass hstate in here because it is called from the > @@ -1422,6 +1426,7 @@ static void __free_huge_page(struct page *page) > int nid = page_to_nid(page); > struct hugepage_subpool *