Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-07 Thread John Hubbard

On 8/7/20 9:48 AM, Kirill A. Shutemov wrote:

[...]

+static inline int head_mapcount(struct page *head)
+{


Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?


Well, no.  That was the point of the bug report -- by the time we called
compound_mapcount, the page was no longer a head page.


Right. VM_BUG_ON_PAGE(PageTail(head), head)?


Sorry for overlooking that feedback point. Looking at it now, I'd much
rather not put any assertions at all here. This supposed to be for
implementing the failure case, and we've moved past assertions at this
point. In other words, dump_page() is part of *implementing* an
assertion, so calling assertions from within it is undesirable.

It's better to put the assertions in code that would call these inner
routines. Then, if we want to use these routines outside of mm/debug.c,
as per the other thread, then we should provide a wrapper that has such
assertions.


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-07 Thread Kirill A. Shutemov
On Fri, Aug 07, 2020 at 04:10:29PM +0100, Matthew Wilcox wrote:
> On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> > > If a compound page is being split while dump_page() is being run on that
> > > page, we can end up calling compound_mapcount() on a page that is no
> > > longer compound. This leads to a crash (already seen at least once in
> > > the field), due to the VM_BUG_ON_PAGE() assertion inside
> > > compound_mapcount().
> 
> [...]
> > > +static inline int head_mapcount(struct page *head)
> > > +{
> > 
> > Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?
> 
> Well, no.  That was the point of the bug report -- by the time we called
> compound_mapcount, the page was no longer a head page.

Right. VM_BUG_ON_PAGE(PageTail(head), head)?

> > > A similar problem is possible, via compound_pincount() instead of
> > > compound_mapcount().
> > > 
> > > In order to avoid this kind of crash, make dump_page() slightly more
> > > robust, by providing a pair of simpler routines that don't contain
> > > assertions: head_mapcount() and head_pincount().
> > 
> > I find naming misleading. head_mapcount() and head_pincount() sounds like
> > a mapcount/pincount of the head page, but it's not. It's mapcount and
> > pincount of the compound page.
> 
> OK, point taken.  I might go for head_compound_mapcount()?  Or as I
> originally suggested, just opencoding it like we do in __page_mapcount().

I'm fine either way.

-- 
 Kirill A. Shutemov


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-07 Thread Matthew Wilcox
On Fri, Aug 07, 2020 at 05:35:04PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> > If a compound page is being split while dump_page() is being run on that
> > page, we can end up calling compound_mapcount() on a page that is no
> > longer compound. This leads to a crash (already seen at least once in
> > the field), due to the VM_BUG_ON_PAGE() assertion inside
> > compound_mapcount().

[...]
> > +static inline int head_mapcount(struct page *head)
> > +{
> 
> Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?

Well, no.  That was the point of the bug report -- by the time we called
compound_mapcount, the page was no longer a head page.

> > A similar problem is possible, via compound_pincount() instead of
> > compound_mapcount().
> > 
> > In order to avoid this kind of crash, make dump_page() slightly more
> > robust, by providing a pair of simpler routines that don't contain
> > assertions: head_mapcount() and head_pincount().
> 
> I find naming misleading. head_mapcount() and head_pincount() sounds like
> a mapcount/pincount of the head page, but it's not. It's mapcount and
> pincount of the compound page.

OK, point taken.  I might go for head_compound_mapcount()?  Or as I
originally suggested, just opencoding it like we do in __page_mapcount().



Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-07 Thread Kirill A. Shutemov
On Thu, Aug 06, 2020 at 06:15:00PM +0100, Matthew Wilcox wrote:
> On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote:
> > On 8/6/20 5:39 PM, Matthew Wilcox wrote:
> > >> >> +++ b/mm/huge_memory.c
> > >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
> > >> >> vm_area_struct *vma, pmd_t *pmd,
> > >> >>* Set PG_double_map before dropping compound_mapcount to avoid
> > >> >>* false-negative page_mapped().
> > >> >>*/
> > >> >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) 
> > >> >> {
> > >> >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> > >> > 
> > >> > I'm a little nervous about this one.  The page does actually come from
> > >> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> > >> > not convinced that's going to still be true in twenty years.  With the
> > >> > current THP patchset, I won't allocate pages larger than PMD order, but
> > >> > I can see there being interest in tracking pages in chunks larger than
> > >> > 2MB in the future.  And then pmd_page() might well return a tail page.
> > >> > So it might be a good idea to not convert this one.
> > >> 
> > >> Hmm the function converts the compound mapcount of the whole page to a
> > >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger 
> > >> than a pmd,
> > >> then I guess this wouldn't work properly anymore without changes anyway?
> > >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there 
> > >> as
> > >> "enforced documentation" for now?
> > > 
> > > I think it would work as-is.  But also I may have totally misunderstood 
> > > it.
> > > I'll write this declaratively and specifically for x86 (PMD order is 9)
> > > ... tell me when I've made a mistake ;-)
> > > 
> > > This function is for splitting the PMD.  We're leaving the underlying
> > > page intact and just changing the page table.  So if, say, we have an
> > > underlying 4MB page (and maybe the pages are mapped as PMDs in this
> > > process), we might get subpage number 512 of this order-10 page.  We'd
> > > need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> > > also stored in page 1, but we'd only want to spread the mapcount out
> > > over the 512 subpages from 512-1023; we wouldn't want to spread it out
> > > over 0-511 because they aren't affected by this particular PMD.
> > 
> > Yeah, and then we decrease the compound mapcount, which is a counter of "how
> > many times is this compound page mapped as a whole". But we only removed 
> > (the
> > second) half of the compound mapping, so imho that would be wrong?
> 
> I'd expect that count to be incremented by 1 for each PMD that it's
> mapped to?  ie change the definition of that counter slightly.
> 
> > > Having to reason about stuff like this is why I limited the THP code to
> > > stop at PMD order ... I don't want to make my life even more complicated
> > > than I have to!
> > 
> > Kirill might correct me but I'd expect the THP code right now has baked in 
> > many
> > assumptions about THP pages being exactly HPAGE_PMD_ORDER large?

That will be true for PMD-mapped THP pages after applying Matthew's
patchset.

> There are somewhat fewer places that make that assumption after applying
> the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git

The patchset allows for THP to be anywhere between order-2 and
order-9 (on x86-64).

> I have mostly not touched the anonymous THPs (obviously some of the code
> paths are shared), although both Kirill & I think there's a win to be
> had there too.

Yeah. Reducing LRU handling overhead alone should be enough to justify the
effort. But we still would need to have numbers.

-- 
 Kirill A. Shutemov


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-07 Thread Kirill A. Shutemov
On Tue, Aug 04, 2020 at 02:48:07PM -0700, John Hubbard wrote:
> If a compound page is being split while dump_page() is being run on that
> page, we can end up calling compound_mapcount() on a page that is no
> longer compound. This leads to a crash (already seen at least once in
> the field), due to the VM_BUG_ON_PAGE() assertion inside
> compound_mapcount().
> 
> (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)
> 
> A similar problem is possible, via compound_pincount() instead of
> compound_mapcount().
> 
> In order to avoid this kind of crash, make dump_page() slightly more
> robust, by providing a pair of simpler routines that don't contain
> assertions: head_mapcount() and head_pincount().

I find naming misleading. head_mapcount() and head_pincount() sounds like
a mapcount/pincount of the head page, but it's not. It's mapcount and
pincount of the compound page.

Maybe compound_mapcount_head() and compound_pincoun_head()? Or
__compound_mapcount() and __compound_pincount().

> For debug tools, we don't want to go *too* far in this direction, but
> this is a simple small fix, and the crash has already been seen, so it's
> a good trade-off.
> 
> Reported-by: Qian Cai 
> Suggested-by: Matthew Wilcox 
> Cc: Vlastimil Babka 
> Cc: Kirill A. Shutemov 
> Signed-off-by: John Hubbard 
> ---
> Hi,
> 
> I'm assuming that a fix is not required for -stable, but let me know if
> others feel differently. The dump_page() code has changed a lot in that
> area.
> 
> Changes since v1 [1]:
> 
> 1) Rebased onto mmotm
> 
> 2) Used a simpler head_*count() approach.
> 
> 3) Added Matthew's Suggested-by: tag
> 
> 4) Support pincount as well as mapcount.
> 
> [1] 
> https://lore.kernel.org/linux-mm/20200804183943.1244828-1-jhubb...@nvidia.com/
> 
> thanks,
> John Hubbard
> 
>  include/linux/mm.h | 14 --
>  mm/debug.c |  6 +++---
>  2 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4f12b2465e80..8ab941cf73f4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -776,6 +776,11 @@ static inline void *kvcalloc(size_t n, size_t size, 
> gfp_t flags)
>  extern void kvfree(const void *addr);
>  extern void kvfree_sensitive(const void *addr, size_t len);
>  
> +static inline int head_mapcount(struct page *head)
> +{

Do we want VM_BUG_ON_PAGE(!PageHead(head), head) here?

> + return atomic_read(compound_mapcount_ptr(head)) + 1;
> +}
> +
>  /*
>   * Mapcount of compound page as a whole, does not include mapped sub-pages.
>   *
> @@ -785,7 +790,7 @@ static inline int compound_mapcount(struct page *page)
>  {
>   VM_BUG_ON_PAGE(!PageCompound(page), page);
>   page = compound_head(page);
> - return atomic_read(compound_mapcount_ptr(page)) + 1;
> + return head_mapcount(page);
>  }
>  
>  /*
> @@ -898,11 +903,16 @@ static inline bool hpage_pincount_available(struct page 
> *page)
>   return PageCompound(page) && compound_order(page) > 1;
>  }
>  
> +static inline int head_pincount(struct page *head)
> +{

Ditto.

> + return atomic_read(compound_pincount_ptr(head));
> +}
> +
>  static inline int compound_pincount(struct page *page)
>  {
>   VM_BUG_ON_PAGE(!hpage_pincount_available(page), page);
>   page = compound_head(page);
> - return atomic_read(compound_pincount_ptr(page));
> + return head_pincount(page);
>  }
>  
>  static inline void set_compound_order(struct page *page, unsigned int order)
> diff --git a/mm/debug.c b/mm/debug.c
> index c27fff1e3ca8..69b60637112b 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -102,12 +102,12 @@ void __dump_page(struct page *page, const char *reason)
>   if (hpage_pincount_available(page)) {
>   pr_warn("head:%p order:%u compound_mapcount:%d 
> compound_pincount:%d\n",
>   head, compound_order(head),
> - compound_mapcount(head),
> - compound_pincount(head));
> + head_mapcount(head),
> + head_pincount(head));
>   } else {
>   pr_warn("head:%p order:%u compound_mapcount:%d\n",
>   head, compound_order(head),
> - compound_mapcount(head));
> + head_mapcount(head));
>   }
>   }
>   if (PageKsm(page))
> -- 
> 2.28.0
> 

-- 
 Kirill A. Shutemov


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Matthew Wilcox
On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
> How about this additional patch now that we have head_mapcoun()? (I wouldn't
> go for squashing as the goal and scope is too different).

I like it.  It bothers me that the compiler doesn't know that
compound_head(compound_head(x)) == compound_head(x).  I updated
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
able to tell the compiler that compound_head() is idempotent.

> The bloat-o-meter difference without DEBUG_VM is the following:
> 
> add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
> Function old new   delta
> __split_huge_pmd28672899 +32
> shrink_page_list38603847 -13
> reuse_swap_page  762 748 -14
> page_trans_huge_mapcount 153 139 -14
> total_mapcount   187 172 -15
> Total: Before=8687306, After=8687282, chg -0.00%

That's great.  I'm expecting improvements from my thp_head() macro when
that lands (currently in Andrew's tree).  I have been reluctant to replace
current callers of compound_head() with thp_head(), but I suspect PF_HEAD
could use thp_head() and save a few bytes on a tinyconfig build.

> +++ b/mm/huge_memory.c
> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
> vm_area_struct *vma, pmd_t *pmd,
>* Set PG_double_map before dropping compound_mapcount to avoid
>* false-negative page_mapped().
>*/
> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {

I'm a little nervous about this one.  The page does actually come from
pmd_page(), and today that's guaranteed to be a head page.  But I'm
not convinced that's going to still be true in twenty years.  With the
current THP patchset, I won't allocate pages larger than PMD order, but
I can see there being interest in tracking pages in chunks larger than
2MB in the future.  And then pmd_page() might well return a tail page.
So it might be a good idea to not convert this one.

> @@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page)
>   if (likely(!PageCompound(page)))
>   return atomic_read(>_mapcount) + 1;
>  
> - compound = compound_mapcount(page);
> + compound = head_mapcount(page);
>   if (PageHuge(page))
>   return compound;
>   ret = compound;

Yes.  This function is a little confusing because it uses PageCompound()
all the way through when it really should be using PageHead ... because
the opening line of the function is: VM_BUG_ON_PAGE(PageTail(page), page);

> @@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int 
> *total_mapcount)
>   ret -= 1;
>   _total_mapcount -= HPAGE_PMD_NR;
>   }
> - mapcount = compound_mapcount(page);
> + mapcount = head_mapcount(page);
>   ret += mapcount;
>   _total_mapcount += mapcount;
>   if (total_mapcount)

Yes, we called compound_head() earlier in the function.  Safe.

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 9ee4211835c6..c5e722de38b8 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page 
> *page, int *total_mapcount,
>   map_swapcount -= 1;
>   _total_mapcount -= HPAGE_PMD_NR;
>   }
> - mapcount = compound_mapcount(page);
> + mapcount = head_mapcount(page);
>   map_swapcount += mapcount;
>   _total_mapcount += mapcount;
>   if (total_mapcount)

Yes.  page is a head page at this point.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a086c104a9a6..72218cdfd902 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1248,7 +1248,7 @@ static unsigned int shrink_page_list(struct list_head 
> *page_list,
>* away. Chances are some or all of the
>* tail pages can be freed without IO.
>*/
> - if (!compound_mapcount(page) &&
> + if (!head_mapcount(page) &&
>   split_huge_page_to_list(page,
>   page_list))
>   goto activate_locked;

Yes.  We don't put (can't put!) tail pages on the lists, so this must be a
head page.


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Matthew Wilcox
On Thu, Aug 06, 2020 at 05:53:10PM +0200, Vlastimil Babka wrote:
> On 8/6/20 5:39 PM, Matthew Wilcox wrote:
> >> >> +++ b/mm/huge_memory.c
> >> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
> >> >> vm_area_struct *vma, pmd_t *pmd,
> >> >>  * Set PG_double_map before dropping compound_mapcount to avoid
> >> >>  * false-negative page_mapped().
> >> >>  */
> >> >> -   if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) 
> >> >> {
> >> >> +   if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> >> > 
> >> > I'm a little nervous about this one.  The page does actually come from
> >> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> >> > not convinced that's going to still be true in twenty years.  With the
> >> > current THP patchset, I won't allocate pages larger than PMD order, but
> >> > I can see there being interest in tracking pages in chunks larger than
> >> > 2MB in the future.  And then pmd_page() might well return a tail page.
> >> > So it might be a good idea to not convert this one.
> >> 
> >> Hmm the function converts the compound mapcount of the whole page to a
> >> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than 
> >> a pmd,
> >> then I guess this wouldn't work properly anymore without changes anyway?
> >> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
> >> "enforced documentation" for now?
> > 
> > I think it would work as-is.  But also I may have totally misunderstood it.
> > I'll write this declaratively and specifically for x86 (PMD order is 9)
> > ... tell me when I've made a mistake ;-)
> > 
> > This function is for splitting the PMD.  We're leaving the underlying
> > page intact and just changing the page table.  So if, say, we have an
> > underlying 4MB page (and maybe the pages are mapped as PMDs in this
> > process), we might get subpage number 512 of this order-10 page.  We'd
> > need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> > also stored in page 1, but we'd only want to spread the mapcount out
> > over the 512 subpages from 512-1023; we wouldn't want to spread it out
> > over 0-511 because they aren't affected by this particular PMD.
> 
> Yeah, and then we decrease the compound mapcount, which is a counter of "how
> many times is this compound page mapped as a whole". But we only removed (the
> second) half of the compound mapping, so imho that would be wrong?

I'd expect that count to be incremented by 1 for each PMD that it's
mapped to?  ie change the definition of that counter slightly.

> > Having to reason about stuff like this is why I limited the THP code to
> > stop at PMD order ... I don't want to make my life even more complicated
> > than I have to!
> 
> Kirill might correct me but I'd expect the THP code right now has baked in 
> many
> assumptions about THP pages being exactly HPAGE_PMD_ORDER large?

There are somewhat fewer places that make that assumption after applying
the ~80 patches here ... http://git.infradead.org/users/willy/pagecache.git

I have mostly not touched the anonymous THPs (obviously some of the code
paths are shared), although both Kirill & I think there's a win to be
had there too.


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Vlastimil Babka


On 8/4/20 11:48 PM, John Hubbard wrote:
> If a compound page is being split while dump_page() is being run on that
> page, we can end up calling compound_mapcount() on a page that is no
> longer compound. This leads to a crash (already seen at least once in
> the field), due to the VM_BUG_ON_PAGE() assertion inside
> compound_mapcount().
> 
> (The above is from Matthew Wilcox's analysis of Qian Cai's bug report.)
> 
> A similar problem is possible, via compound_pincount() instead of
> compound_mapcount().
> 
> In order to avoid this kind of crash, make dump_page() slightly more
> robust, by providing a pair of simpler routines that don't contain
> assertions: head_mapcount() and head_pincount().
> 
> For debug tools, we don't want to go *too* far in this direction, but
> this is a simple small fix, and the crash has already been seen, so it's
> a good trade-off.
> 
> Reported-by: Qian Cai 
> Suggested-by: Matthew Wilcox 
> Cc: Vlastimil Babka 
> Cc: Kirill A. Shutemov 
> Signed-off-by: John Hubbard 

Acked-by: Vlastimil Babka 

> ---
> Hi,
> 
> I'm assuming that a fix is not required for -stable, but let me know if
> others feel differently. The dump_page() code has changed a lot in that
> area.

I'd say only if we backport all those changes, as most were to avoid similar
kind of crashes?
How about this additional patch now that we have head_mapcoun()? (I wouldn't
go for squashing as the goal and scope is too different).

8<
>From 3b3d5b4639eea9c0787eed510a32acdc918d569f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka 
Date: Thu, 6 Aug 2020 13:33:58 +0200
Subject: [PATCH] mm, thp: use head_mapcount when we know we have a head page

Patch "mm, dump_page: do not crash with bad compound_mapcount()" has introduced
head_mapcount() to split out the part of compound_mapcount() where we already
know/assume we have a head page. We can use the new function in more places
where we already have a head page, to avoid the overhead of compound_head()
and (with DEBUG_VM) a debug check. This patch does that. There are few more
applicable places, but behind DEBUG_VM so performance is not important, and the
extra debug check in compound_mapcount() could be useful instead.

The bloat-o-meter difference without DEBUG_VM is the following:

add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
Function old new   delta
__split_huge_pmd28672899 +32
shrink_page_list38603847 -13
reuse_swap_page  762 748 -14
page_trans_huge_mapcount 153 139 -14
total_mapcount   187 172 -15
Total: Before=8687306, After=8687282, chg -0.00%

This just shows that compiler wasn't able to prove we have a head page by
itself. In __split_huge_pmd() the eliminated check probably led to different
optimization decisions thus code size increased.

Signed-off-by: Vlastimil Babka 
---
 mm/huge_memory.c | 6 +++---
 mm/swapfile.c| 2 +-
 mm/vmscan.c  | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 90733cefa528..5927874b7894 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct 
*vma, pmd_t *pmd,
 * Set PG_double_map before dropping compound_mapcount to avoid
 * false-negative page_mapped().
 */
-   if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
+   if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
for (i = 0; i < HPAGE_PMD_NR; i++)
atomic_inc([i]._mapcount);
}
@@ -2467,7 +2467,7 @@ int total_mapcount(struct page *page)
if (likely(!PageCompound(page)))
return atomic_read(>_mapcount) + 1;
 
-   compound = compound_mapcount(page);
+   compound = head_mapcount(page);
if (PageHuge(page))
return compound;
ret = compound;
@@ -2531,7 +2531,7 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
ret -= 1;
_total_mapcount -= HPAGE_PMD_NR;
}
-   mapcount = compound_mapcount(page);
+   mapcount = head_mapcount(page);
ret += mapcount;
_total_mapcount += mapcount;
if (total_mapcount)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 9ee4211835c6..c5e722de38b8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1673,7 +1673,7 @@ static int page_trans_huge_map_swapcount(struct page 
*page, int *total_mapcount,
map_swapcount -= 1;
_total_mapcount -= HPAGE_PMD_NR;
}
-   mapcount = compound_mapcount(page);
+   mapcount = head_mapcount(page);
map_swapcount += mapcount;
_total_mapcount += mapcount;
if (total_mapcount)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index a086c104a9a6..72218cdfd902 

Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Matthew Wilcox
On Thu, Aug 06, 2020 at 05:13:05PM +0200, Vlastimil Babka wrote:
> On 8/6/20 3:48 PM, Matthew Wilcox wrote:
> > On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
> >> How about this additional patch now that we have head_mapcoun()? (I 
> >> wouldn't
> >> go for squashing as the goal and scope is too different).
> > 
> > I like it.  It bothers me that the compiler doesn't know that
> > compound_head(compound_head(x)) == compound_head(x).  I updated
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
> > able to tell the compiler that compound_head() is idempotent.
> 
> Yeah it would be nice to get the benefits everywhere automatically. But I 
> guess
> the compiler would have to discard the idempotence assumptions if there are
> multiple consecutive (perhaps hidden behind page flag access)
> compound_head(page) from a function, as soon as we modify the struct page 
> somewhere.

The only place where we modify the struct page is the split code, and
I think we're pretty careful to handle the pages appropriately there.
The buddy allocator has its own way of combining pages.

> >> +++ b/mm/huge_memory.c
> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
> >> vm_area_struct *vma, pmd_t *pmd,
> >> * Set PG_double_map before dropping compound_mapcount to avoid
> >> * false-negative page_mapped().
> >> */
> >> -  if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> >> +  if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> > 
> > I'm a little nervous about this one.  The page does actually come from
> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
> > not convinced that's going to still be true in twenty years.  With the
> > current THP patchset, I won't allocate pages larger than PMD order, but
> > I can see there being interest in tracking pages in chunks larger than
> > 2MB in the future.  And then pmd_page() might well return a tail page.
> > So it might be a good idea to not convert this one.
> 
> Hmm the function converts the compound mapcount of the whole page to a
> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a 
> pmd,
> then I guess this wouldn't work properly anymore without changes anyway?
> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
> "enforced documentation" for now?

I think it would work as-is.  But also I may have totally misunderstood it.
I'll write this declaratively and specifically for x86 (PMD order is 9)
... tell me when I've made a mistake ;-)

This function is for splitting the PMD.  We're leaving the underlying
page intact and just changing the page table.  So if, say, we have an
underlying 4MB page (and maybe the pages are mapped as PMDs in this
process), we might get subpage number 512 of this order-10 page.  We'd
need to check the DoubleMap bit on subpage 1, and the compound_mapcount
also stored in page 1, but we'd only want to spread the mapcount out
over the 512 subpages from 512-1023; we wouldn't want to spread it out
over 0-511 because they aren't affected by this particular PMD.

Having to reason about stuff like this is why I limited the THP code to
stop at PMD order ... I don't want to make my life even more complicated
than I have to!


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Vlastimil Babka
On 8/6/20 3:48 PM, Matthew Wilcox wrote:
> On Thu, Aug 06, 2020 at 01:45:11PM +0200, Vlastimil Babka wrote:
>> How about this additional patch now that we have head_mapcoun()? (I wouldn't
>> go for squashing as the goal and scope is too different).
> 
> I like it.  It bothers me that the compiler doesn't know that
> compound_head(compound_head(x)) == compound_head(x).  I updated
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32911 with a request to be
> able to tell the compiler that compound_head() is idempotent.

Yeah it would be nice to get the benefits everywhere automatically. But I guess
the compiler would have to discard the idempotence assumptions if there are
multiple consecutive (perhaps hidden behind page flag access)
compound_head(page) from a function, as soon as we modify the struct page 
somewhere.

>> The bloat-o-meter difference without DEBUG_VM is the following:
>> 
>> add/remove: 0/0 grow/shrink: 1/4 up/down: 32/-56 (-24)
>> Function old new   delta
>> __split_huge_pmd28672899 +32
>> shrink_page_list38603847 -13
>> reuse_swap_page  762 748 -14
>> page_trans_huge_mapcount 153 139 -14
>> total_mapcount   187 172 -15
>> Total: Before=8687306, After=8687282, chg -0.00%
> 
> That's great.  I'm expecting improvements from my thp_head() macro when
> that lands (currently in Andrew's tree).  I have been reluctant to replace
> current callers of compound_head() with thp_head(), but I suspect PF_HEAD
> could use thp_head() and save a few bytes on a tinyconfig build.
> 
>> +++ b/mm/huge_memory.c
>> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
>> vm_area_struct *vma, pmd_t *pmd,
>>   * Set PG_double_map before dropping compound_mapcount to avoid
>>   * false-negative page_mapped().
>>   */
>> -if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> +if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
> 
> I'm a little nervous about this one.  The page does actually come from
> pmd_page(), and today that's guaranteed to be a head page.  But I'm
> not convinced that's going to still be true in twenty years.  With the
> current THP patchset, I won't allocate pages larger than PMD order, but
> I can see there being interest in tracking pages in chunks larger than
> 2MB in the future.  And then pmd_page() might well return a tail page.
> So it might be a good idea to not convert this one.

Hmm the function converts the compound mapcount of the whole page to a
HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a pmd,
then I guess this wouldn't work properly anymore without changes anyway?
Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
"enforced documentation" for now?


Re: [PATCH v2] mm, dump_page: do not crash with bad compound_mapcount()

2020-08-06 Thread Vlastimil Babka
On 8/6/20 5:39 PM, Matthew Wilcox wrote:
>> >> +++ b/mm/huge_memory.c
>> >> @@ -2125,7 +2125,7 @@ static void __split_huge_pmd_locked(struct 
>> >> vm_area_struct *vma, pmd_t *pmd,
>> >>* Set PG_double_map before dropping compound_mapcount to avoid
>> >>* false-negative page_mapped().
>> >>*/
>> >> - if (compound_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> >> + if (head_mapcount(page) > 1 && !TestSetPageDoubleMap(page)) {
>> > 
>> > I'm a little nervous about this one.  The page does actually come from
>> > pmd_page(), and today that's guaranteed to be a head page.  But I'm
>> > not convinced that's going to still be true in twenty years.  With the
>> > current THP patchset, I won't allocate pages larger than PMD order, but
>> > I can see there being interest in tracking pages in chunks larger than
>> > 2MB in the future.  And then pmd_page() might well return a tail page.
>> > So it might be a good idea to not convert this one.
>> 
>> Hmm the function converts the compound mapcount of the whole page to a
>> HPAGE_PMD_NR of base pages. If suddenly the compound page was bigger than a 
>> pmd,
>> then I guess this wouldn't work properly anymore without changes anyway?
>> Maybe we could stick something like VM_BUG_ON(PageTransHuge(page)) there as
>> "enforced documentation" for now?
> 
> I think it would work as-is.  But also I may have totally misunderstood it.
> I'll write this declaratively and specifically for x86 (PMD order is 9)
> ... tell me when I've made a mistake ;-)
> 
> This function is for splitting the PMD.  We're leaving the underlying
> page intact and just changing the page table.  So if, say, we have an
> underlying 4MB page (and maybe the pages are mapped as PMDs in this
> process), we might get subpage number 512 of this order-10 page.  We'd
> need to check the DoubleMap bit on subpage 1, and the compound_mapcount
> also stored in page 1, but we'd only want to spread the mapcount out
> over the 512 subpages from 512-1023; we wouldn't want to spread it out
> over 0-511 because they aren't affected by this particular PMD.

Yeah, and then we decrease the compound mapcount, which is a counter of "how
many times is this compound page mapped as a whole". But we only removed (the
second) half of the compound mapping, so imho that would be wrong?

> Having to reason about stuff like this is why I limited the THP code to
> stop at PMD order ... I don't want to make my life even more complicated
> than I have to!

Kirill might correct me but I'd expect the THP code right now has baked in many
assumptions about THP pages being exactly HPAGE_PMD_ORDER large?