On Mon, Jan 19, 2026 at 04:59:56PM +1100, Alistair Popple wrote:
> On 2026-01-17 at 16:27 +1100, Matthew Brost <[email protected]> wrote...
> > On Sat, Jan 17, 2026 at 03:42:16PM +1100, Balbir Singh wrote:
> > > On 1/17/26 14:55, Matthew Brost wrote:
> > > > On Fri, Jan 16, 2026 at 08:51:14PM -0400, Jason Gunthorpe wrote:
> > > >> On Fri, Jan 16, 2026 at 12:31:25PM -0800, Matthew Brost wrote:
> > > >>>> I suppose we could be getting say an order-9 folio that was
> > > >>>> previously used
> > > >>>> as two order-8 folios? And each of them had their _nr_pages in their
> > > >>>> head
> > > >>>
> > > >>> Yes, this is a good example. At this point we have idea what previous
> > > >>> allocation(s) order(s) were - we could have multiple places in the
> > > >>> loop
> > > >>> where _nr_pages is populated, thus we have to clear this everywhere.
> > > >>
> > > >> Why? The fact you have to use such a crazy expression to even access
> > > >> _nr_pages strongly says nothing will read it as _nr_pages.
> > > >>
> > > >> Explain each thing:
> > > >>
> > > >> new_page->flags.f &= ~0xffUL; /* Clear possible
> > > >> order, page head */
> > > >>
> > > >> OK, the tail page flags need to be set right, and prep_compound_page()
> > > >> called later depends on them being zero.
> > > >>
> > > >> ((struct folio *)(new_page - 1))->_nr_pages = 0;
> > > >>
> > > >> Can't see a reason, nothing reads _nr_pages from a random tail
> > > >> page. _nr_pages is the last 8 bytes of struct page so it overlaps
> > > >> memcg_data, which is also not supposed to be read from a tail page?
>
> This is (or was) either a order-0 page, a head page or a tail page, who
> knows. So it doesn't really matter whether or not _nr_pages or memcg_data are
> supposed to be read from a tail page or not. What really matters is does any
> of
> vm_insert_page(), migrate_vma_*() or prep_compound_page() expect this to be a
> particular value when called on this page?
This weird expression is doing three things,
1) it is zeroing memcg on the head page
2) it is zeroing _nr_pages on the head folio
3) it is zeroing memcg on all the tail pages.
Are you aruging for 1, 2 or 3?
#1 is missing today
#2 is handled directly by the prep_compound_page() -> prep_compound_head() ->
folio_set_order()
#3 I argue isn't necessary.
> AFAIK memcg_data is at least expected to be NULL for migrate_vma_*() when
> called
> on an order-0 page, which means it has to be cleared.
Great, so lets write that in prep_compound_head()!
> Although I think it would be far less confusing if it was just written like
> that
> rather than the folio math but it achieves the same thing and is technically
> correct.
I have yet to hear a reason to do #3.
> > > >> new_folio->mapping = NULL;
> > > >>
> > > >> Pointless, prep_compound_page() -> prep_compound_tail() -> p->mapping
> > > >> = TAIL_MAPPING;
>
> Not pointless - vm_insert_page() for example expects folio_test_anon() which
> which won't be the case if p->mapping was previously set to TAIL_MAPPING so it
> needs to be cleared. migrate_vma_setup() has a similar issue.
It is pointless to put it in the loop! Sure set the head page.
> > > >> new_folio->pgmap = pgmap; /* Also clear compound
> > > >> head */
> > > >>
> > > >> Pointless, compound_head is set in prep_compound_tail():
> > > >> set_compound_head(p, head);
>
> No it isn't - we're not clearing tail pages here, we're initialising
> ZONE_DEVICE
> struct pages ready for use by the core-mm which means the pgmap needs to be
> correct.
See above, same issue. The tail pages have pgmap set to NULL because
prep_compound_tail() does it. So why do we set it to pgmap here and
then clear it a few lines below?
Set it once in the head folio outside this loop.
> No problem with the above, and FWIW it seems correct. Although I suspect just
> setting page->memcg_data = 0 would have been far less controversial ;)
It is "correct" but horrible.
What is wrong with this? Isn't it so much better and more efficient??
diff --git a/mm/internal.h b/mm/internal.h
index e430da900430a1..a7d3f5e4b85e49 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -806,14 +806,21 @@ static inline void prep_compound_head(struct page *page,
unsigned int order)
atomic_set(&folio->_pincount, 0);
atomic_set(&folio->_entire_mapcount, -1);
}
- if (order > 1)
+ if (order > 1) {
INIT_LIST_HEAD(&folio->_deferred_list);
+ } else {
+ folio->mapping = NULL;
+#ifdef CONFIG_MEMCG
+ folio->memcg_data = 0;
+#endif
+ }
}
static inline void prep_compound_tail(struct page *head, int tail_idx)
{
struct page *p = head + tail_idx;
+ p->flags.f &= ~0xffUL; /* Clear possible order, page head */
p->mapping = TAIL_MAPPING;
set_compound_head(p, head);
set_page_private(p, 0);
diff --git a/mm/memremap.c b/mm/memremap.c
index 4c2e0d68eb2798..7ec034c11068e1 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -479,19 +479,23 @@ void free_zone_device_folio(struct folio *folio)
}
}
-void zone_device_page_init(struct page *page, unsigned int order)
+void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap,
+ unsigned int order)
{
VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES);
+ struct folio *folio;
/*
* Drivers shouldn't be allocating pages after calling
* memunmap_pages().
*/
WARN_ON_ONCE(!percpu_ref_tryget_many(&page_pgmap(page)->ref, 1 <<
order));
- set_page_count(page, 1);
- lock_page(page);
- if (order)
- prep_compound_page(page, order);
+ prep_compound_page(page, order);
+
+ folio = page_folio(page);
+ folio->pgmap = pgmap;
+ folio_lock(folio);
+ folio_set_count(folio, 1);
}
EXPORT_SYMBOL_GPL(zone_device_page_init);
Jason