David Hildenbrand wrote:
[..]
> > Maybe there is something missing in ZONE_DEVICE freeing/splitting code
> > of large folios, where we should do the same, to make sure that all
> > page->memcg_data is actually 0?
> >
> > I assume so. Let me dig.
> >
>
> I suspect this should do the trick:
>
> diff --git a/fs/dax.c b/fs/dax.c
> index af5045b0f476e..8dffffef70d21 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -397,6 +397,10 @@ static inline unsigned long dax_folio_put(struct folio
> *folio)
> if (!order)
> return 0;
>
> +#ifdef NR_PAGES_IN_LARGE_FOLIO
> + folio->_nr_pages = 0;
> +#endif
I assume this new fs/dax.c instance of this pattern motivates a
folio_set_nr_pages() helper to hide the ifdef?
While it is concerning that fs/dax.c misses common expectations like
this, but I think that is the nature of bypassing the page allocator to
get folios().
However, raises the question if fixing it here is sufficient for other
ZONE_DEVICE folio cases. I did not immediately find a place where other
ZONE_DEVICE users might be calling prep_compound_page() and leaving
stale tail page metadata lying around. Alistair?
> +
> for (i = 0; i < (1UL << order); i++) {
> struct dev_pagemap *pgmap = page_pgmap(&folio->page);
> struct page *page = folio_page(folio, i);
>
>
> Alternatively (in the style of fa23a338de93aa03eb0b6146a0440f5762309f85)
>
> diff --git a/fs/dax.c b/fs/dax.c
> index af5045b0f476e..a1e354b748522 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -412,6 +412,9 @@ static inline unsigned long dax_folio_put(struct folio
> *folio)
> */
> new_folio->pgmap = pgmap;
> new_folio->share = 0;
> +#ifdef CONFIG_MEMCG
> + new_folio->memcg_data = 0;
> +#endif
This looks correct, but I like the first option because I would never
expect a dax-page to need to worry about being part of a memcg.
> WARN_ON_ONCE(folio_ref_count(new_folio));
> }
>
>
>
> --
> Cheers,
>
> David / dhildenb
Thanks for the help, David!