On Wed, May 13, 2026 at 09:04:29PM +0800, Muchun Song wrote:
> Commit 622026e87c40 ("mm/hugetlb: remove fake head pages") switched
> HVO to reuse per-zone shared tail pages from zone->vmemmap_tails[].
> 
> Those shared tail pages were initialized in hugetlb_vmemmap_init(), but
> bootmem HugeTLB folios are prepared earlier from gather_bootmem_prealloc().
> With hugetlb_free_vmemmap=on, prep_and_add_bootmem_folios() can access
> pageblock flags on bootmem HugeTLB pages whose mirrored tail struct pages
> already point to the shared tail page. On CONFIG_DEBUG_VM kernels,
> get_pfnblock_bitmap_bitidx() then dereferences the still-uninitialized
> shared tail page and can panic during boot.
> 
> Initialize zone->vmemmap_tails[] from gather_bootmem_prealloc(), before
> bootmem HugeTLB folios are processed, and drop the later initialization
> from hugetlb_vmemmap_init().
> 
> This bug only affects CONFIG_DEBUG_VM kernels, where the relevant
> assertion is evaluated.
> 
> Fixes: 622026e87c40 ("mm/hugetlb: remove fake head pages")
> Signed-off-by: Muchun Song <[email protected]>

For the correctness of the change

Acked-by: Oscar Salvador <[email protected]>

but I have a couple of comments:

> ---
>  mm/hugetlb.c         | 19 +++++++++++++++++++
>  mm/hugetlb_vmemmap.c | 17 -----------------
>  2 files changed, 19 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 31b34ca0f402..d22683ab30a1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3382,6 +3382,25 @@ static void __init gather_bootmem_prealloc(void)
>               .max_threads    = num_node_state(N_MEMORY),
>               .numa_aware     = true,
>       };
> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> +     struct zone *zone;
> +
> +     for_each_zone(zone) {
> +             for (int i = 0; i < NR_VMEMMAP_TAILS; i++) {
> +                     struct page *tail, *p;
> +                     unsigned int order;
> +
> +                     tail = zone->vmemmap_tails[i];
> +                     if (!tail)
> +                             continue;
> +
> +                     order = i + VMEMMAP_TAIL_MIN_ORDER;
> +                     p = page_to_virt(tail);
> +                     for (int j = 0; j < PAGE_SIZE / sizeof(struct page); 
> j++)
> +                             init_compound_tail(p + j, NULL, order, zone);
> +             }
> +     }

This deserves a comment why do we need to initialize those pages here, no need 
for
a fat one but a hint, because everybody else looking at this will wonder why do 
not
have it in hugetlb_vmemmap_init(), as the name suggests.


> +#endif
>  
>       padata_do_multithreaded(&job);
>  }
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4a077d231d3a..62e61af18c9a 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -870,27 +870,10 @@ static const struct ctl_table hugetlb_vmemmap_sysctls[] 
> = {
>  static int __init hugetlb_vmemmap_init(void)

At this point, hugetlb_vmemmap_init only initialites the sysctls for
each hstate, should the name reflect that better?

Also, vmemmap_get_tail() still thinks that tail pages are initialized
here from this comment:

 "/*
   * Only allocate the page, but do not initialize it.
   *
   * Any initialization done here will be overwritten by memmap_init().
   *
   * hugetlb_vmemmap_init() will take care of initialization after
   * memmap_init().
   */"

so we might want to adjust that as well, and I am not sure if we have
more comments in the tree referencing hugetlb_vmemmap_init() as the init
point for those pages that need correction.

 
Having said that, and this is just a question, we cannot really make
gather_bootmem_prealloc() also do the initialization of the sysfs right?
I see that hugetlb sysfs gets initialized from hugetlb_init() a few
calls after, so.. meh.

Anyway, maybe just convert hugetlb_vmemmap_init to hugetlb_vmemmap_sysfs_create
(pick a better name), so it does not look like having two entry points
for vmemmap init stuff.

-- 
Oscar Salvador
SUSE Labs

Reply via email to