On Wed, May 13, 2026 at 09:04:29PM +0800, Muchun Song wrote:
> Commit 622026e87c40 ("mm/hugetlb: remove fake head pages") switched
> HVO to reuse per-zone shared tail pages from zone->vmemmap_tails[].
>
> Those shared tail pages were initialized in hugetlb_vmemmap_init(), but
> bootmem HugeTLB folios are prepared earlier from gather_bootmem_prealloc().
> With hugetlb_free_vmemmap=on, prep_and_add_bootmem_folios() can access
> pageblock flags on bootmem HugeTLB pages whose mirrored tail struct pages
> already point to the shared tail page. On CONFIG_DEBUG_VM kernels,
> get_pfnblock_bitmap_bitidx() then dereferences the still-uninitialized
> shared tail page and can panic during boot.
>
> Initialize zone->vmemmap_tails[] from gather_bootmem_prealloc(), before
> bootmem HugeTLB folios are processed, and drop the later initialization
> from hugetlb_vmemmap_init().
>
> This bug only affects CONFIG_DEBUG_VM kernels, where the relevant
> assertion is evaluated.
>
> Fixes: 622026e87c40 ("mm/hugetlb: remove fake head pages")
> Signed-off-by: Muchun Song <[email protected]>
For the correctness of the change
Acked-by: Oscar Salvador <[email protected]>
but I have a couple of comments:
> ---
> mm/hugetlb.c | 19 +++++++++++++++++++
> mm/hugetlb_vmemmap.c | 17 -----------------
> 2 files changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 31b34ca0f402..d22683ab30a1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3382,6 +3382,25 @@ static void __init gather_bootmem_prealloc(void)
> .max_threads = num_node_state(N_MEMORY),
> .numa_aware = true,
> };
> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> + struct zone *zone;
> +
> + for_each_zone(zone) {
> + for (int i = 0; i < NR_VMEMMAP_TAILS; i++) {
> + struct page *tail, *p;
> + unsigned int order;
> +
> + tail = zone->vmemmap_tails[i];
> + if (!tail)
> + continue;
> +
> + order = i + VMEMMAP_TAIL_MIN_ORDER;
> + p = page_to_virt(tail);
> + for (int j = 0; j < PAGE_SIZE / sizeof(struct page);
> j++)
> + init_compound_tail(p + j, NULL, order, zone);
> + }
> + }
This deserves a comment why do we need to initialize those pages here, no need
for
a fat one but a hint, because everybody else looking at this will wonder why do
not
have it in hugetlb_vmemmap_init(), as the name suggests.
> +#endif
>
> padata_do_multithreaded(&job);
> }
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4a077d231d3a..62e61af18c9a 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -870,27 +870,10 @@ static const struct ctl_table hugetlb_vmemmap_sysctls[]
> = {
> static int __init hugetlb_vmemmap_init(void)
At this point, hugetlb_vmemmap_init only initialites the sysctls for
each hstate, should the name reflect that better?
Also, vmemmap_get_tail() still thinks that tail pages are initialized
here from this comment:
"/*
* Only allocate the page, but do not initialize it.
*
* Any initialization done here will be overwritten by memmap_init().
*
* hugetlb_vmemmap_init() will take care of initialization after
* memmap_init().
*/"
so we might want to adjust that as well, and I am not sure if we have
more comments in the tree referencing hugetlb_vmemmap_init() as the init
point for those pages that need correction.
Having said that, and this is just a question, we cannot really make
gather_bootmem_prealloc() also do the initialization of the sysfs right?
I see that hugetlb sysfs gets initialized from hugetlb_init() a few
calls after, so.. meh.
Anyway, maybe just convert hugetlb_vmemmap_init to hugetlb_vmemmap_sysfs_create
(pick a better name), so it does not look like having two entry points
for vmemmap init stuff.
--
Oscar Salvador
SUSE Labs