> On May 14, 2026, at 15:51, Oscar Salvador <[email protected]> wrote:
>
> On Wed, May 13, 2026 at 09:04:29PM +0800, Muchun Song wrote:
>> Commit 622026e87c40 ("mm/hugetlb: remove fake head pages") switched
>> HVO to reuse per-zone shared tail pages from zone->vmemmap_tails[].
>>
>> Those shared tail pages were initialized in hugetlb_vmemmap_init(), but
>> bootmem HugeTLB folios are prepared earlier from gather_bootmem_prealloc().
>> With hugetlb_free_vmemmap=on, prep_and_add_bootmem_folios() can access
>> pageblock flags on bootmem HugeTLB pages whose mirrored tail struct pages
>> already point to the shared tail page. On CONFIG_DEBUG_VM kernels,
>> get_pfnblock_bitmap_bitidx() then dereferences the still-uninitialized
>> shared tail page and can panic during boot.
>>
>> Initialize zone->vmemmap_tails[] from gather_bootmem_prealloc(), before
>> bootmem HugeTLB folios are processed, and drop the later initialization
>> from hugetlb_vmemmap_init().
>>
>> This bug only affects CONFIG_DEBUG_VM kernels, where the relevant
>> assertion is evaluated.
>>
>> Fixes: 622026e87c40 ("mm/hugetlb: remove fake head pages")
>> Signed-off-by: Muchun Song <[email protected]>
>
> For the correctness of the change
>
> Acked-by: Oscar Salvador <[email protected]>
Thanks.
>
> but I have a couple of comments:
>
>> ---
>> mm/hugetlb.c | 19 +++++++++++++++++++
>> mm/hugetlb_vmemmap.c | 17 -----------------
>> 2 files changed, 19 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 31b34ca0f402..d22683ab30a1 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3382,6 +3382,25 @@ static void __init gather_bootmem_prealloc(void)
>> .max_threads = num_node_state(N_MEMORY),
>> .numa_aware = true,
>> };
>> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>> + struct zone *zone;
>> +
>> + for_each_zone(zone) {
>> + for (int i = 0; i < NR_VMEMMAP_TAILS; i++) {
>> + struct page *tail, *p;
>> + unsigned int order;
>> +
>> + tail = zone->vmemmap_tails[i];
>> + if (!tail)
>> + continue;
>> +
>> + order = i + VMEMMAP_TAIL_MIN_ORDER;
>> + p = page_to_virt(tail);
>> + for (int j = 0; j < PAGE_SIZE / sizeof(struct page);
>> j++)
>> + init_compound_tail(p + j, NULL, order, zone);
>> + }
>> + }
>
> This deserves a comment why do we need to initialize those pages here, no
> need for
> a fat one but a hint, because everybody else looking at this will wonder why
> do not
> have it in hugetlb_vmemmap_init(), as the name suggests.
Make sense.
>
>
>> +#endif
>>
>> padata_do_multithreaded(&job);
>> }
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 4a077d231d3a..62e61af18c9a 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -870,27 +870,10 @@ static const struct ctl_table
>> hugetlb_vmemmap_sysctls[] = {
>> static int __init hugetlb_vmemmap_init(void)
>
> At this point, hugetlb_vmemmap_init only initialites the sysctls for
> each hstate, should the name reflect that better?
A good point. I can add a new cleanup patch.
>
> Also, vmemmap_get_tail() still thinks that tail pages are initialized
> here from this comment:
Good catch. I'll fix it.
>
> "/*
> * Only allocate the page, but do not initialize it.
> *
> * Any initialization done here will be overwritten by memmap_init().
> *
> * hugetlb_vmemmap_init() will take care of initialization after
> * memmap_init().
> */"
>
> so we might want to adjust that as well, and I am not sure if we have
> more comments in the tree referencing hugetlb_vmemmap_init() as the init
> point for those pages that need correction.
Only this one.
>
>
> Having said that, and this is just a question, we cannot really make
> gather_bootmem_prealloc() also do the initialization of the sysfs right?
> I see that hugetlb sysfs gets initialized from hugetlb_init() a few
> calls after, so.. meh.
hugetlb_vmemmap_init() is introduced by me. I want to hide some symbols
from mm/hugetlb_vmemmap.c at that time. So I used late_initcall() for that.
>
> Anyway, maybe just convert hugetlb_vmemmap_init to
> hugetlb_vmemmap_sysfs_create
> (pick a better name), so it does not look like having two entry points
> for vmemmap init stuff.
This bug fix is a temporary change, the whole zone initialization will
removed later in patch 30. Then, there will be only one place for vmemmap init
:)
I’m curious about your thoughts on the practice of minimizing external
symbols. Personally, I’d prefer to move these initializations into
their respective files:
- hugetlb_sysfs_init();
- hugetlb_cgroup_file_init();
- hugetlb_sysctl_init();
Of course, that’s just my personal preference.
Muchun,
Thanks.
>
> --
> Oscar Salvador
> SUSE Labs