In this series, HVO is redefined as Hugepage Vmemmap Optimization: a
general vmemmap optimization model for large hugepage-backed mappings,
rather than a HugeTLB-only implementation detail.

The existing code grew around the original HugeTLB-specific HVO path,
while device DAX developed similar but separate vmemmap optimization
handling. As a result, the current implementation carries duplicated
logic, boot-time special cases, and subsystem-specific interfaces around
what is fundamentally the same sparse-vmemmap optimization.

This series generalizes that optimization into a common framework used
by both HugeTLB and device DAX.

The first few patches include some minor bug fixes found during AI-aided
review of the current code. These fixes are not the main goal of the
series, but the later refactoring and unification work depends on them,
so they are included here as preparatory changes.

The series then reworks the relevant early boot and sparse
initialization paths, introduces a generic section-based sparse-vmemmap
optimization infrastructure, switches HugeTLB and device DAX over to the
shared implementation, and removes the old special-case code.

At a high level, the series does the following:

  - apply a small set of preparatory bug fixes
  - reorder early boot and sparse initialization so optimized vmemmap
    setup has the required zone and pageblock state
  - introduce generic section-based vmemmap optimization infrastructure
  - switch HugeTLB and device DAX to the shared implementation
  - consolidate HVO enablement and naming
  - remove obsolete HugeTLB-specific boot-time and architecture-specific
    optimization code
  - rewrite the documentation around the unified design

This brings a few concrete benefits:

  - HugeTLB and device DAX share one vmemmap optimization framework,
    reducing duplicated logic and long-term maintenance overhead
  - when CONFIG_DEFERRED_STRUCT_PAGE_INIT is disabled, optimized struct
    pages can skip the usual memmap_init() initialization work, which
    helps reduce boot-time overhead
  - all architectures that support HVO benefit from the generic
    sparse-vmemmap optimization path without extra architecture-specific
    preinit handling
  - device DAX improves its struct page savings further by dropping the
    extra reserved tail page
  - shared vmemmap tail pages are mapped read-only, improving robustness

I have only built and tested this series on x86. I do not currently have
a powerpc test environment, so any testing or feedback on powerpc would
be much appreciated.

Changes since v1:
  - rebased onto current next tree
  - added the preparatory minor bug fixes found during AI-aided review
  - added further refactoring on top of the new infrastructure

Muchun Song (69):
  mm/hugetlb: Fix boot panic with CONFIG_DEBUG_VM and HVO bootmem pages
  mm/hugetlb_vmemmap: Fix __hugetlb_vmemmap_optimize_folios()
  powerpc/mm: Fix wrong addr_pfn tracking in compound vmemmap population
  mm/hugetlb: Initialize gigantic bootmem hugepage struct pages earlier
  mm/mm_init: Simplify deferred_free_pages() migratetype init
  mm/sparse: Panic on memmap and usemap allocation failure
  mm/sparse: Move subsection_map_init() into sparse_init()
  mm/mm_init: Defer sparse_init() until after zone initialization
  mm/mm_init: Defer hugetlb reservation until after zone initialization
  mm/mm_init: Remove set_pageblock_order() call from sparse_init()
  mm/sparse: Move sparse_vmemmap_init_nid_late() into sparse_init_nid()
  mm/hugetlb_cma: Validate hugetlb CMA range by zone at reserve time
  mm/hugetlb: Refactor early boot gigantic hugepage allocation
  mm/hugetlb: Free cross-zone bootmem gigantic pages after allocation
  mm/hugetlb_vmemmap: Move bootmem HVO setup to early init
  mm/hugetlb: Remove obsolete bootmem cross-zone checks
  mm/sparse-vmemmap: Remove sparse_vmemmap_init_nid_late()
  mm/hugetlb: Remove unused bootmem cma field
  mm/mm_init: Make __init_page_from_nid() static
  mm/sparse-vmemmap: Drop VMEMMAP_POPULATE_PAGEREF
  mm: Rename vmemmap optimization macros around folio semantics
  mm/sparse: Drop power-of-2 size requirement for struct mem_section
  mm/sparse-vmemmap: track compound page order in struct mem_section
  mm/mm_init: Skip initializing shared vmemmap tail pages
  mm/sparse-vmemmap: Initialize shared tail vmemmap pages on allocation
  mm/sparse-vmemmap: Support section-based vmemmap accounting
  mm/sparse-vmemmap: Support section-based vmemmap optimization
  mm/hugetlb: Use generic vmemmap optimization macros
  mm/sparse: Mark memblocks present earlier
  mm/hugetlb: Switch HugeTLB to section-based vmemmap optimization
  mm/sparse: Remove section_map_size()
  mm/mm_init: Factor out pfn_to_zone() as a shared helper
  mm/sparse: Remove SPARSEMEM_VMEMMAP_PREINIT
  mm/sparse: Inline usemap allocation into sparse_init_nid()
  mm/hugetlb: Remove HUGE_BOOTMEM_HVO
  mm/hugetlb: Remove HUGE_BOOTMEM_CMA
  mm/sparse-vmemmap: Factor out shared vmemmap page allocation
  mm/sparse-vmemmap: Introduce CONFIG_SPARSEMEM_VMEMMAP_OPTIMIZATION
  mm/sparse-vmemmap: Switch DAX to vmemmap_shared_tail_page()
  powerpc/mm: Switch DAX to vmemmap_shared_tail_page()
  mm/sparse-vmemmap: Drop the extra tail page from DAX reservation
  mm/sparse-vmemmap: Switch DAX to section-based vmemmap optimization
  mm/sparse-vmemmap: Unify DAX and HugeTLB population paths
  mm/sparse-vmemmap: Remove the unused ptpfn argument
  powerpc/mm: Make vmemmap_populate_compound_pages() static
  mm/sparse-vmemmap: Map shared vmemmap tail pages read-only
  powerpc/mm: Map shared vmemmap tail pages read-only
  mm/sparse-vmemmap: Inline vmemmap_populate_address() into its caller
  mm/hugetlb_vmemmap: Remove vmemmap_wrprotect_hvo()
  mm/sparse: Simplify section_nr_vmemmap_pages()
  mm/sparse-vmemmap: Introduce vmemmap_nr_struct_pages()
  powerpc/mm: Drop powerpc vmemmap_can_optimize()
  mm/sparse-vmemmap: Drop vmemmap_can_optimize()
  mm/sparse-vmemmap: Drop @pgmap from vmemmap population APIs
  mm/sparse: Decouple section activation from ZONE_DEVICE
  mm: Redefine HVO as Hugepage Vmemmap Optimization
  mm/sparse-vmemmap: Consolidate HVO enable checks
  mm/hugetlb: Make HVO optimizable checks depend on generic logic
  mm/sparse-vmemmap: Localize init_compound_tail()
  mm/mm_init: Check zone consistency on optimized vmemmap sections
  mm/hugetlb: Drop boot-time HVO handling for gigantic folios
  mm/hugetlb: Simplify hugetlb_folio_init_vmemmap()
  mm/hugetlb: Initialize the full bootmem hugepage in hugetlb code
  mm/mm_init: Factor out compound page initialization
  mm/mm_init: Make __init_single_page() static
  mm/cma: Move CMA pageblock initialization into cma_activate_area()
  mm/cma: Move init_cma_pageblock() into cma.c
  mm/mm_init: Initialize pageblock migratetype in memmap init helpers
  Documentation/mm: Rewrite vmemmap_dedup.rst for unified HVO

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/admin-guide/mm/hugetlbpage.rst  |   4 +-
 .../admin-guide/mm/memory-hotplug.rst         |   2 +-
 Documentation/admin-guide/sysctl/vm.rst       |   3 +-
 Documentation/arch/powerpc/index.rst          |   1 -
 Documentation/arch/powerpc/vmemmap_dedup.rst  | 101 ----
 Documentation/mm/vmemmap_dedup.rst            | 217 ++------
 arch/arm64/mm/mmu.c                           |   5 +-
 arch/loongarch/mm/init.c                      |   5 +-
 arch/powerpc/include/asm/book3s/64/radix.h    |  12 -
 arch/powerpc/mm/book3s64/radix_pgtable.c      | 154 +-----
 arch/powerpc/mm/hugetlbpage.c                 |  11 +-
 arch/powerpc/mm/init_64.c                     |   1 +
 arch/powerpc/mm/mem.c                         |   5 +-
 arch/riscv/mm/init.c                          |   5 +-
 arch/s390/mm/init.c                           |   5 +-
 arch/x86/Kconfig                              |   1 -
 arch/x86/entry/vdso/vdso32/fake_32bit_build.h |   1 -
 arch/x86/mm/init_64.c                         |   5 +-
 drivers/dax/Kconfig                           |   1 +
 fs/Kconfig                                    |   6 +-
 include/linux/hugetlb.h                       |  23 +-
 include/linux/memory_hotplug.h                |  12 +-
 include/linux/mm.h                            |  44 +-
 include/linux/mm_types.h                      |   3 +-
 include/linux/mmzone.h                        | 151 ++++--
 include/linux/page-flags-layout.h             |   2 +
 include/linux/page-flags.h                    |  31 +-
 kernel/bounds.c                               |   5 +
 mm/Kconfig                                    |   9 +-
 mm/bootmem_info.c                             |   5 +-
 mm/cma.c                                      |  18 +-
 mm/hugetlb.c                                  | 337 ++++--------
 mm/hugetlb_cma.c                              |  41 +-
 mm/hugetlb_cma.h                              |   4 +-
 mm/hugetlb_vmemmap.c                          | 266 +--------
 mm/hugetlb_vmemmap.h                          |  64 +--
 mm/internal.h                                 |  72 ++-
 mm/memory-failure.c                           |   6 +-
 mm/memory_hotplug.c                           |  22 +-
 mm/memremap.c                                 |   4 +-
 mm/mm_init.c                                  | 241 ++++-----
 mm/sparse-vmemmap.c                           | 511 ++++++------------
 mm/sparse.c                                   | 129 +----
 mm/util.c                                     |   2 +-
 scripts/gdb/linux/mm.py                       |   6 +-
 46 files changed, 743 insertions(+), 1812 deletions(-)
 delete mode 100644 Documentation/arch/powerpc/vmemmap_dedup.rst


base-commit: e98d21c170b01ddef366f023bbfcf6b31509fa83
-- 
2.54.0


Reply via email to