This series adds userfaultfd support for tracking the working set of
VM guest memory, enabling VMMs to identify cold pages and evict them
to tiered or remote storage.

== Problem ==

VMMs managing guest memory need to:
1. Track which pages are actively used (working set detection)
2. Safely evict cold pages to slower storage
3. Fetch pages back on demand when accessed again

For shmem-backed guest memory, working set tracking partially works
today: MADV_DONTNEED zaps PTEs while pages stay in page cache, and
re-access auto-resolves from cache. But safe eviction still requires
synchronous fault interception to prevent data loss races.

For anonymous guest memory (needed for KSM cross-VM deduplication),
there is no mechanism at all — clearing a PTE loses the page.

== Solution ==

The series introduces a unified userfaultfd interface that works
across both anonymous and shmem-backed memory:

UFFD_FEATURE_MINOR_ANON: extends MODE_MINOR registration to anonymous
private memory. Uses the PROT_NONE hinting mechanism (same as NUMA
balancing) to make pages inaccessible without freeing them.

UFFD_FEATURE_MINOR_ASYNC: auto-resolves minor faults without handler
involvement. The kernel restores PTE permissions immediately and the
faulting thread continues. Works for anonymous, shmem, and hugetlbfs.

UFFDIO_DEACTIVATE: marks pages as deactivated. For anonymous memory,
sets PROT_NONE on PTEs (pages stay resident). For shmem/hugetlbfs,
zaps PTEs (pages stay in page cache).

UFFDIO_SET_MODE: toggles MINOR_ASYNC at runtime, synchronized via
mmap_write_lock. Enables the VMM workflow: async mode for lightweight
detection, sync mode for race-free eviction.

PAGE_IS_UFFD_DEACTIVATED: PAGEMAP_SCAN category flag for efficient
batch detection of cold (still-deactivated) anonymous pages.

== VMM Workflow ==

    UFFDIO_DEACTIVATE(all)            -- async, no vCPU stalls
    sleep(interval)
    PAGEMAP_SCAN                      -- find cold pages
    UFFDIO_SET_MODE(sync)             -- block faults for eviction
    pwrite + MADV_DONTNEED cold pages -- safe, faults block
    UFFDIO_SET_MODE(async)            -- resume tracking

The same workflow applies to shmem, with a different PAGEMAP_SCAN mask
(!PAGE_IS_PRESENT instead of PAGE_IS_UFFD_DEACTIVATED).

== NUMA Balancing ==

NUMA balancing scanning is skipped on anonymous VM_UFFD_MINOR VMAs to
avoid protnone conflicts. NUMA locality stats are fed from the uffd
fault path via task_numa_fault() so the scheduler retains placement
data. Shmem VMAs are unaffected (UFFDIO_DEACTIVATE zaps PTEs there,
no protnone involved).

== Testing ==

The series includes 6 new selftests covering async/sync modes,
PAGEMAP_SCAN cold detection, GUP through protnone, UFFDIO_SET_MODE
toggling, and cleanup on close. All 73 uffd unit tests pass
(including hugetlb) across defconfig, allnoconfig, allmodconfig,
and randomized configs.

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 ++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 +-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 +++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

Kiryl Shutsemau (Meta) (12):
  userfaultfd: define UAPI constants for anonymous minor faults
  userfaultfd: add UFFD_FEATURE_MINOR_ANON registration support
  userfaultfd: implement UFFDIO_DEACTIVATE ioctl
  userfaultfd: UFFDIO_CONTINUE for anonymous memory
  mm: intercept protnone faults on VM_UFFD_MINOR anonymous VMAs
  userfaultfd: auto-resolve shmem and hugetlbfs minor faults in async
    mode
  sched/numa: skip scanning anonymous VM_UFFD_MINOR VMAs
  userfaultfd: enable UFFD_FEATURE_MINOR_ANON
  mm/pagemap: add PAGE_IS_UFFD_DEACTIVATED to PAGEMAP_SCAN
  userfaultfd: add UFFDIO_SET_MODE for runtime sync/async toggle
  selftests/mm: add userfaultfd anonymous minor fault tests
  Documentation/userfaultfd: document working set tracking

 Documentation/admin-guide/mm/userfaultfd.rst | 141 +++++-
 fs/proc/task_mmu.c                           |  11 +-
 fs/userfaultfd.c                             | 184 +++++++-
 include/linux/huge_mm.h                      |   6 +
 include/linux/mm.h                           |   2 +
 include/linux/sched/numa_balancing.h         |   1 +
 include/linux/userfaultfd_k.h                |  21 +-
 include/trace/events/sched.h                 |   3 +-
 include/uapi/linux/fs.h                      |   1 +
 include/uapi/linux/userfaultfd.h             |  40 +-
 kernel/sched/fair.c                          |  13 +
 mm/huge_memory.c                             |  33 +-
 mm/hugetlb.c                                 |   3 +-
 mm/memory.c                                  |  51 ++-
 mm/mprotect.c                                |   9 +-
 mm/shmem.c                                   |   3 +-
 mm/userfaultfd.c                             | 164 ++++++-
 tools/testing/selftests/mm/uffd-unit-tests.c | 458 +++++++++++++++++++
 18 files changed, 1096 insertions(+), 48 deletions(-)

-- 
2.51.2


Reply via email to