This is an attempt to use NVDIMM/PMEM as volatile NUMA memory that's
transparent to normal applications and virtual machines.

The code is still in active development. It's provided for early design review.

Key functionalities:

1) create and describe PMEM NUMA node for NVDIMM memory
2) dumb /proc/PID/idle_pages interface, for user space driven hot page 
accounting
3) passive kernel cold page migration in page reclaim path
4) improved move_pages() for active user space hot/cold page migration

(1) is foundation for transparent usage of NVDIMM for normal apps and virtual
machines. (2-4) enable auto placing hot pages in DRAM for better performance.
A user space migration daemon is being built based on this kernel patchset to
make the full vertical solution.

Base kernel is v4.20 . The patches are not suitable for upstreaming in near
future -- some are quick hacks, some others need more works. However they are
complete enough to demo the necessary kernel changes for the proposed app&VM
transparent NVDIMM volatile use model.

The interfaces are far from finalized. They kind of illustrate what would be
necessary for creating a user space driven solution. The exact forms will ask
for more thoughts and inputs. We may adopt HMAT based solution for NUMA node
related interface when they are ready. The /proc/PID/idle_pages interface is
standalone but non-trivial. Before upstreaming some day, it's expected to take
long time to collect various real use cases and feedbacks, so as to refine and
stabilize the format.

Create PMEM numa node

        [PATCH 01/21] e820: cheat PMEM as DRAM

Mark numa node as DRAM/PMEM

        [PATCH 02/21] acpi/numa: memorize NUMA node type from SRAT table
        [PATCH 03/21] x86/numa_emulation: fix fake NUMA in uniform case
        [PATCH 04/21] x86/numa_emulation: pass numa node type to fake nodes
        [PATCH 05/21] mmzone: new pgdat flags for DRAM and PMEM
        [PATCH 06/21] x86,numa: update numa node type
        [PATCH 07/21] mm: export node type {pmem|dram} under /sys/bus/node

Point neighbor DRAM/PMEM to each other

        [PATCH 08/21] mm: introduce and export pgdat peer_node
        [PATCH 09/21] mm: avoid duplicate peer target node

Standalone zonelist for DRAM and PMEM nodes

        [PATCH 10/21] mm: build separate zonelist for PMEM and DRAM node

Keep page table pages in DRAM

        [PATCH 11/21] kvm: allocate page table pages from DRAM
        [PATCH 12/21] x86/pgtable: allocate page table pages from DRAM

/proc/PID/idle_pages interface for virtual machine and normal tasks

        [PATCH 13/21] x86/pgtable: dont check PMD accessed bit
        [PATCH 14/21] kvm: register in mm_struct
        [PATCH 15/21] ept-idle: EPT walk for virtual machine
        [PATCH 16/21] mm-idle: mm_walk for normal task
        [PATCH 17/21] proc: introduce /proc/PID/idle_pages
        [PATCH 18/21] kvm-ept-idle: enable module

Mark hot pages

        [PATCH 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag

Kernel DRAM=>PMEM migration

        [PATCH 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node
        [PATCH 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM

 arch/x86/include/asm/numa.h    |    2 
 arch/x86/include/asm/pgalloc.h |   10 
 arch/x86/include/asm/pgtable.h |    3 
 arch/x86/kernel/e820.c         |    3 
 arch/x86/kvm/Kconfig           |   11 
 arch/x86/kvm/Makefile          |    4 
 arch/x86/kvm/ept_idle.c        |  841 +++++++++++++++++++++++++++++++
 arch/x86/kvm/ept_idle.h        |  116 ++++
 arch/x86/kvm/mmu.c             |   12 
 arch/x86/mm/numa.c             |    3 
 arch/x86/mm/numa_emulation.c   |   30 +
 arch/x86/mm/pgtable.c          |   22 
 drivers/acpi/numa.c            |    5 
 drivers/base/node.c            |   21 
 fs/proc/base.c                 |    2 
 fs/proc/internal.h             |    1 
 fs/proc/task_mmu.c             |   54 +
 include/linux/mm_types.h       |   11 
 include/linux/mmzone.h         |   38 +
 mm/mempolicy.c                 |   14 
 mm/migrate.c                   |   13 
 mm/page_alloc.c                |   77 ++
 mm/pagewalk.c                  |    1 
 mm/vmscan.c                    |   38 +
 virt/kvm/kvm_main.c            |    3 
 25 files changed, 1306 insertions(+), 29 deletions(-)

V1 patches: https://lkml.org/lkml/2018/9/2/13

Regards,
Fengguang

Reply via email to