Before this patch set, numatune only has three memory modes: static, interleave and prefered. These memory policies are ultimately set by mbind() system call.
Memory policy could be 'hard coded' into the kernel, but none of above policies fit our requirment under this case. mbind() support default memory policy, but it requires a NULL nodemask. So obviously setting allowed memory nodes is cgroups' mission under this case. So we introduce a new option for mode in numatune named 'restrictive'. <numatune> <memory mode="restrictive" nodeset="1-4,^3"/> <memnode cellid="0" mode="restrictive" nodeset="1"/> <memnode cellid="2" mode="restrictive" nodeset="2"/> </numatune> The config above means we only use cgroups to restrict the allowed memory nodes and not setting any specific memory policies explicitly. For this new "restrictive" mode, there is a concrete use case about a new feature in kernel but not merged yet, we call it memory tiering. (https://lwn.net/Articles/802544/). If memory tiering is enabled on host, DRAM is top tier memory, and PMEM(persistent memory) is second tier memory, PMEM is shown as numa node without cpu. Pages can be migrated between DRAM node and PMEM node based on DRAM pressure and how cold/hot they are. *this memory policy* is implemented in kernel. So we need a default mode here, but from libvirt's perspective, the "defaut" mode is "strict", it's not MPOL_DEFAULT (https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel. And to make memory tiering works well, cgroups setting is necessary, since it restricts that the pages can only be migrated between the DRAM and PMEM nodes that we specified (NUMA affinity support). Just using cgroups with multiple nodes in the nodeset makes kernel decide on which node (out of those in the restricted set) to allocate on, but specifying "strict" basically allocates it sequentially (on the first one until it is full, then on the next one and so on). In a word, if a user requires default mode(MPOL_DEFAULT), that means they want kernel decide the memory allocation and also want the cgroups to restrict memory nodes, "restrictive" mode will be useful. BR, Luyao Luyao Zhong (3): docs: add docs for 'restrictive' option for mode in numatune schema: add 'restrictive' config option for mode in numatune qemu: add parser and formatter for 'restrictive' mode in numatune docs/formatdomain.rst | 7 +++- docs/schemas/domaincommon.rng | 2 + include/libvirt/libvirt-domain.h | 1 + src/conf/numa_conf.c | 9 ++++ src/qemu/qemu_command.c | 6 ++- src/qemu/qemu_process.c | 27 ++++++++++++ src/util/virnuma.c | 3 ++ .../numatune-memnode-invalid-mode.err | 1 + .../numatune-memnode-invalid-mode.xml | 33 +++++++++++++++ ...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++ .../numatune-memnode-restrictive-mode.xml | 33 +++++++++++++++ tests/qemuxml2argvtest.c | 2 + ...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 14 files changed, 201 insertions(+), 3 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml create mode 100644 tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml -- 2.25.4