On Thu, Nov 27, 2008 at 11:23:21PM +0100, Andre Przywara wrote:
> Hi,
> 
> this patch series introduces multiple NUMA nodes support within KVM guests.
> This will improve the performance of guests which are bigger than one 
> node (number of VCPUs and/or amount of memory) and also allows better 
> balancing by taking better usage of each node's memory.
> It also improves the one node case by pinning a guest to this node and
> avoiding access of remote memory from one VCPU.
> 
> The user (or better: management application) specifies the host nodes
> the guest should use: -nodes 2,3 would create a two node guest mapped to
> node 2 and 3 on the host. These numbers are handed over to libnuma:
> VCPUs are pinned to the nodes and the allocated guest memory is bound to
> it's respective node.

I'm wondering whether this is the right level of granularity/expresiveness
It is basically encoding 3 pieces of information

 - Number of NUMA nodes to expose to guest
 - Which host nodes to use
 - Which host nodes to pin vCPUS to.

The latter item can actually already be done by management applications
without a command line flag, with a greater level of flexbility that
this allows. In libvirt we start up KVM with -S, so its initially
stopped, then run 'info cpus' in the monitor. This gives us the list of
thread IDs for each vCPU. We then use sched_setaffinity to control the
placement of each vCPU onto pCPUs. KVM could pick which host nodes to
use for allocation based on which nodes it vCPUs are pinned to.

Since NUMA support is going to be optional, we can't rely on using
-nodes for CPU placement, and I'd rather not have to write different
codepaths for initial placement for NUMA vs non-NUMA enabled KVM.
People not using a mgmt tool may also choose to control host node 
placement using numactl to launch KVM . They would still need to be 
able to say how many nodes the guest is given.

Finally this CLI arg does not allow you to say which vCPU is placed 
in which vNUMA node, or how much of the guests RAM is allocated to 
each guest node.

Thus I think it might be desirable, to have the CLI argument focus
on describing the guest NUMA configuration, rather than having it
encode host & guest NUMA info in one go. Finally you'd also want
a way to describe vCPU <-> vNUMA node placement for vCPUS which
are not yet present - eg so you can start with 4 vCPUs and hotplug
add another 12 later. You can't assume you want all 4 inital CPUs
in the same node, nor assume that you want all 4 spread evenly.

So some examples off the top of my head for alternate syntax for the
guest topology

 * Create 4 nodes, split RAM & 8 initial vCPUs equally across
   nodes, and 8 unplugged vCPUs equally too

    -m 1024 -smp 8 -nodes 4

 * Create 4 nodes, split RAM equally across nodes, 8 initial vCPUs
   on first 2 nodes, and 8 unplugged vCPUs across other 2 nodes.

    -m 1024 -smp 8 -nodes 4,cpu:0-3;4-7;8-11;12-15

 * Create 4 nodes, putting all RAM in first 2 nodes, split 8
   initial vCPUs equally across nodes

    -m 1024 -smp 8 -nodes 4,mem:512;512

 * Create 4 nodes, putting all RAM in first 2 nodes, 8 initial vCPUs
   on first 2 nodes, and 8 unplugged vCPUs across other 2 nodes.

    -m 1024 -smp 8 -nodes 4,mem:512;512,cpu:0-3;4-7;8-11;12-15

We could optionally also include host node pining for convenience

 * Create 4 nodes, putting all RAM in first 2 nodes, split 8
   initial vCPUs equally across nodes, pin to host nodes 5-8

    -m 1024 -smp 8 -nodes 4,mem:512;512,pin:5;6;7;8

If no 'pin' is given, it query its current host pCPU pinning to determine
what NUMA nodes it had been launched on.


>                         Since libnuma seems not to be installed
> everywhere, the user has to enable this via configure --enable-numa

It'd be nicer if the configure script just 'did the right thing'. So if
neither --enable-numa, or --disable-numa are given, it should probe for
availability and automatically enable it if found, disable if missing.
If --enable-numa is given, it should probe and abort if not found. If
--disable-numa is given it'd not enable anyhing.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to