On Thu, Nov 27, 2008 at 11:23:21PM +0100, Andre Przywara wrote: > Hi, > > this patch series introduces multiple NUMA nodes support within KVM guests. > This will improve the performance of guests which are bigger than one > node (number of VCPUs and/or amount of memory) and also allows better > balancing by taking better usage of each node's memory. > It also improves the one node case by pinning a guest to this node and > avoiding access of remote memory from one VCPU. > > The user (or better: management application) specifies the host nodes > the guest should use: -nodes 2,3 would create a two node guest mapped to > node 2 and 3 on the host. These numbers are handed over to libnuma: > VCPUs are pinned to the nodes and the allocated guest memory is bound to > it's respective node.
I'm wondering whether this is the right level of granularity/expresiveness It is basically encoding 3 pieces of information - Number of NUMA nodes to expose to guest - Which host nodes to use - Which host nodes to pin vCPUS to. The latter item can actually already be done by management applications without a command line flag, with a greater level of flexbility that this allows. In libvirt we start up KVM with -S, so its initially stopped, then run 'info cpus' in the monitor. This gives us the list of thread IDs for each vCPU. We then use sched_setaffinity to control the placement of each vCPU onto pCPUs. KVM could pick which host nodes to use for allocation based on which nodes it vCPUs are pinned to. Since NUMA support is going to be optional, we can't rely on using -nodes for CPU placement, and I'd rather not have to write different codepaths for initial placement for NUMA vs non-NUMA enabled KVM. People not using a mgmt tool may also choose to control host node placement using numactl to launch KVM . They would still need to be able to say how many nodes the guest is given. Finally this CLI arg does not allow you to say which vCPU is placed in which vNUMA node, or how much of the guests RAM is allocated to each guest node. Thus I think it might be desirable, to have the CLI argument focus on describing the guest NUMA configuration, rather than having it encode host & guest NUMA info in one go. Finally you'd also want a way to describe vCPU <-> vNUMA node placement for vCPUS which are not yet present - eg so you can start with 4 vCPUs and hotplug add another 12 later. You can't assume you want all 4 inital CPUs in the same node, nor assume that you want all 4 spread evenly. So some examples off the top of my head for alternate syntax for the guest topology * Create 4 nodes, split RAM & 8 initial vCPUs equally across nodes, and 8 unplugged vCPUs equally too -m 1024 -smp 8 -nodes 4 * Create 4 nodes, split RAM equally across nodes, 8 initial vCPUs on first 2 nodes, and 8 unplugged vCPUs across other 2 nodes. -m 1024 -smp 8 -nodes 4,cpu:0-3;4-7;8-11;12-15 * Create 4 nodes, putting all RAM in first 2 nodes, split 8 initial vCPUs equally across nodes -m 1024 -smp 8 -nodes 4,mem:512;512 * Create 4 nodes, putting all RAM in first 2 nodes, 8 initial vCPUs on first 2 nodes, and 8 unplugged vCPUs across other 2 nodes. -m 1024 -smp 8 -nodes 4,mem:512;512,cpu:0-3;4-7;8-11;12-15 We could optionally also include host node pining for convenience * Create 4 nodes, putting all RAM in first 2 nodes, split 8 initial vCPUs equally across nodes, pin to host nodes 5-8 -m 1024 -smp 8 -nodes 4,mem:512;512,pin:5;6;7;8 If no 'pin' is given, it query its current host pCPU pinning to determine what NUMA nodes it had been launched on. > Since libnuma seems not to be installed > everywhere, the user has to enable this via configure --enable-numa It'd be nicer if the configure script just 'did the right thing'. So if neither --enable-numa, or --disable-numa are given, it should probe for availability and automatically enable it if found, disable if missing. If --enable-numa is given, it should probe and abort if not found. If --disable-numa is given it'd not enable anyhing. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html