Re: [libvirt] [Gluster-devel] [PATCH v3 UPDATED 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.
On Fri, Oct 26, 2012 at 10:27 PM, Harsh Prateek Bora ha...@linux.vnet.ibm.com wrote: @@ -1624,7 +1626,15 @@ td one of the sheepdog servers (default is localhost:7000) /td td zero or one /td /tr + tr +td gluster /td +td a server running glusterd daemon /td Its actually the server which contains the volume file specification of the gluster volume Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH v1 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.
On Wed, Sep 5, 2012 at 8:45 PM, Eric Blake ebl...@redhat.com wrote: On 09/05/2012 09:08 AM, Bharata B Rao wrote: On Wed, Sep 5, 2012 at 7:03 PM, Jiri Denemark jdene...@redhat.com wrote: @@ -1042,6 +1043,13 @@ attribute name=port ref name=unsignedInt/ /attribute +attribute name=transport + choice +valuesocket/value +valueunix/value +valuerdma/value This could be a bit confusing as socket is too generic, after all unix is also a socket. Could we change the values tcp, unix, rdma or something similar depending on what socket was supposed to mean? That is how gluster calls it and hence I am using the same in QEMU and the same is true here too. This is something for gluster developers to decide if they want to change socket to something more specific like tcp as you suggest. Just because gluster calls it a confusing name does not mean we have to repeat the confusion in libvirt - it is feasible to have a mapping where we name it 'tcp' in the XML but map that to 'socket' in the command line that eventually reaches gluster. The question then becomes whether using sensible naming in libvirt, but no longer directly mapped to underlying gluster naming, will be the cause of its own set of headaches. Vijay - would really like to have your inputs here... - While the transport-type for a volume is shown as tcp in gluster volume info, libgfapi forces me to use transport=socket to access the same volume from QEMU. So does socket mean tcp really ? If so, should I just switch over to using transport=tcp from QEMU ? If not, can you explain a bit about the difference b/n socket and tcp transport types ? - Also apart from socket (or tcp ?), rdma and unix, are there any other transport options that QEMU should care about ? - Are rdma and unix transport types operational at the moment ? If not, do you see them being used in gluster any time in the future ? The reason behind asking this is to check if we are spending effort in defining semantics in QEMU for a transport type that is never going to be used in gluster. Also I see that gluster volume create supports tcp and rdma but doesn't list unix as an option. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH v1 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.
On Wed, Sep 5, 2012 at 7:03 PM, Jiri Denemark jdene...@redhat.com wrote: @@ -1042,6 +1043,13 @@ attribute name=port ref name=unsignedInt/ /attribute +attribute name=transport + choice +valuesocket/value +valueunix/value +valuerdma/value This could be a bit confusing as socket is too generic, after all unix is also a socket. Could we change the values tcp, unix, rdma or something similar depending on what socket was supposed to mean? That is how gluster calls it and hence I am using the same in QEMU and the same is true here too. This is something for gluster developers to decide if they want to change socket to something more specific like tcp as you suggest. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH 2/6] Add New address type spapr-vio to domain.rng
Original patch by Bharata. Updated to use {1,16} in spaprvioReg based on example from Eric Blake. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- docs/schemas/domaincommon.rng | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 27ec1da..553a6f0 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2199,6 +2199,13 @@ ref name=usbPort/ /attribute /define + define name=spaprvioaddress +optional + attribute name=reg +ref name=spaprvioReg/ + /attribute +/optional + /define define name=pciaddress optional attribute name=domain @@ -2577,6 +2584,12 @@ /attribute ref name=usbportaddress/ /group +group + attribute name=type +valuespapr-vio/value + /attribute + ref name=spaprvioaddress/ +/group /choice /element /define @@ -2866,4 +2879,9 @@ param name=pattern[a-zA-Z0-9_\.:]+/param /data /define + define name=spaprvioReg +data type=string + param name=pattern(0x)?[0-9a-fA-F]{1,16}/param +/data + /define /grammar -- 1.7.7.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2 1/2] XML definitions for guest NUMA and parsing routines
On Wed, Dec 07, 2011 at 03:49:59PM +1100, Michael Ellerman wrote: On Fri, 2011-11-11 at 18:21 +0530, Bharata B Rao wrote: XML definitions for guest NUMA and parsing routines. From: Bharata B Rao bhar...@linux.vnet.ibm.com This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this: cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' memory='512000'/ cell cpus='8-15' memory='512000'/ /numa ... /cpu Hi Bharata, I realise I'm a bit late on this, but I'm just catching up on the list traffic. Firstly why is the XML tag called cell, it seems to represent what we would normally call a node in terms of NUMA? I initially started out with node, but in libvirt node is referred to as cell. Hence stuck to cell to be consistent. Also does the parser support disjoint ranges for cpus and memory? Or do we not need that complexity for some reason? For example what if I have a topology that looks like: Node 0: CPUs: 0-3,8-11 This is supported, and since we use the same parsing routine that parses cpuset, we support fancy stuff like cpus-0-4^3,8-11 But such things aren't support by QEMU currently. MEM : 0-1G,2G-3G QEMU currently doesn't support such construct. Do we really want this in guest topology ? There are some wierd NUMA topologies in the real world which aren't supported as of now, we wanted to start simple and if needed support futher extensions (Refer to discussions on my v0 post) Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC v4 PATCH 5/5] PowerPC : Add ppc64 specific definitions to domain.rng
On Thu, Dec 01, 2011 at 01:47:47PM -0500, Stefan Berger wrote: On 12/01/2011 01:08 PM, Prerna Saxena wrote: From: Bharata B Raobhar...@linux.vnet.ibm.com Date: Thu, 1 Dec 2011 21:21:34 +0530 Subject: [PATCH 5/5] Add ppc64 specific definitions to domain.rng ppc64 as new arch type and pseries as new machine type are added underos .../os. New address type spapr-vio is added. Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com --- ACK. Test case(s) for the test/qemuxml2argvtest{.c} would be nice, too. I am working on the testcase, will send it shortly. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCHv3] conf: don't modify cpu set string during parsing
On Fri, Nov 18, 2011 at 11:31:12AM -0700, Eric Blake wrote: None of the callers cared if str was updated to point to the next byte after the parsed cpuset; simplifying this results in quite a few code simplifications. Additionally, virCPUDefParseXML was strdup()'ing a malloc()'d string; avoiding a memory copy resulted in less code. Changes to virCPUDefParseXML look good. numa ... /numa XML section is parsed correctly and qemu -numa options are generated correctly after this change. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/3] XML definitions for guest NUMA
On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote: On 11/06/2011 06:57 AM, Bharata B Rao wrote: XML definitions for guest NUMA and parsing routines. From: Bharata B Raobhar...@linux.vnet.ibm.com This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this: cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' mems='512000'/ cell cpus='8-15' mems='512000'/ /numa ... /cpu Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com --- +p + Guest NUMA topology can be specifed usingcodenuma/code element. +span class=sinceSince X.X.X/span Let's just put 0.9.8 here. It's easier at feature freeze time to grep and replace a concrete 0.9.8 into a different numbering scheme (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8, than it is to remember to also search for X.X.X. Ok, changed to 0.9.8 in v3. +/p + +pre + ... +lt;cpugt; +... +lt;numagt; +lt;cell cpus='0-3' mems='512000'/gt; +lt;cell cpus='4-7' mems='512000'/gt; I understand 'cpus' (a valid word meaning multiple cpu units), but 'mems' seems odd; I think it would be better naming this attribute 'memory' to match our memory element at the top level. Just because qemu's command line names the option mems= doesn't mean we should be stuck making our XML inconsistent. Changed to memory. +lt;/numagt; +... +lt;/cpugt; + .../pre + +p + Eachcodecell/code element specifies a NUMA cell or a NUMA node. +codecpus/code specifies the CPU or range of CPUs that are part of + the node.codemems/code specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0. I agree with doing things in 1024-byte blocks [1], since memory and currentMemory are also in that unit. [1] 1024-byte blocks is technically kibibytes, not kilobytes; but you're copying from existing text, so at least we're consistent, not to mention fitting right in with the wikipedia complaint that KiB has had slow adoption by the computer industry: https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :) +/p + +p + This guest NUMA specification translates tocode-numa/code command + line option for QEMU/KVM. For the above example, the following QEMU + command line option is generated: +code-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000/code This paragraph is not necessary. We don't need to give one hypervisor-specific example of how the XML is translated; it is sufficient to simply document the XML semantics in a way that can be implemented by any number of hypervisors. Ok, removed this paragraph. + +define name=numaCell +element name=cell +attribute name=cpus +ref name=Cellcpus/ Typically, ref names start with a lower case letter. Ok, changed. +/attribute +attribute name=mems +ref name=Cellmems/ +/attribute Is it possible for these attributes to be optional? That is, on the qemu line, can I specify cpus= but not mems=, or mems= but not cpus=? If so, then the attributes need to be optional in the schema, and the code behave with sane defaults when one of the two attributes is not present. Actually qemu allows both cpus and mem to be optional. But if we allow that, we will have specifications like this: numa cell / cell / /numa That looks ugly. Either both of them should allowed to be optional or none of them, I am going for the latter specification. IOW lets make both of them mandatory in a numa cell. @@ -2745,4 +2767,14 @@ param name=pattern[a-zA-Z0-9_\.:]+/param /data /define +define name=Cellcpus +data type=string +param name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param This looks like a repeat of define name=cpuset; if so, let's reuse that define instead of making a new one (and if not, why are we introducing yet another syntax?). Fixed. +/data +/define +define name=Cellmems +data type=unsignedInt +param name=pattern[0-9]+/param Likewise, this looks like a repeat of define name=memoryKB, so lets reuse that. Done. @@ -109,6 +114,19 @@ no_memory: return NULL; } +static int +virCPUDefNumaCPUs(virCPUDefPtr def) +{ +int i, j, ncpus = 0; + +for (i = 0; i def-ncells; i++) { +for (j = 0; j VIR_DOMAIN_CPUMASK_LEN; j++) { +if (def-cells[i].cpumask[j]) +ncpus++; +} +} Can this loop be made any faster by using count_one_bits? Yes provided we start storing CPUs in bitmasks rather than char arrays. But I saw other parts of the code (like cpuset) storing CPUs in array and hence used the same for numa too. In any case I got rid of this since we can
Re: [libvirt] [PATCH 3/3] qemu: Generate -numa command line option
On Mon, Nov 07, 2011 at 11:47:10AM -0700, Eric Blake wrote: On 11/06/2011 06:59 AM, Bharata B Rao wrote: qemu: Generate -numa option From: Bharata B Raobhar...@linux.vnet.ibm.com Add routines to generate -numa QEMU command line option based on numa .../numa XML specifications. Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com +static int +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ +int i, first, last; +int cpuSet = 0; + What happens if cpumask is all 0? +for (i = 0; i VIR_DOMAIN_CPUMASK_LEN; i++) { +if (cpumask[i]) { This if branch is skipped, +if (cpuSet) +last = i; +else { +first = last = i; +cpuSet = 1; +} +} else { +if (!cpuSet) +continue; so this branch always continues, +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +cpuSet = 0; +} +} + +if (cpuSet) { and this if is skipped, +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +} + +/* Remove the trailing comma */ +return virBufferTruncate(buf, 1); meaning that nothing was appended to buf, and you are now stripping unknown text, rather than a comma you just added. Do we need a sanity check to ensure that the cpumask specifies at least one cpu? And if so, would that mask be better here, or up front at the xml parsing time? Good catch. I am now ensuring that this doesn't get a mask with no cpus. +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ +int i; +char *node; +virBuffer buf = VIR_BUFFER_INITIALIZER; + +for (i = 0; i def-cpu-ncells; i++) { +virCommandAddArg(cmd, -numa); +virBufferAsprintf(buf, %s, node); More efficient as virBufferAddLit(buf, node), or... Right. +virBufferAsprintf(buf, ,nodeid=%d, def-cpu-cells[i].cellid); merge these two into a single: virBufferAsprintf(buf, node,nodeid=%d, ...); +virBufferAsprintf(buf, ,cpus=); Again, with no % in the format string, it is more efficient to use virBufferAddLit. + +if (qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask,buf)) Generally, we prefer an explicit 0 comparison when checking for failure. Ok as you prefer :) +goto error; + +virBufferAsprintf(buf, ,mems=%d, def-cpu-cells[i].mem); + Why do we need to bother with stripping a trailing comma in qemuBuildNumaCPUArgStr, if we will just be adding a comma back again here as the very next statement? You could skip all the hassle of adding virBufferTruncate by just transferring the comma out of this statement and into qemuBuildNumaCPUArgStr This is what I mentioned in my last iteration. But since you hinted at extending virBuffer, I went on that path. Another motivation was to keep qemuBuildNumaArgStr() generic enough so that it gives out comma/dash separate CPU string without the ending comma in case this functionality is needed elsewhere in libvirt. But I guess I will stick with not appending comma to memory in which case this extension to virBuffer isn't needed. (that said, I still think virBufferTruncate will be a useful addition in other contexts). I think we should add virBufferTruncate only when we get any user for that. Hence I am not including virBufferTruncate in v3 patchset. +if (virBufferError(buf)) +goto error; + +node = virBufferContentAndReset(buf); +virCommandAddArg(cmd, node); +VIR_FREE(node); It's more efficient to replace these five lines with one: virCommandAddArgBuffer(cmd, buf); I can't because this is in a loop and I want to break out from the loop in case of buffer error. virCommandAddArgBuffer doesn't allow that. But I did replace the last 3 lines with virCommandAddArgBuffer :) @@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); +if (def-cpu def-cpu-ncells qemuBuildNumaArgStr(def, cmd)) Again, explicit 0 check when looking for errors. Sure. +topology sockets=2 cores=4 threads=2/ +numa +cell cpus=0-7 mems=109550/ +cell cpus=8-15 mems=109550/ Of course, this will need tweaking to match any XML changes made in 1/3, but thanks for adding test cases! Tweaked the testcases. BTW the testcases were useful to me as they uncovered 2 bugs in my code! Overall, I think we'll need a v3 (you may want to use git send-email --subject-prefix=PATCHv3; it wasn't very clear from the subject line that this was already a v2 series), but I like where it's heading. Will ensure [PATCH v3] for the next iteration. Thanks. v3 on its
[libvirt] [PATCH v2 2/2] qemu: Generate -numa option
qemu: Generate -numa option From: Bharata B Rao bhar...@linux.vnet.ibm.com Add routines to generate -numa QEMU command line option based on numa ... /numa XML specifications. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- src/conf/cpu_conf.c|3 + src/qemu/qemu_command.c| 63 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args |5 ++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 25 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args |6 ++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 25 tests/qemuxml2argvtest.c |2 + 7 files changed, 128 insertions(+), 1 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 4aabe98..7c5bf69 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -352,11 +352,12 @@ virCPUDefParseXML(const xmlNodePtr node, ret = virStrToLong_ui(memory, NULL, 10, def-cells[i].mem); if (ret == -1) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(Invalid 'memory' attribute in NUMA cell)); VIR_FREE(cpus); VIR_FREE(memory); goto error; } - VIR_FREE(cpus); VIR_FREE(memory); } diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 01ee23f..7083928 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3266,6 +3266,65 @@ qemuBuildSmpArgStr(const virDomainDefPtr def, return virBufferContentAndReset(buf); } +static void +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ +int i, first, last; +int cpuSet = 0; + +for (i = 0; i VIR_DOMAIN_CPUMASK_LEN; i++) { +if (cpumask[i]) { +if (cpuSet) +last = i; +else { +first = last = i; +cpuSet = 1; +} +} else { +if (!cpuSet) +continue; +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +cpuSet = 0; + } +} + +if (cpuSet) { +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +} +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ +int i; +virBuffer buf = VIR_BUFFER_INITIALIZER; + +for (i = 0; i def-cpu-ncells; i++) { +virCommandAddArg(cmd, -numa); +virBufferAsprintf(buf, node,nodeid=%d, def-cpu-cells[i].cellid); +virBufferAddLit(buf, ,cpus=); +qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask, buf); +virBufferAsprintf(buf, mem=%d, +VIR_DIV_UP(def-cpu-cells[i].mem, 1024)); + +if (virBufferError(buf)) +goto error; + +virCommandAddArgBuffer(cmd, buf); +} +return 0; + +error: +virBufferFreeAndReset(buf); +virReportOOMError(); +return -1; +} /* * Constructs a argv suitable for launching qemu with config defined @@ -3429,6 +3488,10 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); +if (def-cpu def-cpu-ncells) +if (qemuBuildNumaArgStr(def, cmd) 0) +goto error; + if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) { virCommandAddArg(cmd, -name); if (driver-setProcessName diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args new file mode 100644 index 000..7c0dd30 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args @@ -0,0 +1,5 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \ +-m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mem=107 \ +-numa node,nodeid=1,cpus=8-15,mem=107 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \ +-parallel none -usb diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml new file mode 100644 index 000..9fcdda8 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml @@ -0,0 +1,25 @@ +domain type='qemu' + nameQEMUGuest1/name + uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid + memory219100/memory + currentMemory219100/currentMemory + vcpu16/vcpu + os +type arch='x86_64' machine='pc'hvm/type +boot dev='network'/ + /os + cpu +topology sockets=2 cores=4 threads=2/ +numa + cell cpus=0-7
[libvirt] [PATCH v2 0/2] Guest NUMA topology support
Hi, This is v2 of the patchset that adds support for specifying NUMA topology for guests. cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' mems='512000'/ cell cpus='8-15' mems='512000'/ /numa ... /cpu This change allows libvirt to generate -numa options for QEMU/KVM. This patchset passes all tests except daemon-conf test. Changes for v2 -- - Renamed mems to memory in NUMA cell specification. - Make both cpus= and memory= mandatory in a NUMA cell. - Reuse cpuset and memoryKB definitions for cpus and memory. - Fix a bug in reading memory=. - Correct error handling for usages of virXMLPropString. - Support virsh dumpxml. - Fix XML in domaincommon.rng so that numa works correctly for both cpu ... cpu as well as cpu match=... ... cpu - Don't use virBufferTruncate. - Modifiy qemuxml2argv test cases for s/mems/memory change. - Use virBufferLit wherever possible. - Now qemuBuildNumaCPUArgStr can't fail, hence doesn't need return val. - Pass memory in MB to qemu. v1 - https://www.redhat.com/archives/libvir-list/2011-November/msg00247.html v0 - https://www.redhat.com/archives/libvir-list/2011-October/msg00025.html RFC - http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626 Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH v2 1/2] XML definitions for guest NUMA and parsing routines
XML definitions for guest NUMA and parsing routines. From: Bharata B Rao bhar...@linux.vnet.ibm.com This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this: cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' memory='512000'/ cell cpus='8-15' memory='512000'/ /numa ... /cpu Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- docs/formatdomain.html.in | 29 ++ docs/schemas/domaincommon.rng | 31 ++- src/conf/cpu_conf.c | 86 - src/conf/cpu_conf.h | 13 ++ src/conf/domain_conf.c|7 +++ 5 files changed, 163 insertions(+), 3 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index cbad196..28a4edc 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -628,6 +628,35 @@ /dd /dl +p + Guest NUMA topology can be specifed using codenuma/code element. + span class=sinceSince 0.9.8/span +/p + +pre + ... + lt;cpugt; +... +lt;numagt; + lt;cell cpus='0-3' memory='512000'/gt; + lt;cell cpus='4-7' memory='512000'/gt; +lt;/numagt; +... + lt;/cpugt; + .../pre + +p + Each codecell/code element specifies a NUMA cell or a NUMA node. + codecpus/code specifies the CPU or range of CPUs that are part of + the node. codememory/code specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0. +/p + +p + This guest NUMA specification is currently available only for QEMU/KVM. +/p + h3a name=elementsLifecycleLifecycle control/a/h3 p diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index b6f858e..afcaccc 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2297,7 +2297,14 @@ define name=cpu element name=cpu choice -ref name=cpuTopology/ +group + optional +ref name=cpuTopology/ + /optional + optional +ref name=cpuNuma/ + /optional +/group group ref name=cpuMatch/ interleave @@ -2311,6 +2318,9 @@ zeroOrMore ref name=cpuFeature/ /zeroOrMore +optional + ref name=cpuNuma/ +/optional /interleave /group /choice @@ -2371,6 +2381,25 @@ /element /define + define name=cpuNuma +element name=numa + oneOrMore +ref name=numaCell/ + /oneOrMore +/element + /define + + define name=numaCell +element name=cell + attribute name=cpus +ref name=cpuset/ + /attribute + attribute name=memory +ref name=memoryKB/ + /attribute +/element + /define + !-- System information specification: Placeholder for system specific informations likes the ones diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 41e997e..4aabe98 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -28,6 +28,7 @@ #include util.h #include buf.h #include cpu_conf.h +#include domain_conf.h #define VIR_FROM_THIS VIR_FROM_CPU @@ -67,6 +68,12 @@ virCPUDefFree(virCPUDefPtr def) VIR_FREE(def-features[i].name); VIR_FREE(def-features); +for (i = 0 ; i def-ncells ; i++) { +VIR_FREE(def-cells[i].cpumask); +VIR_FREE(def-cells[i].cpustr); +} +VIR_FREE(def-cells); + VIR_FREE(def); } @@ -109,7 +116,6 @@ no_memory: return NULL; } - virCPUDefPtr virCPUDefParseXML(const xmlNodePtr node, xmlXPathContextPtr ctxt, @@ -289,9 +295,75 @@ virCPUDefParseXML(const xmlNodePtr node, def-features[i].policy = policy; } +if (virXPathNode(./numa[1], ctxt)) { +VIR_FREE(nodes); +n = virXPathNodeSet(./numa[1]/cell, ctxt, nodes); +if (n = 0) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(NUMA topology defined without NUMA cells)); +goto error; +} + +if (VIR_RESIZE_N(def-cells, def-ncells_max, + def-ncells, n) 0) +goto no_memory; + +def-ncells = n; + +for (i = 0 ; i n ; i++) { +char *cpus, *cpus_parse, *memory; +int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; +int ret, ncpus = 0; + +def-cells[i].cellid = i; +cpus = cpus_parse = virXMLPropString(nodes[i], cpus); +if (!cpus) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(Missing 'cpus' attribute in NUMA cell)); +goto error
Re: [libvirt] [PATCH 2/3] Routine to truncate virBuffer
On Mon, Nov 07, 2011 at 11:12:07AM -0700, Eric Blake wrote: On 11/06/2011 06:58 AM, Bharata B Rao wrote: Routine to truncate virBuffer From: Bharata B Raobhar...@linux.vnet.ibm.com Add a helper to truncate virBuffer. /** * virBufferTruncate: +++ b/src/util/buf.c @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len) } /** + * virBufferTruncate: + * @buf: the buffer + * @len: number of bytes by which the buffer is truncated + * + * Truncate the buffer by @len bytes. + * + * Returns zero on success or -1 on error + */ +int +virBufferTruncate(virBufferPtr buf, unsigned int len) What good is returning an error, given that callers already have to use virBufferError for detecting other errors, and given that you aren't marking the function as ATTRIBUTE_RETURN_CHECK to require callers to respect that error? One of the points of the virBuffer API is that most functions can return void (for example, see virBufferAdjustIndent), because the user doesn't care about error checking until the end. Ok, point taken. +{ +if (buf-error) +return -1; + +if ((signed)(buf-use - len) 0) { I tend to write the cast as (int), not (signed), if a cast is even needed. You don't catch all possible overflows, though - if buf-use is 2, and len is 4294967295 (aka (unsigned)-1), then you've proceeded to expand(!) buf-use to 3, skipping an uninitialized byte. A better filter would be: if (len buf-use) +virBufferSetError(buf, -1); +return -1; +} + +buf-use -= len; +buf-content[buf-use] = '\0'; This looks correct, once you filter out invalid len first. Agreed. +++ b/tests/virbuftest.c @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data ATTRIBUTE_UNUSED) virBufferAddChar(buf, '\n'); virBufferEscapeShell(buf, 11); virBufferAddChar(buf, '\n'); - +virBufferAsprintf(buf, %d, 12); +virBufferTruncate(buf, 4); That gives no change in the expected output. I'd almost rather see: virBufferAsprintf(buf, %d, 123); virBufferTruncate(buf, 1); as well as an update to the expected output to see output ending in 12. The idea was to test with minimal change and hence resorted to this. In any case, I am dropping this patch for the reason explained in another thread. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH 2/3] Routine to truncate virBuffer
Routine to truncate virBuffer From: Bharata B Rao bhar...@linux.vnet.ibm.com Add a helper to truncate virBuffer. /** * virBufferTruncate: * @buf: the buffer * @len: number of bytes by which the buffer is truncated * * Truncate the buffer by @len bytes. * * Returns zero on success or -1 on error */ int virBufferTruncate(virBufferPtr buf, unsigned int len) This doesn't reduce the buffer size, but instead reduces the in-use buffer. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- src/libvirt_private.syms |1 + src/util/buf.c | 25 + src/util/buf.h |1 + tests/virbuftest.c |3 ++- 4 files changed, 29 insertions(+), 1 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6a1562e..3a2ae86 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -33,6 +33,7 @@ virBufferEscapeString; virBufferFreeAndReset; virBufferGetIndent; virBufferStrcat; +virBufferTruncate; virBufferURIEncodeString; virBufferUse; virBufferVasprintf; diff --git a/src/util/buf.c b/src/util/buf.c index 5043128..4bba350 100644 --- a/src/util/buf.c +++ b/src/util/buf.c @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len) } /** + * virBufferTruncate: + * @buf: the buffer + * @len: number of bytes by which the buffer is truncated + * + * Truncate the buffer by @len bytes. + * + * Returns zero on success or -1 on error + */ +int +virBufferTruncate(virBufferPtr buf, unsigned int len) +{ +if (buf-error) +return -1; + +if ((signed)(buf-use - len) 0) { +virBufferSetError(buf, -1); +return -1; +} + +buf-use -= len; +buf-content[buf-use] = '\0'; +return 0; +} + +/** * virBufferAdd: * @buf: the buffer to append to * @str: the string diff --git a/src/util/buf.h b/src/util/buf.h index 1a8ebfb..76f48e0 100644 --- a/src/util/buf.h +++ b/src/util/buf.h @@ -63,5 +63,6 @@ void virBufferURIEncodeString(virBufferPtr buf, const char *str); void virBufferAdjustIndent(virBufferPtr buf, int indent); int virBufferGetIndent(const virBufferPtr buf, bool dynamic); +int virBufferTruncate(virBufferPtr buf, unsigned int len); #endif /* __VIR_BUFFER_H__ */ diff --git a/tests/virbuftest.c b/tests/virbuftest.c index 51a62fe..2230cb0 100644 --- a/tests/virbuftest.c +++ b/tests/virbuftest.c @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data ATTRIBUTE_UNUSED) virBufferAddChar(buf, '\n'); virBufferEscapeShell(buf, 11); virBufferAddChar(buf, '\n'); - +virBufferAsprintf(buf, %d, 12); +virBufferTruncate(buf, 4); result = virBufferContentAndReset(buf); if (!result || STRNEQ(result, expected)) { virtTestDifference(stderr, expected, result); -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH 0/3] Guest NUMA topology support - v1
Hi, This patch series adds support for specifying NUMA topology for guests. cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' mems='512000'/ cell cpus='8-15' mems='512000'/ /numa ... /cpu This change allows libvirt to generate -numa options for QEMU/KVM. Changes since v0 - Patch re-organization as per danpb's comments. - Use of virBufferAsPrintf instead of snprintf as per danpb's comments. - Use cell instead of node as per danpb's suggestion. - Add a helper to truncate virBuffer as per Eric's suggestion. - Add documentation. - Add testcases - Add basic validity check for cpus specified in numa ... /numa Testing The patches pass all tests except domainschema test. I think this is not a real failure, but the test is probably failing to pickup the right domaincommon.rng file. Will investigate further. TEST: domainschematest 40 80 .!.. 120 160 .!.. 200 240 280 320 ... 323 FAILED daemon-conf fails, but it fails even without my patches too. I guess my patches are really affecting it. TEST: daemon-conf .!!...!!!./daemon-conf: line 98: kill: (7811) - No such process ! 34 FAILED v0 -- - https://www.redhat.com/archives/libvir-list/2011-October/msg00025.html RFC --- - http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626 Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH 1/3] XML definitions for guest NUMA
XML definitions for guest NUMA and parsing routines. From: Bharata B Rao bhar...@linux.vnet.ibm.com This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this: cpu ... topology sockets='2' cores='4' threads='2'/ numa cell cpus='0-7' mems='512000'/ cell cpus='8-15' mems='512000'/ /numa ... /cpu Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- docs/formatdomain.html.in | 32 + docs/schemas/domaincommon.rng | 32 + src/conf/cpu_conf.c | 62 + src/conf/cpu_conf.h | 12 src/conf/domain_conf.c|7 + 5 files changed, 145 insertions(+), 0 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index c3e7752..8d070ca 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -628,6 +628,38 @@ /dd /dl +p + Guest NUMA topology can be specifed using codenuma/code element. + span class=sinceSince X.X.X/span +/p + +pre + ... + lt;cpugt; +... +lt;numagt; + lt;cell cpus='0-3' mems='512000'/gt; + lt;cell cpus='4-7' mems='512000'/gt; +lt;/numagt; +... + lt;/cpugt; + .../pre + +p + Each codecell/code element specifies a NUMA cell or a NUMA node. + codecpus/code specifies the CPU or range of CPUs that are part of + the node. codemems/code specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0. +/p + +p + This guest NUMA specification translates to code-numa/code command + line option for QEMU/KVM. For the above example, the following QEMU + command line option is generated: + code-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000/code +/p + h3a name=elementsLifecycleLifecycle control/a/h3 p diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index b6f858e..b09060a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2311,6 +2311,9 @@ zeroOrMore ref name=cpuFeature/ /zeroOrMore +optional + ref name=cpuNuma/ +/optional /interleave /group /choice @@ -2371,6 +2374,25 @@ /element /define + define name=cpuNuma +element name=numa + oneOrMore +ref name=numaCell/ + /oneOrMore +/element + /define + + define name=numaCell +element name=cell + attribute name=cpus +ref name=Cellcpus/ + /attribute + attribute name=mems +ref name=Cellmems/ + /attribute +/element + /define + !-- System information specification: Placeholder for system specific informations likes the ones @@ -2745,4 +2767,14 @@ param name=pattern[a-zA-Z0-9_\.:]+/param /data /define + define name=Cellcpus +data type=string + param name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param +/data + /define + define name=Cellmems +data type=unsignedInt + param name=pattern[0-9]+/param +/data + /define /grammar diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 41e997e..e55897f 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -28,6 +28,7 @@ #include util.h #include buf.h #include cpu_conf.h +#include domain_conf.h #define VIR_FROM_THIS VIR_FROM_CPU @@ -67,6 +68,10 @@ virCPUDefFree(virCPUDefPtr def) VIR_FREE(def-features[i].name); VIR_FREE(def-features); +for (i = 0 ; i def-ncells ; i++) +VIR_FREE(def-cells[i].cpumask); +VIR_FREE(def-cells); + VIR_FREE(def); } @@ -109,6 +114,19 @@ no_memory: return NULL; } +static int +virCPUDefNumaCPUs(virCPUDefPtr def) +{ +int i, j, ncpus = 0; + +for (i = 0; i def-ncells; i++) { +for (j = 0; j VIR_DOMAIN_CPUMASK_LEN; j++) { +if (def-cells[i].cpumask[j]) +ncpus++; +} +} +return ncpus; +} virCPUDefPtr virCPUDefParseXML(const xmlNodePtr node, @@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node, def-features[i].policy = policy; } +if (virXPathNode(./numa[1], ctxt)) { +VIR_FREE(nodes); +n = virXPathNodeSet(./numa[1]/cell, ctxt, nodes); +if (n 0 || n == 0) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(NUMA topology defined without NUMA cells)); +goto error; +} + +if (VIR_RESIZE_N(def-cells, def-ncells_max, + def-ncells, n) 0) +goto no_memory; + +def-ncells = n; + +for (i = 0 ; i n ; i
[libvirt] [PATCH 3/3] qemu: Generate -numa command line option
qemu: Generate -numa option From: Bharata B Rao bhar...@linux.vnet.ibm.com Add routines to generate -numa QEMU command line option based on numa ... /numa XML specifications. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- src/qemu/qemu_command.c| 71 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args |5 + tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 25 +++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args |6 ++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 25 +++ tests/qemuxml2argvtest.c |2 + 6 files changed, 134 insertions(+), 0 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index dc92fa3..b042e34 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3251,6 +3251,74 @@ qemuBuildSmpArgStr(const virDomainDefPtr def, return virBufferContentAndReset(buf); } +static int +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ +int i, first, last; +int cpuSet = 0; + +for (i = 0; i VIR_DOMAIN_CPUMASK_LEN; i++) { +if (cpumask[i]) { +if (cpuSet) +last = i; +else { +first = last = i; +cpuSet = 1; +} +} else { +if (!cpuSet) +continue; +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +cpuSet = 0; + } +} + +if (cpuSet) { +if (first == last) +virBufferAsprintf(buf, %d,, first); +else +virBufferAsprintf(buf, %d-%d,, first, last); +} + +/* Remove the trailing comma */ +return virBufferTruncate(buf, 1); +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ +int i; +char *node; +virBuffer buf = VIR_BUFFER_INITIALIZER; + +for (i = 0; i def-cpu-ncells; i++) { +virCommandAddArg(cmd, -numa); +virBufferAsprintf(buf, %s, node); +virBufferAsprintf(buf, ,nodeid=%d, def-cpu-cells[i].cellid); +virBufferAsprintf(buf, ,cpus=); + +if (qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask, buf)) +goto error; + +virBufferAsprintf(buf, ,mems=%d, def-cpu-cells[i].mem); + +if (virBufferError(buf)) +goto error; + +node = virBufferContentAndReset(buf); +virCommandAddArg(cmd, node); +VIR_FREE(node); +} +return 0; + +error: +virBufferFreeAndReset(buf); +virReportOOMError(); +return -1; +} /* * Constructs a argv suitable for launching qemu with config defined @@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); +if (def-cpu def-cpu-ncells qemuBuildNumaArgStr(def, cmd)) +goto error; + if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) { virCommandAddArg(cmd, -name); if (driver-setProcessName diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args new file mode 100644 index 000..e8fd277 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args @@ -0,0 +1,5 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \ +-m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mems=109550 \ +-numa node,nodeid=1,cpus=8-15,mems=109550 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \ +-parallel none -usb diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml new file mode 100644 index 000..07fe891 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml @@ -0,0 +1,25 @@ +domain type='qemu' + nameQEMUGuest1/name + uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid + memory219100/memory + currentMemory219100/currentMemory + vcpu16/vcpu + os +type arch='x86_64' machine='pc'hvm/type +boot dev='network'/ + /os + cpu +topology sockets=2 cores=4 threads=2/ +numa + cell cpus=0-7 mems=109550/ + cell cpus=8-15 mems=109550/ +/numa + /cpu + clock offset='utc'/ + on_poweroffdestroy/on_poweroff + on_rebootrestart/on_reboot + on_crashdestroy/on_crash + devices + emulator/./qemu.sh/emulator + /devices +/domain diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args new file mode 100644 index 000..bc96cef --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args @@ -0,0
Re: [libvirt] [RFC PATCH 2/2] Routines to parse numa ... /numa
On Thu, Oct 13, 2011 at 12:50:00PM +0100, Daniel P. Berrange wrote: On Mon, Oct 03, 2011 at 03:31:35PM +0530, Bharata B Rao wrote: Routines to parse numa ... /numa From: Bharata B Rao bhar...@linux.vnet.ibm.com This patch adds routines to parse guest numa XML configuration for qemu. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- src/conf/cpu_conf.c | 48 src/conf/cpu_conf.h | 11 ++ src/qemu/qemu_command.c | 94 +++ 3 files changed, 153 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a13ba71..5b4345e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3153,6 +3153,97 @@ qemuBuildSmpArgStr(const virDomainDefPtr def, return virBufferContentAndReset(buf); } +static char * +virParseNodeCPUs(char *cpumask) This is a rather misleading name, since it is really formatting the argument value. Can you rename this to qemuBuildNumaCPUArgStr Sure will do. +{ +int i, first, last, ret; +char *cpus, *ptr; +int cpuSet = 0; +int remaining = 128; + +if (VIR_ALLOC_N(cpus, remaining) 0) +return NULL; + +ptr = cpus; + +for (i = 0; i VIR_DOMAIN_CPUMASK_LEN; i++) { +if (cpumask[i]) { +if (cpuSet) +last = i; +else { +first = last = i; +cpuSet = 1; +} +} else { +if (!cpuSet) +continue; +if (first == last) +ret = snprintf(ptr, remaining, %d,, first); +else +ret = snprintf(ptr, remaining, %d-%d,, first, last); +if (ret remaining) +goto error; +ptr += ret; +remaining -= ret; +cpuSet = 0; + } +} + +if (cpuSet) { +if (first == last) +ret = snprintf(ptr, remaining, %d,, first); +else +ret = snprintf(ptr, remaining, %d-%d,, first, last); +if (ret remaining) +goto error; +} + +/* Remove the trailing comma */ +*(--ptr) = '\0'; +return cpus; + +error: +VIR_FREE(cpus); +return NULL; Using VIR_ALLOC_N + snprintf here is not desirable, when you already have a nice virBufferPtr object in the call that you could use. Just pass the virBufferPtr straight into this method. Wanted to user virBufferPtr, but I realized that I need to remove the last comma from the string and coudn't find an easy way to do that. Hence resorted to this method. But I think I can still achive this by not appending a comma to the next part of the agrument (,mems). Let me see if I can do this cleanly in the next post. +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ +int i; +char *cpus, *node; +virBuffer buf = VIR_BUFFER_INITIALIZER; + +for (i = 0; i def-cpu-nnodes; i++) { +virCommandAddArg(cmd, -numa); +virBufferAsprintf(buf, %s, node); +virBufferAsprintf(buf, ,nodeid=%d, def-cpu-nodes[i].nodeid); + +cpus = virParseNodeCPUs(def-cpu-nodes[i].cpumask); +if (!cpus) +goto error; + +virBufferAsprintf(buf, ,cpus=%s, cpus); +virBufferAsprintf(buf, ,mems=%d, def-cpu-nodes[i].mem); + +if (virBufferError(buf)) { +VIR_FREE(cpus); +goto error; +} + +node = virBufferContentAndReset(buf); +virCommandAddArg(cmd, node); + +VIR_FREE(cpus); +VIR_FREE(node); +} +return 0; + +error: +virBufferFreeAndReset(buf); +virReportOOMError(); +return -1; +} /* * Constructs a argv suitable for launching qemu with config defined @@ -3319,6 +3410,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); +if (def-cpu-nnodes qemuBuildNumaArgStr(def, cmd)) +goto error; + if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) { virCommandAddArg(cmd, -name); if (driver-setProcessName Looks fine code wise. For the future iterations, it would be good to change the split of patches slightly - Patch 1 for XML bits: cpu_conf.c, cpu_conf.h, and docs/schemas/domain.rn - Patch 2 for qemu_command.c, tests/qemuxml2argvtest.c Sure will rearrange in the next iteration. Thanks for your review. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH 0/3] Guest NUMA topology support - v0
On Thu, Oct 13, 2011 at 12:53:22PM +0100, Daniel P. Berrange wrote: On Mon, Oct 03, 2011 at 03:28:44PM +0530, Bharata B Rao wrote: Hi, I discussed the possibilities of adding NUMA topology XML specification support for guests here some time back. Since my latest proposal (http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626) didn't get any response, I am posting a prototype implementation that supports specifying NUMA topology for QEMU guests. - The implementation is based on the last proposal I listed above. So we're basically only allowing a flat NUMA toplogy numa node cpus='0,2,4,6' mems='1024 node cpus='8,10,12,14' mems='1024 node cpus='1,3,5,7' mems='1024' node cpus='9,11,13,15' mems='1024' /numa which mirrors what QEMU allows currently. Should we need to support a hierarchy, we can trivially extend this syntax in a backwards compatible fashion numa node node cpus='0,2,4,6' mems='1024 node cpus='8,10,12,14' mems='1024 /node node node cpus='1,3,5,7' mems='1024' node cpus='9,11,13,15' mems='1024' /node /numa so I think this limitation is OK for now. Fine then. In the virsh capabilities XML, we actually use the word 'cell' rather than 'node'. I think it might be preferrable to be consistent and use 'cell' here too. I feel NUMA node sounds more familiar than NUMA cell. But if libvirt prefers cell, we can go with cell I suppose. - The implementation is for QEMU only. That's fine. - The patchset has gone through extremely light testing and I have just tested booting a 2 socket 2 core 2 thread QEMU guest. - I haven't really bothered to cover all the corner cases and haven't run libvirt tests after this patchset. For eg, there is no code to validate if the CPU combination specified by topology and numa match with each other. I plan to cover all these after we freeze the specification itself. ok WRT the question about CPU enumeration order in the URL quoted above. I don't think it matters whether we enumerate CPUs in the same order as real hardware. The key thing is that we just choose an order, document what *our* enumeration order is, and then stick to it forever. Ok fine. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC PATCH 1/2] Add XML definitions for guest NUMA specifications
Add XML definitions for guest NUMA specifications. From: Bharata B Rao bhar...@linux.vnet.ibm.com NUMA topology for guest is specified as follows: cpu ... numa node cpus='0-3' mems='1024' node cpus='4,5,6,7' mems='1024' node cpus='8-10',11-12^12' mems='1024' /numa /cpu Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- docs/schemas/domaincommon.rng | 32 1 files changed, 32 insertions(+), 0 deletions(-) diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 4972fac..99b70c3 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2252,6 +2252,9 @@ zeroOrMore ref name=cpuFeature/ /zeroOrMore +optional + ref name=cpuNuma/ +/optional /interleave /group /choice @@ -2312,6 +2315,25 @@ /element /define + define name=cpuNuma +element name=numa + oneOrMore +ref name=numaNode/ + /oneOrMore +/element + /define + + define name=numaNode +element name=node + attribute name=cpus +ref name=Nodecpus/ + /attribute + attribute name=mems +ref name=Nodemems/ + /attribute +/element + /define + !-- System information specification: Placeholder for system specific informations likes the ones @@ -2665,4 +2687,14 @@ param name=pattern[a-zA-Z0-9_\.:]+/param /data /define + define name=Nodecpus +data type=string + param name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param +/data + /define + define name=Nodemems +data type=unsignedInt + param name=pattern[0-9]+/param +/data + /define /grammar -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC PATCH 0/3] Guest NUMA topology support - v0
Hi, I discussed the possibilities of adding NUMA topology XML specification support for guests here some time back. Since my latest proposal (http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626) didn't get any response, I am posting a prototype implementation that supports specifying NUMA topology for QEMU guests. - The implementation is based on the last proposal I listed above. - The implementation is for QEMU only. - The patchset has gone through extremely light testing and I have just tested booting a 2 socket 2 core 2 thread QEMU guest. - I haven't really bothered to cover all the corner cases and haven't run libvirt tests after this patchset. For eg, there is no code to validate if the CPU combination specified by topology and numa match with each other. I plan to cover all these after we freeze the specification itself. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC PATCH 2/2] Routines to parse numa ... /numa
Routines to parse numa ... /numa From: Bharata B Rao bhar...@linux.vnet.ibm.com This patch adds routines to parse guest numa XML configuration for qemu. Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com --- src/conf/cpu_conf.c | 48 src/conf/cpu_conf.h | 11 ++ src/qemu/qemu_command.c | 94 +++ 3 files changed, 153 insertions(+), 0 deletions(-) diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 5cecda2..c520025 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -28,6 +28,7 @@ #include util.h #include buf.h #include cpu_conf.h +#include domain_conf.h #define VIR_FROM_THIS VIR_FROM_CPU @@ -67,6 +68,10 @@ virCPUDefFree(virCPUDefPtr def) VIR_FREE(def-features[i].name); VIR_FREE(def-features); +for (i = 0 ; i def-nnodes ; i++) +VIR_FREE(def-nodes[i].cpumask); +VIR_FREE(def-nodes); + VIR_FREE(def); } @@ -289,6 +294,49 @@ virCPUDefParseXML(const xmlNodePtr node, def-features[i].policy = policy; } +if (virXPathNode(./numa[1], ctxt)) { +VIR_FREE(nodes); +n = virXPathNodeSet(./numa[1]/node, ctxt, nodes); +if (n 0 || n == 0) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(NUMA topology defined without NUMA nodes)); +goto error; +} + +if (VIR_RESIZE_N(def-nodes, def-nnodes_max, + def-nnodes, n) 0) +goto no_memory; + +def-nnodes = n; + +for (i = 0 ; i n ; i++) { +char *cpus; +int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; +unsigned long ul; +int ret; + +def-nodes[i].nodeid = i; +cpus = virXMLPropString(nodes[i], cpus); + +if (VIR_ALLOC_N(def-nodes[i].cpumask, cpumasklen) 0) +goto no_memory; + +if (virDomainCpuSetParse((const char **)cpus, + 0, def-nodes[i].cpumask, + cpumasklen) 0) +goto error; + +ret = virXPathULong(string(./numa[1]/node/@mems), +ctxt, ul); +if (ret 0) { +virCPUReportError(VIR_ERR_INTERNAL_ERROR, +%s, _(Missing 'mems' attribute in NUMA topology)); +goto error; +} +def-nodes[i].mem = (unsigned int) ul; +} +} + cleanup: VIR_FREE(nodes); diff --git a/src/conf/cpu_conf.h b/src/conf/cpu_conf.h index 57b85e1..266ec81 100644 --- a/src/conf/cpu_conf.h +++ b/src/conf/cpu_conf.h @@ -67,6 +67,14 @@ struct _virCPUFeatureDef { int policy; /* enum virCPUFeaturePolicy */ }; +typedef struct _virNodeDef virNodeDef; +typedef virNodeDef *virNodeDefPtr; +struct _virNodeDef { + int nodeid; + char *cpumask; /* CPUs that are part of this node */ + unsigned int mem; /* Node memory */ +}; + typedef struct _virCPUDef virCPUDef; typedef virCPUDef *virCPUDefPtr; struct _virCPUDef { @@ -81,6 +89,9 @@ struct _virCPUDef { size_t nfeatures; size_t nfeatures_max; virCPUFeatureDefPtr features; +size_t nnodes; +size_t nnodes_max; +virNodeDefPtr nodes; }; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index a13ba71..5b4345e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3153,6 +3153,97 @@ qemuBuildSmpArgStr(const virDomainDefPtr def, return virBufferContentAndReset(buf); } +static char * +virParseNodeCPUs(char *cpumask) +{ +int i, first, last, ret; +char *cpus, *ptr; +int cpuSet = 0; +int remaining = 128; + +if (VIR_ALLOC_N(cpus, remaining) 0) +return NULL; + +ptr = cpus; + +for (i = 0; i VIR_DOMAIN_CPUMASK_LEN; i++) { +if (cpumask[i]) { +if (cpuSet) +last = i; +else { +first = last = i; +cpuSet = 1; +} +} else { +if (!cpuSet) +continue; +if (first == last) +ret = snprintf(ptr, remaining, %d,, first); +else +ret = snprintf(ptr, remaining, %d-%d,, first, last); +if (ret remaining) +goto error; +ptr += ret; +remaining -= ret; +cpuSet = 0; + } +} + +if (cpuSet) { +if (first == last) +ret = snprintf(ptr, remaining, %d,, first); +else +ret = snprintf(ptr, remaining, %d-%d,, first, last); +if (ret remaining) +goto error; +} + +/* Remove the trailing comma */ +*(--ptr) = '\0'; +return cpus; + +error: +VIR_FREE(cpus); +return NULL; +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ +int i; +char *cpus, *node; +virBuffer buf = VIR_BUFFER_INITIALIZER
[libvirt] Dependency between vcpu and topology
Hi, Is there a dependency b/n vcpu and topology ? Is specifying vcpu mandatory for a SMP guest ? Can't I skip vcpu but have topology sockets= cores= threads= to define a SMP guest ? I see that if I skip vcpu, libvirt generates -smp 1 on qemu command line which results in qemu booting a VM with single CPU irrespective of the topology specified. Shouldn't libvirt arrive at correct CPU count (vcpus=sockets*cores*threads ?) even if vcpu isn't specified rather than defaulting to 1 ? Regards, Bharata. -- http://bharata.sulekha.com/blog/posts.htm, http://raobharata.wordpress.com/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
On Tue, Aug 30, 2011 at 10:53 AM, Bharata B Rao bharata@gmail.com wrote: Hi, Here is another attempt at guest NUMA topology XML specification that should work for different NUMA topologies. Hi Daniel, Do you think I should go ahead and implement this ? Any comments or concerns ? Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
Hi, Here is another attempt at guest NUMA topology XML specification that should work for different NUMA topologies. We already specify the number of sockets, cores and threads a system has by using: cpu topology sockets='2' cores='2' threads='2' /cpu For NUMA, we can add the following: numa node cpus='0-3' mems='1024' node cpus='4-7' mems='1024' /numa Specifying only cpus in the NUMA node specification should be enough to represent most of the topologies. Based on the number of cpus specified in each node, we should be able to work out how many cores and sockets will be part of each node. Only other thing needed is explicit memory specification. I have taken a few example NUMA topologies here and shown how the above specification can help. Magny cours - Topology desc: http://code.google.com/p/likwid-topology/wiki/AMD_MagnyCours8 cpu topology sockets='4' cores='4' threads='1' numa node cpus='0-3' mems='1024' node cpus='4-7' mems='1024' node cpus='8-11' mems='1024' node cpus='12-15'mems='1024' /numa cpu OR if we want to stick to how CPUs get enumerated in real hardware we can specify like this: cpu topology sockets='4' cores='4' threads='1' numa node cpus='0,2,4,6' mems='1024 node cpus='8,10,12,14' mems='1024 node cpus='1,3,5,7' mems='1024' node cpus='9,11,13,15' mems='1024' /numa cpu The above two specifations for Magny Cours aren't perfect because we conveniently converted the multi-level NUMA into sigle level NUMA. System has 2 sockets and 2 NUMA domains consisting of 4 cores in each domain, but we aren't really reflecting this in the topology specification. But does this really matter ? We are still showing 4 distinct NUMA domains. Nehalem Topology desc: http://code.google.com/p/likwid-topology/wiki/Intel_Nehalem cpu topology sockets='2' cores='4' threads='2' numa node cpus='0-7' mems='1024' node cpus='8-15' mems='1024' /numa /cpu OR if we want to stick to how CPUs get enumerated in real hardware we can specify like this: cpu topology sockets='2' cores='4' threads='2' numa node cpus='0-3,8-11' mems='1024' node cpus='4-7,12-15' mems='1024' /numa /cpu However there is a problem here. The specification isn't granular enough to specify which CPU is part of which core. As you can see in the topology diagram, CPUs 0,8 belong to one core, CPUs 1,9 belong to one core etc. So the whole point of specifying all the CPUs explicitly in the specification gets defeated. Dunnington --- Topology desc: 2 nodes, 4 sockets in each node, 6 cores in each socket. cpu topology sockets='8' cores='6 threads='1' numa node cpus='0-23' mems=1024 node cpus='24-47' mems=1024 /numa /cpu Here also there is the same problem. CPUs 0,4,8,12,16,,20 belong to a core but the specifcation doesn't allow for that. So here are some questions that we need to answer: - Can we just go with flat NUMA specification and convert multilevel NUMA into flat NUMA wherever possible (like in the above Magny cours eg) ? - Are there topologies where this doesn't work ? - Isn't it enough to enumerate CPUs serially among cores and sockets and not enumerate them exactly as in real hardware ? Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
On Tue, Aug 23, 2011 at 7:43 PM, Daniel P. Berrange berra...@redhat.com wrote: On Fri, Aug 19, 2011 at 12:05:43PM +0530, Bharata B Rao wrote: Hi, qemu supports specification of NUMA topology on command line using -numa option. -numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node] I see that there is no way to specify such NUMA topology in libvirt XML. Are there plans to add support for NUMA topology specification ? Is anybody already working on this ? If not I would like to add this support for libvirt. Currently the topology specification available in libvirt ( topology sockets='1' cores='2' threads='1'/) translates to -smp sockets=1,cores=2,threads=1 option of qemu. There is not equivalent in libvirt that could generate -numa command line option of qemu. How about something like this ? (OPTION 1) cpu ... numa nodeid='node' cpus='cpu[-cpu]' mem='size' ... /cpu And we could specify multiple such lines, one for each node. I'm not sure it really makes sense having the NUMA memory config inside the cpu configuration, but i like the simplicity of of this specification. Yes, memory specification inside cpu, may be we could define a separate numa section as shown in your examples below and put it outside of cpu. -numa and -smp options in qemu do not work all that well since they are parsed independent of each other and one could specify a cpu set with -numa option that is incompatible with sockets,cores and threads specified on -smp option. This should be fixed in qemu, but given that such a problem has been observed, should libvirt tie the specification of numa and smp (sockets,threads,cores) together so that one is forced to specify only valid combinations of nodes and cpus in libvirt ? No matter what we do, libvirt is going to have todo some kind of semantic validation on the different info. Right. Given that we have vcpus as well as vcpu current, libvirt needs to ensure that specified topology is sane. May be something like this: (OPTION 2) cpu ... topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size' ... /cpu This should result in a 2 node system with each node having 1 socket with 2 cores. This has the problem of redundancy of specification of the sockets, cores threads, vs the new 'cpus' attribute. eg you can specify wierd configs like: Yes, sockets,cores,threads become redundant, one option is to define sockets,cores,threads once (like we currently do inside cpu) and have it as a common definition for all the numa nodes defined. Something like this: cpu topology sockets='1' cores='2' threads='1' numa node cpus='0-1' mems=1024 numa node cpus='2-3' mems=1024 /cpu This will result in a 2 node system with each node having 1 socket with 2 cores each. But as you can see this will be restrictive since you can't specify different topologies for different nodes. Are there such non-symmetric systems out there and should libvirt be flexible enough to support such NUMA topologies for VMs ? Also looks like nodeid (from OPTION 2 of my original mail) is redundant, may be we should assign increasing node ids based on the number of numa topology statements that appear. In the above example, we can implicitly assign node ids 0 and 1 for two nodes. Adam Litke suggested that we can omit cpus= from the specification since it can be derived, but given that there are topologies that don't enumerate the CPUs within a socket serially, it becomes necessary to have an explicit cpus= specification. topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='2' cores='1' threads='1' nodeid='1' cpus='2-3' mem='size' Or even bogus configs topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='4' cores='1' threads='1' nodeid='1' cpus='2-3' mem='size' That all said, given our current XML schema, it is inevitable that we will have some level of duplication of information. Some things that are important to consider are how this interacts with possible CPU / memory hotplug in the future, So what are the issues we need to take care of here ? and how we will be able to pin guest NUMA nodes, to host NUMA nodes. This would be a good thing to do in libvirt. I think libvirt should intelligently place VMs on host nodes based on the guest topology. But I don't clearly see what issues we need to take care of now while we come up with NUMA topology definition for VM. For the first point, it might be desirable to create a NUMA topology which supports upto 8 logical CPUs, but only have 2 physical sockets actually plugged in at boot time. Also, I dread to question whether we want to be able to represent a multi-level NUMA topology, or just assume one level. If we want to be able to cope with multi-level topology, can we assume the levels are solely grouping at the socket
Re: [libvirt] [RFC] NUMA topology specification
On Fri, Aug 19, 2011 at 7:10 PM, Adam Litke a...@us.ibm.com wrote: On 08/19/2011 01:35 AM, Bharata B Rao wrote: ... topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size' ... I like the idea of merging this into topology to prevent errors with specifying incompatible cpu and numa topologies but I think you can go a step further (assuming my following assertion is valid). Since cpus are assigned to numa nodes at the core level, and you are providing a 'nodeid' attribute, you can infer the 'cpus' attribute using 'cores' and 'nodeid' alone. For your example above: topology sockets='1' cores='2' threads='1' nodeid='0' mem='size' topology sockets='1' cores='2' threads='1' nodeid='1' mem='size' You have 4 cores total, each node is assigned 2. Assign cores to nodes starting with core 0 and node 0. Sounds good. Unless anyone or any architecture has specific requirements of enumerating CPUs differently across nodes, 'cpus=' is redundant as you observe. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC] NUMA topology specification
Hi, qemu supports specification of NUMA topology on command line using -numa option. -numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node] I see that there is no way to specify such NUMA topology in libvirt XML. Are there plans to add support for NUMA topology specification ? Is anybody already working on this ? If not I would like to add this support for libvirt. Currently the topology specification available in libvirt ( topology sockets='1' cores='2' threads='1'/) translates to -smp sockets=1,cores=2,threads=1 option of qemu. There is not equivalent in libvirt that could generate -numa command line option of qemu. How about something like this ? (OPTION 1) cpu ... numa nodeid='node' cpus='cpu[-cpu]' mem='size' ... /cpu And we could specify multiple such lines, one for each node. -numa and -smp options in qemu do not work all that well since they are parsed independent of each other and one could specify a cpu set with -numa option that is incompatible with sockets,cores and threads specified on -smp option. This should be fixed in qemu, but given that such a problem has been observed, should libvirt tie the specification of numa and smp (sockets,threads,cores) together so that one is forced to specify only valid combinations of nodes and cpus in libvirt ? May be something like this: (OPTION 2) cpu ... topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size' ... /cpu This should result in a 2 node system with each node having 1 socket with 2 cores. Comments, suggestions ? Regards, Bharata. -- http://bharata.sulekha.com/blog/posts.htm, http://raobharata.wordpress.com/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
On Fri, Aug 19, 2011 at 12:55 PM, Osier Yang jy...@redhat.com wrote: 于 2011年08月19日 14:35, Bharata B Rao 写道: How about something like this ? (OPTION 1) cpu ... numa nodeid='node' cpus='cpu[-cpu]' mem='size' ... /cpu Libvirt already supported NUMA setting (both cpu and memory) on host yet, but yes, nothing for NUMA setting inside guest yet. We have talked once about the XML when adding the support for numa memory setting on host. And finally choosed to introduce new XML node for it with considering to add support for NUMA setting inside guest one day. The XML is: numatune memory mode=strict nodeset=1-4,^3/ /numatune But this only specifies the host NUMA policy that should be used for guest VM processes. So, personlly, I think the new XML should be inside numatune as a child node. And we could specify multiple such lines, one for each node. -numa and -smp options in qemu do not work all that well since they are parsed independent of each other and one could specify a cpu set with -numa option that is incompatible with sockets,cores and threads specified on -smp option. This should be fixed in qemu, but given that such a problem has been observed, should libvirt tie the specification of numa and smp (sockets,threads,cores) together so that one is forced to specify only valid combinations of nodes and cpus in libvirt ? May be something like this: (OPTION 2) cpu ... topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size' topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size' ... /cpu This will cause we have 3 places for NUMA, one is numatune, As I observed above, this controls the NUMA policy of the guest VM threads on host. the other is vcpu, vcpu/cpuset specifies how vcpu threads should be pinned on host. and this one. I think what we are addressing here is a bit different from the above two. Here we are actually trying to _define_ the NUMA topology of the guest, while via other capabilites (numatune, vcpu) we only control the cpu and memory bindings of vcpu threads on host. Hence I am not sure if if numatune is the right place for defining host NUMA topology which btw should be independent of the host topology. Thanks for your response. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
2011/8/19 Bharata B Rao bharata@gmail.com: Hence I am not sure if if numatune is the right place for defining host NUMA topology which btw should be independent of the host ^^guest Sorry for the typo. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC] NUMA topology specification
2011/8/19 Osier Yang jy...@redhat.com: Maybe something like: numatune guest .. /guest /numatune Yes, one possible solution. Let me wait and see what others say and what will be the consensus. Regards, Bharata. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list