Re: [libvirt] [Gluster-devel] [PATCH v3 UPDATED 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.

2012-10-27 Thread Bharata B Rao
On Fri, Oct 26, 2012 at 10:27 PM, Harsh Prateek Bora
ha...@linux.vnet.ibm.com wrote:
 @@ -1624,7 +1626,15 @@
  td one of the sheepdog servers (default is localhost:7000) 
 /td
  td zero or one /td
/tr
 +  tr
 +td gluster /td
 +td a server running glusterd daemon /td

Its actually the server which contains the volume file specification
of the gluster volume

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH v1 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.

2012-09-06 Thread Bharata B Rao
On Wed, Sep 5, 2012 at 8:45 PM, Eric Blake ebl...@redhat.com wrote:
 On 09/05/2012 09:08 AM, Bharata B Rao wrote:
 On Wed, Sep 5, 2012 at 7:03 PM, Jiri Denemark jdene...@redhat.com wrote:
 @@ -1042,6 +1043,13 @@
  attribute name=port
ref name=unsignedInt/
  /attribute
 +attribute name=transport
 +  choice
 +valuesocket/value
 +valueunix/value
 +valuerdma/value

 This could be a bit confusing as socket is too generic, after all unix is 
 also
 a socket. Could we change the values tcp, unix, rdma or something
 similar depending on what socket was supposed to mean?

 That is how gluster calls it and hence I am using the same in QEMU and
 the same is true here too. This is something for gluster developers to
 decide if they want to change socket to something more specific like
 tcp as you suggest.

 Just because gluster calls it a confusing name does not mean we have to
 repeat the confusion in libvirt - it is feasible to have a mapping where
 we name it 'tcp' in the XML but map that to 'socket' in the command line
 that eventually reaches gluster.  The question then becomes whether
 using sensible naming in libvirt, but no longer directly mapped to
 underlying gluster naming, will be the cause of its own set of headaches.

Vijay - would really like to have your inputs here...

- While the transport-type for a volume is shown as tcp in gluster
volume info, libgfapi forces me to use transport=socket to access the
same volume from QEMU. So does socket mean tcp really ? If so,
should I just switch over to using transport=tcp from QEMU ? If not,
can you explain a bit about the difference b/n socket and tcp
transport types ?

- Also apart from socket (or tcp ?), rdma and unix, are there any
other transport options that QEMU should care about ?

- Are rdma and unix transport types operational at the moment ? If
not, do you see them being used in gluster any time in the future ?
The reason behind asking this is to check if we are spending effort in
defining semantics in QEMU for a transport type that is never going to
be used in gluster. Also I see that gluster volume create supports
tcp and rdma but doesn't list unix as an option.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH v1 1/2] Qemu/Gluster: Add Gluster protocol as supported network disk formats.

2012-09-05 Thread Bharata B Rao
On Wed, Sep 5, 2012 at 7:03 PM, Jiri Denemark jdene...@redhat.com wrote:
 @@ -1042,6 +1043,13 @@
  attribute name=port
ref name=unsignedInt/
  /attribute
 +attribute name=transport
 +  choice
 +valuesocket/value
 +valueunix/value
 +valuerdma/value

 This could be a bit confusing as socket is too generic, after all unix is also
 a socket. Could we change the values tcp, unix, rdma or something
 similar depending on what socket was supposed to mean?

That is how gluster calls it and hence I am using the same in QEMU and
the same is true here too. This is something for gluster developers to
decide if they want to change socket to something more specific like
tcp as you suggest.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 2/6] Add New address type spapr-vio to domain.rng

2011-12-12 Thread Bharata B Rao
Original patch by Bharata. Updated to use {1,16} in spaprvioReg based
on example from Eric Blake.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
Signed-off-by: Prerna Saxena pre...@linux.vnet.ibm.com
Signed-off-by: Michael Ellerman mich...@ellerman.id.au
---
 docs/schemas/domaincommon.rng |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 27ec1da..553a6f0 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2199,6 +2199,13 @@
   ref name=usbPort/
 /attribute
   /define
+  define name=spaprvioaddress
+optional
+  attribute name=reg
+ref name=spaprvioReg/
+  /attribute
+/optional
+  /define
   define name=pciaddress
 optional
   attribute name=domain
@@ -2577,6 +2584,12 @@
   /attribute
   ref name=usbportaddress/
 /group
+group
+  attribute name=type
+valuespapr-vio/value
+  /attribute
+  ref name=spaprvioaddress/
+/group
   /choice
 /element
   /define
@@ -2866,4 +2879,9 @@
   param name=pattern[a-zA-Z0-9_\.:]+/param
 /data
   /define
+  define name=spaprvioReg
+data type=string
+  param name=pattern(0x)?[0-9a-fA-F]{1,16}/param
+/data
+  /define
 /grammar
-- 
1.7.7.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 1/2] XML definitions for guest NUMA and parsing routines

2011-12-06 Thread Bharata B Rao
On Wed, Dec 07, 2011 at 03:49:59PM +1100, Michael Ellerman wrote:
 On Fri, 2011-11-11 at 18:21 +0530, Bharata B Rao wrote:
  XML definitions for guest NUMA and parsing routines.
  
  From: Bharata B Rao bhar...@linux.vnet.ibm.com
  
  This patch adds XML definitions for guest NUMA specification and contains
  routines to parse the same. The guest NUMA specification looks like this:
  
  cpu
  ...
  topology sockets='2' cores='4' threads='2'/
  numa
  cell cpus='0-7' memory='512000'/
  cell cpus='8-15' memory='512000'/
  /numa
  ...
  /cpu
 
 Hi Bharata,
 
 I realise I'm a bit late on this, but I'm just catching up on the list
 traffic.
 
 Firstly why is the XML tag called cell, it seems to represent what we
 would normally call a node in terms of NUMA?

I initially started out with node, but in libvirt node is referred
to as cell. Hence stuck to cell to be consistent.

 
 Also does the parser support disjoint ranges for cpus and memory? Or do
 we not need that complexity for some reason?
 
 For example what if I have a topology that looks like:
 
 Node 0:
   CPUs: 0-3,8-11

This is supported, and since we use the same parsing routine that parses
cpuset, we support fancy stuff like

cpus-0-4^3,8-11

But such things aren't support by QEMU currently.

   MEM : 0-1G,2G-3G

QEMU currently doesn't support such construct. Do we really want this
in guest topology ? There are some wierd NUMA topologies in the real world
which aren't supported as of now, we wanted to start simple and if needed
support futher extensions (Refer to discussions on my v0 post)

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC v4 PATCH 5/5] PowerPC : Add ppc64 specific definitions to domain.rng

2011-12-01 Thread Bharata B Rao
On Thu, Dec 01, 2011 at 01:47:47PM -0500, Stefan Berger wrote:
 On 12/01/2011 01:08 PM, Prerna Saxena wrote:
 From: Bharata B Raobhar...@linux.vnet.ibm.com
 Date: Thu, 1 Dec 2011 21:21:34 +0530
 Subject: [PATCH 5/5] Add ppc64 specific definitions to domain.rng
 
 ppc64 as new arch type and pseries as new machine type are
 added underos  .../os. New address type spapr-vio is added.
 
 Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com
 Signed-off-by: Prerna Saxenapre...@linux.vnet.ibm.com
 ---
 
 
 ACK. Test case(s) for the test/qemuxml2argvtest{.c} would be nice, too.

I am working on the testcase, will send it shortly.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv3] conf: don't modify cpu set string during parsing

2011-11-18 Thread Bharata B Rao
On Fri, Nov 18, 2011 at 11:31:12AM -0700, Eric Blake wrote:
 None of the callers cared if str was updated to point to the next
 byte after the parsed cpuset; simplifying this results in quite
 a few code simplifications.  Additionally, virCPUDefParseXML was
 strdup()'ing a malloc()'d string; avoiding a memory copy resulted
 in less code.

Changes to virCPUDefParseXML look good. numa ... /numa XML
section is parsed correctly and qemu -numa options are generated
correctly after this change.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1/3] XML definitions for guest NUMA

2011-11-11 Thread Bharata B Rao
On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote:
 On 11/06/2011 06:57 AM, Bharata B Rao wrote:
 XML definitions for guest NUMA and parsing routines.
 
 From: Bharata B Raobhar...@linux.vnet.ibm.com
 
 This patch adds XML definitions for guest NUMA specification and contains
 routines to parse the same. The guest NUMA specification looks like this:
 
 cpu
  ...
  topology sockets='2' cores='4' threads='2'/
  numa
  cell cpus='0-7' mems='512000'/
  cell cpus='8-15' mems='512000'/
  /numa
  ...
 /cpu
 
 Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com
 ---
 
 +p
 +  Guest NUMA topology can be specifed usingcodenuma/code  element.
 +span class=sinceSince X.X.X/span
 
 Let's just put 0.9.8 here.  It's easier at feature freeze time to
 grep and replace a concrete 0.9.8 into a different numbering scheme
 (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8,
 than it is to remember to also search for X.X.X.

Ok, changed to 0.9.8 in v3.

 
 +/p
 +
 +pre
 +  ...
 +lt;cpugt;
 +...
 +lt;numagt;
 +lt;cell cpus='0-3' mems='512000'/gt;
 +lt;cell cpus='4-7' mems='512000'/gt;
 
 I understand 'cpus' (a valid word meaning multiple cpu units), but
 'mems' seems odd; I think it would be better naming this attribute
 'memory' to match our memory element at the top level.  Just
 because qemu's command line names the option mems= doesn't mean we
 should be stuck making our XML inconsistent.

Changed to memory.

 
 +lt;/numagt;
 +...
 +lt;/cpugt;
 +  .../pre
 +
 +p
 +  Eachcodecell/code  element specifies a NUMA cell or a NUMA node.
 +codecpus/code  specifies the CPU or range of CPUs that are part of
 +  the node.codemems/code  specifies the node memory in kilobytes
 +  (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
 +  or nodeid in the increasing order starting from 0.
 
 I agree with doing things in 1024-byte blocks [1], since memory
 and currentMemory are also in that unit.
 
 [1] 1024-byte blocks is technically kibibytes, not kilobytes; but
 you're copying from existing text, so at least we're consistent, not
 to mention fitting right in with the wikipedia complaint that KiB
 has had slow adoption by the computer industry:
 https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
 
 +/p
 +
 +p
 +  This guest NUMA specification translates tocode-numa/code  command
 +  line option for QEMU/KVM. For the above example, the following QEMU
 +  command line option is generated:
 +code-numa node,nodeid=0,cpus=0-3,mems=512000 -numa 
 node,nodeid=1,cpus=4-7,mems=512000/code
 
 This paragraph is not necessary.  We don't need to give one
 hypervisor-specific example of how the XML is translated; it is
 sufficient to simply document the XML semantics in a way that can be
 implemented by any number of hypervisors.

Ok, removed this paragraph.

 
 +
 +define name=numaCell
 +element name=cell
 +attribute name=cpus
 +ref name=Cellcpus/
 
 Typically, ref names start with a lower case letter.

Ok, changed.

 
 +/attribute
 +attribute name=mems
 +ref name=Cellmems/
 +/attribute
 
 Is it possible for these attributes to be optional?  That is, on the
 qemu line, can I specify cpus= but not mems=, or mems= but not
 cpus=? If so, then the attributes need to be optional in the schema,
 and the code behave with sane defaults when one of the two
 attributes is not present.

Actually qemu allows both cpus and mem to be optional. But if we allow
that, we will have specifications like this:

numa
  cell /
  cell /
/numa

That looks ugly.

Either both of them should allowed to be optional or none of them, I am
going for the latter specification. IOW lets make both of them
mandatory in a numa cell.

 
 @@ -2745,4 +2767,14 @@
 param name=pattern[a-zA-Z0-9_\.:]+/param
   /data
 /define
 +define name=Cellcpus
 +data type=string
 +param 
 name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param
 
 This looks like a repeat of define name=cpuset; if so, let's
 reuse that define instead of making a new one (and if not, why are
 we introducing yet another syntax?).

Fixed.

 
 +/data
 +/define
 +define name=Cellmems
 +data type=unsignedInt
 +param name=pattern[0-9]+/param
 
 Likewise, this looks like a repeat of define name=memoryKB, so
 lets reuse that.

Done.

 
 @@ -109,6 +114,19 @@ no_memory:
   return NULL;
   }
 
 +static int
 +virCPUDefNumaCPUs(virCPUDefPtr def)
 +{
 +int i, j, ncpus = 0;
 +
 +for (i = 0; i  def-ncells; i++) {
 +for (j = 0; j  VIR_DOMAIN_CPUMASK_LEN; j++) {
 +if (def-cells[i].cpumask[j])
 +ncpus++;
 +}
 +}
 
 Can this loop be made any faster by using count_one_bits?

Yes provided we start storing CPUs in bitmasks rather than
char arrays. But I saw other parts of the code (like cpuset)
storing CPUs in array and hence used the same for numa too.
In any case I got rid of this since we can

Re: [libvirt] [PATCH 3/3] qemu: Generate -numa command line option

2011-11-11 Thread Bharata B Rao
On Mon, Nov 07, 2011 at 11:47:10AM -0700, Eric Blake wrote:
 On 11/06/2011 06:59 AM, Bharata B Rao wrote:
 qemu: Generate -numa option
 
 From: Bharata B Raobhar...@linux.vnet.ibm.com
 
 Add routines to generate -numa QEMU command line option based on
 numa  .../numa  XML specifications.
 
 Signed-off-by: Bharata B Raobhar...@linux.vnet.ibm.com
 
 
 +static int
 +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf)
 +{
 +int i, first, last;
 +int cpuSet = 0;
 +
 
 What happens if cpumask is all 0?
 
 +for (i = 0; i  VIR_DOMAIN_CPUMASK_LEN; i++) {
 +if (cpumask[i]) {
 
 This if branch is skipped,
 
 +if (cpuSet)
 +last = i;
 +else {
 +first = last = i;
 +cpuSet = 1;
 +}
 +} else {
 +if (!cpuSet)
 +continue;
 
 so this branch always continues,
 
 +if (first == last)
 +virBufferAsprintf(buf, %d,, first);
 +else
 +virBufferAsprintf(buf, %d-%d,, first, last);
 +cpuSet = 0;
 +}
 +}
 +
 +if (cpuSet) {
 
 and this if is skipped,
 
 +if (first == last)
 +virBufferAsprintf(buf, %d,, first);
 +else
 +virBufferAsprintf(buf, %d-%d,, first, last);
 +}
 +
 +/* Remove the trailing comma */
 +return virBufferTruncate(buf, 1);
 
 meaning that nothing was appended to buf, and you are now stripping
 unknown text, rather than a comma you just added.  Do we need a
 sanity check to ensure that the cpumask specifies at least one cpu?
 And if so, would that mask be better here, or up front at the xml
 parsing time?

Good catch. I am now ensuring that this doesn't get a mask with no cpus.

 
 +}
 +
 +static int
 +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd)
 +{
 +int i;
 +char *node;
 +virBuffer buf = VIR_BUFFER_INITIALIZER;
 +
 +for (i = 0; i  def-cpu-ncells; i++) {
 +virCommandAddArg(cmd, -numa);
 +virBufferAsprintf(buf, %s, node);
 
 More efficient as virBufferAddLit(buf, node), or...

Right.

 
 +virBufferAsprintf(buf, ,nodeid=%d, def-cpu-cells[i].cellid);
 
 merge these two into a single:
 
 virBufferAsprintf(buf, node,nodeid=%d, ...);
 
 +virBufferAsprintf(buf, ,cpus=);
 
 Again, with no % in the format string, it is more efficient to use
 virBufferAddLit.
 
 +
 +if (qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask,buf))
 
 Generally, we prefer an explicit  0 comparison when checking for failure.

Ok as you prefer :)

 
 +goto error;
 +
 +virBufferAsprintf(buf, ,mems=%d, def-cpu-cells[i].mem);
 +
 
 Why do we need to bother with stripping a trailing comma in
 qemuBuildNumaCPUArgStr, if we will just be adding a comma back again
 here as the very next statement?  You could skip all the hassle of
 adding virBufferTruncate by just transferring the comma out of this
 statement and into qemuBuildNumaCPUArgStr

This is what I mentioned in my last iteration. But since you hinted
at extending virBuffer, I went on that path. Another motivation was to
keep qemuBuildNumaArgStr() generic enough so that it gives out comma/dash
separate CPU string without the ending comma in case this functionality
is needed elsewhere in libvirt. But I guess I will stick with not appending
comma to memory in which case this extension to virBuffer isn't needed.

 (that said, I still think
 virBufferTruncate will be a useful addition in other contexts).

I think we should add virBufferTruncate only when we get any user for that.
Hence I am not including virBufferTruncate in v3 patchset.

 
 +if (virBufferError(buf))
 +goto error;
 +
 +node = virBufferContentAndReset(buf);
 +virCommandAddArg(cmd, node);
 +VIR_FREE(node);
 
 It's more efficient to replace these five lines with one:
 
 virCommandAddArgBuffer(cmd, buf);

I can't because this is in a loop and I want to break out from
the loop in case of buffer error. virCommandAddArgBuffer doesn't allow that.
But I did replace the last 3 lines with virCommandAddArgBuffer :)

 
 @@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn,
   virCommandAddArg(cmd, smp);
   VIR_FREE(smp);
 
 +if (def-cpu  def-cpu-ncells  qemuBuildNumaArgStr(def, cmd))
 
 Again, explicit  0 check when looking for errors.

Sure.

 
 +topology sockets=2 cores=4 threads=2/
 +numa
 +cell cpus=0-7 mems=109550/
 +cell cpus=8-15 mems=109550/
 
 Of course, this will need tweaking to match any XML changes made in
 1/3, but thanks for adding test cases!

Tweaked the testcases.

BTW the testcases were useful to me as they uncovered 2 bugs in my code!

 
 Overall, I think we'll need a v3 (you may want to use git send-email
 --subject-prefix=PATCHv3; it wasn't very clear from the subject line
 that this was already a v2 series), but I like where it's heading.

Will ensure [PATCH v3] for the next iteration.

Thanks. v3 on its

[libvirt] [PATCH v2 2/2] qemu: Generate -numa option

2011-11-11 Thread Bharata B Rao
qemu: Generate -numa option

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Add routines to generate -numa QEMU command line option based on
numa ... /numa XML specifications.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 src/conf/cpu_conf.c|3 +
 src/qemu/qemu_command.c|   63 
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args |5 ++
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml  |   25 
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args |6 ++
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml  |   25 
 tests/qemuxml2argvtest.c   |2 +
 7 files changed, 128 insertions(+), 1 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml


diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index 4aabe98..7c5bf69 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -352,11 +352,12 @@ virCPUDefParseXML(const xmlNodePtr node,
 
 ret  = virStrToLong_ui(memory, NULL, 10, def-cells[i].mem);
 if (ret == -1) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(Invalid 'memory' attribute in NUMA cell));
 VIR_FREE(cpus);
 VIR_FREE(memory);
 goto error;
 }
-
 VIR_FREE(cpus);
 VIR_FREE(memory);
 }
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 01ee23f..7083928 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3266,6 +3266,65 @@ qemuBuildSmpArgStr(const virDomainDefPtr def,
 return virBufferContentAndReset(buf);
 }
 
+static void
+qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf)
+{
+int i, first, last;
+int cpuSet = 0;
+
+for (i = 0; i  VIR_DOMAIN_CPUMASK_LEN; i++) {
+if (cpumask[i]) {
+if (cpuSet)
+last = i;
+else {
+first = last = i;
+cpuSet = 1;
+}
+} else {
+if (!cpuSet)
+continue;
+if (first == last)
+virBufferAsprintf(buf, %d,, first);
+else
+virBufferAsprintf(buf, %d-%d,, first, last);
+cpuSet = 0;
+   }
+}
+
+if (cpuSet) {
+if (first == last)
+virBufferAsprintf(buf, %d,, first);
+else
+virBufferAsprintf(buf, %d-%d,, first, last);
+}
+}
+
+static int
+qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd)
+{
+int i;
+virBuffer buf = VIR_BUFFER_INITIALIZER;
+
+for (i = 0; i  def-cpu-ncells; i++) {
+virCommandAddArg(cmd, -numa);
+virBufferAsprintf(buf, node,nodeid=%d, def-cpu-cells[i].cellid);
+virBufferAddLit(buf, ,cpus=);
+qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask, buf);
+virBufferAsprintf(buf, mem=%d,
+VIR_DIV_UP(def-cpu-cells[i].mem, 1024));
+
+if (virBufferError(buf))
+goto error;
+
+virCommandAddArgBuffer(cmd, buf);
+}
+return 0;
+
+error:
+virBufferFreeAndReset(buf);
+virReportOOMError();
+return -1;
+}
 
 /*
  * Constructs a argv suitable for launching qemu with config defined
@@ -3429,6 +3488,10 @@ qemuBuildCommandLine(virConnectPtr conn,
 virCommandAddArg(cmd, smp);
 VIR_FREE(smp);
 
+if (def-cpu  def-cpu-ncells)
+if (qemuBuildNumaArgStr(def, cmd)  0)
+goto error;
+
 if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) {
 virCommandAddArg(cmd, -name);
 if (driver-setProcessName 
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args 
b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
new file mode 100644
index 000..7c0dd30
--- /dev/null
+++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
@@ -0,0 +1,5 @@
+LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \
+-m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mem=107 \
+-numa node,nodeid=1,cpus=8-15,mem=107 -nographic -monitor \
+unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \
+-parallel none -usb
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml 
b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
new file mode 100644
index 000..9fcdda8
--- /dev/null
+++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
@@ -0,0 +1,25 @@
+domain type='qemu'
+  nameQEMUGuest1/name
+  uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid
+  memory219100/memory
+  currentMemory219100/currentMemory
+  vcpu16/vcpu
+  os
+type arch='x86_64' machine='pc'hvm/type
+boot dev='network'/
+  /os
+  cpu
+topology sockets=2 cores=4 threads=2/
+numa
+  cell cpus=0-7

[libvirt] [PATCH v2 0/2] Guest NUMA topology support

2011-11-11 Thread Bharata B Rao
Hi,

This is v2 of the patchset that adds support for specifying NUMA topology
for guests.

cpu
...
topology sockets='2' cores='4' threads='2'/
numa
cell cpus='0-7' mems='512000'/
cell cpus='8-15' mems='512000'/
/numa
...
/cpu

This change allows libvirt to generate -numa options for QEMU/KVM.

This patchset passes all tests except daemon-conf test.

Changes for v2
--
- Renamed mems to memory in NUMA cell specification.
- Make both cpus= and memory= mandatory in a NUMA cell.
- Reuse cpuset and memoryKB definitions for cpus and memory.
- Fix a bug in reading memory=.
- Correct error handling for usages of virXMLPropString.
- Support virsh dumpxml.
- Fix XML in domaincommon.rng so that numa works correctly for
  both cpu ... cpu as well as cpu match=... ... cpu
- Don't use virBufferTruncate.
- Modifiy qemuxml2argv test cases for s/mems/memory change.
- Use virBufferLit wherever possible.
- Now qemuBuildNumaCPUArgStr can't fail, hence doesn't need return val.
- Pass memory in MB to qemu.

v1 - https://www.redhat.com/archives/libvir-list/2011-November/msg00247.html
v0 - https://www.redhat.com/archives/libvir-list/2011-October/msg00025.html
RFC - http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v2 1/2] XML definitions for guest NUMA and parsing routines

2011-11-11 Thread Bharata B Rao
XML definitions for guest NUMA and parsing routines.

From: Bharata B Rao bhar...@linux.vnet.ibm.com

This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:

cpu
...
topology sockets='2' cores='4' threads='2'/
numa
cell cpus='0-7' memory='512000'/
cell cpus='8-15' memory='512000'/
/numa
...
/cpu

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 docs/formatdomain.html.in |   29 ++
 docs/schemas/domaincommon.rng |   31 ++-
 src/conf/cpu_conf.c   |   86 -
 src/conf/cpu_conf.h   |   13 ++
 src/conf/domain_conf.c|7 +++
 5 files changed, 163 insertions(+), 3 deletions(-)


diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index cbad196..28a4edc 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -628,6 +628,35 @@
   /dd
 /dl
 
+p
+  Guest NUMA topology can be specifed using  codenuma/code element.
+  span class=sinceSince 0.9.8/span
+/p
+
+pre
+  ...
+  lt;cpugt;
+...
+lt;numagt;
+  lt;cell cpus='0-3' memory='512000'/gt;
+  lt;cell cpus='4-7' memory='512000'/gt;
+lt;/numagt;
+...
+  lt;/cpugt;
+  .../pre
+
+p
+  Each codecell/code element specifies a NUMA cell or a NUMA node.
+  codecpus/code specifies the CPU or range of CPUs that are part of
+  the node. codememory/code specifies the node memory in kilobytes
+  (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
+  or nodeid in the increasing order starting from 0.
+/p
+
+p
+  This guest NUMA specification is currently available only for QEMU/KVM.
+/p
+
 h3a name=elementsLifecycleLifecycle control/a/h3
 
 p
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index b6f858e..afcaccc 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2297,7 +2297,14 @@
   define name=cpu
 element name=cpu
   choice
-ref name=cpuTopology/
+group
+  optional
+ref name=cpuTopology/
+  /optional  
+  optional
+ref name=cpuNuma/
+  /optional
+/group
 group
   ref name=cpuMatch/
   interleave
@@ -2311,6 +2318,9 @@
 zeroOrMore
   ref name=cpuFeature/
 /zeroOrMore
+optional
+  ref name=cpuNuma/
+/optional
   /interleave
 /group
   /choice
@@ -2371,6 +2381,25 @@
 /element
   /define
 
+  define name=cpuNuma
+element name=numa
+  oneOrMore
+ref name=numaCell/
+  /oneOrMore
+/element
+  /define
+
+  define name=numaCell
+element name=cell
+  attribute name=cpus
+ref name=cpuset/
+  /attribute
+  attribute name=memory
+ref name=memoryKB/
+  /attribute
+/element
+  /define
+
   !--
   System information specification:
Placeholder for system specific informations likes the ones
diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index 41e997e..4aabe98 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -28,6 +28,7 @@
 #include util.h
 #include buf.h
 #include cpu_conf.h
+#include domain_conf.h
 
 #define VIR_FROM_THIS VIR_FROM_CPU
 
@@ -67,6 +68,12 @@ virCPUDefFree(virCPUDefPtr def)
 VIR_FREE(def-features[i].name);
 VIR_FREE(def-features);
 
+for (i = 0 ; i  def-ncells ; i++) {
+VIR_FREE(def-cells[i].cpumask);
+VIR_FREE(def-cells[i].cpustr);
+}
+VIR_FREE(def-cells);
+
 VIR_FREE(def);
 }
 
@@ -109,7 +116,6 @@ no_memory:
 return NULL;
 }
 
-
 virCPUDefPtr
 virCPUDefParseXML(const xmlNodePtr node,
   xmlXPathContextPtr ctxt,
@@ -289,9 +295,75 @@ virCPUDefParseXML(const xmlNodePtr node,
 def-features[i].policy = policy;
 }
 
+if (virXPathNode(./numa[1], ctxt)) {
+VIR_FREE(nodes);
+n = virXPathNodeSet(./numa[1]/cell, ctxt, nodes);
+if (n = 0) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(NUMA topology defined without NUMA cells));
+goto error;
+}
+
+if (VIR_RESIZE_N(def-cells, def-ncells_max,
+ def-ncells, n)  0)
+goto no_memory;
+
+def-ncells = n;
+
+for (i = 0 ; i  n ; i++) {
+char *cpus, *cpus_parse, *memory;
+int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
+int ret, ncpus = 0;
+
+def-cells[i].cellid = i;
+cpus = cpus_parse = virXMLPropString(nodes[i], cpus);
+if (!cpus) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(Missing 'cpus' attribute in NUMA cell));
+goto error

Re: [libvirt] [PATCH 2/3] Routine to truncate virBuffer

2011-11-11 Thread Bharata B Rao
On Mon, Nov 07, 2011 at 11:12:07AM -0700, Eric Blake wrote:
 On 11/06/2011 06:58 AM, Bharata B Rao wrote:
 Routine to truncate virBuffer
 
 From: Bharata B Raobhar...@linux.vnet.ibm.com
 
 Add a helper to truncate virBuffer.
 
 /**
   * virBufferTruncate:
 
 +++ b/src/util/buf.c
 @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len)
   }
 
   /**
 + * virBufferTruncate:
 + * @buf:  the buffer
 + * @len:  number of bytes by which the buffer is truncated
 + *
 + * Truncate the buffer by @len bytes.
 + *
 + * Returns zero on success or -1 on error
 + */
 +int
 +virBufferTruncate(virBufferPtr buf, unsigned int len)
 
 What good is returning an error, given that callers already have to
 use virBufferError for detecting other errors, and given that you
 aren't marking the function as ATTRIBUTE_RETURN_CHECK to require
 callers to respect that error?  One of the points of the virBuffer
 API is that most functions can return void (for example, see
 virBufferAdjustIndent), because the user doesn't care about error
 checking until the end.

Ok, point taken.

 
 +{
 +if (buf-error)
 +return -1;
 +
 +if ((signed)(buf-use - len)  0) {
 
 I tend to write the cast as (int), not (signed), if a cast is even
 needed.  You don't catch all possible overflows, though - if
 buf-use is 2, and len is 4294967295 (aka (unsigned)-1), then you've
 proceeded to expand(!) buf-use to 3, skipping an uninitialized
 byte.  A better filter would be:
 
 if (len  buf-use)
 
 +virBufferSetError(buf, -1);
 +return -1;
 +}
 +
 +buf-use -= len;
 +buf-content[buf-use] = '\0';
 
 This looks correct, once you filter out invalid len first.

Agreed.

 
 +++ b/tests/virbuftest.c
 @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data 
 ATTRIBUTE_UNUSED)
   virBufferAddChar(buf, '\n');
   virBufferEscapeShell(buf,  11);
   virBufferAddChar(buf, '\n');
 -
 +virBufferAsprintf(buf, %d, 12);
 +virBufferTruncate(buf, 4);
 
 That gives no change in the expected output.  I'd almost rather see:
 
 virBufferAsprintf(buf, %d, 123);
 virBufferTruncate(buf, 1);
 
 as well as an update to the expected output to see output ending in 12.

The idea was to test with minimal change and hence resorted to this.

In any case, I am dropping this patch for the reason explained in another
thread.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 2/3] Routine to truncate virBuffer

2011-11-06 Thread Bharata B Rao
Routine to truncate virBuffer

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Add a helper to truncate virBuffer.

/**
 * virBufferTruncate:
 * @buf:  the buffer
 * @len:  number of bytes by which the buffer is truncated
 *
 * Truncate the buffer by @len bytes.
 *
 * Returns zero on success or -1 on error
 */
int virBufferTruncate(virBufferPtr buf, unsigned int len)

This doesn't reduce the buffer size, but instead reduces
the in-use buffer.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 src/libvirt_private.syms |1 +
 src/util/buf.c   |   25 +
 src/util/buf.h   |1 +
 tests/virbuftest.c   |3 ++-
 4 files changed, 29 insertions(+), 1 deletions(-)


diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 6a1562e..3a2ae86 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -33,6 +33,7 @@ virBufferEscapeString;
 virBufferFreeAndReset;
 virBufferGetIndent;
 virBufferStrcat;
+virBufferTruncate;
 virBufferURIEncodeString;
 virBufferUse;
 virBufferVasprintf;
diff --git a/src/util/buf.c b/src/util/buf.c
index 5043128..4bba350 100644
--- a/src/util/buf.c
+++ b/src/util/buf.c
@@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len)
 }
 
 /**
+ * virBufferTruncate:
+ * @buf:  the buffer
+ * @len:  number of bytes by which the buffer is truncated
+ *
+ * Truncate the buffer by @len bytes.
+ *
+ * Returns zero on success or -1 on error
+ */
+int
+virBufferTruncate(virBufferPtr buf, unsigned int len)
+{
+if (buf-error)
+return -1;
+
+if ((signed)(buf-use - len)  0) {
+virBufferSetError(buf, -1);
+return -1;
+}
+
+buf-use -= len;
+buf-content[buf-use] = '\0';
+return 0;
+}
+
+/**
  * virBufferAdd:
  * @buf: the buffer to append to
  * @str: the string
diff --git a/src/util/buf.h b/src/util/buf.h
index 1a8ebfb..76f48e0 100644
--- a/src/util/buf.h
+++ b/src/util/buf.h
@@ -63,5 +63,6 @@ void virBufferURIEncodeString(virBufferPtr buf, const char 
*str);
 
 void virBufferAdjustIndent(virBufferPtr buf, int indent);
 int virBufferGetIndent(const virBufferPtr buf, bool dynamic);
+int virBufferTruncate(virBufferPtr buf, unsigned int len);
 
 #endif /* __VIR_BUFFER_H__ */
diff --git a/tests/virbuftest.c b/tests/virbuftest.c
index 51a62fe..2230cb0 100644
--- a/tests/virbuftest.c
+++ b/tests/virbuftest.c
@@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data 
ATTRIBUTE_UNUSED)
 virBufferAddChar(buf, '\n');
 virBufferEscapeShell(buf,  11);
 virBufferAddChar(buf, '\n');
-
+virBufferAsprintf(buf, %d, 12);
+virBufferTruncate(buf, 4);
 result = virBufferContentAndReset(buf);
 if (!result || STRNEQ(result, expected)) {
 virtTestDifference(stderr, expected, result);

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 0/3] Guest NUMA topology support - v1

2011-11-06 Thread Bharata B Rao
Hi,

This patch series adds support for specifying NUMA topology for guests.

cpu
...
topology sockets='2' cores='4' threads='2'/
numa
cell cpus='0-7' mems='512000'/
cell cpus='8-15' mems='512000'/
/numa
...
/cpu

This change allows libvirt to generate -numa options for QEMU/KVM.

Changes since v0

- Patch re-organization as per danpb's comments.
- Use of virBufferAsPrintf instead of snprintf as per danpb's comments.
- Use cell instead of node as per danpb's suggestion.
- Add a helper to truncate virBuffer as per Eric's suggestion.
- Add documentation.
- Add testcases
- Add basic validity check for cpus specified in numa ... /numa

Testing

The patches pass all tests except domainschema test. I think this is not
a real failure, but the test is probably failing to pickup the
right domaincommon.rng file. Will investigate further.

TEST: domainschematest
   40 
   80 
  .!.. 120
   160
  .!.. 200
   240
   280
   320
  ...  323 FAILED

daemon-conf fails, but it fails even without my patches too. I guess my
patches are really affecting it.

TEST: daemon-conf
  .!!...!!!./daemon-conf: line 98: kill: (7811) - 
No such process
!   34  FAILED


v0
--
- https://www.redhat.com/archives/libvir-list/2011-October/msg00025.html

RFC
---
- http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1/3] XML definitions for guest NUMA

2011-11-06 Thread Bharata B Rao
XML definitions for guest NUMA and parsing routines.

From: Bharata B Rao bhar...@linux.vnet.ibm.com

This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:

cpu
...
topology sockets='2' cores='4' threads='2'/
numa
cell cpus='0-7' mems='512000'/
cell cpus='8-15' mems='512000'/
/numa
...
/cpu

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 docs/formatdomain.html.in |   32 +
 docs/schemas/domaincommon.rng |   32 +
 src/conf/cpu_conf.c   |   62 +
 src/conf/cpu_conf.h   |   12 
 src/conf/domain_conf.c|7 +
 5 files changed, 145 insertions(+), 0 deletions(-)


diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index c3e7752..8d070ca 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -628,6 +628,38 @@
   /dd
 /dl
 
+p
+  Guest NUMA topology can be specifed using  codenuma/code element.
+  span class=sinceSince X.X.X/span
+/p
+
+pre
+  ...
+  lt;cpugt;
+...
+lt;numagt;
+  lt;cell cpus='0-3' mems='512000'/gt;
+  lt;cell cpus='4-7' mems='512000'/gt;
+lt;/numagt;
+...
+  lt;/cpugt;
+  .../pre
+
+p
+  Each codecell/code element specifies a NUMA cell or a NUMA node.
+  codecpus/code specifies the CPU or range of CPUs that are part of
+  the node. codemems/code specifies the node memory in kilobytes
+  (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
+  or nodeid in the increasing order starting from 0.
+/p
+
+p
+  This guest NUMA specification translates to code-numa/code command
+  line option for QEMU/KVM. For the above example, the following QEMU
+  command line option is generated:
+  code-numa node,nodeid=0,cpus=0-3,mems=512000 -numa 
node,nodeid=1,cpus=4-7,mems=512000/code
+/p
+
 h3a name=elementsLifecycleLifecycle control/a/h3
 
 p
diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index b6f858e..b09060a 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2311,6 +2311,9 @@
 zeroOrMore
   ref name=cpuFeature/
 /zeroOrMore
+optional
+  ref name=cpuNuma/
+/optional
   /interleave
 /group
   /choice
@@ -2371,6 +2374,25 @@
 /element
   /define
 
+  define name=cpuNuma
+element name=numa
+  oneOrMore
+ref name=numaCell/
+  /oneOrMore
+/element
+  /define
+
+  define name=numaCell
+element name=cell
+  attribute name=cpus
+ref name=Cellcpus/
+  /attribute
+  attribute name=mems
+ref name=Cellmems/
+  /attribute
+/element
+  /define
+
   !--
   System information specification:
Placeholder for system specific informations likes the ones
@@ -2745,4 +2767,14 @@
   param name=pattern[a-zA-Z0-9_\.:]+/param
 /data
   /define
+  define name=Cellcpus
+data type=string
+  param 
name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param
+/data
+  /define
+  define name=Cellmems
+data type=unsignedInt
+  param name=pattern[0-9]+/param
+/data
+  /define
 /grammar
diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index 41e997e..e55897f 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -28,6 +28,7 @@
 #include util.h
 #include buf.h
 #include cpu_conf.h
+#include domain_conf.h
 
 #define VIR_FROM_THIS VIR_FROM_CPU
 
@@ -67,6 +68,10 @@ virCPUDefFree(virCPUDefPtr def)
 VIR_FREE(def-features[i].name);
 VIR_FREE(def-features);
 
+for (i = 0 ; i  def-ncells ; i++)
+VIR_FREE(def-cells[i].cpumask);
+VIR_FREE(def-cells);
+
 VIR_FREE(def);
 }
 
@@ -109,6 +114,19 @@ no_memory:
 return NULL;
 }
 
+static int
+virCPUDefNumaCPUs(virCPUDefPtr def)
+{
+int i, j, ncpus = 0;
+
+for (i = 0; i  def-ncells; i++) {
+for (j = 0; j  VIR_DOMAIN_CPUMASK_LEN; j++) {
+if (def-cells[i].cpumask[j])
+ncpus++;
+}
+}
+return ncpus;
+}
 
 virCPUDefPtr
 virCPUDefParseXML(const xmlNodePtr node,
@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
 def-features[i].policy = policy;
 }
 
+if (virXPathNode(./numa[1], ctxt)) {
+VIR_FREE(nodes);
+n = virXPathNodeSet(./numa[1]/cell, ctxt, nodes);
+if (n  0 || n == 0) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(NUMA topology defined without NUMA cells));
+goto error;
+}
+
+if (VIR_RESIZE_N(def-cells, def-ncells_max,
+ def-ncells, n)  0)
+goto no_memory;
+
+def-ncells = n;
+
+for (i = 0 ; i  n ; i

[libvirt] [PATCH 3/3] qemu: Generate -numa command line option

2011-11-06 Thread Bharata B Rao
qemu: Generate -numa option

From: Bharata B Rao bhar...@linux.vnet.ibm.com

Add routines to generate -numa QEMU command line option based on
numa ... /numa XML specifications.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 src/qemu/qemu_command.c|   71 
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args |5 +
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml  |   25 +++
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args |6 ++
 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml  |   25 +++
 tests/qemuxml2argvtest.c   |2 +
 6 files changed, 134 insertions(+), 0 deletions(-)
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args
 create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml


diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index dc92fa3..b042e34 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3251,6 +3251,74 @@ qemuBuildSmpArgStr(const virDomainDefPtr def,
 return virBufferContentAndReset(buf);
 }
 
+static int
+qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf)
+{
+int i, first, last;
+int cpuSet = 0;
+
+for (i = 0; i  VIR_DOMAIN_CPUMASK_LEN; i++) {
+if (cpumask[i]) {
+if (cpuSet)
+last = i;
+else {
+first = last = i;
+cpuSet = 1;
+}
+} else {
+if (!cpuSet)
+continue;
+if (first == last)
+virBufferAsprintf(buf, %d,, first);
+else
+virBufferAsprintf(buf, %d-%d,, first, last);
+cpuSet = 0;
+   }
+}
+
+if (cpuSet) {
+if (first == last)
+virBufferAsprintf(buf, %d,, first);
+else
+virBufferAsprintf(buf, %d-%d,, first, last);
+}
+
+/* Remove the trailing comma */
+return virBufferTruncate(buf, 1);
+}
+
+static int
+qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd)
+{
+int i;
+char *node;
+virBuffer buf = VIR_BUFFER_INITIALIZER;
+
+for (i = 0; i  def-cpu-ncells; i++) {
+virCommandAddArg(cmd, -numa);
+virBufferAsprintf(buf, %s, node);
+virBufferAsprintf(buf, ,nodeid=%d, def-cpu-cells[i].cellid);
+virBufferAsprintf(buf, ,cpus=);
+
+if (qemuBuildNumaCPUArgStr(def-cpu-cells[i].cpumask, buf))
+goto error;
+
+virBufferAsprintf(buf, ,mems=%d, def-cpu-cells[i].mem);
+
+if (virBufferError(buf))
+goto error;
+
+node = virBufferContentAndReset(buf);
+virCommandAddArg(cmd, node);
+VIR_FREE(node);
+}
+return 0;
+
+error:
+virBufferFreeAndReset(buf);
+virReportOOMError();
+return -1;
+}
 
 /*
  * Constructs a argv suitable for launching qemu with config defined
@@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn,
 virCommandAddArg(cmd, smp);
 VIR_FREE(smp);
 
+if (def-cpu  def-cpu-ncells  qemuBuildNumaArgStr(def, cmd))
+goto error;
+
 if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) {
 virCommandAddArg(cmd, -name);
 if (driver-setProcessName 
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args 
b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
new file mode 100644
index 000..e8fd277
--- /dev/null
+++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args
@@ -0,0 +1,5 @@
+LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \
+-m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mems=109550 \
+-numa node,nodeid=1,cpus=8-15,mems=109550 -nographic -monitor \
+unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \
+-parallel none -usb
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml 
b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
new file mode 100644
index 000..07fe891
--- /dev/null
+++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml
@@ -0,0 +1,25 @@
+domain type='qemu'
+  nameQEMUGuest1/name
+  uuidc7a5fdbd-edaf-9455-926a-d65c16db1809/uuid
+  memory219100/memory
+  currentMemory219100/currentMemory
+  vcpu16/vcpu
+  os
+type arch='x86_64' machine='pc'hvm/type
+boot dev='network'/
+  /os
+  cpu
+topology sockets=2 cores=4 threads=2/
+numa
+  cell cpus=0-7 mems=109550/
+  cell cpus=8-15 mems=109550/
+/numa
+  /cpu
+  clock offset='utc'/
+  on_poweroffdestroy/on_poweroff
+  on_rebootrestart/on_reboot
+  on_crashdestroy/on_crash
+  devices
+  emulator/./qemu.sh/emulator
+  /devices
+/domain
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args 
b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args
new file mode 100644
index 000..bc96cef
--- /dev/null
+++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args
@@ -0,0

Re: [libvirt] [RFC PATCH 2/2] Routines to parse numa ... /numa

2011-10-13 Thread Bharata B Rao
On Thu, Oct 13, 2011 at 12:50:00PM +0100, Daniel P. Berrange wrote:
 On Mon, Oct 03, 2011 at 03:31:35PM +0530, Bharata B Rao wrote:
  Routines to parse numa ... /numa
  
  From: Bharata B Rao bhar...@linux.vnet.ibm.com
  
  This patch adds routines to parse guest numa
  XML configuration for qemu.
  
  Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
  ---
  
   src/conf/cpu_conf.c |   48 
   src/conf/cpu_conf.h |   11 ++
   src/qemu/qemu_command.c |   94 
  +++
   3 files changed, 153 insertions(+), 0 deletions(-)
  
  
  diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
  index a13ba71..5b4345e 100644
  --- a/src/qemu/qemu_command.c
  +++ b/src/qemu/qemu_command.c
  @@ -3153,6 +3153,97 @@ qemuBuildSmpArgStr(const virDomainDefPtr def,
   return virBufferContentAndReset(buf);
   }
   
  +static char *
  +virParseNodeCPUs(char *cpumask)
 
 This is a rather misleading name, since it is really formatting
 the argument value. Can you rename this to qemuBuildNumaCPUArgStr

Sure will do.

 
  +{
  +int i, first, last, ret;
  +char *cpus, *ptr;
  +int cpuSet = 0;
  +int remaining = 128;
  +
  +if (VIR_ALLOC_N(cpus, remaining)  0)
  +return NULL;
  +
  +ptr = cpus;
  +
  +for (i = 0; i  VIR_DOMAIN_CPUMASK_LEN; i++) {
  +if (cpumask[i]) {
  +if (cpuSet)
  +last = i;
  +else {
  +first = last = i;
  +cpuSet = 1;
  +}
  +} else {
  +if (!cpuSet)
  +continue;
  +if (first == last)
  +ret = snprintf(ptr, remaining, %d,, first);
  +else
  +ret = snprintf(ptr, remaining, %d-%d,, first, last);
  +if (ret  remaining)
  +goto error;
  +ptr += ret;
  +remaining -= ret;
  +cpuSet = 0;
  +   }
  +}
  +
  +if (cpuSet) {
  +if (first == last)
  +ret = snprintf(ptr, remaining, %d,, first);
  +else
  +ret = snprintf(ptr, remaining, %d-%d,, first, last);
  +if (ret  remaining)
  +goto error;
  +}
  +
  +/* Remove the trailing comma */
  +*(--ptr) = '\0';
  +return cpus;
  +
  +error:
  +VIR_FREE(cpus);
  +return NULL;
 
 Using VIR_ALLOC_N + snprintf here is not desirable, when you
 already have a nice  virBufferPtr object in the call that you
 could use.  Just pass the virBufferPtr straight into this
 method.

Wanted to user virBufferPtr, but I realized that I need to remove
the last comma from the string and coudn't find an easy way to
do that. Hence resorted to this method. But I think I can still
achive this by not appending a comma to the next part of the agrument
(,mems). Let me see if I can do this cleanly in the next post.

 
  +}
  +
  +static int
  +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd)
  +{
  +int i;
  +char *cpus, *node;
  +virBuffer buf = VIR_BUFFER_INITIALIZER;
  +
  +for (i = 0; i  def-cpu-nnodes; i++) {
  +virCommandAddArg(cmd, -numa);
  +virBufferAsprintf(buf, %s, node);
  +virBufferAsprintf(buf, ,nodeid=%d, def-cpu-nodes[i].nodeid);
  +
  +cpus = virParseNodeCPUs(def-cpu-nodes[i].cpumask);
  +if (!cpus)
  +goto error;
  +
  +virBufferAsprintf(buf, ,cpus=%s, cpus);
  +virBufferAsprintf(buf, ,mems=%d, def-cpu-nodes[i].mem);
  +
  +if (virBufferError(buf)) {
  +VIR_FREE(cpus);
  +goto error;
  +}
  +
  +node = virBufferContentAndReset(buf);
  +virCommandAddArg(cmd, node);
  +
  +VIR_FREE(cpus);
  +VIR_FREE(node);
  +}
  +return 0;
  +
  +error:
  +virBufferFreeAndReset(buf);
  +virReportOOMError();
  +return -1;
  +}
   
   /*
* Constructs a argv suitable for launching qemu with config defined
  @@ -3319,6 +3410,9 @@ qemuBuildCommandLine(virConnectPtr conn,
   virCommandAddArg(cmd, smp);
   VIR_FREE(smp);
   
  +if (def-cpu-nnodes  qemuBuildNumaArgStr(def, cmd))
  +goto error;
  +
   if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) {
   virCommandAddArg(cmd, -name);
   if (driver-setProcessName 
 
 Looks fine code wise. For the future iterations, it would be good to
 change the split of patches slightly
 
   - Patch 1 for XML bits:  cpu_conf.c, cpu_conf.h, and docs/schemas/domain.rn
   - Patch 2 for qemu_command.c, tests/qemuxml2argvtest.c

Sure will rearrange in the next iteration.

Thanks for your review.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 0/3] Guest NUMA topology support - v0

2011-10-13 Thread Bharata B Rao
On Thu, Oct 13, 2011 at 12:53:22PM +0100, Daniel P. Berrange wrote:
 On Mon, Oct 03, 2011 at 03:28:44PM +0530, Bharata B Rao wrote:
  Hi,
  
  I discussed the possibilities of adding NUMA topology XML specification
  support for guests here some time back. Since my latest proposal
  (http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626)
  didn't get any response, I am posting a prototype implementation
  that supports specifying NUMA topology for QEMU guests.
  
  - The implementation is based on the last proposal I listed above.
 
 So we're basically only allowing a flat NUMA toplogy
 
  numa
 node  cpus='0,2,4,6' mems='1024
 node  cpus='8,10,12,14' mems='1024
 node  cpus='1,3,5,7' mems='1024'
 node  cpus='9,11,13,15' mems='1024'
  /numa
 
 which mirrors what QEMU allows currently. Should we need
 to support a hierarchy, we can trivially extend this
 syntax in a backwards compatible fashion
 
  numa
 node
   node  cpus='0,2,4,6' mems='1024
   node  cpus='8,10,12,14' mems='1024
 /node
node
   node  cpus='1,3,5,7' mems='1024'
   node  cpus='9,11,13,15' mems='1024'
/node
  /numa
 
 so I think this limitation is OK for now.

Fine then.

 
 In the virsh capabilities XML, we actually use the
 word 'cell' rather than 'node'. I think it might
 be preferrable to be consistent and use 'cell' here
 too.

I feel NUMA node sounds more familiar than NUMA cell.
But if libvirt prefers cell, we can go with cell I suppose.

 
  - The implementation is for QEMU only.
 
 That's fine.
 
  - The patchset has gone through extremely light testing and I have
just tested booting a 2 socket 2 core 2 thread QEMU guest.
  - I haven't really bothered to cover all the corner cases and haven't
run libvirt tests after this patchset. For eg, there is no code
to validate if the CPU combination specified by topology and
numa match with each other. I plan to cover all these after we
freeze the specification itself.
 
 ok
 
 
 WRT the question about CPU enumeration order in the URL quoted
 above. I don't think it matters whether we enumerate CPUs in the
 same order as real hardware. The key thing is that we just choose
 an order, document what *our* enumeration order is, and then stick
 to it forever.

Ok fine.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 1/2] Add XML definitions for guest NUMA specifications

2011-10-03 Thread Bharata B Rao
Add XML definitions for guest NUMA specifications.

From: Bharata B Rao bhar...@linux.vnet.ibm.com

NUMA topology for guest is specified as follows:

cpu
...
numa
node cpus='0-3' mems='1024'
node cpus='4,5,6,7' mems='1024'
node cpus='8-10',11-12^12' mems='1024'
/numa

/cpu

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 docs/schemas/domaincommon.rng |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)


diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng
index 4972fac..99b70c3 100644
--- a/docs/schemas/domaincommon.rng
+++ b/docs/schemas/domaincommon.rng
@@ -2252,6 +2252,9 @@
 zeroOrMore
   ref name=cpuFeature/
 /zeroOrMore
+optional
+  ref name=cpuNuma/
+/optional
   /interleave
 /group
   /choice
@@ -2312,6 +2315,25 @@
 /element
   /define
 
+  define name=cpuNuma
+element name=numa
+  oneOrMore
+ref name=numaNode/
+  /oneOrMore
+/element
+  /define
+
+  define name=numaNode
+element name=node
+  attribute name=cpus
+ref name=Nodecpus/
+  /attribute
+  attribute name=mems
+ref name=Nodemems/
+  /attribute
+/element
+  /define
+
   !--
   System information specification:
Placeholder for system specific informations likes the ones
@@ -2665,4 +2687,14 @@
   param name=pattern[a-zA-Z0-9_\.:]+/param
 /data
   /define
+  define name=Nodecpus
+data type=string
+  param 
name=pattern([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*/param
+/data
+  /define
+  define name=Nodemems
+data type=unsignedInt
+  param name=pattern[0-9]+/param
+/data
+  /define
 /grammar

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 0/3] Guest NUMA topology support - v0

2011-10-03 Thread Bharata B Rao
Hi,

I discussed the possibilities of adding NUMA topology XML specification
support for guests here some time back. Since my latest proposal
(http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626)
didn't get any response, I am posting a prototype implementation
that supports specifying NUMA topology for QEMU guests.

- The implementation is based on the last proposal I listed above.
- The implementation is for QEMU only.
- The patchset has gone through extremely light testing and I have
  just tested booting a 2 socket 2 core 2 thread QEMU guest.
- I haven't really bothered to cover all the corner cases and haven't
  run libvirt tests after this patchset. For eg, there is no code
  to validate if the CPU combination specified by topology and
  numa match with each other. I plan to cover all these after we
  freeze the specification itself.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 2/2] Routines to parse numa ... /numa

2011-10-03 Thread Bharata B Rao
Routines to parse numa ... /numa

From: Bharata B Rao bhar...@linux.vnet.ibm.com

This patch adds routines to parse guest numa
XML configuration for qemu.

Signed-off-by: Bharata B Rao bhar...@linux.vnet.ibm.com
---

 src/conf/cpu_conf.c |   48 
 src/conf/cpu_conf.h |   11 ++
 src/qemu/qemu_command.c |   94 +++
 3 files changed, 153 insertions(+), 0 deletions(-)


diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index 5cecda2..c520025 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -28,6 +28,7 @@
 #include util.h
 #include buf.h
 #include cpu_conf.h
+#include domain_conf.h
 
 #define VIR_FROM_THIS VIR_FROM_CPU
 
@@ -67,6 +68,10 @@ virCPUDefFree(virCPUDefPtr def)
 VIR_FREE(def-features[i].name);
 VIR_FREE(def-features);
 
+for (i = 0 ; i  def-nnodes ; i++)
+VIR_FREE(def-nodes[i].cpumask);
+VIR_FREE(def-nodes);
+
 VIR_FREE(def);
 }
 
@@ -289,6 +294,49 @@ virCPUDefParseXML(const xmlNodePtr node,
 def-features[i].policy = policy;
 }
 
+if (virXPathNode(./numa[1], ctxt)) {
+VIR_FREE(nodes);
+n = virXPathNodeSet(./numa[1]/node, ctxt, nodes);
+if (n  0 || n == 0) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(NUMA topology defined without NUMA nodes));
+goto error;
+}
+
+if (VIR_RESIZE_N(def-nodes, def-nnodes_max,
+ def-nnodes, n)  0)
+goto no_memory;
+
+def-nnodes = n;
+
+for (i = 0 ; i  n ; i++) {
+char *cpus;
+int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
+unsigned long ul;
+int ret;
+
+def-nodes[i].nodeid = i;
+cpus = virXMLPropString(nodes[i], cpus);
+
+if (VIR_ALLOC_N(def-nodes[i].cpumask, cpumasklen)  0)
+goto no_memory;
+
+if (virDomainCpuSetParse((const char **)cpus,
+ 0, def-nodes[i].cpumask,
+ cpumasklen)  0)
+goto error;
+
+ret = virXPathULong(string(./numa[1]/node/@mems),
+ctxt, ul);
+if (ret  0) {
+virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+%s, _(Missing 'mems' attribute in NUMA topology));
+goto error;
+}
+def-nodes[i].mem = (unsigned int) ul;
+}
+}
+
 cleanup:
 VIR_FREE(nodes);
 
diff --git a/src/conf/cpu_conf.h b/src/conf/cpu_conf.h
index 57b85e1..266ec81 100644
--- a/src/conf/cpu_conf.h
+++ b/src/conf/cpu_conf.h
@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
 int policy; /* enum virCPUFeaturePolicy */
 };
 
+typedef struct _virNodeDef virNodeDef;
+typedef virNodeDef *virNodeDefPtr;
+struct _virNodeDef {
+   int nodeid;
+   char *cpumask;  /* CPUs that are part of this node */
+   unsigned int mem;   /* Node memory */
+};
+
 typedef struct _virCPUDef virCPUDef;
 typedef virCPUDef *virCPUDefPtr;
 struct _virCPUDef {
@@ -81,6 +89,9 @@ struct _virCPUDef {
 size_t nfeatures;
 size_t nfeatures_max;
 virCPUFeatureDefPtr features;
+size_t nnodes;
+size_t nnodes_max;
+virNodeDefPtr nodes;
 };
 
 
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index a13ba71..5b4345e 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3153,6 +3153,97 @@ qemuBuildSmpArgStr(const virDomainDefPtr def,
 return virBufferContentAndReset(buf);
 }
 
+static char *
+virParseNodeCPUs(char *cpumask)
+{
+int i, first, last, ret;
+char *cpus, *ptr;
+int cpuSet = 0;
+int remaining = 128;
+
+if (VIR_ALLOC_N(cpus, remaining)  0)
+return NULL;
+
+ptr = cpus;
+
+for (i = 0; i  VIR_DOMAIN_CPUMASK_LEN; i++) {
+if (cpumask[i]) {
+if (cpuSet)
+last = i;
+else {
+first = last = i;
+cpuSet = 1;
+}
+} else {
+if (!cpuSet)
+continue;
+if (first == last)
+ret = snprintf(ptr, remaining, %d,, first);
+else
+ret = snprintf(ptr, remaining, %d-%d,, first, last);
+if (ret  remaining)
+goto error;
+ptr += ret;
+remaining -= ret;
+cpuSet = 0;
+   }
+}
+
+if (cpuSet) {
+if (first == last)
+ret = snprintf(ptr, remaining, %d,, first);
+else
+ret = snprintf(ptr, remaining, %d-%d,, first, last);
+if (ret  remaining)
+goto error;
+}
+
+/* Remove the trailing comma */
+*(--ptr) = '\0';
+return cpus;
+
+error:
+VIR_FREE(cpus);
+return NULL;
+}
+
+static int
+qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd)
+{
+int i;
+char *cpus, *node;
+virBuffer buf = VIR_BUFFER_INITIALIZER

[libvirt] Dependency between vcpu and topology

2011-09-15 Thread Bharata B Rao
Hi,

Is there a dependency b/n vcpu and topology ?

Is specifying vcpu mandatory for a SMP guest ?

Can't I skip vcpu but have topology sockets= cores= threads= to
define a SMP guest ?

I see that if I skip vcpu, libvirt generates -smp 1 on qemu command
line which results in qemu booting a VM with single CPU irrespective
of the topology specified. Shouldn't libvirt arrive at correct CPU
count (vcpus=sockets*cores*threads ?) even if vcpu isn't specified
rather than defaulting to 1 ?

Regards,
Bharata.
-- 
http://bharata.sulekha.com/blog/posts.htm, http://raobharata.wordpress.com/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] NUMA topology specification

2011-09-01 Thread Bharata B Rao
On Tue, Aug 30, 2011 at 10:53 AM, Bharata B Rao bharata@gmail.com wrote:
 Hi,

 Here is another attempt at guest NUMA topology XML specification that
 should work for different NUMA topologies.


Hi Daniel,

Do you think I should go ahead and implement this ?  Any comments or concerns ?

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] NUMA topology specification

2011-08-29 Thread Bharata B Rao
Hi,

Here is another attempt at guest NUMA topology XML specification that
should work for different NUMA topologies.

We already specify the number of sockets, cores and threads a system
has by using:

cpu
topology sockets='2' cores='2' threads='2'
/cpu

For NUMA, we can add the following:

numa
node cpus='0-3' mems='1024'
node cpus='4-7' mems='1024'
/numa

Specifying only cpus in the NUMA node specification should be enough
to represent most of the topologies. Based on the number of cpus
specified in each node, we should be able to work out how many cores
and sockets will be part of each node. Only other thing needed is
explicit memory specification.

I have taken a few example NUMA topologies here and shown how the
above specification can help.

Magny cours
-
Topology desc: http://code.google.com/p/likwid-topology/wiki/AMD_MagnyCours8

cpu
topology sockets='4' cores='4' threads='1'
numa
node cpus='0-3' mems='1024'
node cpus='4-7' mems='1024'
node cpus='8-11' mems='1024'
node cpus='12-15'mems='1024'
/numa
cpu

OR if we want to stick to how CPUs get enumerated in real hardware we
can specify like this:

cpu
topology sockets='4' cores='4' threads='1'
numa
node  cpus='0,2,4,6' mems='1024
node  cpus='8,10,12,14' mems='1024
node  cpus='1,3,5,7' mems='1024'
node  cpus='9,11,13,15' mems='1024'
/numa
cpu

The above two specifations for Magny Cours aren't perfect because we
conveniently converted the multi-level NUMA into sigle level NUMA.
System has 2 sockets and 2 NUMA domains consisting of 4 cores in each
domain, but we aren't really reflecting this in the topology
specification. But does this really matter ? We are still showing 4
distinct NUMA domains.

Nehalem

Topology desc: http://code.google.com/p/likwid-topology/wiki/Intel_Nehalem

cpu
topology sockets='2' cores='4' threads='2'
numa
node cpus='0-7' mems='1024'
node cpus='8-15' mems='1024'
/numa
/cpu

OR if we want to stick to how CPUs get enumerated in real hardware we
can specify like this:

cpu
topology sockets='2' cores='4' threads='2'
numa
node cpus='0-3,8-11' mems='1024'
node cpus='4-7,12-15' mems='1024'
/numa
/cpu

However there is a problem here. The specification isn't granular
enough to specify which CPU is part of which core. As you can see in
the topology diagram, CPUs 0,8 belong to one core, CPUs 1,9 belong to
one core etc. So the whole point of specifying all the CPUs explicitly
in the specification gets defeated.

Dunnington
---
Topology desc: 2 nodes, 4 sockets in each node, 6 cores in each socket.

cpu
topology sockets='8' cores='6 threads='1'
numa
node cpus='0-23' mems=1024
node cpus='24-47' mems=1024
/numa
/cpu

Here also there is the same problem. CPUs 0,4,8,12,16,,20 belong to a
core but the specifcation doesn't allow for that.

So here are some questions that we need to answer:

- Can we just go with flat NUMA specification and convert multilevel
NUMA into flat NUMA wherever possible (like in the above Magny cours
eg) ?
- Are there topologies where this doesn't work ?
- Isn't it enough to enumerate CPUs serially among cores and sockets
and not enumerate them exactly as in real hardware ?

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] NUMA topology specification

2011-08-24 Thread Bharata B Rao
On Tue, Aug 23, 2011 at 7:43 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Fri, Aug 19, 2011 at 12:05:43PM +0530, Bharata B Rao wrote:
 Hi,

 qemu supports specification of NUMA topology on command line using -numa 
 option.

 -numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]

 I see that there is no way to specify such NUMA topology in libvirt
 XML. Are there plans to add support for NUMA topology specification ?
 Is anybody already working on this ? If not I would like to add this
 support for libvirt.

 Currently the topology specification available in libvirt ( topology
 sockets='1' cores='2' threads='1'/) translates to -smp
 sockets=1,cores=2,threads=1 option of qemu. There is not equivalent
 in libvirt that could generate -numa command line option of qemu.

 How about something like this ? (OPTION 1)

 cpu
 ...
 numa nodeid='node' cpus='cpu[-cpu]' mem='size'
 ...
 /cpu

 And we could specify multiple such lines, one for each node.

 I'm not sure it really makes sense having the NUMA memory config
 inside the cpu configuration, but i like the simplicity of
 of this specification.

Yes, memory specification inside cpu, may be we could define a
separate numa section as shown in your examples below and put it
outside of cpu.


 -numa and -smp options in qemu do not work all that well since they
 are parsed independent of each other and one could specify a cpu set
 with -numa option that is incompatible with sockets,cores and threads
 specified on -smp option. This should be fixed in qemu, but given that
 such a problem has been observed, should libvirt tie the specification
 of numa and smp (sockets,threads,cores) together so that one is forced
 to specify only valid combinations of nodes and cpus in libvirt ?

 No matter what we do, libvirt is going to have todo some kind of
 semantic validation on the different info.

Right. Given that we have vcpus as well as vcpu current, libvirt
needs to ensure that specified topology is sane.


 May be something like this: (OPTION 2)

 cpu
 ...
 topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size'
 topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size'
 ...
 /cpu

 This should result in a 2 node system with each node having 1 socket
 with 2 cores.

 This has the problem of redundancy of specification of the sockets,
 cores  threads, vs the new 'cpus' attribute. eg you can specify
 wierd configs like:

Yes, sockets,cores,threads become redundant, one option is to define
sockets,cores,threads once (like we currently do inside cpu) and
have it as a common definition for all the numa nodes defined.
Something like this:

cpu
  topology sockets='1' cores='2' threads='1'
  numa node cpus='0-1' mems=1024
  numa node cpus='2-3' mems=1024
/cpu

This will result in a 2 node system with each node having 1 socket
with 2 cores each. But as you can see this will be restrictive since
you can't specify different topologies for different nodes. Are there
such non-symmetric systems out there and should libvirt be flexible
enough to support such NUMA topologies for VMs ?

Also looks like nodeid (from OPTION 2 of my original mail) is
redundant, may be we should assign increasing node ids based on the
number of numa topology statements that appear. In the above example,
we can implicitly assign node ids 0 and 1 for two nodes.

Adam Litke suggested that we can omit cpus= from the specification
since it can be derived, but given that there are topologies that
don't enumerate the CPUs within a socket serially, it becomes
necessary to have an explicit cpus= specification.


  topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size'
  topology sockets='2' cores='1' threads='1' nodeid='1' cpus='2-3' mem='size'

 Or even  bogus configs

  topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size'
  topology sockets='4' cores='1' threads='1' nodeid='1' cpus='2-3' mem='size'

 That all said, given our current XML schema, it is inevitable that we
 will have some level of duplication of information.


 Some things that are important to consider are how this interacts with
 possible CPU / memory hotplug in the future,

So what are the issues we need to take care of here ?

 and how we will be able
 to pin guest NUMA nodes, to host NUMA nodes.

This would be a good thing to do in libvirt. I think libvirt should
intelligently place VMs on host nodes based on the guest topology.

But I don't clearly see what issues we need to take care of now while
we come up with NUMA topology definition for VM.


 For the first point, it might be desirable to create a NUMA topology
 which supports upto 8 logical CPUs, but only have 2 physical sockets
 actually plugged in at boot time.

 Also, I dread to question whether we want to be able to represent a
 multi-level NUMA topology, or just assume one level. If we want to
 be able to cope with multi-level topology, can we assume the levels
 are solely grouping at the socket

Re: [libvirt] [RFC] NUMA topology specification

2011-08-21 Thread Bharata B Rao
On Fri, Aug 19, 2011 at 7:10 PM, Adam Litke a...@us.ibm.com wrote:
 On 08/19/2011 01:35 AM, Bharata B Rao wrote:
 ...
 topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size'
 topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size'
 ...

 I like the idea of merging this into topology to prevent errors with
 specifying incompatible cpu and numa topologies but I think you can go a
 step further (assuming my following assertion is valid).  Since cpus are
 assigned to numa nodes at the core level, and you are providing a
 'nodeid' attribute, you can infer the 'cpus' attribute using 'cores' and
 'nodeid' alone.

 For your example above:
 topology sockets='1' cores='2' threads='1' nodeid='0' mem='size'
 topology sockets='1' cores='2' threads='1' nodeid='1' mem='size'

 You have 4 cores total, each node is assigned 2.  Assign cores to nodes
 starting with core 0 and node 0.

Sounds good. Unless anyone or any architecture has specific
requirements of enumerating CPUs differently across nodes, 'cpus=' is
redundant as you observe.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC] NUMA topology specification

2011-08-19 Thread Bharata B Rao
Hi,

qemu supports specification of NUMA topology on command line using -numa option.

-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]

I see that there is no way to specify such NUMA topology in libvirt
XML. Are there plans to add support for NUMA topology specification ?
Is anybody already working on this ? If not I would like to add this
support for libvirt.

Currently the topology specification available in libvirt ( topology
sockets='1' cores='2' threads='1'/) translates to -smp
sockets=1,cores=2,threads=1 option of qemu. There is not equivalent
in libvirt that could generate -numa command line option of qemu.

How about something like this ? (OPTION 1)

cpu
...
numa nodeid='node' cpus='cpu[-cpu]' mem='size'
...
/cpu

And we could specify multiple such lines, one for each node.

-numa and -smp options in qemu do not work all that well since they
are parsed independent of each other and one could specify a cpu set
with -numa option that is incompatible with sockets,cores and threads
specified on -smp option. This should be fixed in qemu, but given that
such a problem has been observed, should libvirt tie the specification
of numa and smp (sockets,threads,cores) together so that one is forced
to specify only valid combinations of nodes and cpus in libvirt ?

May be something like this: (OPTION 2)

cpu
...
topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1' mem='size'
topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3' mem='size'
...
/cpu

This should result in a 2 node system with each node having 1 socket
with 2 cores.

Comments, suggestions ?

Regards,
Bharata.
-- 
http://bharata.sulekha.com/blog/posts.htm, http://raobharata.wordpress.com/

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] NUMA topology specification

2011-08-19 Thread Bharata B Rao
On Fri, Aug 19, 2011 at 12:55 PM, Osier Yang jy...@redhat.com wrote:
 于 2011年08月19日 14:35, Bharata B Rao 写道:

 How about something like this ? (OPTION 1)

 cpu
 ...
 numa nodeid='node' cpus='cpu[-cpu]' mem='size'
 ...
 /cpu


 Libvirt already supported NUMA setting (both cpu and memory)
 on host yet, but yes, nothing for NUMA setting inside guest yet.

 We have talked once about the XML when adding the support
 for numa memory setting on host. And finally choosed to introduce
 new XML node for it with considering to add support for NUMA
 setting inside guest one day. The XML is:

 numatune
 memory mode=strict nodeset=1-4,^3/
 /numatune

But this only specifies the host NUMA policy that should be used for
guest VM processes.


 So, personlly, I think the new XML should be inside numatune
 as a child node.


 And we could specify multiple such lines, one for each node.

 -numa and -smp options in qemu do not work all that well since they
 are parsed independent of each other and one could specify a cpu set
 with -numa option that is incompatible with sockets,cores and threads
 specified on -smp option. This should be fixed in qemu, but given that
 such a problem has been observed, should libvirt tie the specification
 of numa and smp (sockets,threads,cores) together so that one is forced
 to specify only valid combinations of nodes and cpus in libvirt ?

 May be something like this: (OPTION 2)

 cpu
 ...
 topology sockets='1' cores='2' threads='1' nodeid='0' cpus='0-1'
 mem='size'
 topology sockets='1' cores='2' threads='1' nodeid='1' cpus='2-3'
 mem='size'
 ...
 /cpu

 This will cause we have 3 places for NUMA,
 one is numatune,

As I observed above, this controls the NUMA policy of the guest VM
threads on host.

 the other is vcpu,

vcpu/cpuset specifies how vcpu threads should be pinned on host.

 and this one.

I think what we are addressing here is a bit different from the above
two. Here we are actually trying to _define_ the NUMA topology of the
guest, while via other capabilites (numatune, vcpu) we only control
the cpu and memory bindings of vcpu threads on host.

Hence I am not sure if if numatune is the right place for defining
host NUMA topology which btw should be independent of the host
topology.

Thanks for your response.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC] NUMA topology specification

2011-08-19 Thread Bharata B Rao
2011/8/19 Bharata B Rao bharata@gmail.com:
 Hence I am not sure if if numatune is the right place for defining
 host NUMA topology which btw should be independent of the host
^^guest

Sorry for the typo.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] NUMA topology specification

2011-08-19 Thread Bharata B Rao
2011/8/19 Osier Yang jy...@redhat.com:

 Maybe something like:

 numatune
 guest
 ..
 /guest
 /numatune

Yes, one possible solution.

Let me wait and see what others say and what will be the consensus.

Regards,
Bharata.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list