Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...

2008-10-03 Thread Evgeniy Sokolov

Hi.

I rewrite functions taken from vzctl in the new patch, so this:
 



Here is the patch, that implements the following behaviour

* interface name inside container is automatically generated and
equals ethN,
   where N is the number of that interface within current domain
* mac address of that interface inside container is generated
automatically by
   function openvzGenerateMac
* if target dev= is specified, use it; otherwise, use the
default openvz name,
   i.e., vethN.M for interface ethM in container with veid N
* if mac address='...' specified, use it; otherwise, vzctl
will generate it automatically
* target dev and mac address are (re)stored (from)to $veid.conf


is true, while that:


the functions openvzGenerateMac and
openvzGenerateVethName are taken from vzctl sources and slightly
changed then


is not true anymore.


+static char *
+openvzGenerateMac(void)
+{
+char mac[6] = {
+0x52,
+0x54,
+0x00,
How did you get 0x52, 0x54, 0x00?

+if (net-type == VIR_DOMAIN_NET_TYPE_BRIDGE) {
+static int vnetNo = 0;

Libvirt may be use as library in applications. If some will call create 
2 containers, then first container will have eth0...ethN

second will have ethN+1...

Other looks good.





--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...

2008-10-03 Thread Anton Protopopov
2008/10/3 Evgeniy Sokolov [EMAIL PROTECTED]

  Hi.

 I rewrite functions taken from vzctl in the new patch, so this:


Here is the patch, that implements the following behaviour

* interface name inside container is automatically generated and
equals ethN,
   where N is the number of that interface within current domain
* mac address of that interface inside container is generated
automatically by
   function openvzGenerateMac
* if target dev= is specified, use it; otherwise, use the
default openvz name,
   i.e., vethN.M for interface ethM in container with veid N
* if mac address='...' specified, use it; otherwise, vzctl
will generate it automatically
* target dev and mac address are (re)stored (from)to $veid.conf


 is true, while that:


the functions openvzGenerateMac and
openvzGenerateVethName are taken from vzctl sources and slightly
changed then


 is not true anymore.


 +static char *
 +openvzGenerateMac(void)
 +{
 +char mac[6] = {
 +0x52,
 +0x54,
 +0x00,
 How did you get 0x52, 0x54, 0x00?


This is the same as in virDomainNetRandomMAC() function, see the source code
for it.




 +if (net-type == VIR_DOMAIN_NET_TYPE_BRIDGE) {
 +static int vnetNo = 0;

 Libvirt may be use as library in applications. If some will call create 2
 containers, then first container will have eth0...ethN
 second will have ethN+1...

OK, I will try to fix it.


 Other looks good.

Actually, there is a bug :( When we generate host interface name, we need to
save it in net-ifname.



P.S. Now I am working on that:



 P.S. Are someone going to implement
  interface type='bridge'
 ...
  source bridge=...
 ...
   /interface
 part of openvz driver? :)

 I plan to implement it in a month.
 It will be fine if you are ready to develop the feature.


I'm using $veid.conf to store information about bridge device in the
following manner:
for interface ifname, therhe will be a line
   #BRIGDE(ifname): bridge name
for example,
   #BRIDGE(veth101.0): virbr1

Do you agree with that behaviour?
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 04:04:06PM +0400, Evgeniy Sokolov wrote:
 
 P.S. Now I am working on that:
 
 
 
 P.S. Are someone going to implement
  interface type='bridge'
 ...
  source bridge=...
 ...
   /interface
 part of openvz driver? :)
 
 I plan to implement it in a month.
 It will be fine if you are ready to develop the feature.
 
 
 I'm using $veid.conf to store information about bridge device in the 
 following manner:
 for interface ifname, therhe will be a line
#BRIGDE(ifname): bridge name
 for example,
#BRIDGE(veth101.0): virbr1
 
 Do you agree with that behaviour?
 I think we can simplify format to simplify parsing.
 #BRIDGE=virbr1:veth101.0,veth101.1,veth101.2

There's no need to store  veth101.0, veth101.2 - they are automatically
generated  not intended to be stable across restarts. It is merely
neccessary to persist the name of the bridge (or virtual network) to
which they are attached, and the MAC address.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] PATCH: 2 of 5: Remove virDomainObjPtr linked list

2008-10-03 Thread Daniel P. Berrange
This patch removes the linked list in the virDomainObjPtr object, and
adds a new virDomainObjList struct to track domains as an array. The
QEMU, LXC, OpenVZ and Test drivers get updated to account for this API
change.

 domain_conf.c   |   99 
 domain_conf.h   |   23 +++---
 lxc_conf.h  |2 
 lxc_driver.c|  112 +--
 openvz_conf.c   |   24 ++
 openvz_conf.h   |2 
 openvz_driver.c |   54 +++
 qemu_conf.h |2 
 qemu_driver.c   |  198 +---
 test.c  |   77 +++--
 10 files changed, 274 insertions(+), 319 deletions(-)


Daniel

diff -r 2bb12d98c26b src/domain_conf.c
--- a/src/domain_conf.c Fri Oct 03 12:45:58 2008 +0100
+++ b/src/domain_conf.c Fri Oct 03 12:58:03 2008 +0100
@@ -166,44 +166,40 @@
 }
 
 
-virDomainObjPtr virDomainFindByID(const virDomainObjPtr doms,
+virDomainObjPtr virDomainFindByID(const virDomainObjListPtr doms,
   int id)
 {
-virDomainObjPtr dom = doms;
-while (dom) {
-if (virDomainIsActive(dom)  dom-def-id == id)
-return dom;
-dom = dom-next;
-}
+unsigned int i;
+
+for (i = 0 ; i  doms-count ; i++)
+if (virDomainIsActive(doms-objs[i]) 
+doms-objs[i]-def-id == id)
+return doms-objs[i];
 
 return NULL;
 }
 
 
-virDomainObjPtr virDomainFindByUUID(const virDomainObjPtr doms,
+virDomainObjPtr virDomainFindByUUID(const virDomainObjListPtr doms,
 const unsigned char *uuid)
 {
-virDomainObjPtr dom = doms;
+unsigned int i;
 
-while (dom) {
-if (!memcmp(dom-def-uuid, uuid, VIR_UUID_BUFLEN))
-return dom;
-dom = dom-next;
-}
+for (i = 0 ; i  doms-count ; i++)
+if (!memcmp(doms-objs[i]-def-uuid, uuid, VIR_UUID_BUFLEN))
+return doms-objs[i];
 
 return NULL;
 }
 
-virDomainObjPtr virDomainFindByName(const virDomainObjPtr doms,
+virDomainObjPtr virDomainFindByName(const virDomainObjListPtr doms,
 const char *name)
 {
-virDomainObjPtr dom = doms;
+unsigned int i;
 
-while (dom) {
-if (STREQ(dom-def-name, name))
-return dom;
-dom = dom-next;
-}
+for (i = 0 ; i  doms-count ; i++)
+if (STREQ(doms-objs[i]-def-name, name))
+return doms-objs[i];
 
 return NULL;
 }
@@ -425,13 +421,24 @@
 VIR_FREE(dom);
 }
 
+void virDomainObjListFree(virDomainObjListPtr vms)
+{
+unsigned int i;
+
+for (i = 0 ; i  vms-count ; i++)
+virDomainObjFree(vms-objs[i]);
+
+VIR_FREE(vms-objs);
+vms-count = 0;
+}
+
 virDomainObjPtr virDomainAssignDef(virConnectPtr conn,
-   virDomainObjPtr *doms,
+   virDomainObjListPtr doms,
const virDomainDefPtr def)
 {
 virDomainObjPtr domain;
 
-if ((domain = virDomainFindByName(*doms, def-name))) {
+if ((domain = virDomainFindByName(doms, def-name))) {
 if (!virDomainIsActive(domain)) {
 virDomainDefFree(domain-def);
 domain-def = def;
@@ -451,33 +458,41 @@
 
 domain-state = VIR_DOMAIN_SHUTOFF;
 domain-def = def;
-domain-next = *doms;
 
-*doms = domain;
+if (VIR_REALLOC_N(doms-objs, doms-count + 1)  0) {
+virDomainReportError(conn, VIR_ERR_NO_MEMORY, NULL);
+VIR_FREE(domain);
+return NULL;
+}
+
+doms-objs[doms-count] = domain;
+doms-count++;
 
 return domain;
 }
 
-void virDomainRemoveInactive(virDomainObjPtr *doms,
+void virDomainRemoveInactive(virDomainObjListPtr doms,
  virDomainObjPtr dom)
 {
-virDomainObjPtr prev = NULL;
-virDomainObjPtr curr = *doms;
+unsigned int i;
 
-while (curr 
-   curr != dom) {
-prev = curr;
-curr = curr-next;
+for (i = 0 ; i  doms-count ; i++) {
+if (doms-objs[i] == dom) {
+virDomainObjFree(doms-objs[i]);
+
+if (i  (doms-count - 1))
+memmove(doms-objs + i, doms-objs + i + 1,
+sizeof(*(doms-objs)) * (doms-count - (i + 1)));
+
+if (VIR_REALLOC_N(doms-objs, doms-count - 1)  0) {
+; /* Failure to reduce memory allocation isn't fatal */
+}
+doms-count--;
+
+break;
+}
 }
 
-if (curr) {
-if (prev)
-prev-next = curr-next;
-else
-*doms = curr-next;
-}
-
-virDomainObjFree(dom);
 }
 
 #ifndef PROXY
@@ -3266,7 +3281,7 @@
 
 virDomainObjPtr virDomainLoadConfig(virConnectPtr conn,
 virCapsPtr caps,
-virDomainObjPtr *doms,
+virDomainObjListPtr doms,
 const char 

Re: [libvirt] PATCH: 3 of 5: Remove virNetworkObjPtr linked list

2008-10-03 Thread Daniel P. Berrange
This patch removes the linked list in the virNetworkObjPtr object, and
adds a new virNetworkObjList struct to track domains as an array. The
virtual network  test drivers get updated to account for this API
change.


 network_conf.c   |   88 +++-
 network_conf.h   |   20 +
 network_driver.c |  120 ---
 test.c   |   75 --
 4 files changed, 145 insertions(+), 158 deletions(-)


Daniel

diff -r 25b5be9d400e src/network_conf.c
--- a/src/network_conf.cFri Oct 03 12:58:03 2008 +0100
+++ b/src/network_conf.cFri Oct 03 12:58:27 2008 +0100
@@ -69,28 +69,26 @@
 }
 
 
-virNetworkObjPtr virNetworkFindByUUID(const virNetworkObjPtr nets,
+virNetworkObjPtr virNetworkFindByUUID(const virNetworkObjListPtr nets,
   const unsigned char *uuid)
 {
-virNetworkObjPtr net = nets;
-while (net) {
-if (!memcmp(net-def-uuid, uuid, VIR_UUID_BUFLEN))
-return net;
-net = net-next;
-}
+unsigned int i;
+
+for (i = 0 ; i  nets-count ; i++)
+if (!memcmp(nets-objs[i]-def-uuid, uuid, VIR_UUID_BUFLEN))
+return nets-objs[i];
 
 return NULL;
 }
 
-virNetworkObjPtr virNetworkFindByName(const virNetworkObjPtr nets,
+virNetworkObjPtr virNetworkFindByName(const virNetworkObjListPtr nets,
   const char *name)
 {
-virNetworkObjPtr net = nets;
-while (net) {
-if (STREQ(net-def-name, name))
-return net;
-net = net-next;
-}
+unsigned int i;
+
+for (i = 0 ; i  nets-count ; i++)
+if (STREQ(nets-objs[i]-def-name, name))
+return nets-objs[i];
 
 return NULL;
 }
@@ -141,13 +139,24 @@
 VIR_FREE(net);
 }
 
+void virNetworkObjListFree(virNetworkObjListPtr nets)
+{
+unsigned int i;
+
+for (i = 0 ; i  nets-count ; i++)
+virNetworkObjFree(nets-objs[i]);
+
+VIR_FREE(nets-objs);
+nets-count = 0;
+}
+
 virNetworkObjPtr virNetworkAssignDef(virConnectPtr conn,
- virNetworkObjPtr *nets,
+ virNetworkObjListPtr nets,
  const virNetworkDefPtr def)
 {
 virNetworkObjPtr network;
 
-if ((network = virNetworkFindByName(*nets, def-name))) {
+if ((network = virNetworkFindByName(nets, def-name))) {
 if (!virNetworkIsActive(network)) {
 virNetworkDefFree(network-def);
 network-def = def;
@@ -166,34 +175,41 @@
 }
 
 network-def = def;
-network-next = *nets;
 
-*nets = network;
+if (VIR_REALLOC_N(nets-objs, nets-count + 1)  0) {
+virNetworkReportError(conn, VIR_ERR_NO_MEMORY, NULL);
+VIR_FREE(network);
+return NULL;
+}
+
+nets-objs[nets-count] = network;
+nets-count++;
 
 return network;
 
 }
 
-void virNetworkRemoveInactive(virNetworkObjPtr *nets,
+void virNetworkRemoveInactive(virNetworkObjListPtr nets,
   const virNetworkObjPtr net)
 {
-virNetworkObjPtr prev = NULL;
-virNetworkObjPtr curr = *nets;
+unsigned int i;
 
-while (curr 
-   curr != net) {
-prev = curr;
-curr = curr-next;
+for (i = 0 ; i  nets-count ; i++) {
+if (nets-objs[i] == net) {
+virNetworkObjFree(nets-objs[i]);
+
+if (i  (nets-count - 1))
+memmove(nets-objs + i, nets-objs + i + 1,
+sizeof(*(nets-objs)) * (nets-count - (i + 1)));
+
+if (VIR_REALLOC_N(nets-objs, nets-count - 1)  0) {
+; /* Failure to reduce memory allocation isn't fatal */
+}
+nets-count--;
+
+break;
+}
 }
-
-if (curr) {
-if (prev)
-prev-next = curr-next;
-else
-*nets = curr-next;
-}
-
-virNetworkObjFree(net);
 }
 
 
@@ -699,7 +715,7 @@
 }
 
 virNetworkObjPtr virNetworkLoadConfig(virConnectPtr conn,
-  virNetworkObjPtr *nets,
+  virNetworkObjListPtr nets,
   const char *configDir,
   const char *autostartDir,
   const char *file)
@@ -753,7 +769,7 @@
 }
 
 int virNetworkLoadAllConfigs(virConnectPtr conn,
- virNetworkObjPtr *nets,
+ virNetworkObjListPtr nets,
  const char *configDir,
  const char *autostartDir)
 {
diff -r 25b5be9d400e src/network_conf.h
--- a/src/network_conf.hFri Oct 03 12:58:03 2008 +0100
+++ b/src/network_conf.hFri Oct 03 12:58:27 2008 +0100
@@ -88,8 +88,13 @@
 
 virNetworkDefPtr def; /* The current definition */
 virNetworkDefPtr newDef; /* New definition 

Re: [libvirt] PATCH: 4 of 5: Remove virStoragePoolObjPtr linked list

2008-10-03 Thread Daniel P. Berrange
This patch removes the linked list in the virStoragePoolObjPtr and
and virStorageVolDefPtr objects, and adds new virStoragePoolObjList
and virStorageVolDefList structs to track pools  vols as arrays. The
storage driver driver gets updated to account for this API change.

 storage_backend_disk.c|   10 +
 storage_backend_fs.c  |   75 +
 storage_backend_iscsi.c   |9 +
 storage_backend_logical.c |   11 +-
 storage_conf.c|  165 --
 storage_conf.h|   37 --
 storage_driver.c  |  253 ++
 7 files changed, 290 insertions(+), 270 deletions(-)

Daniel

diff -r 03f1715be403 src/storage_backend_disk.c
--- a/src/storage_backend_disk.cFri Oct 03 12:58:28 2008 +0100
+++ b/src/storage_backend_disk.cFri Oct 03 13:07:54 2008 +0100
@@ -185,9 +185,13 @@
 return -1;
 }
 
-vol-next = pool-volumes;
-pool-volumes = vol;
-pool-nvolumes++;
+if (VIR_REALLOC_N(pool-volumes.objs,
+  pool-volumes.count+1)  0) {
+virStorageReportError(conn, VIR_ERR_NO_MEMORY, _(volume));
+virStorageVolDefFree(vol);
+return -1;
+}
+pool-volumes.objs[pool-volumes.count++] = vol;
 
 /* Prepended path will be same for all partitions, so we can
  * strip the path to form a reasonable pool-unique name
diff -r 03f1715be403 src/storage_backend_fs.c
--- a/src/storage_backend_fs.c  Fri Oct 03 12:58:28 2008 +0100
+++ b/src/storage_backend_fs.c  Fri Oct 03 13:07:54 2008 +0100
@@ -822,6 +822,7 @@
 DIR *dir;
 struct dirent *ent;
 struct statvfs sb;
+virStorageVolDefPtr vol = NULL;
 
 if (!(dir = opendir(pool-def-target.path))) {
 virStorageReportError(conn, VIR_ERR_INTERNAL_ERROR,
@@ -831,61 +832,42 @@
 }
 
 while ((ent = readdir(dir)) != NULL) {
-virStorageVolDefPtr vol;
 int ret;
 
-if (VIR_ALLOC(vol)  0) {
-virStorageReportError(conn, VIR_ERR_NO_MEMORY,
-  %s, _(volume));
-goto cleanup;
-}
+if (VIR_ALLOC(vol)  0)
+goto no_memory;
 
-vol-name = strdup(ent-d_name);
-if (vol-name == NULL) {
-VIR_FREE(vol);
-virStorageReportError(conn, VIR_ERR_NO_MEMORY,
-  %s, _(volume name));
-goto cleanup;
-}
+if ((vol-name = strdup(ent-d_name)) == NULL)
+goto no_memory;
 
 vol-target.format = VIR_STORAGE_VOL_RAW; /* Real value is filled in 
during probe */
 if (VIR_ALLOC_N(vol-target.path, strlen(pool-def-target.path) +
-1 + strlen(vol-name) + 1)  0) {
-VIR_FREE(vol-target.path);
-VIR_FREE(vol);
-virStorageReportError(conn, VIR_ERR_NO_MEMORY,
-  %s, _(volume name));
-goto cleanup;
-}
+1 + strlen(vol-name) + 1)  0)
+goto no_memory;
+
 strcpy(vol-target.path, pool-def-target.path);
 strcat(vol-target.path, /);
 strcat(vol-target.path, vol-name);
-if ((vol-key = strdup(vol-target.path)) == NULL) {
-VIR_FREE(vol-name);
-VIR_FREE(vol-target.path);
-VIR_FREE(vol);
-virStorageReportError(conn, VIR_ERR_NO_MEMORY,
-  %s, _(volume key));
-goto cleanup;
+if ((vol-key = strdup(vol-target.path)) == NULL)
+goto no_memory;
+
+if ((ret = virStorageBackendProbeFile(conn, vol)  0)) {
+if (ret == -1)
+goto no_memory;
+else {
+/* Silently ignore non-regular files,
+ * eg '.' '..', 'lost+found' */
+virStorageVolDefFree(vol);
+vol = NULL;
+continue;
+}
 }
 
-if ((ret = virStorageBackendProbeFile(conn, vol)  0)) {
-VIR_FREE(vol-key);
-VIR_FREE(vol-name);
-VIR_FREE(vol-target.path);
-VIR_FREE(vol);
-if (ret == -1)
-goto cleanup;
-else
-/* Silently ignore non-regular files,
- * eg '.' '..', 'lost+found' */
-continue;
-}
-
-vol-next = pool-volumes;
-pool-volumes = vol;
-pool-nvolumes++;
-continue;
+if (VIR_REALLOC_N(pool-volumes.objs,
+  pool-volumes.count+1)  0)
+goto no_memory;
+pool-volumes.objs[pool-volumes.count++] = vol;
+vol = NULL;
 }
 closedir(dir);
 
@@ -904,8 +886,13 @@
 
 return 0;
 
+no_memory:
+virStorageReportError(conn, VIR_ERR_NO_MEMORY, NULL);
+/* fallthrough */
+
  cleanup:
 closedir(dir);
+virStorageVolDefFree(vol);
 

Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...

2008-10-03 Thread Evgeniy Sokolov


P.S. Now I am working on that:



P.S. Are someone going to implement
 interface type='bridge'
...
 source bridge=...
...
  /interface
part of openvz driver? :)

I plan to implement it in a month.
It will be fine if you are ready to develop the feature.


I'm using $veid.conf to store information about bridge device in the 
following manner:

for interface ifname, therhe will be a line
   #BRIGDE(ifname): bridge name
for example,
   #BRIDGE(veth101.0): virbr1

Do you agree with that behaviour?

I think we can simplify format to simplify parsing.
#BRIDGE=virbr1:veth101.0,veth101.1,veth101.2

In this case we may use
openvzReadConfigParam(101, #BRIDGE, value, 1024)



--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] libvirt-qpid

2008-10-03 Thread Daniel Veillard
On Thu, Oct 02, 2008 at 05:33:54PM +0200, Daniel Veillard wrote:
 On Thu, Oct 02, 2008 at 08:16:02AM -0700, Ian Main wrote:
  On Thu, 2 Oct 2008 16:50:08 +0200
  Daniel Veillard [EMAIL PROTECTED] wrote:
  
   On Thu, Oct 02, 2008 at 03:40:06PM +0100, Daniel P. Berrange wrote:
On Thu, Oct 02, 2008 at 03:06:01PM +0200, Daniel Veillard wrote:
   You also need 'yum install qpidd' I suspect this indicates a missing
 dependancy maybe in the libvirt-qpid package but I'm not 100% sure

Yeah, i believe that should be a requirement. NB, the versions of qpidc
and qpidd in Fedora currently are too old for libvirt-qpid. I've been 
speaking with qpid maintainer and they'll push a new build to Fedora in
the very near future.
   
 Ah, that probably explains why that didn't work for me, I got my qpidd
   from Fedora ...
 Another question, upon rebout the QPid service starts automatically,
   I wonder if there is something to do to be able to connect assuming
   a default install and not starting without auth .
  
  Ah, I know what happened.. I changed the dependencies since my first email.
  qpidd isn't needed for libvirt-qpid to operate.  It's just an agent and can
  talk to a qpidd on a different host.
 
   okay, I was guessing something like that which is why i was so
 cautious in my wording ;-)

  Okay I now have things working ... on one machine. Which is
interesting but not really the point of the exercise :-)
I have 3 machines. qpidd runs on machine A, qpid-tool on that machine
allows to access the local node and local domains. libvirt-qpid
started on machine B and C in the same subnet without error,
but well they don't find the qpidd on machine A since I don't see them
there. qpid-tool there tries to get to localhost and fail.
  If then I install and start qpidd on machine B qpid-tool can connect
to it ... but the already started libvirt-qpid don't seems to be
able to find it, unless I restart it which seems to indicate a failure
to connect after startup. And I'm still only able to list local
domains/nodes never remote ones.

  Ideas ?

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...

2008-10-03 Thread Anton Protopopov
2008/10/3 Daniel P. Berrange [EMAIL PROTECTED]

 On Fri, Oct 03, 2008 at 04:04:06PM +0400, Evgeniy Sokolov wrote:
  
  P.S. Now I am working on that:
  
  
  
  P.S. Are someone going to implement
   interface type='bridge'
  ...
   source bridge=...
  ...
/interface
  part of openvz driver? :)
  
  I plan to implement it in a month.
  It will be fine if you are ready to develop the feature.
  
  
  I'm using $veid.conf to store information about bridge device in the
  following manner:
  for interface ifname, therhe will be a line
 #BRIGDE(ifname): bridge name
  for example,
 #BRIDGE(veth101.0): virbr1
  
  Do you agree with that behaviour?
  I think we can simplify format to simplify parsing.
  #BRIDGE=virbr1:veth101.0,veth101.1,veth101.2

I do not undesrtand how it will simplify parsing: the iterator
in parsing is an interface name, not bridge name. I attached a patch, so
you will see how I do think about it :) (this patch includes all
discussed changes)



 There's no need to store  veth101.0, veth101.2 - they are automatically
 generated  not intended to be stable across restarts. It is merely

No, there is. They are generated only once --- when VE container is created.
After that they are stored in the config file.



 neccessary to persist the name of the bridge (or virtual network) to
 which they are attached, and the MAC address.

 Daniel
 --
 |: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/:|
 |: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org:|
 |: http://autobuild.org   -o- 
 http://search.cpan.org/~danberr/http://search.cpan.org/%7Edanberr/:|
 |: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505
 :|



libvirt_openvz_ifaces_and_bridges.patch
Description: Binary data
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] libvirt-qpid

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 03:29:17PM +0200, Daniel Veillard wrote:
 On Thu, Oct 02, 2008 at 05:33:54PM +0200, Daniel Veillard wrote:
  On Thu, Oct 02, 2008 at 08:16:02AM -0700, Ian Main wrote:
   On Thu, 2 Oct 2008 16:50:08 +0200
   Daniel Veillard [EMAIL PROTECTED] wrote:
   
On Thu, Oct 02, 2008 at 03:40:06PM +0100, Daniel P. Berrange wrote:
 On Thu, Oct 02, 2008 at 03:06:01PM +0200, Daniel Veillard wrote:
You also need 'yum install qpidd' I suspect this indicates a 
  missing
  dependancy maybe in the libvirt-qpid package but I'm not 100% sure
 
 Yeah, i believe that should be a requirement. NB, the versions of 
 qpidc
 and qpidd in Fedora currently are too old for libvirt-qpid. I've been 
 speaking with qpid maintainer and they'll push a new build to Fedora 
 in
 the very near future.

  Ah, that probably explains why that didn't work for me, I got my qpidd
from Fedora ...
  Another question, upon rebout the QPid service starts automatically,
I wonder if there is something to do to be able to connect assuming
a default install and not starting without auth .
   
   Ah, I know what happened.. I changed the dependencies since my first 
   email.
   qpidd isn't needed for libvirt-qpid to operate.  It's just an agent and 
   can
   talk to a qpidd on a different host.
  
okay, I was guessing something like that which is why i was so
  cautious in my wording ;-)
 
   Okay I now have things working ... on one machine. Which is
 interesting but not really the point of the exercise :-)
 I have 3 machines. qpidd runs on machine A, qpid-tool on that machine
 allows to access the local node and local domains. libvirt-qpid
 started on machine B and C in the same subnet without error,
 but well they don't find the qpidd on machine A since I don't see them
 there. qpid-tool there tries to get to localhost and fail.
   If then I install and start qpidd on machine B qpid-tool can connect
 to it ... but the already started libvirt-qpid don't seems to be
 able to find it, unless I restart it which seems to indicate a failure
 to connect after startup. And I'm still only able to list local
 domains/nodes never remote ones.
 
   Ideas ?

I got this working across 3 machines as follows

 - Machine A provides a Qpid broker, run as root with

   qpidd --auth no

 - Machine B and C are libvirt hosts, each run a libvirtd, and libvirt-qpid

   libvirt-qpid --broker machineA.example.com


On machine A, if you run 'qpid-tool' you should now see node, domain
objects from both machines B  C. NB, it take 5-10 seconds from 
starting libvirt-qpid before they appear in the broker

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread David Lively
On Tue, 2008-09-30 at 19:31 +0200, Daniel Veillard wrote:
 Good to see that you seems to have the python bindings ready too !

Many python bindings are not really ready (in this patch).  But I should
have them in the next patch (Monday/Tuesday next week).

I have complete java bindings for the current API, and will submit them
once we agree upon it.

I wonder how many of them should be future-proofed by adding them
 a final flags argument too ... For example it may be useful to only
 lookup devices which are local to the machine, or the opposite only
 remote devices. We don't have to specify now flags values, but having
 the APIs ready is sufficient.
   The virNodeNum/virNodeListDevices devices can probably all share
 the same flags definitions when needed.

I agree.  I'll add a flags arg to virNodeNum*/virNodeList*.

   The LookupByName/LookupByKey may be able to use the same set. I wonder
 a bit about the key argument though, I assume it's something actually
 defined by the lower APIs (hal/devkit).

As long as the lookup keys are unique, I don't see why we'd want a flags
arg for these functions.

Currently the key is implementation- (HAL- or Devkit-) dependent, but I
have questions about that (and for that matter, the name arg -- if it's
unique, why have a separate key??).  Dan B brings up some similar
points, so I'll address this in my response to his post rather than
here.

   For virNodeDeviceCreate maybe a flags may be needed too, though the
 flexibility of the API is provided by the XML.

I think the XML should provide sufficient flexibility here.  But let me
know if you really want me to add a flag arg.

 
  +int virNodeDeviceDestroy(virNodeDevicePtr dev);
  +
  +int virNodeDeviceFree   (virNodeDevicePtr dev);
 
   Maybe Destroy need flags too, for example to force (or not)
 destruction of devices which may be in use.

Ok, I'll add a flags arg to Destroy as well.  FWIW, I'm not wild about
the names virNodeDeviceCreate/Destroy.  Don't Attach and Detach make
more sense?  If there are no objections, I'll change those in my next
iteration (though I'll still leave them unimplemented).

 
   
  @@ -332,6 +340,12 @@ skip_function = (
   'virCopyLastError', # Python API is called virGetLastError instead
   'virConnectOpenAuth', # Python C code is manually written
   'virDefaultErrorFunc', # Python virErrorFuncHandler impl calls this 
  from C
  +'virNodeDeviceGetKey',
  +'virNodeDeviceGetName',
  +'virNodeDeviceGetParentKey',
  +'virNodeDeviceGetBus',
  +'virNodeDeviceNumOfCaps',
  +'virNodeDeviceListCaps',
   )
 
   Strange how are the accessors supposed to work from python then ?

They don't.  They're just here so you don't get errors building the
Python bindings.  I'll implement real bindings for these soon.

   Hum, the libvirt_lxc really need to link to the device discovery libs ?

I was making host device enumeration work for ALL hypervisors (though of
course the remote driver has a remote impl), so I think this is
necessary.  But that said, I'm still digesting Dan B's point about
licensing issues, and I really have no clue about how lxc and openvz
work with libvirt (do they have separate control daemons, like Xen, or
are they more like QEMU, or ???).


   virLibConnError(virConnectPtr conn, virErrorNumber error, const char *info)
   virLibConnWarning(virConnectPtr conn, virErrorNumber error, const char 
  *info)
 
   you really need to export them ? 

Oh, uhhh, apparently not :-)  (I did at one point, but must have removed
that code.)  I'll un-export them ...

   Hum ... we need the comment on the accessors, so they get extracted
 as part of the API when doing 'make rebuild' in docs/ added to the 
 resulting XML, which then will provide the python accessors
 automatically I think.

Will do.

Thanks for the response.  I'll implement the things discussed here (plus
in Dan B's comments - another response coming) and submit a new patch on
Monday or early Tuesday next week.

Dave


--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Daniel Veillard
On Wed, Oct 01, 2008 at 01:19:04PM -0700, Dan Smith wrote:
 This patch adds code to the controller to set up a cgroup named after the
 domain name, set the memory limit, and restrict devices.  It also
 adds bits to lxc_driver to properly clean up the cgroup on domain death.
 
 If virCgroupHaveSupport() says that no support is available, then we just
 allow the domain creation to proceed as it did before without resource
 controls in place.

  Okay in addition to Dan feedback on the error handling,

 +/**
 + * lxcSetContainerResources
 + * @def: pointer to virtual machine structure
 + *
 + * Creates a cgroup for the container, moves the task inside,
 + * and sets resource limits
 + *
 + * Returns 0 on success or -1 in case of error
 + */
 +static int lxcSetContainerResources(virDomainDefPtr def)
 +{
 +virCgroupPtr cgroup;
 +int rc = -1;
 +int i;
 +struct cgroup_device_policy devices[] = {
 +{'c', 1, 3},
 +{'c', 1, 5},
 +{'c', 1, 7},
 +{'c', 1, 8},
 +{'c', 1, 9},
 +{'c', 5, 1},
 +{0,   0, 0}};

  creating a domain is always the trickiest part of domain lifecycle
and setting up the framework is often something a bit obfuscated, nearly
by definition. But if we could associate meaningful constants to the
device policies here, this would help people having to improve or
debug that code in the future ;-)

  otherwise looks fine to me,

+1

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Dan Smith
DV creating a domain is always the trickiest part of domain lifecycle
DV and setting up the framework is often something a bit obfuscated,
DV nearly by definition. But if we could associate meaningful
DV constants to the device policies here, this would help people
DV having to improve or debug that code in the future ;-)

Okay, I suppose that if the QEMU driver ends up doing something
similar, the constants would be reused.

Should they go in cgroup.h since they're expected to be used with that
interface?

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]


pgpQlL0rquJWP.pgp
Description: PGP signature
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Fix possible segfault if starting qemu vm with an inactive virtual network

2008-10-03 Thread Daniel P. Berrange
On Thu, Oct 02, 2008 at 05:16:43PM -0400, Cole Robinson wrote:
 The attached patch fixes a possible segfault if trying
 to start a domain which tries to use an inactive
 virtual network. We should explicitly catch that error
 and report it, but that's a separate patch.

ACK

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] PATCH: Support SDL configuration for QEMU driver

2008-10-03 Thread Daniel Veillard
On Thu, Oct 02, 2008 at 06:09:21PM +0100, Daniel P. Berrange wrote:
 QEMU has two modes of providing a graphical display, VNC and SDL. Now most
 of our tools just use VNC, but occasionally people want to use SDL for 
 some crazy reason. We already support this in Xen driver, but the QEMU
 impl has been rather lacking. At the moment if you ask for a SDL display
 it'll only happen to work if you had the $DISPLAY environment variable
 set when you started libvirtd - you probably don't.

  Hum, it would be really nicer if everything was driven only by
command line options, but i can understand why the QEmu guys made it
that way, at least historically

 The generic XML parser allows for two attributes on the graohics
 element for setting the display and xauth filename, so this patch updates
 the QEMU driver to use this data if available. This means we have to 
 start setting environment variables when invoking QEMU, so this patch
 is a little larger than would otherwise be expected.

  Okay i think I understand the patch, i didn't find anything suspicious
in the C code diff (except missing -p ?)

  The tests diff looks simple and systematic, and for the test program
this looks fine too.

  +1

 Now previously since we just use 'execv' the QEMU process would just
 inherit all libvirtd's environment variables. When we now use execve()
 no variables are inherited - we have to explicitly set all the ones
 we need. I'm not sure what we should consider the mimimum required?
 
 I'm merely setting  'LC_ALL=C' to ensure it runs in C locale. Do we
 need to set $PATH for QEMU - maybe ? Anything else which is good
 practice to set ?

  if there is an LD_PRELOAD, pass it maybe, that can be really useful
at times especially when debugging.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Unify *ReportError logic

2008-10-03 Thread Daniel P. Berrange
On Thu, Oct 02, 2008 at 04:12:10PM -0400, Cole Robinson wrote:
 Daniel P. Berrange wrote:
  On Thu, Oct 02, 2008 at 03:41:21PM -0400, Cole Robinson wrote:
  Currently, most src/* files have their own ReportError
  function. Some support printf style arguments, others
  only allow reporting a single string message. The code
  for all of them does virtually the same thing, possibly
  passing a different constant off to another function.
 
  The attached patch adds a function to virterror.c which
  encapsulates the common ReportError logic. I used this
  to replace qemudReportError with a macro, which also 
  allows passing off filename and line number info if
  we wanted to do something with it later.
 
  I did just the one function conversion to see what
  people think: if I'm missing something, or ideas for
  anything else to add. Seems to work as expected in
  my testing.
  
  
  Basically a good idea - we've discussed doing exactly this in the past.
  You can do further though, and kill off the 'dom' and 'net' parameters
  here too. Those are deprecated and should always be left NULL these
  days. 
 
 Sounds good.

Actually, lets go one step further than this. Just change the signature
of virRaiseError itself, rather than creating a virRaiseErrorHelper.
No caller in libvirt uses all these parameter it has, so might as well
just remove them and have it set those fields to NULL/0 as directly.

  Likewise passing the filename/linenr is not much use - if a
  particular error message wants to include file/line number then they
  can be includes as args to the printf fmt string.
 
 Hmm, I kind of like the idea of having this info by default,
 rather than offloading it to the error message. We could
 for example append filename:lineno to all error messages if
 debugging is enabled, or stick the data as extra string or
 int info to RaiseError.
 
 I know there have been times when, using virt-manager/
 virtinst, libvirt prints some generic lookup error msg,
 yet nothing seems to fail. It would be nice to get a
 definitive line number of the culprit just by using
 LIBVIRT_DEBUG.

That's an interesting idea - I can't remember offhand though whether the
__FILE__ expands to the full file path, or just the final filename without
directory separator. I'd only want this compiled in by default if it is
the latter - we don't want another 500 K of long strings from full directory
paths in every  build.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Centralize use of DEBUG macros

2008-10-03 Thread Daniel P. Berrange
On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote:
 Currently the DEBUG and DEBUG0 macros are duplicated in
 every file that uses them. This patch moves the macros
 to internal.h, removes the needless duplication, and
 now every file gets them for free. Seems to work as
 expected in my testing.

Yes, if someone can confirm one question

 diff --git a/src/internal.h b/src/internal.h
 index d96504d..a3d48fa 100644
 --- a/src/internal.h
 +++ b/src/internal.h
 @@ -85,6 +85,9 @@ extern int debugFlag;
  do { } while (0)
  #endif /* !ENABLE_DEBUG */
  
 +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
 +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
 +

Will __FILE__ expand to the name of the file where the DEBUG
macro is defined - ie internal.h, or will it expand to the
name of the file where DEBUG() is called. Obviously the latter
is what we need.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Daniel Veillard
On Fri, Oct 03, 2008 at 07:45:24AM -0700, Dan Smith wrote:
 DV creating a domain is always the trickiest part of domain lifecycle
 DV and setting up the framework is often something a bit obfuscated,
 DV nearly by definition. But if we could associate meaningful
 DV constants to the device policies here, this would help people
 DV having to improve or debug that code in the future ;-)
 
 Okay, I suppose that if the QEMU driver ends up doing something
 similar, the constants would be reused.
 
 Should they go in cgroup.h since they're expected to be used with that
 interface?

  Fine by me, but I'm more looking to being able to map semantic
to the value than sharing the values at this point,

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Centralize use of DEBUG macros

2008-10-03 Thread Cole Robinson
Daniel P. Berrange wrote:
 On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote:
 Currently the DEBUG and DEBUG0 macros are duplicated in
 every file that uses them. This patch moves the macros
 to internal.h, removes the needless duplication, and
 now every file gets them for free. Seems to work as
 expected in my testing.
 
 Yes, if someone can confirm one question
 
 diff --git a/src/internal.h b/src/internal.h
 index d96504d..a3d48fa 100644
 --- a/src/internal.h
 +++ b/src/internal.h
 @@ -85,6 +85,9 @@ extern int debugFlag;
  do { } while (0)
  #endif /* !ENABLE_DEBUG */
  
 +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
 +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
 +
 
 Will __FILE__ expand to the name of the file where the DEBUG
 macro is defined - ie internal.h, or will it expand to the
 name of the file where DEBUG() is called. Obviously the latter
 is what we need.
 
 Daniel

I was worried about that as well, but testing the patch I posted
confirmed that __FILE__ expands to where the macro is called
from, so no change in existing behavior.

Thanks,
Cole

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread David Lively
On Tue, 2008-09-30 at 19:02 +0100, Daniel P. Berrange wrote:
 On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote:

  - Split stateful drivers out of libvirt.so, so they're only used
by the daemon, and not apps linking to libvirt.so.
 
This solves the licensing problem that HAL/DeviceKit introduce
libvirt.so needs to be LGPL to allow closed-source apps to link
to it. HAL/DeviceKit are both GPL or AFL licensed, by virtue
of using DBus. Since LGPL is not AFL compatible, if we link to
HAL/DeviceKit/DBus we do so under the GPL, and thus would
prevent closed source apps using libvirt.so We don't want this,
so we ned to make sure only the libvirtd daemon links to the
GPL bits.

Ok.  I see.  Will do.

   * figure out how devkit and HAL correlate, so we report device info
  consistently
 
 This is looking like it'll be much harder than I had anticipated.

Yeah, I struggled for a while trying to reconcile them.  But devkit's
not even close to complete yet, so I gave up.  I'm assuming that, since
there are a lot of HAL - DevKit conversions to be done, some kind of
(probably domain-specific) method(s) of correlating the two will
eventually be described.

 So I'm thinking perhaps we need to simplify our modelling a little
 so its not so closely replicating HAL, getting rid of the distinct
 elements for 'class' and 'bus' and having a device simply have a
 'subsystem'. And instead of having a complete tree hierarchy, have
 a specialized hierarchy. eg if we can identify a 'usb' or 'pci'
 device parent for a device, then include its name, but don't
 claim to provide a full hierarchy.

Yes!  I find the distinction between bus and capability to be pretty
arbitrary anyway (being on a bus is a type of capability, IMO).

So I'll drop the bus stuff, and combine bus capabilities with the
other capabilities.

 
 The other interesting question, is what policy we should define
 for compatability. Do we absolutely need to have compatible
 keys  names for devices, whether using HAL vs DeviceKit, or
 can be just say that the format of a name is opaque and liable
 to change ? This has upgrade implications, for example, if we
 ship libvirt with a HAL backend in Fedora 10, and then switched
 it to the DeviceKit backend in Fedora 11, do we need to ensure
 consistent naming across the upgrade path. I don't know...

Right.  I was hoping to find a consistent set of keys/names to use, but
gave up.  (sysfs-paths seemed the most common denominator, but not all
HAL devices have sysfs-paths, and I think the HAL names look more
appropriate anyway.)  I'm hoping that devkit will eventually add a
property that either specifies or is convertible to a HAL name.

Also, if names are really unique (as specified), then why have separate
keys?  I'd prefer dropping the Key functions and using Name as the key
(as we do for storage pools, etc.).


  +struct _stringv {
  +int len;
  +char **const base;
  +const int maxlen;
  +};
  +
  +typedef struct _stringv stringv;
 
 Perhaps I'm mis-undersatnding what this does, but isn't this 
 similar to the virStringList class in internal.h ?

Well, it's a vector instead of a singly-linked list.  I use it when I
know the max length ahead of time (while the list is used in pool source
discovery code that knows of no limit on the number of sources it will
find).  I could use a list for the former case, but that would require
an extra copy of each name (plus unnecessary alloc/dealloc code for the
list).

  +static char *nodeDeviceDumpXML(virNodeDevicePtr dev,
  +   unsigned int flags ATTRIBUTE_UNUSED)
  +{
  +xmlBufferPtr buf = xmlBufferCreate();
...
  +xmlBufferCCat(buf, /key\n);
 This should use the libvirt buffer routines from buf.h which
 ensure you don't ignore return values indicating OOM like
 you have here :-) See buf.h, or HACKING file for some examples.

Note I call xmlNodeDump(buf, ...).  I suppose the right thing to do is
to add virBufNodeDump(...) to use libxml to dump into a virBuf  ???

  +/* TODO: Can we avoid this copy? */
  +xml = strdup((const char *)xmlBufferContent(buf));
 
 Yes, you can with the libvirt buf.h APIs :-)

But if I have to have to call xmlNodeDump, I can't avoid it ...

Thanks for the comments!

Dave


--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Centralize use of DEBUG macros

2008-10-03 Thread Daniel Veillard
On Fri, Oct 03, 2008 at 03:53:25PM +0100, Daniel P. Berrange wrote:
 On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote:
  Currently the DEBUG and DEBUG0 macros are duplicated in
  every file that uses them. This patch moves the macros
  to internal.h, removes the needless duplication, and
  now every file gets them for free. Seems to work as
  expected in my testing.
 
 Yes, if someone can confirm one question
 
  diff --git a/src/internal.h b/src/internal.h
  index d96504d..a3d48fa 100644
  --- a/src/internal.h
  +++ b/src/internal.h
  @@ -85,6 +85,9 @@ extern int debugFlag;
   do { } while (0)
   #endif /* !ENABLE_DEBUG */
   
  +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
  +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
  +
 
 Will __FILE__ expand to the name of the file where the DEBUG
 macro is defined - ie internal.h, or will it expand to the
 name of the file where DEBUG() is called. Obviously the latter
 is what we need.

  I do that in libxml2 debugging and I think that works as expected,
I tried to check with gcc -E but failed :-\

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Slim xmllint test output

2008-10-03 Thread Daniel Veillard
On Thu, Oct 02, 2008 at 03:40:55PM -0400, Cole Robinson wrote:
 This patch simply slims down the output from the
 xmllint tests to not overrun a single line, only
 printing the relevant information about the xml file
 (parent directory and name, not fully qualified
 path). Purely cosmetic, but more useful IMO.

  makes sense to me.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 11:02:55AM -0400, David Lively wrote:
 On Tue, 2008-09-30 at 19:02 +0100, Daniel P. Berrange wrote:
  On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote:
 
   - Split stateful drivers out of libvirt.so, so they're only used
 by the daemon, and not apps linking to libvirt.so.
  
 This solves the licensing problem that HAL/DeviceKit introduce
 libvirt.so needs to be LGPL to allow closed-source apps to link
 to it. HAL/DeviceKit are both GPL or AFL licensed, by virtue
 of using DBus. Since LGPL is not AFL compatible, if we link to
 HAL/DeviceKit/DBus we do so under the GPL, and thus would
 prevent closed source apps using libvirt.so We don't want this,
 so we ned to make sure only the libvirtd daemon links to the
 GPL bits.
 
 Ok.  I see.  Will do.

No, no don't do that !  I've already got this work underway, since it
impacts much more than just the host device drivers. Just carry on
with your existing way, and when its ready to merge it'll be easy to
tweak the makefile to have it build in a suitable way.

  So I'm thinking perhaps we need to simplify our modelling a little
  so its not so closely replicating HAL, getting rid of the distinct
  elements for 'class' and 'bus' and having a device simply have a
  'subsystem'. And instead of having a complete tree hierarchy, have
  a specialized hierarchy. eg if we can identify a 'usb' or 'pci'
  device parent for a device, then include its name, but don't
  claim to provide a full hierarchy.
 
 Yes!  I find the distinction between bus and capability to be pretty
 arbitrary anyway (being on a bus is a type of capability, IMO).
 
 So I'll drop the bus stuff, and combine bus capabilities with the
 other capabilities.
 
  
  The other interesting question, is what policy we should define
  for compatability. Do we absolutely need to have compatible
  keys  names for devices, whether using HAL vs DeviceKit, or
  can be just say that the format of a name is opaque and liable
  to change ? This has upgrade implications, for example, if we
  ship libvirt with a HAL backend in Fedora 10, and then switched
  it to the DeviceKit backend in Fedora 11, do we need to ensure
  consistent naming across the upgrade path. I don't know...
 
 Right.  I was hoping to find a consistent set of keys/names to use, but
 gave up.  (sysfs-paths seemed the most common denominator, but not all
 HAL devices have sysfs-paths, and I think the HAL names look more
 appropriate anyway.)  I'm hoping that devkit will eventually add a
 property that either specifies or is convertible to a HAL name.
 
 Also, if names are really unique (as specified), then why have separate
 keys?  I'd prefer dropping the Key functions and using Name as the key
 (as we do for storage pools, etc.).

The ultimate goal with the storage was that 'key' was unique  consistent
for the same storage on every host the storage was attached to, while
'name' could vary across machines. 

This distinction doesn't entirely apply to host devices, so perhaps
droping key is OK - we could always re-add it later if we found a
compelling need.

 
   +static char *nodeDeviceDumpXML(virNodeDevicePtr dev,
   +   unsigned int flags ATTRIBUTE_UNUSED)
   +{
   +xmlBufferPtr buf = xmlBufferCreate();
 ...
   +xmlBufferCCat(buf, /key\n);
  This should use the libvirt buffer routines from buf.h which
  ensure you don't ignore return values indicating OOM like
  you have here :-) See buf.h, or HACKING file for some examples.
 
 Note I call xmlNodeDump(buf, ...).  I suppose the right thing to do is
 to add virBufNodeDump(...) to use libxml to dump into a virBuf  ???

Ok, this actually makes me notice another problem - I'll reply
to the original patch with details. In summary though, the internal
data structure should not be using an xmlNodePtr as its canonical
form, and thus there should not be a need for xmlNodeDump.


Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread Daniel P. Berrange
On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote:
 diff --git a/src/internal.h b/src/internal.h
 index d96504d..9a94f6d 100644
 --- a/src/internal.h
 +++ b/src/internal.h
 @@ -292,6 +307,25 @@ struct _virStorageVol {
  };
  
  
 +/**
 + * _virNodeDevice:
 + *
 + * Internal structure associated with a node device
 + */
 +struct _virNodeDevice {
 +unsigned int magic; /* specific value to check */
 +int refs;   /* reference count */
 +virConnectPtr conn; /* pointer back to the connection */
 +char *key;  /* device key */
 +char *name; /* device name */
 +char *parent_key;   /* parent device */
 +char *bus_name; /* (optional) bus name */
 +xmlNodePtr bus; /* (optional) bus descriptor */
 +char **cap_names;   /* capability names (null-term) */
 +xmlNodePtr *caps;   /* capability descs (null-term) */
 +};

This is not the right way to maintain the internal representation of
devices.

The model we follow is to have 2, sometimes 3 structs to represent
an object. I'll summarize in terms of domain objects

 - virDomainPtr - the public opaque struct representing a domain
  the internal struct is defined in internal.h
  and merely contains reference counting, connection
  pointer, and any unique keys (ID, name, UUID).

 - virDomainObjPtr - the internal struct representing the state of
 a domain object. Not all drivers use us - only
 stateful drivers do. It is used to track state
 such as guest PID, logfile, running state, 
 and a pointer to the live and inactive configs
 This is done in domain_conf.h/.c

 - virDomainDefPtr - the internal struct representing the canonical
 configuration of a domain. This is a direct
 mapping of the XML into a series of structs.
 This is done in domain_conf.h/.c


So, in the context for host devices we need a few changes. In the
internal.h file, you should only store the key/name attributes.

There should then be a separate file handling configuration parsing
and formatting, node_device_conf.h/.c.  There is probably no need
for a state object, so skip virNodeDeviceObjPtr, and just create a
virNodeDeviceDefPtr representing the XML format as a series of 
structs/unions, etc. See virDomainDefPtr for a good example. This
should not store any xmlNodePtr instances - it should be all done
as explicit struct/enum fields

The node_device_conf.c file should at mimium have 2 methods, one
for converting an XML document into a virNodeDeviceDefPtr object,
and one for the converting a virNodeDeviceObjPtr back into an XML
document. Follow the existing API naming / method signatures that
are seen in domain_conf.h / network_conf.h for current 'best practice'


Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Unify *ReportError logic

2008-10-03 Thread Daniel Veillard
On Thu, Oct 02, 2008 at 08:53:12PM +0100, Daniel P. Berrange wrote:
 On Thu, Oct 02, 2008 at 03:41:21PM -0400, Cole Robinson wrote:
  Currently, most src/* files have their own ReportError
  function. Some support printf style arguments, others
  only allow reporting a single string message. The code
  for all of them does virtually the same thing, possibly
  passing a different constant off to another function.
  
  The attached patch adds a function to virterror.c which
  encapsulates the common ReportError logic. I used this
  to replace qemudReportError with a macro, which also 
  allows passing off filename and line number info if
  we wanted to do something with it later.
  
  I did just the one function conversion to see what
  people think: if I'm missing something, or ideas for
  anything else to add. Seems to work as expected in
  my testing.
 
 
 Basically a good idea - we've discussed doing exactly this in the past.
 You can do further though, and kill off the 'dom' and 'net' parameters
 here too. Those are deprecated and should always be left NULL these
 days.

  i'm still a bit sad about this though, I'm still left with the feeling
that being unable to refcount, result in a loss of useful informations.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] PATCH: Support SDL configuration for QEMU driver

2008-10-03 Thread Cole Robinson
Daniel P. Berrange wrote:
 QEMU has two modes of providing a graphical display, VNC and SDL. Now most
 of our tools just use VNC, but occasionally people want to use SDL for 
 some crazy reason. We already support this in Xen driver, but the QEMU
 impl has been rather lacking. At the moment if you ask for a SDL display
 it'll only happen to work if you had the $DISPLAY environment variable
 set when you started libvirtd - you probably don't.
 

I've thought about this issue before, not sure how to fully solve it
though. If we set the display in the xml at install time (say setting
it to :0.0), but at a later time we ssh =Y into that machine, the 
display in the xml will be wrong, and possibly pop the sdl window on
someone else's display. Possibly a solution at the virsh or 
virt-manager level?

Thanks,
Cole

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] Fix possible segfault if starting qemu vm with an inactive virtual network

2008-10-03 Thread Daniel Veillard
On Thu, Oct 02, 2008 at 05:16:43PM -0400, Cole Robinson wrote:
 The attached patch fixes a possible segfault if trying
 to start a domain which tries to use an inactive
 virtual network. We should explicitly catch that error
 and report it, but that's a separate patch.

  Fix looks fine to me, good catch !

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions

2008-10-03 Thread Dan Smith
This patch adds src/cgroup.{c,h} with support for creating and manipulating
cgroups.

All groups created with the internal API are forced under $mount/libvirt/
to keep everything together.  The first time a group is created, the libvirt
directory is also created, and the settings from the root are inherited.

The code creates groups in all mounts requires to get memory and devices
functionality.  When setting a value, the appropriate mount is determined
and the value is set there.  I have tested this with all controllers mounted
in a single location, as well as all of them mounted separately.

Changes:
 - Handle multiple mount points
 - Add more abstract internal API, per recent discussion
 - Consider absence of memory or devices controllers as no cgroup support

diff -r 444e2614d0a2 -r 5590af0c8c3e src/cgroup.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/cgroup.c  Fri Oct 03 08:06:52 2008 -0700
@@ -0,0 +1,762 @@
+/*
+ * cgroup.c: Tools for managing cgroups
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * See COPYING.LIB for the License of this software
+ *
+ * Authors:
+ *  Dan Smith [EMAIL PROTECTED]
+ */
+#include config.h
+
+#include stdio.h
+#include stdint.h
+#include inttypes.h
+#include mntent.h
+#include fcntl.h
+#include string.h
+#include errno.h
+#include stdlib.h
+#include stdbool.h
+#include sys/stat.h
+#include sys/types.h
+#include libgen.h
+
+#include internal.h
+#include util.h
+#include memory.h
+#include cgroup.h
+
+#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
+#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
+
+#define CGROUP_MAX_VAL 512
+
+struct virCgroup {
+char *path;
+};
+
+const char *supported_controllers[] = {
+memory,
+devices,
+NULL
+};
+
+/**
+ * virCgroupFree:
+ *
+ * @group: The group structure to free
+ */
+void virCgroupFree(virCgroupPtr *group)
+{
+if (*group != NULL) {
+VIR_FREE((*group)-path);
+VIR_FREE(*group);
+*group = NULL;
+}
+}
+
+static virCgroupPtr virCgroupGetMount(const char *controller)
+{
+FILE *mounts;
+struct mntent entry;
+char buf[CGROUP_MAX_VAL];
+virCgroupPtr root = NULL;
+
+if (VIR_ALLOC(root) != 0)
+return NULL;
+
+mounts = fopen(/proc/mounts, r);
+if (mounts == NULL) {
+DEBUG0(Unable to open /proc/mounts: %m);
+goto err;
+}
+
+while (getmntent_r(mounts, entry, buf, sizeof(buf)) != NULL) {
+if (STREQ(entry.mnt_type, cgroup) 
+(strstr(entry.mnt_opts, controller))) {
+root-path = strdup(entry.mnt_dir);
+break;
+}
+}
+
+DEBUG(Mount for %s is %s\n, controller, root-path);
+
+if (root-path == NULL) {
+DEBUG0(Did not find cgroup mount);
+goto err;
+}
+
+fclose(mounts);
+
+return root;
+err:
+virCgroupFree(root);
+
+return NULL;
+}
+
+/**
+ * virCgroupHaveSupport:
+ *
+ * Returns 0 if support is present, negative if not
+ */
+int virCgroupHaveSupport(void)
+{
+virCgroupPtr root;
+int i;
+
+for (i = 0; supported_controllers[i] != NULL; i++) {
+root = virCgroupGetMount(supported_controllers[i]);
+if (root == NULL)
+return -1;
+virCgroupFree(root);
+}
+
+return 0;
+}
+
+static int virCgroupPathOfGroup(const char *group,
+const char *controller,
+char **path)
+{
+virCgroupPtr root = NULL;
+int rc = 0;
+
+root = virCgroupGetMount(controller);
+if (root == NULL) {
+rc = -ENOTDIR;
+goto out;
+}
+
+if (asprintf(path, %s/%s, root-path, group) == -1)
+rc = -ENOMEM;
+out:
+virCgroupFree(root);
+
+return rc;
+}
+
+static int virCgroupPathOf(const char *grppath,
+   const char *key,
+   char **path)
+{
+virCgroupPtr root;
+int rc = 0;
+char *controller = NULL;
+
+if (strchr(key, '.') == NULL)
+return -EINVAL;
+
+if (sscanf(key, %a[^.], controller) != 1)
+return -EINVAL;
+
+root = virCgroupGetMount(controller);
+if (root == NULL) {
+rc = -ENOTDIR;
+goto out;
+}
+
+if (asprintf(path, %s/%s/%s, root-path, grppath, key) == -1)
+rc = -ENOMEM;
+out:
+virCgroupFree(root);
+VIR_FREE(controller);
+
+return rc;
+}
+
+static int virCgroupSetValueStr(virCgroupPtr group,
+const char *key,
+const char *value)
+{
+int fd = -1;
+int rc = 0;
+char *keypath = NULL;
+
+rc = virCgroupPathOf(group-path, key, keypath);
+if (rc != 0)
+return rc;
+
+fd = open(keypath, O_WRONLY);
+if (fd  0) {
+DEBUG(Unable to open %s: %m, keypath);
+rc = -ENOENT;
+goto out;
+}
+
+DEBUG(Writing '%s' to '%s', value, keypath);
+
+rc = safewrite(fd, value, strlen(value));
+if (rc  0) {
+DEBUG(Failed to write value '%s': %m, value);

[libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Dan Smith
This patch adds code to the controller to set up a cgroup named after the
domain name, set the memory limit, and restrict devices.  It also
adds bits to lxc_driver to properly clean up the cgroup on domain death.

If virCgroupHaveSupport() says that no support is available, then we just
allow the domain creation to proceed as it did before without resource
controls in place.

Changes:
 - Use device number constants from cgroup.h
 - Be sure to remove the cgroup on the failure path
 - Use the more abstract API

diff -r 5590af0c8c3e -r e827f165c23b src/Makefile.am
--- a/src/Makefile.am   Fri Oct 03 08:06:52 2008 -0700
+++ b/src/Makefile.am   Fri Oct 03 08:38:57 2008 -0700
@@ -90,13 +90,15 @@
lxc_conf.c lxc_conf.h   \
lxc_container.c lxc_container.h \
lxc_driver.c lxc_driver.h   \
-   veth.c veth.h
+   veth.c veth.h   \
+   cgroup.c cgroup.h
 
 LXC_CONTROLLER_SOURCES =   \
lxc_conf.c lxc_conf.h   \
lxc_container.c lxc_container.h \
lxc_controller.c\
-   veth.c veth.h
+   veth.c veth.h   \
+   cgroup.c cgroup.h
 
 OPENVZ_DRIVER_SOURCES =\
openvz_conf.c openvz_conf.h \
diff -r 5590af0c8c3e -r e827f165c23b src/lxc_controller.c
--- a/src/lxc_controller.c  Fri Oct 03 08:06:52 2008 -0700
+++ b/src/lxc_controller.c  Fri Oct 03 08:38:57 2008 -0700
@@ -42,12 +42,82 @@
 #include veth.h
 #include memory.h
 #include util.h
-
+#include cgroup.h
 
 #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
 #define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
 
 int debugFlag = 0;
+
+struct cgroup_device_policy {
+char type;
+int major;
+int minor;
+};
+
+/**
+ * lxcSetContainerResources
+ * @def: pointer to virtual machine structure
+ *
+ * Creates a cgroup for the container, moves the task inside,
+ * and sets resource limits
+ *
+ * Returns 0 on success or -1 in case of error
+ */
+static int lxcSetContainerResources(virDomainDefPtr def)
+{
+virCgroupPtr cgroup;
+int rc = -1;
+int i;
+struct cgroup_device_policy devices[] = {
+{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_NULL},
+{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_ZERO},
+{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_FULL},
+{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_RANDOM},
+{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_URANDOM},
+{'c', VIR_CG_DEV_MAJ_TTY, VIR_CG_DEV_MIN_CONSOLE},
+{0,   0, 0}};
+
+if (virCgroupHaveSupport() != 0)
+return 0; /* Not supported, so claim success */
+
+rc = virCgroupForDomain(def, lxc, cgroup);
+if (rc != 0) {
+lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+ _(Unable to create cgroup for %s\n), def-name);
+return rc;
+}
+
+rc = virCgroupSetMemory(cgroup, def-maxmem);
+if (rc != 0)
+goto out;
+
+rc = virCgroupDenyAllDevices(cgroup);
+if (rc != 0)
+goto out;
+
+for (i = 0; devices[i].type != 0; i++) {
+struct cgroup_device_policy *dev = devices[i];
+rc = virCgroupAllowDevice(cgroup,
+  dev-type,
+  dev-major,
+  dev-minor);
+if (rc != 0)
+goto out;
+}
+
+rc = virCgroupAddTask(cgroup, getpid());
+out:
+if (rc != 0) {
+lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+ _(Failed to set lxc resources: %s\n), strerror(-rc));
+virCgroupRemove(cgroup);
+}
+
+virCgroupFree(cgroup);
+
+return rc;
+}
 
 static char*lxcMonitorPath(virDomainDefPtr def)
 {
@@ -394,6 +464,9 @@
 if (lxcControllerMoveInterfaces(nveths, veths, container)  0)
 goto cleanup;
 
+if (lxcSetContainerResources(def)  0)
+goto cleanup;
+
 if (lxcContainerSendContinue(control[0])  0)
 goto cleanup;
 
diff -r 5590af0c8c3e -r e827f165c23b src/lxc_driver.c
--- a/src/lxc_driver.c  Fri Oct 03 08:06:52 2008 -0700
+++ b/src/lxc_driver.c  Fri Oct 03 08:38:57 2008 -0700
@@ -43,6 +43,7 @@
 #include bridge.h
 #include veth.h
 #include event.h
+#include cgroup.h
 
 
 /* debug macros */
@@ -376,6 +377,7 @@
 int waitRc;
 int childStatus = -1;
 virDomainNetDefPtr net;
+virCgroupPtr cgroup;
 
 while (((waitRc = waitpid(vm-pid, childStatus, 0)) == -1) 
errno == EINTR)
@@ -408,6 +410,11 @@
 for (net = vm-def-nets; net; net = net-next) {
 vethInterfaceUpOrDown(net-ifname, 0);
 vethDelete(net-ifname);
+}
+
+if (virCgroupForDomain(vm-def, lxc, cgroup) == 0) {
+

Re: [libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions

2008-10-03 Thread Balbir Singh
Dan Smith wrote:
 This patch adds src/cgroup.{c,h} with support for creating and manipulating
 cgroups.  It's quite naive at the moment, but should provide something to
 work with to move forward with resource controls.
 
 All groups created with the internal API are forced under $mount/libvirt/
 to keep everything together.  The first time a group is created, the libvirt
 directory is also created, and the settings from the root are inherited.
 
 The code scans the mount table to look for the first mount of type cgroup,
 and assumes that all controllers are mounted there.  I think this
 could/should be updated to prefer a mount with just the controller(s) we
 want, if there are multiple ones.
 
 If you have the cpuset controller enabled, and cpuset.cpus_exclusive is 1,
 then all cgroups to be created will fail.  Since we probably shouldn't
 blindly set the root to be non-exclusive, we may also want to consider this
 condition to be no cgroup support.
 
 diff -r 444e2614d0a2 -r 8e948eb88328 src/Makefile.am
 --- a/src/Makefile.am Wed Sep 17 16:07:03 2008 +
 +++ b/src/Makefile.am Mon Sep 29 09:37:42 2008 -0700
 @@ -96,7 +96,8 @@
   lxc_conf.c lxc_conf.h   \
   lxc_container.c lxc_container.h \
   lxc_controller.c\
 - veth.c veth.h
 + veth.c veth.h   \
 + cgroup.c cgroup.h
 
  OPENVZ_DRIVER_SOURCES =  \
   openvz_conf.c openvz_conf.h \
 diff -r 444e2614d0a2 -r 8e948eb88328 src/cgroup.c
 --- /dev/null Thu Jan 01 00:00:00 1970 +
 +++ b/src/cgroup.cMon Sep 29 09:37:42 2008 -0700
 @@ -0,0 +1,526 @@
 +/*
 + * cgroup.c: Tools for managing cgroups
 + *
 + * Copyright IBM Corp. 2008
 + *
 + * See COPYING.LIB for the License of this software
 + *
 + * Authors:
 + *  Dan Smith [EMAIL PROTECTED]
 + */
 +#include config.h
 +
 +#include stdio.h
 +#include stdint.h
 +#include inttypes.h
 +#include mntent.h
 +#include fcntl.h
 +#include string.h
 +#include errno.h
 +#include stdlib.h
 +#include stdbool.h
 +#include sys/stat.h
 +#include sys/types.h
 +#include libgen.h
 +
 +#include internal.h
 +#include util.h
 +#include cgroup.h
 +
 +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
 +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg)
 +
 +struct virCgroup {
 +char *path;
 +};
 +

There is no support for permissions, is everything run as root?

 +void virCgroupFree(virCgroupPtr *group)
 +{
 +if (*group != NULL) {
 +free((*group)-path);
 +free(*group);
 +*group = NULL;
 +}
 +}
 +
 +static virCgroupPtr cgroup_get_mount(void)
 +{
 +FILE *mounts;
 +struct mntent entry;
 +char buf[512];

Is 512 arbitrary? How do we know it is going to be sufficient?

 +virCgroupPtr root = NULL;
 +
 +root = calloc(1, sizeof(*root));
 +if (root == NULL)
 +return NULL;
 +
 +mounts = fopen(/proc/mounts, r);
 +if (mounts == NULL) {
 +DEBUG0(Unable to open /proc/mounts: %m);
 +goto err;
 +}
 +
 +while (getmntent_r(mounts, entry, buf, sizeof(buf)) != NULL) {
 +if (STREQ(entry.mnt_type, cgroup)) {
 +root-path = strdup(entry.mnt_dir);
 +break;
 +}
 +}
 +
 +if (root-path == NULL) {
 +DEBUG0(Did not find cgroup mount);

Or strdup failed due to ENOMEM

 +goto err;
 +}
 +
 +fclose(mounts);
 +
 +return root;
 +err:
 +virCgroupFree(root);
 +
 +return NULL;
 +}
 +
 +int virCgroupHaveSupport(void)
 +{
 +virCgroupPtr root;
 +
 +root = cgroup_get_mount();
 +if (root == NULL)
 +return -1;
 +
 +virCgroupFree(root);
 +

This is quite a horrible way of wasting computation.

 +return 0;
 +}
 +
 +static int cgroup_path_of(const char *grppath,
 +  const char *key,
 +  char **path)
 +{
 +virCgroupPtr root;
 +int rc = 0;
 +
 +root = cgroup_get_mount();

So every routine calls cgroup_path_of(), reads the mounts entry and find a entry
for cgroup and returns it, why not do it just once and use it.

 +if (root == NULL) {
 +rc = -ENOTDIR;
 +goto out;
 +}
 +
 +if (asprintf(path, %s/%s/%s, root-path, grppath, key) == -1)
 +rc = -ENOMEM;
 +out:
 +virCgroupFree(root);
 +
 +return rc;
 +}
 +
 +int virCgroupSetValueStr(virCgroupPtr group,
 + const char *key,
 + const char *value)
 +{
 +int fd = -1;
 +int rc = 0;
 +char *keypath = NULL;
 +
 +rc = cgroup_path_of(group-path, key, keypath);
 +if (rc != 0)
 +return rc;
 +
 +fd = open(keypath, O_WRONLY);

I see a mix of open and fopen calls.I would prefer to stick to just one, helps
with readability.

 +if (fd  0) {
 +DEBUG(Unable to open %s: %m, keypath);
 +rc 

Re: [libvirt] [PATCH 0 of 2] Add cgroup manipulation and LXC driver support

2008-10-03 Thread Daniel Veillard
On Fri, Oct 03, 2008 at 08:40:22AM -0700, Dan Smith wrote:
 Revision 3 of the cgroups patch, with the round 2 comments addressed.

  Fine for me +1

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 08:40:24AM -0700, Dan Smith wrote:
 This patch adds code to the controller to set up a cgroup named after the
 domain name, set the memory limit, and restrict devices.  It also
 adds bits to lxc_driver to properly clean up the cgroup on domain death.
 
 If virCgroupHaveSupport() says that no support is available, then we just
 allow the domain creation to proceed as it did before without resource
 controls in place.

 +struct cgroup_device_policy devices[] = {
 +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_NULL},
 +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_ZERO},
 +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_FULL},
 +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_RANDOM},
 +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_URANDOM},
 +{'c', VIR_CG_DEV_MAJ_TTY, VIR_CG_DEV_MIN_CONSOLE},
 +{0,   0, 0}};

You're going to hate me for suggesting more changes, but

This list of devices is currently duplicated in two places - once
here where we set permissions, and again when we actually create
the container and populate its /dev/ in lxc_container.c. Could do
with a master list of device nodes used by both.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Dan Smith
DB This list of devices is currently duplicated in two places - once
DB here where we set permissions, and again when we actually create
DB the container and populate its /dev/ in lxc_container.c. Could do
DB with a master list of device nodes used by both.

So does that mean they should be somewhere other than cgroup.h?

If not, I think it might be appropriate to make that change in a
separate patch, since populating /dev isn't really cgroup-related.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]


pgpck5elXVqx7.pgp
Description: PGP signature
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 08:45:00AM -0700, Dan Smith wrote:
 DB This list of devices is currently duplicated in two places - once
 DB here where we set permissions, and again when we actually create
 DB the container and populate its /dev/ in lxc_container.c. Could do
 DB with a master list of device nodes used by both.
 
 So does that mean they should be somewhere other than cgroup.h?

I'd think lxc_config.h or lcx_container.h would be a more natural
place for the constants, since both the .c files which would use
them as lxc_ driver files.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions

2008-10-03 Thread Dan Smith
BS There is no support for permissions, is everything run as root?

The LXC driver must run as root, yes.

BS Is 512 arbitrary? How do we know it is going to be sufficient?

This should probably be a constant, but it's more than enough for any
of the values we're setting.  Defining it specifically will help make
that clear.

BS So every routine calls cgroup_path_of(), reads the mounts entry
BS and find a entry for cgroup and returns it, why not do it just
BS once and use it.

Because maintaining the (potentially stale) state isn't worth the
small amount of work necessary to parse the mount table on each domain
creation (IMHO).

BS I see a mix of open and fopen calls.I would prefer to stick to
BS just one, helps with readability.

You mean because I use open() and getmntent_r() wants a FILE*?  I
think I'm willing to live with that :)

BS I don't think 512 bytes will be sufficient. THere are multi-line
BS files like memory.stat.

We don't read memory.stat (or any other values at the moment).
However:

  % for i in $(find /cgroup | grep memory.stat); do cat $i | wc -c; done
  53
  53
  53
  98

I don't see any point in allocating a huge amount of memory on the
stack each time if we're not going to use it.  If we need to read
larger values in the future, we could move to an allocated string.

BS Why do you need access? mkdir will do that check anyway.

Yep, I'll change it.

BS Why 0655? Why not use meaningful names see sys/stat.h

I don't think that the meaningful names in sys/stat.h lead to better
readability, personally.  Octal permissions are used elsewhere int the
code pretty extensively, so I'm comfortable with this as it is.

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: [EMAIL PROTECTED]


pgpPFzo8qY9qz.pgp
Description: PGP signature
--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
Hi, Everyone,

I've seen a new set of patches from Dan Smith, which implement cgroup support
for libvirt. While the patches seem simple, there are some issues that have been
pointed out in the posting itself.

I hope that libvirt will switch over (may be after your concerns are addressed
and definitely in the longer run) to using libcgroups rather than having an
internal implementation of cgroups. The advantages of switching over would be
using the functionality that libcgroup already provides

libcgroups (libcg.sf.net) provides

1. Ability to configure and mount cgroups and controllers via initscripts and a
configuration file
2. An API to control and read cgroups information
3. Thread safety around API calls
4. Daemons to automatically classify a task based on a certain set of rules
5. API to extract current cgroup classification (where is the task currently in
the cgroup hierarchy)

While re-implementing might sound like a cool thing to do, here are the 
drawbacks

1. It leads to code duplication and reduces code reuse
2. It leads to confused users

I understand that in the past there has been a perception that libcgroups might
not yet be ready, because we did not have ABI stability built into the library
and the header file had old comments about things changing. I would urge the
group to look at the current implementation of libcgroups (look at v0.32) and
help us

1. Fix any issues you see or point them to us
2. Add new API or request for new API that can help us integrate better with 
libvirt



-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Daniel Veillard
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 I understand that in the past there has been a perception that libcgroups 
 might
 not yet be ready, because we did not have ABI stability built into the library
 and the header file had old comments about things changing. I would urge the
 group to look at the current implementation of libcgroups (look at v0.32) and
 help us
 
 1. Fix any issues you see or point them to us

  I did point the general problem of ABI in libcgroup
http://www.mail-archive.com/libvir-list@redhat.com/msg08388.html

I didn't see any reply to the points I raised specifically.
In the meantime we got a relatively simple, sufficient for now, usable
right now, patch fullfilling our needs.
A working patch is better in my eye than something which may work well
in the future if we take the time to integrate it and stabilize and
propagate to the systems we use.

The package available in Fedora 9 has not improved as far as I can tell.
So I'm still keeping the same point of view as posted on that same
thread a month ago:

  http://www.mail-archive.com/libvir-list@redhat.com/msg08472.html

Yes I don't want to presume the ability of the libcgroup to become
cleaner and more stable, we can probably go with a small internal API
and when/if things become nicer, then reuse libcgroup,

 As maintainer I will also note that nicer also imply the ability
to work well and smoothly with the other maintainers. I hate guerilla,
I would prefer if you had read and replied to what I wrote.

  So Dan Smith patch should IMHO go in now, if later your API are widely
distributed, cleaner than what i have now (0.1c may be old but what is
available to us on Fedora, no idea what is available on other distros)
and there is a clean patch to switch then we will look at it, right now
we can't use libcgroup in my opinion.

Daniel

-- 
Daniel Veillard  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
[EMAIL PROTECTED]  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 Hi, Everyone,
 
 I've seen a new set of patches from Dan Smith, which implement cgroup support
 for libvirt. While the patches seem simple, there are some issues that have 
 been
 pointed out in the posting itself.
 
 I hope that libvirt will switch over (may be after your concerns are addressed
 and definitely in the longer run) to using libcgroups rather than having an
 internal implementation of cgroups. The advantages of switching over would be
 using the functionality that libcgroup already provides
 
 libcgroups (libcg.sf.net) provides
 
 1. Ability to configure and mount cgroups and controllers via initscripts and 
 a
 configuration file
 2. An API to control and read cgroups information
 3. Thread safety around API calls
 4. Daemons to automatically classify a task based on a certain set of rules
 5. API to extract current cgroup classification (where is the task currently 
 in
 the cgroup hierarchy)

So from a functional point of view you are addressing essentially three
use cases

 1. System configuration for controllers
 2. Automatic task classification
 3. Application development API for creating groups

If each piece is correctly designed, the choice of implementation for
each of these can be, and in some cases must be, totally independant. 

Since the kernel restricts that a single controller can only be attached
to one cgroupsfs mount point, and one attach cannot be changed, the choice
of how / where to mount controllers must remain outside the scope of
applications. If any application using cgroups were to specify mount
points, it would be inflicting its own requirements on every user of
cgroups. This implies that applications must be designed to work with 
whatever controller mount configuration the admin has configured, and
not configure stuff themselves. So impl for point 1 (configuration) 
must, by neccessity, be completely independant of impl for point 3 
(application API).

Considering automatic task classification. The task classification engine
must be able to cope with the fact that applications have some functional
requirements on cgroups setup. Taking libvirt as an example, we have a
specific need to apply some controllers over a group of processes forming
a container. A task classification engine must not re-clasify individual
tasks within a container because that would conflict with the semantics
required by libvirt. It is, however, free to re-classify the libvirtd 
daemon itself - whatever cgroup libvirtd is placed in, it will create the
LXC cgroups below this point.

So if libvirt is designed correctly, it will work with whatever cgroup
task classification engine that might be running. Similarly if the task
classification engine has been designed to co-operate with applications
there is no problem running it alonside libvirt. Thus the implementation
of points 2 (task classification) and point 3 (application API) have no
need to be formally tied together. Furthermore tieing them together does
not magically solve the problem that both applications  the cgroups task
classification engine need to be intelligently designed to co-operate.


 While re-implementing might sound like a cool thing to do, here are the 
 drawbacks
 
 1. It leads to code duplication and reduces code reuse

This is important if the library code is providing significant value add to 
the application using it. As it stands, libcgroup is merely a direct interface
to the cgroups filesystem providing weakly typed setters  getters - with the
exception of looking at the mount table to find where a controller lives, this
is not hard / complex code, so the benefits of re-use are not particularly 
high. 

In such a scenario reducing code duplication is not in itself a benefit, since
there are costs associated with using external libraries. It is more complicated
integrate 2 independant style sof API, particularly with different views on 
error reporting, memory management and varying expectations for the semantic
models exposed.

There are a number of 'hard' questions wrt to cgroups usage by applications,
two of which are outlined above. Simply having all applications use a single
API cannot magically solve any of these problems - no matter what API is used
application developers need to take care to design their usage of cgroups
such that it 'plays nicely' with other applications.

 2. It leads to confused users

The use of cgroups is an internal implementation  detail for libvirt's
LXC driver. In comon with all libvirt drivers, the user has no need to
know about the underlying impl details and these can  will change at
will as we discover better ways to achieve things. As such its irrelevant
to a user how we configure the cgroups filesytem for libvirt - whether we
do it directly, or via libcg the end result is identical - a set of 
directories in the cgroups filesystem.

 
 I understand that in the past there has been a perception that libcgroups 
 

Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-10-03 Thread Daniel P. Berrange
On Wed, Oct 01, 2008 at 08:41:19AM +0530, Balbir Singh wrote:
 Dan Smith wrote:
  
  DB At the same time having the controllers mounted is mandatory for
  DB libvirt to work and asking the admin to set things up manually
  DB also sucks. So perhaps we'll need to mount them automatically, but
  DB make this behaviuour configurable in some way, so admin can
  DB override it
  
  Perhaps we can:
  
   - Have a list of controllers we use (memory and devices so far)
   - Create each group in all mounts required to satisfy our necessary
 controllers
   - Select the appropriate mount when setting a cont.key value
  
 
 I am not sure how libvirt provides thread safety, but I did not see any 
 explicit
 coding for that?

The thread safety model for libvirt has two levels

 - A single virConnectPtr object must only be used by one thread. 
   If you have multiple threads, you must provide each with its
   own conenct object

 - Within a stateless driver (Xen, OpenVZ, Test), there is no shared
   state between virConnectPtr objects, so there are no thread issues
   in this respect

 - With a stateful driver, the libvirtd daemon ensures that only a
   single thread is active at once, so against there are no thread
   issues there either.

Now, in a short while I will be making the daemon fully-multithreaded. When
this happens, the stateful drivers will be required to maintain mutexes for
locking. The locking model wil have 2 levels, one lock over the driver as
a whole. This is held only while acquiring a lock against the object being
modified (eg the virtual domain object).

Each virtual domain, lives in one cgroup, so there is a single virCGroup
object associated with each domain. the virCGroup object state is seflf
contained, so independant virCGroup objects can be accessed concurrently
from multiple threads, without any threads safety issues.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 01:20:35PM -0400, David Lively wrote:
 Okay, I see what you mean.  I'll create a virNodeDeviceDefPtr and follow
 the established pattern.
 
 I'm really trying to avoid creating a struct/union that must be extended
 every time we support a new capability.  I struggled quite a bit with
 the right representation for capabilities and bus details.. At one
 point, I had my own (general-purpose; i.e., not specific to any type of
 capability) virNodeDeviceCapabilities type, but it looked so much like a
 DOM tree that just using a xmlNode seemed like the best choice.
 
 Are you suggesting creating a struct for each kind of
 currently-supported capability, or are you suggesting creating a single
 struct that's general enough to represent all capabilities?

I'm suggesting a something that is specific for each capability. Now
if we were to support all metadata that HAL exposes this would be a
huge job. But I don't believe we should be exposing all the metadata
that HAL has, because in the future this XML format has to work with
DeviceKit which is basically exposing UDev metadata, and VMWare's
APIs which expose hardware info in their own way.

Thus, IMHO, we should expose specific capabilities we know we care
about, for specific sub-substems, and not try to handle the entire
of HAL.

A good starting point would be

 - PCI,  domain, bus, slot, function, and vendor/product 
 - USB, bus, device and  vendor/product

Basically same info required for attaching the device to a domain,
thus matching the struct  virDomainHostdevDefPtr in domain_conf.h

 - NIC,  name  mac address
 - Block, name and host adapter
 - Host adapter, name

For these also have a way to link to the parent device associated
with them (ie the PCI/USB device providing them).

That would basically be enough for use of the existing domain/storage
and network APIs which is what we need this data for, and this minimal
info should be satisifiable with VMWare's APIs, and DeviceKit.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 I understand that in the past there has been a perception that libcgroups 
 might
 not yet be ready, because we did not have ABI stability built into the library
 and the header file had old comments about things changing. I would urge the
 group to look at the current implementation of libcgroups (look at v0.32) and
 help us
 
 1. Fix any issues you see or point them to us
 2. Add new API or request for new API that can help us integrate better with 
 libvirt

To expand on what I said in my other mail about providing value-add over 
the representation exposed by the kernel, here's some thoughts on the API
exposed.

Consider the following high level use case of libvirt

 - A set of groups, in a 3 level hierarchy APPNAME/DRIVER/DOMAIN
 - Control the ACL for block/char devices
 - Control memory limits

This translates into an underling implementation, that I need to create 3
levels of cgroups in the filesystem, attach my PIDs at the 3rd level
use the memory and device controllers and attach PIDs at the 3rd, and
set values for attributes exposed by the controllers. Notice I'm not
actually setting any config parms at the 1st  2nd levels, but they
do need to still exist to ensure namespace uniqueness amongst different
applications using cgroups.

The current cgroups API provides APIs that directly map to individual
actions wrt the kernel filesystem exposed. So as an application developer
I have to explicitly create the 3 levels of hierarchy, tell it I want
to use memory  device controllers, format config values into the syntax
required for each attribute, and remeber the attribute names.

 // Create the hierachy APPNAME/DRIVER/DOMAIN
 c1 = cgroup_new_cgroup(libvirt)
 c2 = cgroup_new_cgroup_parent(c1, lxc)
 c3 = cgroup_new_cgroup_parent(c2, domain.name)

 // Setup the controllers I want to use
 cgroup_add_controler(c3, devices)
 cgroup_add_controller(c3, memory)
   
 // Add my domain's PID to the cgroup
 cgroup_attach_task(c3, domain.pid)

 // Set the device ACL limits
 cgroup_set_value_string(c2, devices.deny, a);

 char buf[1024];
 sprintf(buf, %c %d:%d, 'c', 1, 3);
 cgroup_set_value_stirng(c2, devices.allow, buf);

 // Set memory limit
 cgroup_set_value_uint64(c2, memory.limit_in_bytes, domain.memory * 1024);

This really isn't providing any semantically useful abstraction over 
the direct filesytem manipulation. Just a bunch of wrappers for mkdir(),
mount() and read()/write() calls. My application still has to know far
too much information about the details of cgroups as exposed by the
kernel. 

I do not care that there is a concept of  'controllers' at all, I just
want to set device ACLs and memory limits. I do not care what the attributes
in the filesystem are called, again I just want to set device ACLs and memory
limits.  I do not care what the data format for them must be for device/memory
settings. Memory settings could be stored in base-2, base-10 or base-16 I 
should not have to know this information.

With this style of API, the library provide no real value-add or  compelling
reason to use it.

What might a more useful API look like? At least from my point of view,
I'd like to be able to say:

  // Tell it I want $PID placed in APPNAME/DRIVER/DOMAIN
  char *path[] = { libvirt, lxc, domain.name};
  cg = cgroup_new_path(path, domain.pid)

  // I want to deny all devices
  cgroup_deny_all_devices(cg);

  // Allow /dev/null - either by node/major/minor
  cgroup_allow_device_node(cg, 'c', 1, 3);
   
  // Or more conviently just give it a node to copy info from
  cgroup_allow_device_node(cg, /dev/null)

  // Set memory in KB
  cgroup_set_memory_limit_kb(cg, domain.memory)

Notice how with such a style of API, I don't need to know anything about
the low level implementation details - I'm working entirely in terms of
semantically meaningful concepts.

Now, comes the hard bit - you have to figure out what semantic concepts
you want to expose to applications. The example here would be suitable
for libvirt, but not neccessarily for other applications. Picking the
right APIs is very much much harder than  just exposing the kernel 
capabilities directly as libcgroup.h does now, but the trade off is 
that the resulting API would be much more useful and interesting to 
app developers.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread David Lively
I definitely wasn't planning on covering all of HAL :-)

I assume you weren't planning on exposing these capability-specific
representations in the public API.  Right?  (If not, ignore the rest of
this ...)

So I guess I don't see the value of having these cap-specific internal
representations.  I keep a string array of the cap names for
ListDevicesByCap, but other than that, the capability data is used only
by virNodeDeviceGetXMLDesc().  So it seems natural to represent it in a
form that can easily be converted to XML, and that can represent all the
XML we'll need to output (like xmlNode).  Otherwise, we are forced to
write more capability-specific code, with no particular advantage (no
stronger typing guarantees if we don't expose specific cap types).

Dave

P.S. I do think it would be useful to have access to capability details
other than GetXMLDesc.  Perhaps:
   const char *virNodeDeviceCapabilityProperty(virNodeDevicePtr dev,
const char *cap,
const char *prop)
but note this exposes them only in a general (property / value) way, and
is quite easily implemented with a xmlNode representation.


On Fri, 2008-10-03 at 18:41 +0100, Daniel P. Berrange wrote:
 On Fri, Oct 03, 2008 at 01:20:35PM -0400, David Lively wrote:
  Okay, I see what you mean.  I'll create a virNodeDeviceDefPtr and follow
  the established pattern.
  
  I'm really trying to avoid creating a struct/union that must be extended
  every time we support a new capability.  I struggled quite a bit with
  the right representation for capabilities and bus details.. At one
  point, I had my own (general-purpose; i.e., not specific to any type of
  capability) virNodeDeviceCapabilities type, but it looked so much like a
  DOM tree that just using a xmlNode seemed like the best choice.
  
  Are you suggesting creating a struct for each kind of
  currently-supported capability, or are you suggesting creating a single
  struct that's general enough to represent all capabilities?
 
 I'm suggesting a something that is specific for each capability. Now
 if we were to support all metadata that HAL exposes this would be a
 huge job. But I don't believe we should be exposing all the metadata
 that HAL has, because in the future this XML format has to work with
 DeviceKit which is basically exposing UDev metadata, and VMWare's
 APIs which expose hardware info in their own way.
 
 Thus, IMHO, we should expose specific capabilities we know we care
 about, for specific sub-substems, and not try to handle the entire
 of HAL.
 
 A good starting point would be
 
  - PCI,  domain, bus, slot, function, and vendor/product 
  - USB, bus, device and  vendor/product
 
 Basically same info required for attaching the device to a domain,
 thus matching the struct  virDomainHostdevDefPtr in domain_conf.h
 
  - NIC,  name  mac address
  - Block, name and host adapter
  - Host adapter, name
 
 For these also have a way to link to the parent device associated
 with them (ie the PCI/USB device providing them).
 
 That would basically be enough for use of the existing domain/storage
 and network APIs which is what we need this data for, and this minimal
 info should be satisifiable with VMWare's APIs, and DeviceKit.
 
 Regards,
 Daniel

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote:
 On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 I understand that in the past there has been a perception that libcgroups 
 might
 not yet be ready, because we did not have ABI stability built into the 
 library
 and the header file had old comments about things changing. I would urge the
 group to look at the current implementation of libcgroups (look at v0.32) and
 help us

 1. Fix any issues you see or point them to us
 2. Add new API or request for new API that can help us integrate better with 
 libvirt

 To expand on what I said in my other mail about providing value-add over
 the representation exposed by the kernel, here's some thoughts on the API
 exposed.

 Consider the following high level use case of libvirt

  - A set of groups, in a 3 level hierarchy APPNAME/DRIVER/DOMAIN
  - Control the ACL for block/char devices
  - Control memory limits

 This translates into an underling implementation, that I need to create 3
 levels of cgroups in the filesystem, attach my PIDs at the 3rd level
 use the memory and device controllers and attach PIDs at the 3rd, and
 set values for attributes exposed by the controllers. Notice I'm not
 actually setting any config parms at the 1st  2nd levels, but they
 do need to still exist to ensure namespace uniqueness amongst different
 applications using cgroups.

 The current cgroups API provides APIs that directly map to individual
 actions wrt the kernel filesystem exposed. So as an application developer
 I have to explicitly create the 3 levels of hierarchy, tell it I want
 to use memory  device controllers, format config values into the syntax
 required for each attribute, and remeber the attribute names.

 // Create the hierachy APPNAME/DRIVER/DOMAIN
 c1 = cgroup_new_cgroup(libvirt)
 c2 = cgroup_new_cgroup_parent(c1, lxc)
 c3 = cgroup_new_cgroup_parent(c2, domain.name)

 // Setup the controllers I want to use
 cgroup_add_controler(c3, devices)
 cgroup_add_controller(c3, memory)

 // Add my domain's PID to the cgroup
 cgroup_attach_task(c3, domain.pid)

 // Set the device ACL limits
 cgroup_set_value_string(c2, devices.deny, a);

 char buf[1024];
 sprintf(buf, %c %d:%d, 'c', 1, 3);
 cgroup_set_value_stirng(c2, devices.allow, buf);

 // Set memory limit
 cgroup_set_value_uint64(c2, memory.limit_in_bytes, domain.memory * 
 1024);

 This really isn't providing any semantically useful abstraction over
 the direct filesytem manipulation. Just a bunch of wrappers for mkdir(),
 mount() and read()/write() calls. My application still has to know far
 too much information about the details of cgroups as exposed by the
 kernel.


True, it definitely does and the way I look at APIs is that they are
layers. We've built the first layer that abstracts permissions, paths
and strings into a set of useful API. The second layer does things
that you say, the question then is why don't we have it yet?

Let me try and answer that question

1. We've been trying to build configuration, classification and the
low level plumbing
2. We've been planning to build the exact same thing that you say, we
call that the pluggable architecture, where controller plug in their
logic and provide the abstractions you need, but not gotten there yet.

When you announced cgroup support in libvirt, it was definitely going
to be a user and we hoped that you would come to us with your exact
requirements that you've mentioned now (believe me, your feedback is
very useful). The question then to ask is, is it cheaper for you to
build these abstractions into libvirt or either helped us or asked us
to do so, we would have gladly obliged. You might say that the onus is
on the maintainers to do the right thing without feedback, but I would
beg to differ.

What you've asked for, I consider as a layer on top of the API we have
now and should be easy to build.

 I do not care that there is a concept of  'controllers' at all, I just
 want to set device ACLs and memory limits. I do not care what the attributes
 in the filesystem are called, again I just want to set device ACLs and memory
 limits.  I do not care what the data format for them must be for device/memory
 settings. Memory settings could be stored in base-2, base-10 or base-16 I
 should not have to know this information.

 With this style of API, the library provide no real value-add or  compelling
 reason to use it.

 What might a more useful API look like? At least from my point of view,
 I'd like to be able to say:

  // Tell it I want $PID placed in APPNAME/DRIVER/DOMAIN
  char *path[] = { libvirt, lxc, domain.name};
  cg = cgroup_new_path(path, domain.pid)

  // I want to deny all devices
  cgroup_deny_all_devices(cg);

  // Allow /dev/null - either by node/major/minor
  cgroup_allow_device_node(cg, 'c', 1, 3);

  // Or more conviently just give it a node to copy info 

Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Daniel P. Berrange
On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote:
 On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote:
 True, it definitely does and the way I look at APIs is that they are
 layers. We've built the first layer that abstracts permissions, paths
 and strings into a set of useful API. The second layer does things
 that you say, the question then is why don't we have it yet?
 
 Let me try and answer that question
 
 1. We've been trying to build configuration, classification and the
 low level plumbing
 2. We've been planning to build the exact same thing that you say, we
 call that the pluggable architecture, where controller plug in their
 logic and provide the abstractions you need, but not gotten there yet.
 
 When you announced cgroup support in libvirt, it was definitely going
 to be a user and we hoped that you would come to us with your exact
 requirements that you've mentioned now (believe me, your feedback is
 very useful). The question then to ask is, is it cheaper for you to
 build these abstractions into libvirt or either helped us or asked us
 to do so, we would have gladly obliged. You might say that the onus is
 on the maintainers to do the right thing without feedback, but I would
 beg to differ.

The thing I didn't mention, is that until Dan posted his current patches
actually implementing the cgroups stuff in LXC driver, I didn't have a 
good picture of what the ideal higher level interface would look like.
If you try and imagine high level APIs, without having an app actually
using them, its all too easy to design something that turns out to not
be useful.

So while I know the low level cgroups API isn't what we  need, it needs
the current proof of concept in the libvirt LXC  driver to discover what 
is an effective approach for libcgroups. I suspect our code will evolve
further as we learn from what we've got now.  By doing this entirely 
within libvirt we can experiment with effective implementation strategies
without having to lockdown a formally supported API immediately. Once 
things settle down, it'll easier for libcgroups to see exactly what is 
important for a high level API and thus make one that's useful to more 
apps in the long term.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] anyone implementing host device enumeration?

2008-10-03 Thread Daniel P. Berrange
On Fri, Oct 03, 2008 at 02:35:06PM -0400, David Lively wrote:
 I definitely wasn't planning on covering all of HAL :-)
 
 I assume you weren't planning on exposing these capability-specific
 representations in the public API.  Right?  (If not, ignore the rest of
 this ...)

Not at this time. For the forseeable future I expect the XML format to
be the only publically exposed representation of configuration. A long
time down the road when we're confident the representation is long term
sustainable  correct, we might consider exposing the structs, but that
is a long way off.

 So I guess I don't see the value of having these cap-specific internal
 representations.  I keep a string array of the cap names for
 ListDevicesByCap, but other than that, the capability data is used only
 by virNodeDeviceGetXMLDesc().  So it seems natural to represent it in a
 form that can easily be converted to XML, and that can represent all the
 XML we'll need to output (like xmlNode).  Otherwise, we are forced to
 write more capability-specific code, with no particular advantage (no
 stronger typing guarantees if we don't expose specific cap types).

The idea is that by having formal internal representations it makes it
easier for internal driver code to work with it in a gaurenteed consistent
fashion. While the structs may only be used by your specific driver for 
the intiial commit, experiance has shown our needs evolve over time and 
we'll probably end up with other internal code wanting it. We previously
had each hypervisor driver using an adhoc representation internally for
domain configs, and it just wasn't sustainable our internal usage 
evolved.

 P.S. I do think it would be useful to have access to capability details
 other than GetXMLDesc.  Perhaps:
const char *virNodeDeviceCapabilityProperty(virNodeDevicePtr dev,
   const char *cap,
   const char *prop)
 but note this exposes them only in a general (property / value) way, and
 is quite easily implemented with a xmlNode representation.

I'm wary of exposing sub-sets of the XML docs in simple key/value pairs
because our XML formats have tended to expand over time, so things that
started off key/value pairs gained extra attributes/child elements, all
of which are desired. It is simple enough for applications to wrap in a
property 'getter' using XPath on the XML doc if desired - virt-manager
does this internally for many things.

Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
On Sat, Oct 4, 2008 at 1:17 AM, Daniel P. Berrange [EMAIL PROTECTED] wrote:
 On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote:
 On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] 
 wrote:
 True, it definitely does and the way I look at APIs is that they are
 layers. We've built the first layer that abstracts permissions, paths
 and strings into a set of useful API. The second layer does things
 that you say, the question then is why don't we have it yet?

 Let me try and answer that question

 1. We've been trying to build configuration, classification and the
 low level plumbing
 2. We've been planning to build the exact same thing that you say, we
 call that the pluggable architecture, where controller plug in their
 logic and provide the abstractions you need, but not gotten there yet.

 When you announced cgroup support in libvirt, it was definitely going
 to be a user and we hoped that you would come to us with your exact
 requirements that you've mentioned now (believe me, your feedback is
 very useful). The question then to ask is, is it cheaper for you to
 build these abstractions into libvirt or either helped us or asked us
 to do so, we would have gladly obliged. You might say that the onus is
 on the maintainers to do the right thing without feedback, but I would
 beg to differ.

 The thing I didn't mention, is that until Dan posted his current patches
 actually implementing the cgroups stuff in LXC driver, I didn't have a
 good picture of what the ideal higher level interface would look like.
 If you try and imagine high level APIs, without having an app actually
 using them, its all too easy to design something that turns out to not
 be useful.

 So while I know the low level cgroups API isn't what we  need, it needs
 the current proof of concept in the libvirt LXC  driver to discover what
 is an effective approach for libcgroups. I suspect our code will evolve
 further as we learn from what we've got now.  By doing this entirely
 within libvirt we can experiment with effective implementation strategies
 without having to lockdown a formally supported API immediately. Once
 things settle down, it'll easier for libcgroups to see exactly what is
 important for a high level API and thus make one that's useful to more
 apps in the long term.


Please remember my words if you ever find that you have a code base
that looks like what we have in libcgroups, please remember to switch
over to libcgroup. I fear that you will reach that stage, the code
that is going in right now has too many things hard-coded and will
need a lot of changes going forward, things like adding support for
new controllers is not going to be straight forward, your assumption
that only root can create a container might be broken and we'll build
support for hierarchies, which will require further changes, etc. I am
not scaring you, just trying to make sure we don't solve the same
problems twice.

Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh

 The thing I didn't mention, is that until Dan posted his current patches
 actually implementing the cgroups stuff in LXC driver, I didn't have a
 good picture of what the ideal higher level interface would look like.
 If you try and imagine high level APIs, without having an app actually
 using them, its all too easy to design something that turns out to not
 be useful.

 So while I know the low level cgroups API isn't what we  need, it needs
 the current proof of concept in the libvirt LXC  driver to discover what
 is an effective approach for libcgroups. I suspect our code will evolve
 further as we learn from what we've got now.  By doing this entirely
 within libvirt we can experiment with effective implementation strategies
 without having to lockdown a formally supported API immediately. Once
 things settle down, it'll easier for libcgroups to see exactly what is
 important for a high level API and thus make one that's useful to more
 apps in the long term.


Agreed, the libvirt changes for cgroups have shown us a useful layer
to build. We'll keep on top of it and try and build something that
everyone can use.

Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support

2008-10-03 Thread Balbir Singh
Daniel P. Berrange wrote:
 On Wed, Oct 01, 2008 at 08:41:19AM +0530, Balbir Singh wrote:
 Dan Smith wrote:
 DB At the same time having the controllers mounted is mandatory for
 DB libvirt to work and asking the admin to set things up manually
 DB also sucks. So perhaps we'll need to mount them automatically, but
 DB make this behaviuour configurable in some way, so admin can
 DB override it

 Perhaps we can:

  - Have a list of controllers we use (memory and devices so far)
  - Create each group in all mounts required to satisfy our necessary
controllers
  - Select the appropriate mount when setting a cont.key value

 I am not sure how libvirt provides thread safety, but I did not see any 
 explicit
 coding for that?
 
 The thread safety model for libvirt has two levels
 
  - A single virConnectPtr object must only be used by one thread. 
If you have multiple threads, you must provide each with its
own conenct object
 
  - Within a stateless driver (Xen, OpenVZ, Test), there is no shared
state between virConnectPtr objects, so there are no thread issues
in this respect
 
  - With a stateful driver, the libvirtd daemon ensures that only a
single thread is active at once, so against there are no thread
issues there either.
 
 Now, in a short while I will be making the daemon fully-multithreaded. When
 this happens, the stateful drivers will be required to maintain mutexes for
 locking. The locking model wil have 2 levels, one lock over the driver as
 a whole. This is held only while acquiring a lock against the object being
 modified (eg the virtual domain object).
 
 Each virtual domain, lives in one cgroup, so there is a single virCGroup
 object associated with each domain. the virCGroup object state is seflf
 contained, so independant virCGroup objects can be accessed concurrently
 from multiple threads, without any threads safety issues.

Thanks, that was quite insightful.

-- 
Balbir

--
Libvir-list mailing list
Libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Re: [discuss] The new cgroup patches for libvirt

2008-10-03 Thread Balbir Singh
Daniel P. Berrange wrote:
 On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote:
 Hi, Everyone,

 I've seen a new set of patches from Dan Smith, which implement cgroup support
 for libvirt. While the patches seem simple, there are some issues that have 
 been
 pointed out in the posting itself.

 I hope that libvirt will switch over (may be after your concerns are 
 addressed
 and definitely in the longer run) to using libcgroups rather than having an
 internal implementation of cgroups. The advantages of switching over would be
 using the functionality that libcgroup already provides

 libcgroups (libcg.sf.net) provides

 1. Ability to configure and mount cgroups and controllers via initscripts 
 and a
 configuration file
 2. An API to control and read cgroups information
 3. Thread safety around API calls
 4. Daemons to automatically classify a task based on a certain set of rules
 5. API to extract current cgroup classification (where is the task currently 
 in
 the cgroup hierarchy)
 
 So from a functional point of view you are addressing essentially three
 use cases
 
  1. System configuration for controllers
  2. Automatic task classification
  3. Application development API for creating groups
 
 If each piece is correctly designed, the choice of implementation for
 each of these can be, and in some cases must be, totally independant. 
 
 Since the kernel restricts that a single controller can only be attached
 to one cgroupsfs mount point, and one attach cannot be changed, the choice
 of how / where to mount controllers must remain outside the scope of
 applications. If any application using cgroups were to specify mount
 points, it would be inflicting its own requirements on every user of
 cgroups. This implies that applications must be designed to work with 
 whatever controller mount configuration the admin has configured, and
 not configure stuff themselves. So impl for point 1 (configuration) 
 must, by neccessity, be completely independant of impl for point 3 
 (application API).
 
 Considering automatic task classification. The task classification engine
 must be able to cope with the fact that applications have some functional
 requirements on cgroups setup. Taking libvirt as an example, we have a
 specific need to apply some controllers over a group of processes forming
 a container. A task classification engine must not re-clasify individual
 tasks within a container because that would conflict with the semantics
 required by libvirt. It is, however, free to re-classify the libvirtd 
 daemon itself - whatever cgroup libvirtd is placed in, it will create the
 LXC cgroups below this point.
 
 So if libvirt is designed correctly, it will work with whatever cgroup
 task classification engine that might be running. Similarly if the task
 classification engine has been designed to co-operate with applications
 there is no problem running it alonside libvirt. Thus the implementation
 of points 2 (task classification) and point 3 (application API) have no
 need to be formally tied together. Furthermore tieing them together does
 not magically solve the problem that both applications  the cgroups task
 classification engine need to be intelligently designed to co-operate.
 

Agreed!

 
 While re-implementing might sound like a cool thing to do, here are the 
 drawbacks

 1. It leads to code duplication and reduces code reuse
 
 This is important if the library code is providing significant value add to 
 the application using it. As it stands, libcgroup is merely a direct interface
 to the cgroups filesystem providing weakly typed setters  getters - with the
 exception of looking at the mount table to find where a controller lives, this
 is not hard / complex code, so the benefits of re-use are not particularly 
 high. 
 

Please see my earlier email on layering of API.

 In such a scenario reducing code duplication is not in itself a benefit, since
 there are costs associated with using external libraries. It is more 
 complicated
 integrate 2 independant style sof API, particularly with different views on 
 error reporting, memory management and varying expectations for the semantic
 models exposed.
 

I disagree, I see a lot of code that does the same thing, look through
/proc/mounts, read and parse values to write and read. I see two API's you've
built on top of what libcgroup has (one for setting memory limit and the other
for devices). Please compare the patch sizes as well and you'll see what I mean.

 There are a number of 'hard' questions wrt to cgroups usage by applications,
 two of which are outlined above. Simply having all applications use a single
 API cannot magically solve any of these problems - no matter what API is used
 application developers need to take care to design their usage of cgroups
 such that it 'plays nicely' with other applications.
 

Playing nicely is a definite requirement, but not using existing code or
contributing to it if something is broken and