Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...
Hi. I rewrite functions taken from vzctl in the new patch, so this: Here is the patch, that implements the following behaviour * interface name inside container is automatically generated and equals ethN, where N is the number of that interface within current domain * mac address of that interface inside container is generated automatically by function openvzGenerateMac * if target dev= is specified, use it; otherwise, use the default openvz name, i.e., vethN.M for interface ethM in container with veid N * if mac address='...' specified, use it; otherwise, vzctl will generate it automatically * target dev and mac address are (re)stored (from)to $veid.conf is true, while that: the functions openvzGenerateMac and openvzGenerateVethName are taken from vzctl sources and slightly changed then is not true anymore. +static char * +openvzGenerateMac(void) +{ +char mac[6] = { +0x52, +0x54, +0x00, How did you get 0x52, 0x54, 0x00? +if (net-type == VIR_DOMAIN_NET_TYPE_BRIDGE) { +static int vnetNo = 0; Libvirt may be use as library in applications. If some will call create 2 containers, then first container will have eth0...ethN second will have ethN+1... Other looks good. -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...
2008/10/3 Evgeniy Sokolov [EMAIL PROTECTED] Hi. I rewrite functions taken from vzctl in the new patch, so this: Here is the patch, that implements the following behaviour * interface name inside container is automatically generated and equals ethN, where N is the number of that interface within current domain * mac address of that interface inside container is generated automatically by function openvzGenerateMac * if target dev= is specified, use it; otherwise, use the default openvz name, i.e., vethN.M for interface ethM in container with veid N * if mac address='...' specified, use it; otherwise, vzctl will generate it automatically * target dev and mac address are (re)stored (from)to $veid.conf is true, while that: the functions openvzGenerateMac and openvzGenerateVethName are taken from vzctl sources and slightly changed then is not true anymore. +static char * +openvzGenerateMac(void) +{ +char mac[6] = { +0x52, +0x54, +0x00, How did you get 0x52, 0x54, 0x00? This is the same as in virDomainNetRandomMAC() function, see the source code for it. +if (net-type == VIR_DOMAIN_NET_TYPE_BRIDGE) { +static int vnetNo = 0; Libvirt may be use as library in applications. If some will call create 2 containers, then first container will have eth0...ethN second will have ethN+1... OK, I will try to fix it. Other looks good. Actually, there is a bug :( When we generate host interface name, we need to save it in net-ifname. P.S. Now I am working on that: P.S. Are someone going to implement interface type='bridge' ... source bridge=... ... /interface part of openvz driver? :) I plan to implement it in a month. It will be fine if you are ready to develop the feature. I'm using $veid.conf to store information about bridge device in the following manner: for interface ifname, therhe will be a line #BRIGDE(ifname): bridge name for example, #BRIDGE(veth101.0): virbr1 Do you agree with that behaviour? -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...
On Fri, Oct 03, 2008 at 04:04:06PM +0400, Evgeniy Sokolov wrote: P.S. Now I am working on that: P.S. Are someone going to implement interface type='bridge' ... source bridge=... ... /interface part of openvz driver? :) I plan to implement it in a month. It will be fine if you are ready to develop the feature. I'm using $veid.conf to store information about bridge device in the following manner: for interface ifname, therhe will be a line #BRIGDE(ifname): bridge name for example, #BRIDGE(veth101.0): virbr1 Do you agree with that behaviour? I think we can simplify format to simplify parsing. #BRIDGE=virbr1:veth101.0,veth101.1,veth101.2 There's no need to store veth101.0, veth101.2 - they are automatically generated not intended to be stable across restarts. It is merely neccessary to persist the name of the bridge (or virtual network) to which they are attached, and the MAC address. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] PATCH: 2 of 5: Remove virDomainObjPtr linked list
This patch removes the linked list in the virDomainObjPtr object, and adds a new virDomainObjList struct to track domains as an array. The QEMU, LXC, OpenVZ and Test drivers get updated to account for this API change. domain_conf.c | 99 domain_conf.h | 23 +++--- lxc_conf.h |2 lxc_driver.c| 112 +-- openvz_conf.c | 24 ++ openvz_conf.h |2 openvz_driver.c | 54 +++ qemu_conf.h |2 qemu_driver.c | 198 +--- test.c | 77 +++-- 10 files changed, 274 insertions(+), 319 deletions(-) Daniel diff -r 2bb12d98c26b src/domain_conf.c --- a/src/domain_conf.c Fri Oct 03 12:45:58 2008 +0100 +++ b/src/domain_conf.c Fri Oct 03 12:58:03 2008 +0100 @@ -166,44 +166,40 @@ } -virDomainObjPtr virDomainFindByID(const virDomainObjPtr doms, +virDomainObjPtr virDomainFindByID(const virDomainObjListPtr doms, int id) { -virDomainObjPtr dom = doms; -while (dom) { -if (virDomainIsActive(dom) dom-def-id == id) -return dom; -dom = dom-next; -} +unsigned int i; + +for (i = 0 ; i doms-count ; i++) +if (virDomainIsActive(doms-objs[i]) +doms-objs[i]-def-id == id) +return doms-objs[i]; return NULL; } -virDomainObjPtr virDomainFindByUUID(const virDomainObjPtr doms, +virDomainObjPtr virDomainFindByUUID(const virDomainObjListPtr doms, const unsigned char *uuid) { -virDomainObjPtr dom = doms; +unsigned int i; -while (dom) { -if (!memcmp(dom-def-uuid, uuid, VIR_UUID_BUFLEN)) -return dom; -dom = dom-next; -} +for (i = 0 ; i doms-count ; i++) +if (!memcmp(doms-objs[i]-def-uuid, uuid, VIR_UUID_BUFLEN)) +return doms-objs[i]; return NULL; } -virDomainObjPtr virDomainFindByName(const virDomainObjPtr doms, +virDomainObjPtr virDomainFindByName(const virDomainObjListPtr doms, const char *name) { -virDomainObjPtr dom = doms; +unsigned int i; -while (dom) { -if (STREQ(dom-def-name, name)) -return dom; -dom = dom-next; -} +for (i = 0 ; i doms-count ; i++) +if (STREQ(doms-objs[i]-def-name, name)) +return doms-objs[i]; return NULL; } @@ -425,13 +421,24 @@ VIR_FREE(dom); } +void virDomainObjListFree(virDomainObjListPtr vms) +{ +unsigned int i; + +for (i = 0 ; i vms-count ; i++) +virDomainObjFree(vms-objs[i]); + +VIR_FREE(vms-objs); +vms-count = 0; +} + virDomainObjPtr virDomainAssignDef(virConnectPtr conn, - virDomainObjPtr *doms, + virDomainObjListPtr doms, const virDomainDefPtr def) { virDomainObjPtr domain; -if ((domain = virDomainFindByName(*doms, def-name))) { +if ((domain = virDomainFindByName(doms, def-name))) { if (!virDomainIsActive(domain)) { virDomainDefFree(domain-def); domain-def = def; @@ -451,33 +458,41 @@ domain-state = VIR_DOMAIN_SHUTOFF; domain-def = def; -domain-next = *doms; -*doms = domain; +if (VIR_REALLOC_N(doms-objs, doms-count + 1) 0) { +virDomainReportError(conn, VIR_ERR_NO_MEMORY, NULL); +VIR_FREE(domain); +return NULL; +} + +doms-objs[doms-count] = domain; +doms-count++; return domain; } -void virDomainRemoveInactive(virDomainObjPtr *doms, +void virDomainRemoveInactive(virDomainObjListPtr doms, virDomainObjPtr dom) { -virDomainObjPtr prev = NULL; -virDomainObjPtr curr = *doms; +unsigned int i; -while (curr - curr != dom) { -prev = curr; -curr = curr-next; +for (i = 0 ; i doms-count ; i++) { +if (doms-objs[i] == dom) { +virDomainObjFree(doms-objs[i]); + +if (i (doms-count - 1)) +memmove(doms-objs + i, doms-objs + i + 1, +sizeof(*(doms-objs)) * (doms-count - (i + 1))); + +if (VIR_REALLOC_N(doms-objs, doms-count - 1) 0) { +; /* Failure to reduce memory allocation isn't fatal */ +} +doms-count--; + +break; +} } -if (curr) { -if (prev) -prev-next = curr-next; -else -*doms = curr-next; -} - -virDomainObjFree(dom); } #ifndef PROXY @@ -3266,7 +3281,7 @@ virDomainObjPtr virDomainLoadConfig(virConnectPtr conn, virCapsPtr caps, -virDomainObjPtr *doms, +virDomainObjListPtr doms, const char
Re: [libvirt] PATCH: 3 of 5: Remove virNetworkObjPtr linked list
This patch removes the linked list in the virNetworkObjPtr object, and adds a new virNetworkObjList struct to track domains as an array. The virtual network test drivers get updated to account for this API change. network_conf.c | 88 +++- network_conf.h | 20 + network_driver.c | 120 --- test.c | 75 -- 4 files changed, 145 insertions(+), 158 deletions(-) Daniel diff -r 25b5be9d400e src/network_conf.c --- a/src/network_conf.cFri Oct 03 12:58:03 2008 +0100 +++ b/src/network_conf.cFri Oct 03 12:58:27 2008 +0100 @@ -69,28 +69,26 @@ } -virNetworkObjPtr virNetworkFindByUUID(const virNetworkObjPtr nets, +virNetworkObjPtr virNetworkFindByUUID(const virNetworkObjListPtr nets, const unsigned char *uuid) { -virNetworkObjPtr net = nets; -while (net) { -if (!memcmp(net-def-uuid, uuid, VIR_UUID_BUFLEN)) -return net; -net = net-next; -} +unsigned int i; + +for (i = 0 ; i nets-count ; i++) +if (!memcmp(nets-objs[i]-def-uuid, uuid, VIR_UUID_BUFLEN)) +return nets-objs[i]; return NULL; } -virNetworkObjPtr virNetworkFindByName(const virNetworkObjPtr nets, +virNetworkObjPtr virNetworkFindByName(const virNetworkObjListPtr nets, const char *name) { -virNetworkObjPtr net = nets; -while (net) { -if (STREQ(net-def-name, name)) -return net; -net = net-next; -} +unsigned int i; + +for (i = 0 ; i nets-count ; i++) +if (STREQ(nets-objs[i]-def-name, name)) +return nets-objs[i]; return NULL; } @@ -141,13 +139,24 @@ VIR_FREE(net); } +void virNetworkObjListFree(virNetworkObjListPtr nets) +{ +unsigned int i; + +for (i = 0 ; i nets-count ; i++) +virNetworkObjFree(nets-objs[i]); + +VIR_FREE(nets-objs); +nets-count = 0; +} + virNetworkObjPtr virNetworkAssignDef(virConnectPtr conn, - virNetworkObjPtr *nets, + virNetworkObjListPtr nets, const virNetworkDefPtr def) { virNetworkObjPtr network; -if ((network = virNetworkFindByName(*nets, def-name))) { +if ((network = virNetworkFindByName(nets, def-name))) { if (!virNetworkIsActive(network)) { virNetworkDefFree(network-def); network-def = def; @@ -166,34 +175,41 @@ } network-def = def; -network-next = *nets; -*nets = network; +if (VIR_REALLOC_N(nets-objs, nets-count + 1) 0) { +virNetworkReportError(conn, VIR_ERR_NO_MEMORY, NULL); +VIR_FREE(network); +return NULL; +} + +nets-objs[nets-count] = network; +nets-count++; return network; } -void virNetworkRemoveInactive(virNetworkObjPtr *nets, +void virNetworkRemoveInactive(virNetworkObjListPtr nets, const virNetworkObjPtr net) { -virNetworkObjPtr prev = NULL; -virNetworkObjPtr curr = *nets; +unsigned int i; -while (curr - curr != net) { -prev = curr; -curr = curr-next; +for (i = 0 ; i nets-count ; i++) { +if (nets-objs[i] == net) { +virNetworkObjFree(nets-objs[i]); + +if (i (nets-count - 1)) +memmove(nets-objs + i, nets-objs + i + 1, +sizeof(*(nets-objs)) * (nets-count - (i + 1))); + +if (VIR_REALLOC_N(nets-objs, nets-count - 1) 0) { +; /* Failure to reduce memory allocation isn't fatal */ +} +nets-count--; + +break; +} } - -if (curr) { -if (prev) -prev-next = curr-next; -else -*nets = curr-next; -} - -virNetworkObjFree(net); } @@ -699,7 +715,7 @@ } virNetworkObjPtr virNetworkLoadConfig(virConnectPtr conn, - virNetworkObjPtr *nets, + virNetworkObjListPtr nets, const char *configDir, const char *autostartDir, const char *file) @@ -753,7 +769,7 @@ } int virNetworkLoadAllConfigs(virConnectPtr conn, - virNetworkObjPtr *nets, + virNetworkObjListPtr nets, const char *configDir, const char *autostartDir) { diff -r 25b5be9d400e src/network_conf.h --- a/src/network_conf.hFri Oct 03 12:58:03 2008 +0100 +++ b/src/network_conf.hFri Oct 03 12:58:27 2008 +0100 @@ -88,8 +88,13 @@ virNetworkDefPtr def; /* The current definition */ virNetworkDefPtr newDef; /* New definition
Re: [libvirt] PATCH: 4 of 5: Remove virStoragePoolObjPtr linked list
This patch removes the linked list in the virStoragePoolObjPtr and and virStorageVolDefPtr objects, and adds new virStoragePoolObjList and virStorageVolDefList structs to track pools vols as arrays. The storage driver driver gets updated to account for this API change. storage_backend_disk.c| 10 + storage_backend_fs.c | 75 + storage_backend_iscsi.c |9 + storage_backend_logical.c | 11 +- storage_conf.c| 165 -- storage_conf.h| 37 -- storage_driver.c | 253 ++ 7 files changed, 290 insertions(+), 270 deletions(-) Daniel diff -r 03f1715be403 src/storage_backend_disk.c --- a/src/storage_backend_disk.cFri Oct 03 12:58:28 2008 +0100 +++ b/src/storage_backend_disk.cFri Oct 03 13:07:54 2008 +0100 @@ -185,9 +185,13 @@ return -1; } -vol-next = pool-volumes; -pool-volumes = vol; -pool-nvolumes++; +if (VIR_REALLOC_N(pool-volumes.objs, + pool-volumes.count+1) 0) { +virStorageReportError(conn, VIR_ERR_NO_MEMORY, _(volume)); +virStorageVolDefFree(vol); +return -1; +} +pool-volumes.objs[pool-volumes.count++] = vol; /* Prepended path will be same for all partitions, so we can * strip the path to form a reasonable pool-unique name diff -r 03f1715be403 src/storage_backend_fs.c --- a/src/storage_backend_fs.c Fri Oct 03 12:58:28 2008 +0100 +++ b/src/storage_backend_fs.c Fri Oct 03 13:07:54 2008 +0100 @@ -822,6 +822,7 @@ DIR *dir; struct dirent *ent; struct statvfs sb; +virStorageVolDefPtr vol = NULL; if (!(dir = opendir(pool-def-target.path))) { virStorageReportError(conn, VIR_ERR_INTERNAL_ERROR, @@ -831,61 +832,42 @@ } while ((ent = readdir(dir)) != NULL) { -virStorageVolDefPtr vol; int ret; -if (VIR_ALLOC(vol) 0) { -virStorageReportError(conn, VIR_ERR_NO_MEMORY, - %s, _(volume)); -goto cleanup; -} +if (VIR_ALLOC(vol) 0) +goto no_memory; -vol-name = strdup(ent-d_name); -if (vol-name == NULL) { -VIR_FREE(vol); -virStorageReportError(conn, VIR_ERR_NO_MEMORY, - %s, _(volume name)); -goto cleanup; -} +if ((vol-name = strdup(ent-d_name)) == NULL) +goto no_memory; vol-target.format = VIR_STORAGE_VOL_RAW; /* Real value is filled in during probe */ if (VIR_ALLOC_N(vol-target.path, strlen(pool-def-target.path) + -1 + strlen(vol-name) + 1) 0) { -VIR_FREE(vol-target.path); -VIR_FREE(vol); -virStorageReportError(conn, VIR_ERR_NO_MEMORY, - %s, _(volume name)); -goto cleanup; -} +1 + strlen(vol-name) + 1) 0) +goto no_memory; + strcpy(vol-target.path, pool-def-target.path); strcat(vol-target.path, /); strcat(vol-target.path, vol-name); -if ((vol-key = strdup(vol-target.path)) == NULL) { -VIR_FREE(vol-name); -VIR_FREE(vol-target.path); -VIR_FREE(vol); -virStorageReportError(conn, VIR_ERR_NO_MEMORY, - %s, _(volume key)); -goto cleanup; +if ((vol-key = strdup(vol-target.path)) == NULL) +goto no_memory; + +if ((ret = virStorageBackendProbeFile(conn, vol) 0)) { +if (ret == -1) +goto no_memory; +else { +/* Silently ignore non-regular files, + * eg '.' '..', 'lost+found' */ +virStorageVolDefFree(vol); +vol = NULL; +continue; +} } -if ((ret = virStorageBackendProbeFile(conn, vol) 0)) { -VIR_FREE(vol-key); -VIR_FREE(vol-name); -VIR_FREE(vol-target.path); -VIR_FREE(vol); -if (ret == -1) -goto cleanup; -else -/* Silently ignore non-regular files, - * eg '.' '..', 'lost+found' */ -continue; -} - -vol-next = pool-volumes; -pool-volumes = vol; -pool-nvolumes++; -continue; +if (VIR_REALLOC_N(pool-volumes.objs, + pool-volumes.count+1) 0) +goto no_memory; +pool-volumes.objs[pool-volumes.count++] = vol; +vol = NULL; } closedir(dir); @@ -904,8 +886,13 @@ return 0; +no_memory: +virStorageReportError(conn, VIR_ERR_NO_MEMORY, NULL); +/* fallthrough */ + cleanup: closedir(dir); +virStorageVolDefFree(vol);
Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...
P.S. Now I am working on that: P.S. Are someone going to implement interface type='bridge' ... source bridge=... ... /interface part of openvz driver? :) I plan to implement it in a month. It will be fine if you are ready to develop the feature. I'm using $veid.conf to store information about bridge device in the following manner: for interface ifname, therhe will be a line #BRIGDE(ifname): bridge name for example, #BRIDGE(veth101.0): virbr1 Do you agree with that behaviour? I think we can simplify format to simplify parsing. #BRIDGE=virbr1:veth101.0,veth101.1,veth101.2 In this case we may use openvzReadConfigParam(101, #BRIDGE, value, 1024) -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] libvirt-qpid
On Thu, Oct 02, 2008 at 05:33:54PM +0200, Daniel Veillard wrote: On Thu, Oct 02, 2008 at 08:16:02AM -0700, Ian Main wrote: On Thu, 2 Oct 2008 16:50:08 +0200 Daniel Veillard [EMAIL PROTECTED] wrote: On Thu, Oct 02, 2008 at 03:40:06PM +0100, Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:06:01PM +0200, Daniel Veillard wrote: You also need 'yum install qpidd' I suspect this indicates a missing dependancy maybe in the libvirt-qpid package but I'm not 100% sure Yeah, i believe that should be a requirement. NB, the versions of qpidc and qpidd in Fedora currently are too old for libvirt-qpid. I've been speaking with qpid maintainer and they'll push a new build to Fedora in the very near future. Ah, that probably explains why that didn't work for me, I got my qpidd from Fedora ... Another question, upon rebout the QPid service starts automatically, I wonder if there is something to do to be able to connect assuming a default install and not starting without auth . Ah, I know what happened.. I changed the dependencies since my first email. qpidd isn't needed for libvirt-qpid to operate. It's just an agent and can talk to a qpidd on a different host. okay, I was guessing something like that which is why i was so cautious in my wording ;-) Okay I now have things working ... on one machine. Which is interesting but not really the point of the exercise :-) I have 3 machines. qpidd runs on machine A, qpid-tool on that machine allows to access the local node and local domains. libvirt-qpid started on machine B and C in the same subnet without error, but well they don't find the qpidd on machine A since I don't see them there. qpid-tool there tries to get to localhost and fail. If then I install and start qpidd on machine B qpid-tool can connect to it ... but the already started libvirt-qpid don't seems to be able to find it, unless I restart it which seems to indicate a failure to connect after startup. And I'm still only able to list local domains/nodes never remote ones. Ideas ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] openvz: swap source bridge=... with target dev=...
2008/10/3 Daniel P. Berrange [EMAIL PROTECTED] On Fri, Oct 03, 2008 at 04:04:06PM +0400, Evgeniy Sokolov wrote: P.S. Now I am working on that: P.S. Are someone going to implement interface type='bridge' ... source bridge=... ... /interface part of openvz driver? :) I plan to implement it in a month. It will be fine if you are ready to develop the feature. I'm using $veid.conf to store information about bridge device in the following manner: for interface ifname, therhe will be a line #BRIGDE(ifname): bridge name for example, #BRIDGE(veth101.0): virbr1 Do you agree with that behaviour? I think we can simplify format to simplify parsing. #BRIDGE=virbr1:veth101.0,veth101.1,veth101.2 I do not undesrtand how it will simplify parsing: the iterator in parsing is an interface name, not bridge name. I attached a patch, so you will see how I do think about it :) (this patch includes all discussed changes) There's no need to store veth101.0, veth101.2 - they are automatically generated not intended to be stable across restarts. It is merely No, there is. They are generated only once --- when VE container is created. After that they are stored in the config file. neccessary to persist the name of the bridge (or virtual network) to which they are attached, and the MAC address. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/:| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org:| |: http://autobuild.org -o- http://search.cpan.org/~danberr/http://search.cpan.org/%7Edanberr/:| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| libvirt_openvz_ifaces_and_bridges.patch Description: Binary data -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] libvirt-qpid
On Fri, Oct 03, 2008 at 03:29:17PM +0200, Daniel Veillard wrote: On Thu, Oct 02, 2008 at 05:33:54PM +0200, Daniel Veillard wrote: On Thu, Oct 02, 2008 at 08:16:02AM -0700, Ian Main wrote: On Thu, 2 Oct 2008 16:50:08 +0200 Daniel Veillard [EMAIL PROTECTED] wrote: On Thu, Oct 02, 2008 at 03:40:06PM +0100, Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:06:01PM +0200, Daniel Veillard wrote: You also need 'yum install qpidd' I suspect this indicates a missing dependancy maybe in the libvirt-qpid package but I'm not 100% sure Yeah, i believe that should be a requirement. NB, the versions of qpidc and qpidd in Fedora currently are too old for libvirt-qpid. I've been speaking with qpid maintainer and they'll push a new build to Fedora in the very near future. Ah, that probably explains why that didn't work for me, I got my qpidd from Fedora ... Another question, upon rebout the QPid service starts automatically, I wonder if there is something to do to be able to connect assuming a default install and not starting without auth . Ah, I know what happened.. I changed the dependencies since my first email. qpidd isn't needed for libvirt-qpid to operate. It's just an agent and can talk to a qpidd on a different host. okay, I was guessing something like that which is why i was so cautious in my wording ;-) Okay I now have things working ... on one machine. Which is interesting but not really the point of the exercise :-) I have 3 machines. qpidd runs on machine A, qpid-tool on that machine allows to access the local node and local domains. libvirt-qpid started on machine B and C in the same subnet without error, but well they don't find the qpidd on machine A since I don't see them there. qpid-tool there tries to get to localhost and fail. If then I install and start qpidd on machine B qpid-tool can connect to it ... but the already started libvirt-qpid don't seems to be able to find it, unless I restart it which seems to indicate a failure to connect after startup. And I'm still only able to list local domains/nodes never remote ones. Ideas ? I got this working across 3 machines as follows - Machine A provides a Qpid broker, run as root with qpidd --auth no - Machine B and C are libvirt hosts, each run a libvirtd, and libvirt-qpid libvirt-qpid --broker machineA.example.com On machine A, if you run 'qpid-tool' you should now see node, domain objects from both machines B C. NB, it take 5-10 seconds from starting libvirt-qpid before they appear in the broker Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Tue, 2008-09-30 at 19:31 +0200, Daniel Veillard wrote: Good to see that you seems to have the python bindings ready too ! Many python bindings are not really ready (in this patch). But I should have them in the next patch (Monday/Tuesday next week). I have complete java bindings for the current API, and will submit them once we agree upon it. I wonder how many of them should be future-proofed by adding them a final flags argument too ... For example it may be useful to only lookup devices which are local to the machine, or the opposite only remote devices. We don't have to specify now flags values, but having the APIs ready is sufficient. The virNodeNum/virNodeListDevices devices can probably all share the same flags definitions when needed. I agree. I'll add a flags arg to virNodeNum*/virNodeList*. The LookupByName/LookupByKey may be able to use the same set. I wonder a bit about the key argument though, I assume it's something actually defined by the lower APIs (hal/devkit). As long as the lookup keys are unique, I don't see why we'd want a flags arg for these functions. Currently the key is implementation- (HAL- or Devkit-) dependent, but I have questions about that (and for that matter, the name arg -- if it's unique, why have a separate key??). Dan B brings up some similar points, so I'll address this in my response to his post rather than here. For virNodeDeviceCreate maybe a flags may be needed too, though the flexibility of the API is provided by the XML. I think the XML should provide sufficient flexibility here. But let me know if you really want me to add a flag arg. +int virNodeDeviceDestroy(virNodeDevicePtr dev); + +int virNodeDeviceFree (virNodeDevicePtr dev); Maybe Destroy need flags too, for example to force (or not) destruction of devices which may be in use. Ok, I'll add a flags arg to Destroy as well. FWIW, I'm not wild about the names virNodeDeviceCreate/Destroy. Don't Attach and Detach make more sense? If there are no objections, I'll change those in my next iteration (though I'll still leave them unimplemented). @@ -332,6 +340,12 @@ skip_function = ( 'virCopyLastError', # Python API is called virGetLastError instead 'virConnectOpenAuth', # Python C code is manually written 'virDefaultErrorFunc', # Python virErrorFuncHandler impl calls this from C +'virNodeDeviceGetKey', +'virNodeDeviceGetName', +'virNodeDeviceGetParentKey', +'virNodeDeviceGetBus', +'virNodeDeviceNumOfCaps', +'virNodeDeviceListCaps', ) Strange how are the accessors supposed to work from python then ? They don't. They're just here so you don't get errors building the Python bindings. I'll implement real bindings for these soon. Hum, the libvirt_lxc really need to link to the device discovery libs ? I was making host device enumeration work for ALL hypervisors (though of course the remote driver has a remote impl), so I think this is necessary. But that said, I'm still digesting Dan B's point about licensing issues, and I really have no clue about how lxc and openvz work with libvirt (do they have separate control daemons, like Xen, or are they more like QEMU, or ???). virLibConnError(virConnectPtr conn, virErrorNumber error, const char *info) virLibConnWarning(virConnectPtr conn, virErrorNumber error, const char *info) you really need to export them ? Oh, uhhh, apparently not :-) (I did at one point, but must have removed that code.) I'll un-export them ... Hum ... we need the comment on the accessors, so they get extracted as part of the API when doing 'make rebuild' in docs/ added to the resulting XML, which then will provide the python accessors automatically I think. Will do. Thanks for the response. I'll implement the things discussed here (plus in Dan B's comments - another response coming) and submit a new patch on Monday or early Tuesday next week. Dave -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
On Wed, Oct 01, 2008 at 01:19:04PM -0700, Dan Smith wrote: This patch adds code to the controller to set up a cgroup named after the domain name, set the memory limit, and restrict devices. It also adds bits to lxc_driver to properly clean up the cgroup on domain death. If virCgroupHaveSupport() says that no support is available, then we just allow the domain creation to proceed as it did before without resource controls in place. Okay in addition to Dan feedback on the error handling, +/** + * lxcSetContainerResources + * @def: pointer to virtual machine structure + * + * Creates a cgroup for the container, moves the task inside, + * and sets resource limits + * + * Returns 0 on success or -1 in case of error + */ +static int lxcSetContainerResources(virDomainDefPtr def) +{ +virCgroupPtr cgroup; +int rc = -1; +int i; +struct cgroup_device_policy devices[] = { +{'c', 1, 3}, +{'c', 1, 5}, +{'c', 1, 7}, +{'c', 1, 8}, +{'c', 1, 9}, +{'c', 5, 1}, +{0, 0, 0}}; creating a domain is always the trickiest part of domain lifecycle and setting up the framework is often something a bit obfuscated, nearly by definition. But if we could associate meaningful constants to the device policies here, this would help people having to improve or debug that code in the future ;-) otherwise looks fine to me, +1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
DV creating a domain is always the trickiest part of domain lifecycle DV and setting up the framework is often something a bit obfuscated, DV nearly by definition. But if we could associate meaningful DV constants to the device policies here, this would help people DV having to improve or debug that code in the future ;-) Okay, I suppose that if the QEMU driver ends up doing something similar, the constants would be reused. Should they go in cgroup.h since they're expected to be used with that interface? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] pgpQlL0rquJWP.pgp Description: PGP signature -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix possible segfault if starting qemu vm with an inactive virtual network
On Thu, Oct 02, 2008 at 05:16:43PM -0400, Cole Robinson wrote: The attached patch fixes a possible segfault if trying to start a domain which tries to use an inactive virtual network. We should explicitly catch that error and report it, but that's a separate patch. ACK Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] PATCH: Support SDL configuration for QEMU driver
On Thu, Oct 02, 2008 at 06:09:21PM +0100, Daniel P. Berrange wrote: QEMU has two modes of providing a graphical display, VNC and SDL. Now most of our tools just use VNC, but occasionally people want to use SDL for some crazy reason. We already support this in Xen driver, but the QEMU impl has been rather lacking. At the moment if you ask for a SDL display it'll only happen to work if you had the $DISPLAY environment variable set when you started libvirtd - you probably don't. Hum, it would be really nicer if everything was driven only by command line options, but i can understand why the QEmu guys made it that way, at least historically The generic XML parser allows for two attributes on the graohics element for setting the display and xauth filename, so this patch updates the QEMU driver to use this data if available. This means we have to start setting environment variables when invoking QEMU, so this patch is a little larger than would otherwise be expected. Okay i think I understand the patch, i didn't find anything suspicious in the C code diff (except missing -p ?) The tests diff looks simple and systematic, and for the test program this looks fine too. +1 Now previously since we just use 'execv' the QEMU process would just inherit all libvirtd's environment variables. When we now use execve() no variables are inherited - we have to explicitly set all the ones we need. I'm not sure what we should consider the mimimum required? I'm merely setting 'LC_ALL=C' to ensure it runs in C locale. Do we need to set $PATH for QEMU - maybe ? Anything else which is good practice to set ? if there is an LD_PRELOAD, pass it maybe, that can be really useful at times especially when debugging. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Unify *ReportError logic
On Thu, Oct 02, 2008 at 04:12:10PM -0400, Cole Robinson wrote: Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:41:21PM -0400, Cole Robinson wrote: Currently, most src/* files have their own ReportError function. Some support printf style arguments, others only allow reporting a single string message. The code for all of them does virtually the same thing, possibly passing a different constant off to another function. The attached patch adds a function to virterror.c which encapsulates the common ReportError logic. I used this to replace qemudReportError with a macro, which also allows passing off filename and line number info if we wanted to do something with it later. I did just the one function conversion to see what people think: if I'm missing something, or ideas for anything else to add. Seems to work as expected in my testing. Basically a good idea - we've discussed doing exactly this in the past. You can do further though, and kill off the 'dom' and 'net' parameters here too. Those are deprecated and should always be left NULL these days. Sounds good. Actually, lets go one step further than this. Just change the signature of virRaiseError itself, rather than creating a virRaiseErrorHelper. No caller in libvirt uses all these parameter it has, so might as well just remove them and have it set those fields to NULL/0 as directly. Likewise passing the filename/linenr is not much use - if a particular error message wants to include file/line number then they can be includes as args to the printf fmt string. Hmm, I kind of like the idea of having this info by default, rather than offloading it to the error message. We could for example append filename:lineno to all error messages if debugging is enabled, or stick the data as extra string or int info to RaiseError. I know there have been times when, using virt-manager/ virtinst, libvirt prints some generic lookup error msg, yet nothing seems to fail. It would be nice to get a definitive line number of the culprit just by using LIBVIRT_DEBUG. That's an interesting idea - I can't remember offhand though whether the __FILE__ expands to the full file path, or just the final filename without directory separator. I'd only want this compiled in by default if it is the latter - we don't want another 500 K of long strings from full directory paths in every build. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Centralize use of DEBUG macros
On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote: Currently the DEBUG and DEBUG0 macros are duplicated in every file that uses them. This patch moves the macros to internal.h, removes the needless duplication, and now every file gets them for free. Seems to work as expected in my testing. Yes, if someone can confirm one question diff --git a/src/internal.h b/src/internal.h index d96504d..a3d48fa 100644 --- a/src/internal.h +++ b/src/internal.h @@ -85,6 +85,9 @@ extern int debugFlag; do { } while (0) #endif /* !ENABLE_DEBUG */ +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) + Will __FILE__ expand to the name of the file where the DEBUG macro is defined - ie internal.h, or will it expand to the name of the file where DEBUG() is called. Obviously the latter is what we need. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
On Fri, Oct 03, 2008 at 07:45:24AM -0700, Dan Smith wrote: DV creating a domain is always the trickiest part of domain lifecycle DV and setting up the framework is often something a bit obfuscated, DV nearly by definition. But if we could associate meaningful DV constants to the device policies here, this would help people DV having to improve or debug that code in the future ;-) Okay, I suppose that if the QEMU driver ends up doing something similar, the constants would be reused. Should they go in cgroup.h since they're expected to be used with that interface? Fine by me, but I'm more looking to being able to map semantic to the value than sharing the values at this point, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Centralize use of DEBUG macros
Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote: Currently the DEBUG and DEBUG0 macros are duplicated in every file that uses them. This patch moves the macros to internal.h, removes the needless duplication, and now every file gets them for free. Seems to work as expected in my testing. Yes, if someone can confirm one question diff --git a/src/internal.h b/src/internal.h index d96504d..a3d48fa 100644 --- a/src/internal.h +++ b/src/internal.h @@ -85,6 +85,9 @@ extern int debugFlag; do { } while (0) #endif /* !ENABLE_DEBUG */ +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) + Will __FILE__ expand to the name of the file where the DEBUG macro is defined - ie internal.h, or will it expand to the name of the file where DEBUG() is called. Obviously the latter is what we need. Daniel I was worried about that as well, but testing the patch I posted confirmed that __FILE__ expands to where the macro is called from, so no change in existing behavior. Thanks, Cole -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Tue, 2008-09-30 at 19:02 +0100, Daniel P. Berrange wrote: On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote: - Split stateful drivers out of libvirt.so, so they're only used by the daemon, and not apps linking to libvirt.so. This solves the licensing problem that HAL/DeviceKit introduce libvirt.so needs to be LGPL to allow closed-source apps to link to it. HAL/DeviceKit are both GPL or AFL licensed, by virtue of using DBus. Since LGPL is not AFL compatible, if we link to HAL/DeviceKit/DBus we do so under the GPL, and thus would prevent closed source apps using libvirt.so We don't want this, so we ned to make sure only the libvirtd daemon links to the GPL bits. Ok. I see. Will do. * figure out how devkit and HAL correlate, so we report device info consistently This is looking like it'll be much harder than I had anticipated. Yeah, I struggled for a while trying to reconcile them. But devkit's not even close to complete yet, so I gave up. I'm assuming that, since there are a lot of HAL - DevKit conversions to be done, some kind of (probably domain-specific) method(s) of correlating the two will eventually be described. So I'm thinking perhaps we need to simplify our modelling a little so its not so closely replicating HAL, getting rid of the distinct elements for 'class' and 'bus' and having a device simply have a 'subsystem'. And instead of having a complete tree hierarchy, have a specialized hierarchy. eg if we can identify a 'usb' or 'pci' device parent for a device, then include its name, but don't claim to provide a full hierarchy. Yes! I find the distinction between bus and capability to be pretty arbitrary anyway (being on a bus is a type of capability, IMO). So I'll drop the bus stuff, and combine bus capabilities with the other capabilities. The other interesting question, is what policy we should define for compatability. Do we absolutely need to have compatible keys names for devices, whether using HAL vs DeviceKit, or can be just say that the format of a name is opaque and liable to change ? This has upgrade implications, for example, if we ship libvirt with a HAL backend in Fedora 10, and then switched it to the DeviceKit backend in Fedora 11, do we need to ensure consistent naming across the upgrade path. I don't know... Right. I was hoping to find a consistent set of keys/names to use, but gave up. (sysfs-paths seemed the most common denominator, but not all HAL devices have sysfs-paths, and I think the HAL names look more appropriate anyway.) I'm hoping that devkit will eventually add a property that either specifies or is convertible to a HAL name. Also, if names are really unique (as specified), then why have separate keys? I'd prefer dropping the Key functions and using Name as the key (as we do for storage pools, etc.). +struct _stringv { +int len; +char **const base; +const int maxlen; +}; + +typedef struct _stringv stringv; Perhaps I'm mis-undersatnding what this does, but isn't this similar to the virStringList class in internal.h ? Well, it's a vector instead of a singly-linked list. I use it when I know the max length ahead of time (while the list is used in pool source discovery code that knows of no limit on the number of sources it will find). I could use a list for the former case, but that would require an extra copy of each name (plus unnecessary alloc/dealloc code for the list). +static char *nodeDeviceDumpXML(virNodeDevicePtr dev, + unsigned int flags ATTRIBUTE_UNUSED) +{ +xmlBufferPtr buf = xmlBufferCreate(); ... +xmlBufferCCat(buf, /key\n); This should use the libvirt buffer routines from buf.h which ensure you don't ignore return values indicating OOM like you have here :-) See buf.h, or HACKING file for some examples. Note I call xmlNodeDump(buf, ...). I suppose the right thing to do is to add virBufNodeDump(...) to use libxml to dump into a virBuf ??? +/* TODO: Can we avoid this copy? */ +xml = strdup((const char *)xmlBufferContent(buf)); Yes, you can with the libvirt buf.h APIs :-) But if I have to have to call xmlNodeDump, I can't avoid it ... Thanks for the comments! Dave -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Centralize use of DEBUG macros
On Fri, Oct 03, 2008 at 03:53:25PM +0100, Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:40:17PM -0400, Cole Robinson wrote: Currently the DEBUG and DEBUG0 macros are duplicated in every file that uses them. This patch moves the macros to internal.h, removes the needless duplication, and now every file gets them for free. Seems to work as expected in my testing. Yes, if someone can confirm one question diff --git a/src/internal.h b/src/internal.h index d96504d..a3d48fa 100644 --- a/src/internal.h +++ b/src/internal.h @@ -85,6 +85,9 @@ extern int debugFlag; do { } while (0) #endif /* !ENABLE_DEBUG */ +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) + Will __FILE__ expand to the name of the file where the DEBUG macro is defined - ie internal.h, or will it expand to the name of the file where DEBUG() is called. Obviously the latter is what we need. I do that in libxml2 debugging and I think that works as expected, I tried to check with gcc -E but failed :-\ Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Slim xmllint test output
On Thu, Oct 02, 2008 at 03:40:55PM -0400, Cole Robinson wrote: This patch simply slims down the output from the xmllint tests to not overrun a single line, only printing the relevant information about the xml file (parent directory and name, not fully qualified path). Purely cosmetic, but more useful IMO. makes sense to me. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Fri, Oct 03, 2008 at 11:02:55AM -0400, David Lively wrote: On Tue, 2008-09-30 at 19:02 +0100, Daniel P. Berrange wrote: On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote: - Split stateful drivers out of libvirt.so, so they're only used by the daemon, and not apps linking to libvirt.so. This solves the licensing problem that HAL/DeviceKit introduce libvirt.so needs to be LGPL to allow closed-source apps to link to it. HAL/DeviceKit are both GPL or AFL licensed, by virtue of using DBus. Since LGPL is not AFL compatible, if we link to HAL/DeviceKit/DBus we do so under the GPL, and thus would prevent closed source apps using libvirt.so We don't want this, so we ned to make sure only the libvirtd daemon links to the GPL bits. Ok. I see. Will do. No, no don't do that ! I've already got this work underway, since it impacts much more than just the host device drivers. Just carry on with your existing way, and when its ready to merge it'll be easy to tweak the makefile to have it build in a suitable way. So I'm thinking perhaps we need to simplify our modelling a little so its not so closely replicating HAL, getting rid of the distinct elements for 'class' and 'bus' and having a device simply have a 'subsystem'. And instead of having a complete tree hierarchy, have a specialized hierarchy. eg if we can identify a 'usb' or 'pci' device parent for a device, then include its name, but don't claim to provide a full hierarchy. Yes! I find the distinction between bus and capability to be pretty arbitrary anyway (being on a bus is a type of capability, IMO). So I'll drop the bus stuff, and combine bus capabilities with the other capabilities. The other interesting question, is what policy we should define for compatability. Do we absolutely need to have compatible keys names for devices, whether using HAL vs DeviceKit, or can be just say that the format of a name is opaque and liable to change ? This has upgrade implications, for example, if we ship libvirt with a HAL backend in Fedora 10, and then switched it to the DeviceKit backend in Fedora 11, do we need to ensure consistent naming across the upgrade path. I don't know... Right. I was hoping to find a consistent set of keys/names to use, but gave up. (sysfs-paths seemed the most common denominator, but not all HAL devices have sysfs-paths, and I think the HAL names look more appropriate anyway.) I'm hoping that devkit will eventually add a property that either specifies or is convertible to a HAL name. Also, if names are really unique (as specified), then why have separate keys? I'd prefer dropping the Key functions and using Name as the key (as we do for storage pools, etc.). The ultimate goal with the storage was that 'key' was unique consistent for the same storage on every host the storage was attached to, while 'name' could vary across machines. This distinction doesn't entirely apply to host devices, so perhaps droping key is OK - we could always re-add it later if we found a compelling need. +static char *nodeDeviceDumpXML(virNodeDevicePtr dev, + unsigned int flags ATTRIBUTE_UNUSED) +{ +xmlBufferPtr buf = xmlBufferCreate(); ... +xmlBufferCCat(buf, /key\n); This should use the libvirt buffer routines from buf.h which ensure you don't ignore return values indicating OOM like you have here :-) See buf.h, or HACKING file for some examples. Note I call xmlNodeDump(buf, ...). I suppose the right thing to do is to add virBufNodeDump(...) to use libxml to dump into a virBuf ??? Ok, this actually makes me notice another problem - I'll reply to the original patch with details. In summary though, the internal data structure should not be using an xmlNodePtr as its canonical form, and thus there should not be a need for xmlNodeDump. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Tue, Sep 30, 2008 at 12:17:27AM -0400, David Lively wrote: diff --git a/src/internal.h b/src/internal.h index d96504d..9a94f6d 100644 --- a/src/internal.h +++ b/src/internal.h @@ -292,6 +307,25 @@ struct _virStorageVol { }; +/** + * _virNodeDevice: + * + * Internal structure associated with a node device + */ +struct _virNodeDevice { +unsigned int magic; /* specific value to check */ +int refs; /* reference count */ +virConnectPtr conn; /* pointer back to the connection */ +char *key; /* device key */ +char *name; /* device name */ +char *parent_key; /* parent device */ +char *bus_name; /* (optional) bus name */ +xmlNodePtr bus; /* (optional) bus descriptor */ +char **cap_names; /* capability names (null-term) */ +xmlNodePtr *caps; /* capability descs (null-term) */ +}; This is not the right way to maintain the internal representation of devices. The model we follow is to have 2, sometimes 3 structs to represent an object. I'll summarize in terms of domain objects - virDomainPtr - the public opaque struct representing a domain the internal struct is defined in internal.h and merely contains reference counting, connection pointer, and any unique keys (ID, name, UUID). - virDomainObjPtr - the internal struct representing the state of a domain object. Not all drivers use us - only stateful drivers do. It is used to track state such as guest PID, logfile, running state, and a pointer to the live and inactive configs This is done in domain_conf.h/.c - virDomainDefPtr - the internal struct representing the canonical configuration of a domain. This is a direct mapping of the XML into a series of structs. This is done in domain_conf.h/.c So, in the context for host devices we need a few changes. In the internal.h file, you should only store the key/name attributes. There should then be a separate file handling configuration parsing and formatting, node_device_conf.h/.c. There is probably no need for a state object, so skip virNodeDeviceObjPtr, and just create a virNodeDeviceDefPtr representing the XML format as a series of structs/unions, etc. See virDomainDefPtr for a good example. This should not store any xmlNodePtr instances - it should be all done as explicit struct/enum fields The node_device_conf.c file should at mimium have 2 methods, one for converting an XML document into a virNodeDeviceDefPtr object, and one for the converting a virNodeDeviceObjPtr back into an XML document. Follow the existing API naming / method signatures that are seen in domain_conf.h / network_conf.h for current 'best practice' Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Unify *ReportError logic
On Thu, Oct 02, 2008 at 08:53:12PM +0100, Daniel P. Berrange wrote: On Thu, Oct 02, 2008 at 03:41:21PM -0400, Cole Robinson wrote: Currently, most src/* files have their own ReportError function. Some support printf style arguments, others only allow reporting a single string message. The code for all of them does virtually the same thing, possibly passing a different constant off to another function. The attached patch adds a function to virterror.c which encapsulates the common ReportError logic. I used this to replace qemudReportError with a macro, which also allows passing off filename and line number info if we wanted to do something with it later. I did just the one function conversion to see what people think: if I'm missing something, or ideas for anything else to add. Seems to work as expected in my testing. Basically a good idea - we've discussed doing exactly this in the past. You can do further though, and kill off the 'dom' and 'net' parameters here too. Those are deprecated and should always be left NULL these days. i'm still a bit sad about this though, I'm still left with the feeling that being unable to refcount, result in a loss of useful informations. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] PATCH: Support SDL configuration for QEMU driver
Daniel P. Berrange wrote: QEMU has two modes of providing a graphical display, VNC and SDL. Now most of our tools just use VNC, but occasionally people want to use SDL for some crazy reason. We already support this in Xen driver, but the QEMU impl has been rather lacking. At the moment if you ask for a SDL display it'll only happen to work if you had the $DISPLAY environment variable set when you started libvirtd - you probably don't. I've thought about this issue before, not sure how to fully solve it though. If we set the display in the xml at install time (say setting it to :0.0), but at a later time we ssh =Y into that machine, the display in the xml will be wrong, and possibly pop the sdl window on someone else's display. Possibly a solution at the virsh or virt-manager level? Thanks, Cole -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] Fix possible segfault if starting qemu vm with an inactive virtual network
On Thu, Oct 02, 2008 at 05:16:43PM -0400, Cole Robinson wrote: The attached patch fixes a possible segfault if trying to start a domain which tries to use an inactive virtual network. We should explicitly catch that error and report it, but that's a separate patch. Fix looks fine to me, good catch ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions
This patch adds src/cgroup.{c,h} with support for creating and manipulating cgroups. All groups created with the internal API are forced under $mount/libvirt/ to keep everything together. The first time a group is created, the libvirt directory is also created, and the settings from the root are inherited. The code creates groups in all mounts requires to get memory and devices functionality. When setting a value, the appropriate mount is determined and the value is set there. I have tested this with all controllers mounted in a single location, as well as all of them mounted separately. Changes: - Handle multiple mount points - Add more abstract internal API, per recent discussion - Consider absence of memory or devices controllers as no cgroup support diff -r 444e2614d0a2 -r 5590af0c8c3e src/cgroup.c --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/cgroup.c Fri Oct 03 08:06:52 2008 -0700 @@ -0,0 +1,762 @@ +/* + * cgroup.c: Tools for managing cgroups + * + * Copyright IBM Corp. 2008 + * + * See COPYING.LIB for the License of this software + * + * Authors: + * Dan Smith [EMAIL PROTECTED] + */ +#include config.h + +#include stdio.h +#include stdint.h +#include inttypes.h +#include mntent.h +#include fcntl.h +#include string.h +#include errno.h +#include stdlib.h +#include stdbool.h +#include sys/stat.h +#include sys/types.h +#include libgen.h + +#include internal.h +#include util.h +#include memory.h +#include cgroup.h + +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) + +#define CGROUP_MAX_VAL 512 + +struct virCgroup { +char *path; +}; + +const char *supported_controllers[] = { +memory, +devices, +NULL +}; + +/** + * virCgroupFree: + * + * @group: The group structure to free + */ +void virCgroupFree(virCgroupPtr *group) +{ +if (*group != NULL) { +VIR_FREE((*group)-path); +VIR_FREE(*group); +*group = NULL; +} +} + +static virCgroupPtr virCgroupGetMount(const char *controller) +{ +FILE *mounts; +struct mntent entry; +char buf[CGROUP_MAX_VAL]; +virCgroupPtr root = NULL; + +if (VIR_ALLOC(root) != 0) +return NULL; + +mounts = fopen(/proc/mounts, r); +if (mounts == NULL) { +DEBUG0(Unable to open /proc/mounts: %m); +goto err; +} + +while (getmntent_r(mounts, entry, buf, sizeof(buf)) != NULL) { +if (STREQ(entry.mnt_type, cgroup) +(strstr(entry.mnt_opts, controller))) { +root-path = strdup(entry.mnt_dir); +break; +} +} + +DEBUG(Mount for %s is %s\n, controller, root-path); + +if (root-path == NULL) { +DEBUG0(Did not find cgroup mount); +goto err; +} + +fclose(mounts); + +return root; +err: +virCgroupFree(root); + +return NULL; +} + +/** + * virCgroupHaveSupport: + * + * Returns 0 if support is present, negative if not + */ +int virCgroupHaveSupport(void) +{ +virCgroupPtr root; +int i; + +for (i = 0; supported_controllers[i] != NULL; i++) { +root = virCgroupGetMount(supported_controllers[i]); +if (root == NULL) +return -1; +virCgroupFree(root); +} + +return 0; +} + +static int virCgroupPathOfGroup(const char *group, +const char *controller, +char **path) +{ +virCgroupPtr root = NULL; +int rc = 0; + +root = virCgroupGetMount(controller); +if (root == NULL) { +rc = -ENOTDIR; +goto out; +} + +if (asprintf(path, %s/%s, root-path, group) == -1) +rc = -ENOMEM; +out: +virCgroupFree(root); + +return rc; +} + +static int virCgroupPathOf(const char *grppath, + const char *key, + char **path) +{ +virCgroupPtr root; +int rc = 0; +char *controller = NULL; + +if (strchr(key, '.') == NULL) +return -EINVAL; + +if (sscanf(key, %a[^.], controller) != 1) +return -EINVAL; + +root = virCgroupGetMount(controller); +if (root == NULL) { +rc = -ENOTDIR; +goto out; +} + +if (asprintf(path, %s/%s/%s, root-path, grppath, key) == -1) +rc = -ENOMEM; +out: +virCgroupFree(root); +VIR_FREE(controller); + +return rc; +} + +static int virCgroupSetValueStr(virCgroupPtr group, +const char *key, +const char *value) +{ +int fd = -1; +int rc = 0; +char *keypath = NULL; + +rc = virCgroupPathOf(group-path, key, keypath); +if (rc != 0) +return rc; + +fd = open(keypath, O_WRONLY); +if (fd 0) { +DEBUG(Unable to open %s: %m, keypath); +rc = -ENOENT; +goto out; +} + +DEBUG(Writing '%s' to '%s', value, keypath); + +rc = safewrite(fd, value, strlen(value)); +if (rc 0) { +DEBUG(Failed to write value '%s': %m, value);
[libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
This patch adds code to the controller to set up a cgroup named after the domain name, set the memory limit, and restrict devices. It also adds bits to lxc_driver to properly clean up the cgroup on domain death. If virCgroupHaveSupport() says that no support is available, then we just allow the domain creation to proceed as it did before without resource controls in place. Changes: - Use device number constants from cgroup.h - Be sure to remove the cgroup on the failure path - Use the more abstract API diff -r 5590af0c8c3e -r e827f165c23b src/Makefile.am --- a/src/Makefile.am Fri Oct 03 08:06:52 2008 -0700 +++ b/src/Makefile.am Fri Oct 03 08:38:57 2008 -0700 @@ -90,13 +90,15 @@ lxc_conf.c lxc_conf.h \ lxc_container.c lxc_container.h \ lxc_driver.c lxc_driver.h \ - veth.c veth.h + veth.c veth.h \ + cgroup.c cgroup.h LXC_CONTROLLER_SOURCES = \ lxc_conf.c lxc_conf.h \ lxc_container.c lxc_container.h \ lxc_controller.c\ - veth.c veth.h + veth.c veth.h \ + cgroup.c cgroup.h OPENVZ_DRIVER_SOURCES =\ openvz_conf.c openvz_conf.h \ diff -r 5590af0c8c3e -r e827f165c23b src/lxc_controller.c --- a/src/lxc_controller.c Fri Oct 03 08:06:52 2008 -0700 +++ b/src/lxc_controller.c Fri Oct 03 08:38:57 2008 -0700 @@ -42,12 +42,82 @@ #include veth.h #include memory.h #include util.h - +#include cgroup.h #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) #define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) int debugFlag = 0; + +struct cgroup_device_policy { +char type; +int major; +int minor; +}; + +/** + * lxcSetContainerResources + * @def: pointer to virtual machine structure + * + * Creates a cgroup for the container, moves the task inside, + * and sets resource limits + * + * Returns 0 on success or -1 in case of error + */ +static int lxcSetContainerResources(virDomainDefPtr def) +{ +virCgroupPtr cgroup; +int rc = -1; +int i; +struct cgroup_device_policy devices[] = { +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_NULL}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_ZERO}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_FULL}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_RANDOM}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_URANDOM}, +{'c', VIR_CG_DEV_MAJ_TTY, VIR_CG_DEV_MIN_CONSOLE}, +{0, 0, 0}}; + +if (virCgroupHaveSupport() != 0) +return 0; /* Not supported, so claim success */ + +rc = virCgroupForDomain(def, lxc, cgroup); +if (rc != 0) { +lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _(Unable to create cgroup for %s\n), def-name); +return rc; +} + +rc = virCgroupSetMemory(cgroup, def-maxmem); +if (rc != 0) +goto out; + +rc = virCgroupDenyAllDevices(cgroup); +if (rc != 0) +goto out; + +for (i = 0; devices[i].type != 0; i++) { +struct cgroup_device_policy *dev = devices[i]; +rc = virCgroupAllowDevice(cgroup, + dev-type, + dev-major, + dev-minor); +if (rc != 0) +goto out; +} + +rc = virCgroupAddTask(cgroup, getpid()); +out: +if (rc != 0) { +lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _(Failed to set lxc resources: %s\n), strerror(-rc)); +virCgroupRemove(cgroup); +} + +virCgroupFree(cgroup); + +return rc; +} static char*lxcMonitorPath(virDomainDefPtr def) { @@ -394,6 +464,9 @@ if (lxcControllerMoveInterfaces(nveths, veths, container) 0) goto cleanup; +if (lxcSetContainerResources(def) 0) +goto cleanup; + if (lxcContainerSendContinue(control[0]) 0) goto cleanup; diff -r 5590af0c8c3e -r e827f165c23b src/lxc_driver.c --- a/src/lxc_driver.c Fri Oct 03 08:06:52 2008 -0700 +++ b/src/lxc_driver.c Fri Oct 03 08:38:57 2008 -0700 @@ -43,6 +43,7 @@ #include bridge.h #include veth.h #include event.h +#include cgroup.h /* debug macros */ @@ -376,6 +377,7 @@ int waitRc; int childStatus = -1; virDomainNetDefPtr net; +virCgroupPtr cgroup; while (((waitRc = waitpid(vm-pid, childStatus, 0)) == -1) errno == EINTR) @@ -408,6 +410,11 @@ for (net = vm-def-nets; net; net = net-next) { vethInterfaceUpOrDown(net-ifname, 0); vethDelete(net-ifname); +} + +if (virCgroupForDomain(vm-def, lxc, cgroup) == 0) { +
Re: [libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions
Dan Smith wrote: This patch adds src/cgroup.{c,h} with support for creating and manipulating cgroups. It's quite naive at the moment, but should provide something to work with to move forward with resource controls. All groups created with the internal API are forced under $mount/libvirt/ to keep everything together. The first time a group is created, the libvirt directory is also created, and the settings from the root are inherited. The code scans the mount table to look for the first mount of type cgroup, and assumes that all controllers are mounted there. I think this could/should be updated to prefer a mount with just the controller(s) we want, if there are multiple ones. If you have the cpuset controller enabled, and cpuset.cpus_exclusive is 1, then all cgroups to be created will fail. Since we probably shouldn't blindly set the root to be non-exclusive, we may also want to consider this condition to be no cgroup support. diff -r 444e2614d0a2 -r 8e948eb88328 src/Makefile.am --- a/src/Makefile.am Wed Sep 17 16:07:03 2008 + +++ b/src/Makefile.am Mon Sep 29 09:37:42 2008 -0700 @@ -96,7 +96,8 @@ lxc_conf.c lxc_conf.h \ lxc_container.c lxc_container.h \ lxc_controller.c\ - veth.c veth.h + veth.c veth.h \ + cgroup.c cgroup.h OPENVZ_DRIVER_SOURCES = \ openvz_conf.c openvz_conf.h \ diff -r 444e2614d0a2 -r 8e948eb88328 src/cgroup.c --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/cgroup.cMon Sep 29 09:37:42 2008 -0700 @@ -0,0 +1,526 @@ +/* + * cgroup.c: Tools for managing cgroups + * + * Copyright IBM Corp. 2008 + * + * See COPYING.LIB for the License of this software + * + * Authors: + * Dan Smith [EMAIL PROTECTED] + */ +#include config.h + +#include stdio.h +#include stdint.h +#include inttypes.h +#include mntent.h +#include fcntl.h +#include string.h +#include errno.h +#include stdlib.h +#include stdbool.h +#include sys/stat.h +#include sys/types.h +#include libgen.h + +#include internal.h +#include util.h +#include cgroup.h + +#define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__) +#define DEBUG0(msg) VIR_DEBUG(__FILE__, %s, msg) + +struct virCgroup { +char *path; +}; + There is no support for permissions, is everything run as root? +void virCgroupFree(virCgroupPtr *group) +{ +if (*group != NULL) { +free((*group)-path); +free(*group); +*group = NULL; +} +} + +static virCgroupPtr cgroup_get_mount(void) +{ +FILE *mounts; +struct mntent entry; +char buf[512]; Is 512 arbitrary? How do we know it is going to be sufficient? +virCgroupPtr root = NULL; + +root = calloc(1, sizeof(*root)); +if (root == NULL) +return NULL; + +mounts = fopen(/proc/mounts, r); +if (mounts == NULL) { +DEBUG0(Unable to open /proc/mounts: %m); +goto err; +} + +while (getmntent_r(mounts, entry, buf, sizeof(buf)) != NULL) { +if (STREQ(entry.mnt_type, cgroup)) { +root-path = strdup(entry.mnt_dir); +break; +} +} + +if (root-path == NULL) { +DEBUG0(Did not find cgroup mount); Or strdup failed due to ENOMEM +goto err; +} + +fclose(mounts); + +return root; +err: +virCgroupFree(root); + +return NULL; +} + +int virCgroupHaveSupport(void) +{ +virCgroupPtr root; + +root = cgroup_get_mount(); +if (root == NULL) +return -1; + +virCgroupFree(root); + This is quite a horrible way of wasting computation. +return 0; +} + +static int cgroup_path_of(const char *grppath, + const char *key, + char **path) +{ +virCgroupPtr root; +int rc = 0; + +root = cgroup_get_mount(); So every routine calls cgroup_path_of(), reads the mounts entry and find a entry for cgroup and returns it, why not do it just once and use it. +if (root == NULL) { +rc = -ENOTDIR; +goto out; +} + +if (asprintf(path, %s/%s/%s, root-path, grppath, key) == -1) +rc = -ENOMEM; +out: +virCgroupFree(root); + +return rc; +} + +int virCgroupSetValueStr(virCgroupPtr group, + const char *key, + const char *value) +{ +int fd = -1; +int rc = 0; +char *keypath = NULL; + +rc = cgroup_path_of(group-path, key, keypath); +if (rc != 0) +return rc; + +fd = open(keypath, O_WRONLY); I see a mix of open and fopen calls.I would prefer to stick to just one, helps with readability. +if (fd 0) { +DEBUG(Unable to open %s: %m, keypath); +rc
Re: [libvirt] [PATCH 0 of 2] Add cgroup manipulation and LXC driver support
On Fri, Oct 03, 2008 at 08:40:22AM -0700, Dan Smith wrote: Revision 3 of the cgroups patch, with the round 2 comments addressed. Fine for me +1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
On Fri, Oct 03, 2008 at 08:40:24AM -0700, Dan Smith wrote: This patch adds code to the controller to set up a cgroup named after the domain name, set the memory limit, and restrict devices. It also adds bits to lxc_driver to properly clean up the cgroup on domain death. If virCgroupHaveSupport() says that no support is available, then we just allow the domain creation to proceed as it did before without resource controls in place. +struct cgroup_device_policy devices[] = { +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_NULL}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_ZERO}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_FULL}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_RANDOM}, +{'c', VIR_CG_DEV_MAJ_MEMORY, VIR_CG_DEV_MIN_URANDOM}, +{'c', VIR_CG_DEV_MAJ_TTY, VIR_CG_DEV_MIN_CONSOLE}, +{0, 0, 0}}; You're going to hate me for suggesting more changes, but This list of devices is currently duplicated in two places - once here where we set permissions, and again when we actually create the container and populate its /dev/ in lxc_container.c. Could do with a master list of device nodes used by both. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
DB This list of devices is currently duplicated in two places - once DB here where we set permissions, and again when we actually create DB the container and populate its /dev/ in lxc_container.c. Could do DB with a master list of device nodes used by both. So does that mean they should be somewhere other than cgroup.h? If not, I think it might be appropriate to make that change in a separate patch, since populating /dev isn't really cgroup-related. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] pgpck5elXVqx7.pgp Description: PGP signature -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 2 of 2] Use cgroup functions to set resource limits on LXC domains
On Fri, Oct 03, 2008 at 08:45:00AM -0700, Dan Smith wrote: DB This list of devices is currently duplicated in two places - once DB here where we set permissions, and again when we actually create DB the container and populate its /dev/ in lxc_container.c. Could do DB with a master list of device nodes used by both. So does that mean they should be somewhere other than cgroup.h? I'd think lxc_config.h or lcx_container.h would be a more natural place for the constants, since both the .c files which would use them as lxc_ driver files. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1 of 2] Add internal cgroup manipulation functions
BS There is no support for permissions, is everything run as root? The LXC driver must run as root, yes. BS Is 512 arbitrary? How do we know it is going to be sufficient? This should probably be a constant, but it's more than enough for any of the values we're setting. Defining it specifically will help make that clear. BS So every routine calls cgroup_path_of(), reads the mounts entry BS and find a entry for cgroup and returns it, why not do it just BS once and use it. Because maintaining the (potentially stale) state isn't worth the small amount of work necessary to parse the mount table on each domain creation (IMHO). BS I see a mix of open and fopen calls.I would prefer to stick to BS just one, helps with readability. You mean because I use open() and getmntent_r() wants a FILE*? I think I'm willing to live with that :) BS I don't think 512 bytes will be sufficient. THere are multi-line BS files like memory.stat. We don't read memory.stat (or any other values at the moment). However: % for i in $(find /cgroup | grep memory.stat); do cat $i | wc -c; done 53 53 53 98 I don't see any point in allocating a huge amount of memory on the stack each time if we're not going to use it. If we need to read larger values in the future, we could move to an allocated string. BS Why do you need access? mkdir will do that check anyway. Yep, I'll change it. BS Why 0655? Why not use meaningful names see sys/stat.h I don't think that the meaningful names in sys/stat.h lead to better readability, personally. Octal permissions are used elsewhere int the code pretty extensively, so I'm comfortable with this as it is. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] pgpPFzo8qY9qz.pgp Description: PGP signature -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [discuss] The new cgroup patches for libvirt
Hi, Everyone, I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself. I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides libcgroups (libcg.sf.net) provides 1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy) While re-implementing might sound like a cool thing to do, here are the drawbacks 1. It leads to code duplication and reduces code reuse 2. It leads to confused users I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us 1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt -- Balbir -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [discuss] The new cgroup patches for libvirt
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote: I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us 1. Fix any issues you see or point them to us I did point the general problem of ABI in libcgroup http://www.mail-archive.com/libvir-list@redhat.com/msg08388.html I didn't see any reply to the points I raised specifically. In the meantime we got a relatively simple, sufficient for now, usable right now, patch fullfilling our needs. A working patch is better in my eye than something which may work well in the future if we take the time to integrate it and stabilize and propagate to the systems we use. The package available in Fedora 9 has not improved as far as I can tell. So I'm still keeping the same point of view as posted on that same thread a month ago: http://www.mail-archive.com/libvir-list@redhat.com/msg08472.html Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup, As maintainer I will also note that nicer also imply the ability to work well and smoothly with the other maintainers. I hate guerilla, I would prefer if you had read and replied to what I wrote. So Dan Smith patch should IMHO go in now, if later your API are widely distributed, cleaner than what i have now (0.1c may be old but what is available to us on Fedora, no idea what is available on other distros) and there is a clean patch to switch then we will look at it, right now we can't use libcgroup in my opinion. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ [EMAIL PROTECTED] | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [discuss] The new cgroup patches for libvirt
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote: Hi, Everyone, I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself. I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides libcgroups (libcg.sf.net) provides 1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy) So from a functional point of view you are addressing essentially three use cases 1. System configuration for controllers 2. Automatic task classification 3. Application development API for creating groups If each piece is correctly designed, the choice of implementation for each of these can be, and in some cases must be, totally independant. Since the kernel restricts that a single controller can only be attached to one cgroupsfs mount point, and one attach cannot be changed, the choice of how / where to mount controllers must remain outside the scope of applications. If any application using cgroups were to specify mount points, it would be inflicting its own requirements on every user of cgroups. This implies that applications must be designed to work with whatever controller mount configuration the admin has configured, and not configure stuff themselves. So impl for point 1 (configuration) must, by neccessity, be completely independant of impl for point 3 (application API). Considering automatic task classification. The task classification engine must be able to cope with the fact that applications have some functional requirements on cgroups setup. Taking libvirt as an example, we have a specific need to apply some controllers over a group of processes forming a container. A task classification engine must not re-clasify individual tasks within a container because that would conflict with the semantics required by libvirt. It is, however, free to re-classify the libvirtd daemon itself - whatever cgroup libvirtd is placed in, it will create the LXC cgroups below this point. So if libvirt is designed correctly, it will work with whatever cgroup task classification engine that might be running. Similarly if the task classification engine has been designed to co-operate with applications there is no problem running it alonside libvirt. Thus the implementation of points 2 (task classification) and point 3 (application API) have no need to be formally tied together. Furthermore tieing them together does not magically solve the problem that both applications the cgroups task classification engine need to be intelligently designed to co-operate. While re-implementing might sound like a cool thing to do, here are the drawbacks 1. It leads to code duplication and reduces code reuse This is important if the library code is providing significant value add to the application using it. As it stands, libcgroup is merely a direct interface to the cgroups filesystem providing weakly typed setters getters - with the exception of looking at the mount table to find where a controller lives, this is not hard / complex code, so the benefits of re-use are not particularly high. In such a scenario reducing code duplication is not in itself a benefit, since there are costs associated with using external libraries. It is more complicated integrate 2 independant style sof API, particularly with different views on error reporting, memory management and varying expectations for the semantic models exposed. There are a number of 'hard' questions wrt to cgroups usage by applications, two of which are outlined above. Simply having all applications use a single API cannot magically solve any of these problems - no matter what API is used application developers need to take care to design their usage of cgroups such that it 'plays nicely' with other applications. 2. It leads to confused users The use of cgroups is an internal implementation detail for libvirt's LXC driver. In comon with all libvirt drivers, the user has no need to know about the underlying impl details and these can will change at will as we discover better ways to achieve things. As such its irrelevant to a user how we configure the cgroups filesytem for libvirt - whether we do it directly, or via libcg the end result is identical - a set of directories in the cgroups filesystem. I understand that in the past there has been a perception that libcgroups
Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support
On Wed, Oct 01, 2008 at 08:41:19AM +0530, Balbir Singh wrote: Dan Smith wrote: DB At the same time having the controllers mounted is mandatory for DB libvirt to work and asking the admin to set things up manually DB also sucks. So perhaps we'll need to mount them automatically, but DB make this behaviuour configurable in some way, so admin can DB override it Perhaps we can: - Have a list of controllers we use (memory and devices so far) - Create each group in all mounts required to satisfy our necessary controllers - Select the appropriate mount when setting a cont.key value I am not sure how libvirt provides thread safety, but I did not see any explicit coding for that? The thread safety model for libvirt has two levels - A single virConnectPtr object must only be used by one thread. If you have multiple threads, you must provide each with its own conenct object - Within a stateless driver (Xen, OpenVZ, Test), there is no shared state between virConnectPtr objects, so there are no thread issues in this respect - With a stateful driver, the libvirtd daemon ensures that only a single thread is active at once, so against there are no thread issues there either. Now, in a short while I will be making the daemon fully-multithreaded. When this happens, the stateful drivers will be required to maintain mutexes for locking. The locking model wil have 2 levels, one lock over the driver as a whole. This is held only while acquiring a lock against the object being modified (eg the virtual domain object). Each virtual domain, lives in one cgroup, so there is a single virCGroup object associated with each domain. the virCGroup object state is seflf contained, so independant virCGroup objects can be accessed concurrently from multiple threads, without any threads safety issues. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Fri, Oct 03, 2008 at 01:20:35PM -0400, David Lively wrote: Okay, I see what you mean. I'll create a virNodeDeviceDefPtr and follow the established pattern. I'm really trying to avoid creating a struct/union that must be extended every time we support a new capability. I struggled quite a bit with the right representation for capabilities and bus details.. At one point, I had my own (general-purpose; i.e., not specific to any type of capability) virNodeDeviceCapabilities type, but it looked so much like a DOM tree that just using a xmlNode seemed like the best choice. Are you suggesting creating a struct for each kind of currently-supported capability, or are you suggesting creating a single struct that's general enough to represent all capabilities? I'm suggesting a something that is specific for each capability. Now if we were to support all metadata that HAL exposes this would be a huge job. But I don't believe we should be exposing all the metadata that HAL has, because in the future this XML format has to work with DeviceKit which is basically exposing UDev metadata, and VMWare's APIs which expose hardware info in their own way. Thus, IMHO, we should expose specific capabilities we know we care about, for specific sub-substems, and not try to handle the entire of HAL. A good starting point would be - PCI, domain, bus, slot, function, and vendor/product - USB, bus, device and vendor/product Basically same info required for attaching the device to a domain, thus matching the struct virDomainHostdevDefPtr in domain_conf.h - NIC, name mac address - Block, name and host adapter - Host adapter, name For these also have a way to link to the parent device associated with them (ie the PCI/USB device providing them). That would basically be enough for use of the existing domain/storage and network APIs which is what we need this data for, and this minimal info should be satisifiable with VMWare's APIs, and DeviceKit. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [discuss] The new cgroup patches for libvirt
On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote: I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us 1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt To expand on what I said in my other mail about providing value-add over the representation exposed by the kernel, here's some thoughts on the API exposed. Consider the following high level use case of libvirt - A set of groups, in a 3 level hierarchy APPNAME/DRIVER/DOMAIN - Control the ACL for block/char devices - Control memory limits This translates into an underling implementation, that I need to create 3 levels of cgroups in the filesystem, attach my PIDs at the 3rd level use the memory and device controllers and attach PIDs at the 3rd, and set values for attributes exposed by the controllers. Notice I'm not actually setting any config parms at the 1st 2nd levels, but they do need to still exist to ensure namespace uniqueness amongst different applications using cgroups. The current cgroups API provides APIs that directly map to individual actions wrt the kernel filesystem exposed. So as an application developer I have to explicitly create the 3 levels of hierarchy, tell it I want to use memory device controllers, format config values into the syntax required for each attribute, and remeber the attribute names. // Create the hierachy APPNAME/DRIVER/DOMAIN c1 = cgroup_new_cgroup(libvirt) c2 = cgroup_new_cgroup_parent(c1, lxc) c3 = cgroup_new_cgroup_parent(c2, domain.name) // Setup the controllers I want to use cgroup_add_controler(c3, devices) cgroup_add_controller(c3, memory) // Add my domain's PID to the cgroup cgroup_attach_task(c3, domain.pid) // Set the device ACL limits cgroup_set_value_string(c2, devices.deny, a); char buf[1024]; sprintf(buf, %c %d:%d, 'c', 1, 3); cgroup_set_value_stirng(c2, devices.allow, buf); // Set memory limit cgroup_set_value_uint64(c2, memory.limit_in_bytes, domain.memory * 1024); This really isn't providing any semantically useful abstraction over the direct filesytem manipulation. Just a bunch of wrappers for mkdir(), mount() and read()/write() calls. My application still has to know far too much information about the details of cgroups as exposed by the kernel. I do not care that there is a concept of 'controllers' at all, I just want to set device ACLs and memory limits. I do not care what the attributes in the filesystem are called, again I just want to set device ACLs and memory limits. I do not care what the data format for them must be for device/memory settings. Memory settings could be stored in base-2, base-10 or base-16 I should not have to know this information. With this style of API, the library provide no real value-add or compelling reason to use it. What might a more useful API look like? At least from my point of view, I'd like to be able to say: // Tell it I want $PID placed in APPNAME/DRIVER/DOMAIN char *path[] = { libvirt, lxc, domain.name}; cg = cgroup_new_path(path, domain.pid) // I want to deny all devices cgroup_deny_all_devices(cg); // Allow /dev/null - either by node/major/minor cgroup_allow_device_node(cg, 'c', 1, 3); // Or more conviently just give it a node to copy info from cgroup_allow_device_node(cg, /dev/null) // Set memory in KB cgroup_set_memory_limit_kb(cg, domain.memory) Notice how with such a style of API, I don't need to know anything about the low level implementation details - I'm working entirely in terms of semantically meaningful concepts. Now, comes the hard bit - you have to figure out what semantic concepts you want to expose to applications. The example here would be suitable for libvirt, but not neccessarily for other applications. Picking the right APIs is very much much harder than just exposing the kernel capabilities directly as libcgroup.h does now, but the trade off is that the resulting API would be much more useful and interesting to app developers. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
I definitely wasn't planning on covering all of HAL :-) I assume you weren't planning on exposing these capability-specific representations in the public API. Right? (If not, ignore the rest of this ...) So I guess I don't see the value of having these cap-specific internal representations. I keep a string array of the cap names for ListDevicesByCap, but other than that, the capability data is used only by virNodeDeviceGetXMLDesc(). So it seems natural to represent it in a form that can easily be converted to XML, and that can represent all the XML we'll need to output (like xmlNode). Otherwise, we are forced to write more capability-specific code, with no particular advantage (no stronger typing guarantees if we don't expose specific cap types). Dave P.S. I do think it would be useful to have access to capability details other than GetXMLDesc. Perhaps: const char *virNodeDeviceCapabilityProperty(virNodeDevicePtr dev, const char *cap, const char *prop) but note this exposes them only in a general (property / value) way, and is quite easily implemented with a xmlNode representation. On Fri, 2008-10-03 at 18:41 +0100, Daniel P. Berrange wrote: On Fri, Oct 03, 2008 at 01:20:35PM -0400, David Lively wrote: Okay, I see what you mean. I'll create a virNodeDeviceDefPtr and follow the established pattern. I'm really trying to avoid creating a struct/union that must be extended every time we support a new capability. I struggled quite a bit with the right representation for capabilities and bus details.. At one point, I had my own (general-purpose; i.e., not specific to any type of capability) virNodeDeviceCapabilities type, but it looked so much like a DOM tree that just using a xmlNode seemed like the best choice. Are you suggesting creating a struct for each kind of currently-supported capability, or are you suggesting creating a single struct that's general enough to represent all capabilities? I'm suggesting a something that is specific for each capability. Now if we were to support all metadata that HAL exposes this would be a huge job. But I don't believe we should be exposing all the metadata that HAL has, because in the future this XML format has to work with DeviceKit which is basically exposing UDev metadata, and VMWare's APIs which expose hardware info in their own way. Thus, IMHO, we should expose specific capabilities we know we care about, for specific sub-substems, and not try to handle the entire of HAL. A good starting point would be - PCI, domain, bus, slot, function, and vendor/product - USB, bus, device and vendor/product Basically same info required for attaching the device to a domain, thus matching the struct virDomainHostdevDefPtr in domain_conf.h - NIC, name mac address - Block, name and host adapter - Host adapter, name For these also have a way to link to the parent device associated with them (ie the PCI/USB device providing them). That would basically be enough for use of the existing domain/storage and network APIs which is what we need this data for, and this minimal info should be satisifiable with VMWare's APIs, and DeviceKit. Regards, Daniel -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt
On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote: On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote: I understand that in the past there has been a perception that libcgroups might not yet be ready, because we did not have ABI stability built into the library and the header file had old comments about things changing. I would urge the group to look at the current implementation of libcgroups (look at v0.32) and help us 1. Fix any issues you see or point them to us 2. Add new API or request for new API that can help us integrate better with libvirt To expand on what I said in my other mail about providing value-add over the representation exposed by the kernel, here's some thoughts on the API exposed. Consider the following high level use case of libvirt - A set of groups, in a 3 level hierarchy APPNAME/DRIVER/DOMAIN - Control the ACL for block/char devices - Control memory limits This translates into an underling implementation, that I need to create 3 levels of cgroups in the filesystem, attach my PIDs at the 3rd level use the memory and device controllers and attach PIDs at the 3rd, and set values for attributes exposed by the controllers. Notice I'm not actually setting any config parms at the 1st 2nd levels, but they do need to still exist to ensure namespace uniqueness amongst different applications using cgroups. The current cgroups API provides APIs that directly map to individual actions wrt the kernel filesystem exposed. So as an application developer I have to explicitly create the 3 levels of hierarchy, tell it I want to use memory device controllers, format config values into the syntax required for each attribute, and remeber the attribute names. // Create the hierachy APPNAME/DRIVER/DOMAIN c1 = cgroup_new_cgroup(libvirt) c2 = cgroup_new_cgroup_parent(c1, lxc) c3 = cgroup_new_cgroup_parent(c2, domain.name) // Setup the controllers I want to use cgroup_add_controler(c3, devices) cgroup_add_controller(c3, memory) // Add my domain's PID to the cgroup cgroup_attach_task(c3, domain.pid) // Set the device ACL limits cgroup_set_value_string(c2, devices.deny, a); char buf[1024]; sprintf(buf, %c %d:%d, 'c', 1, 3); cgroup_set_value_stirng(c2, devices.allow, buf); // Set memory limit cgroup_set_value_uint64(c2, memory.limit_in_bytes, domain.memory * 1024); This really isn't providing any semantically useful abstraction over the direct filesytem manipulation. Just a bunch of wrappers for mkdir(), mount() and read()/write() calls. My application still has to know far too much information about the details of cgroups as exposed by the kernel. True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet? Let me try and answer that question 1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet. When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ. What you've asked for, I consider as a layer on top of the API we have now and should be easy to build. I do not care that there is a concept of 'controllers' at all, I just want to set device ACLs and memory limits. I do not care what the attributes in the filesystem are called, again I just want to set device ACLs and memory limits. I do not care what the data format for them must be for device/memory settings. Memory settings could be stored in base-2, base-10 or base-16 I should not have to know this information. With this style of API, the library provide no real value-add or compelling reason to use it. What might a more useful API look like? At least from my point of view, I'd like to be able to say: // Tell it I want $PID placed in APPNAME/DRIVER/DOMAIN char *path[] = { libvirt, lxc, domain.name}; cg = cgroup_new_path(path, domain.pid) // I want to deny all devices cgroup_deny_all_devices(cg); // Allow /dev/null - either by node/major/minor cgroup_allow_device_node(cg, 'c', 1, 3); // Or more conviently just give it a node to copy info
Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt
On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote: On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote: True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet? Let me try and answer that question 1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet. When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ. The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful. So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] anyone implementing host device enumeration?
On Fri, Oct 03, 2008 at 02:35:06PM -0400, David Lively wrote: I definitely wasn't planning on covering all of HAL :-) I assume you weren't planning on exposing these capability-specific representations in the public API. Right? (If not, ignore the rest of this ...) Not at this time. For the forseeable future I expect the XML format to be the only publically exposed representation of configuration. A long time down the road when we're confident the representation is long term sustainable correct, we might consider exposing the structs, but that is a long way off. So I guess I don't see the value of having these cap-specific internal representations. I keep a string array of the cap names for ListDevicesByCap, but other than that, the capability data is used only by virNodeDeviceGetXMLDesc(). So it seems natural to represent it in a form that can easily be converted to XML, and that can represent all the XML we'll need to output (like xmlNode). Otherwise, we are forced to write more capability-specific code, with no particular advantage (no stronger typing guarantees if we don't expose specific cap types). The idea is that by having formal internal representations it makes it easier for internal driver code to work with it in a gaurenteed consistent fashion. While the structs may only be used by your specific driver for the intiial commit, experiance has shown our needs evolve over time and we'll probably end up with other internal code wanting it. We previously had each hypervisor driver using an adhoc representation internally for domain configs, and it just wasn't sustainable our internal usage evolved. P.S. I do think it would be useful to have access to capability details other than GetXMLDesc. Perhaps: const char *virNodeDeviceCapabilityProperty(virNodeDevicePtr dev, const char *cap, const char *prop) but note this exposes them only in a general (property / value) way, and is quite easily implemented with a xmlNode representation. I'm wary of exposing sub-sets of the XML docs in simple key/value pairs because our XML formats have tended to expand over time, so things that started off key/value pairs gained extra attributes/child elements, all of which are desired. It is simple enough for applications to wrap in a property 'getter' using XPath on the XML doc if desired - virt-manager does this internally for many things. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt
On Sat, Oct 4, 2008 at 1:17 AM, Daniel P. Berrange [EMAIL PROTECTED] wrote: On Sat, Oct 04, 2008 at 12:13:38AM +0530, Balbir Singh wrote: On Fri, Oct 3, 2008 at 11:43 PM, Daniel P. Berrange [EMAIL PROTECTED] wrote: True, it definitely does and the way I look at APIs is that they are layers. We've built the first layer that abstracts permissions, paths and strings into a set of useful API. The second layer does things that you say, the question then is why don't we have it yet? Let me try and answer that question 1. We've been trying to build configuration, classification and the low level plumbing 2. We've been planning to build the exact same thing that you say, we call that the pluggable architecture, where controller plug in their logic and provide the abstractions you need, but not gotten there yet. When you announced cgroup support in libvirt, it was definitely going to be a user and we hoped that you would come to us with your exact requirements that you've mentioned now (believe me, your feedback is very useful). The question then to ask is, is it cheaper for you to build these abstractions into libvirt or either helped us or asked us to do so, we would have gladly obliged. You might say that the onus is on the maintainers to do the right thing without feedback, but I would beg to differ. The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful. So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term. Please remember my words if you ever find that you have a code base that looks like what we have in libcgroups, please remember to switch over to libcgroup. I fear that you will reach that stage, the code that is going in right now has too many things hard-coded and will need a lot of changes going forward, things like adding support for new controllers is not going to be straight forward, your assumption that only root can create a container might be broken and we'll build support for hierarchies, which will require further changes, etc. I am not scaring you, just trying to make sure we don't solve the same problems twice. Balbir -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] Re: [discuss] The new cgroup patches for libvirt
The thing I didn't mention, is that until Dan posted his current patches actually implementing the cgroups stuff in LXC driver, I didn't have a good picture of what the ideal higher level interface would look like. If you try and imagine high level APIs, without having an app actually using them, its all too easy to design something that turns out to not be useful. So while I know the low level cgroups API isn't what we need, it needs the current proof of concept in the libvirt LXC driver to discover what is an effective approach for libcgroups. I suspect our code will evolve further as we learn from what we've got now. By doing this entirely within libvirt we can experiment with effective implementation strategies without having to lockdown a formally supported API immediately. Once things settle down, it'll easier for libcgroups to see exactly what is important for a high level API and thus make one that's useful to more apps in the long term. Agreed, the libvirt changes for cgroups have shown us a useful layer to build. We'll keep on top of it and try and build something that everyone can use. Balbir -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 0 of 2] [RFC] Add cgroup manipulation and LXC driver support
Daniel P. Berrange wrote: On Wed, Oct 01, 2008 at 08:41:19AM +0530, Balbir Singh wrote: Dan Smith wrote: DB At the same time having the controllers mounted is mandatory for DB libvirt to work and asking the admin to set things up manually DB also sucks. So perhaps we'll need to mount them automatically, but DB make this behaviuour configurable in some way, so admin can DB override it Perhaps we can: - Have a list of controllers we use (memory and devices so far) - Create each group in all mounts required to satisfy our necessary controllers - Select the appropriate mount when setting a cont.key value I am not sure how libvirt provides thread safety, but I did not see any explicit coding for that? The thread safety model for libvirt has two levels - A single virConnectPtr object must only be used by one thread. If you have multiple threads, you must provide each with its own conenct object - Within a stateless driver (Xen, OpenVZ, Test), there is no shared state between virConnectPtr objects, so there are no thread issues in this respect - With a stateful driver, the libvirtd daemon ensures that only a single thread is active at once, so against there are no thread issues there either. Now, in a short while I will be making the daemon fully-multithreaded. When this happens, the stateful drivers will be required to maintain mutexes for locking. The locking model wil have 2 levels, one lock over the driver as a whole. This is held only while acquiring a lock against the object being modified (eg the virtual domain object). Each virtual domain, lives in one cgroup, so there is a single virCGroup object associated with each domain. the virCGroup object state is seflf contained, so independant virCGroup objects can be accessed concurrently from multiple threads, without any threads safety issues. Thanks, that was quite insightful. -- Balbir -- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Re: [discuss] The new cgroup patches for libvirt
Daniel P. Berrange wrote: On Fri, Oct 03, 2008 at 09:31:52PM +0530, Balbir Singh wrote: Hi, Everyone, I've seen a new set of patches from Dan Smith, which implement cgroup support for libvirt. While the patches seem simple, there are some issues that have been pointed out in the posting itself. I hope that libvirt will switch over (may be after your concerns are addressed and definitely in the longer run) to using libcgroups rather than having an internal implementation of cgroups. The advantages of switching over would be using the functionality that libcgroup already provides libcgroups (libcg.sf.net) provides 1. Ability to configure and mount cgroups and controllers via initscripts and a configuration file 2. An API to control and read cgroups information 3. Thread safety around API calls 4. Daemons to automatically classify a task based on a certain set of rules 5. API to extract current cgroup classification (where is the task currently in the cgroup hierarchy) So from a functional point of view you are addressing essentially three use cases 1. System configuration for controllers 2. Automatic task classification 3. Application development API for creating groups If each piece is correctly designed, the choice of implementation for each of these can be, and in some cases must be, totally independant. Since the kernel restricts that a single controller can only be attached to one cgroupsfs mount point, and one attach cannot be changed, the choice of how / where to mount controllers must remain outside the scope of applications. If any application using cgroups were to specify mount points, it would be inflicting its own requirements on every user of cgroups. This implies that applications must be designed to work with whatever controller mount configuration the admin has configured, and not configure stuff themselves. So impl for point 1 (configuration) must, by neccessity, be completely independant of impl for point 3 (application API). Considering automatic task classification. The task classification engine must be able to cope with the fact that applications have some functional requirements on cgroups setup. Taking libvirt as an example, we have a specific need to apply some controllers over a group of processes forming a container. A task classification engine must not re-clasify individual tasks within a container because that would conflict with the semantics required by libvirt. It is, however, free to re-classify the libvirtd daemon itself - whatever cgroup libvirtd is placed in, it will create the LXC cgroups below this point. So if libvirt is designed correctly, it will work with whatever cgroup task classification engine that might be running. Similarly if the task classification engine has been designed to co-operate with applications there is no problem running it alonside libvirt. Thus the implementation of points 2 (task classification) and point 3 (application API) have no need to be formally tied together. Furthermore tieing them together does not magically solve the problem that both applications the cgroups task classification engine need to be intelligently designed to co-operate. Agreed! While re-implementing might sound like a cool thing to do, here are the drawbacks 1. It leads to code duplication and reduces code reuse This is important if the library code is providing significant value add to the application using it. As it stands, libcgroup is merely a direct interface to the cgroups filesystem providing weakly typed setters getters - with the exception of looking at the mount table to find where a controller lives, this is not hard / complex code, so the benefits of re-use are not particularly high. Please see my earlier email on layering of API. In such a scenario reducing code duplication is not in itself a benefit, since there are costs associated with using external libraries. It is more complicated integrate 2 independant style sof API, particularly with different views on error reporting, memory management and varying expectations for the semantic models exposed. I disagree, I see a lot of code that does the same thing, look through /proc/mounts, read and parse values to write and read. I see two API's you've built on top of what libcgroup has (one for setting memory limit and the other for devices). Please compare the patch sizes as well and you'll see what I mean. There are a number of 'hard' questions wrt to cgroups usage by applications, two of which are outlined above. Simply having all applications use a single API cannot magically solve any of these problems - no matter what API is used application developers need to take care to design their usage of cgroups such that it 'plays nicely' with other applications. Playing nicely is a definite requirement, but not using existing code or contributing to it if something is broken and