Re: [libvirt] [RFC][scale] new API for querying domains stats
On Tue, Aug 05, 2014 at 01:36:02PM +0800, Li Wei wrote: Hi Richard, Thanks for your comment! On 08/04/2014 04:39 PM, Richard W.M. Jones wrote: On Mon, Aug 04, 2014 at 11:38:41AM +0800, Li Wei wrote: Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? I think this works OK for the case where you have 1 domains with lots of disks. However if you have a large number of domains each with 1 or 2 disks I think you would have the same problem as currently. Yes, it is. Is it possible to design an API that can work across all domains in a single call? How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Return the number of virDomainBlockBulkStats populated. where virDomainBlockBulkStats defined as: struct _virDomainBlockBulkStats { virDomainPtr domain; /* domain the block stats belongs to */ virTypedParameterPtr params; /* params to store block stats */ unsigned int nparams; /* how many params used for each block stats */ unsigned int ndisks; /* how many block stats in this domain */ }; Works for me. Please CC me on any patches so I can review them more easily for you. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Richard W.M. Jones rjo...@redhat.com To: Li Wei l...@cn.fujitsu.com Cc: Francesco Romani from...@redhat.com, libvir-list@redhat.com Sent: Tuesday, August 12, 2014 11:04:05 AM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats [...] Is it possible to design an API that can work across all domains in a single call? How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Return the number of virDomainBlockBulkStats populated. where virDomainBlockBulkStats defined as: struct _virDomainBlockBulkStats { virDomainPtr domain; /* domain the block stats belongs to */ virTypedParameterPtr params; /* params to store block stats */ unsigned int nparams;/* how many params used for each block stats */ unsigned int ndisks; /* how many block stats in this domain */ }; Works for me. Same here. oVirt, more specifically VDSM, needs to check all the stats of all the domains on a given host at once, so this API should fit the task. Since VDSM takes ownership (read: keep track and control) of all the VMs, the filtering capability of this new API should be good enough. +++ It would be nice, but less important, to be able to somehow reuse the `stats' argument. What I'm looking here is a way to avoid to allocate/deallocate every time all the needed structure before and after each call. I'm saying so because is a pretty common scenario for a VM (at least in the cases I'm aware of) to have the same number of disks during all its life. But I believe this is an optimization which can be added later. Thanks, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
ping ... On 08/05/2014 01:36 PM, Li Wei wrote: Hi Richard, Thanks for your comment! On 08/04/2014 04:39 PM, Richard W.M. Jones wrote: On Mon, Aug 04, 2014 at 11:38:41AM +0800, Li Wei wrote: Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? I think this works OK for the case where you have 1 domains with lots of disks. However if you have a large number of domains each with 1 or 2 disks I think you would have the same problem as currently. Yes, it is. Is it possible to design an API that can work across all domains in a single call? How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Return the number of virDomainBlockBulkStats populated. where virDomainBlockBulkStats defined as: struct _virDomainBlockBulkStats { virDomainPtr domain; /* domain the block stats belongs to */ virTypedParameterPtr params; /* params to store block stats */ unsigned int nparams; /* how many params used for each block stats */ unsigned int ndisks; /* how many block stats in this domain */ }; Note: 1. because the API allocate memory to store stats, the caller need to free it after use. 2. to distinguish each block stats in a domain, we need use a param to store block device name. PS: It seems we need a bunch of bulk APIs to query stats, I wonder if I can submit a patchset for each bulk API or must supply all the bulk APIs in one patchset? Whichever is easiest to review. I suspect that smaller patches, each containing a single new API, will be simpler to review, but that's just my opinion. I prefer this way also. Thanks, Li Wei Rich. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list . -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 08/04/2014 11:46 PM, Li Wei wrote: How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Because block stats only valid for active domains, may be this filter flag can be remove. No, keep the flag. It is still useful to filter on at least transient vs. persistent. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Mon, Aug 04, 2014 at 11:38:41AM +0800, Li Wei wrote: Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? I think this works OK for the case where you have 1 domains with lots of disks. However if you have a large number of domains each with 1 or 2 disks I think you would have the same problem as currently. Is it possible to design an API that can work across all domains in a single call? PS: It seems we need a bunch of bulk APIs to query stats, I wonder if I can submit a patchset for each bulk API or must supply all the bulk APIs in one patchset? Whichever is easiest to review. I suspect that smaller patches, each containing a single new API, will be simpler to review, but that's just my opinion. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
Hi Richard, Thanks for your comment! On 08/04/2014 04:39 PM, Richard W.M. Jones wrote: On Mon, Aug 04, 2014 at 11:38:41AM +0800, Li Wei wrote: Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? I think this works OK for the case where you have 1 domains with lots of disks. However if you have a large number of domains each with 1 or 2 disks I think you would have the same problem as currently. Yes, it is. Is it possible to design an API that can work across all domains in a single call? How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Return the number of virDomainBlockBulkStats populated. where virDomainBlockBulkStats defined as: struct _virDomainBlockBulkStats { virDomainPtr domain; /* domain the block stats belongs to */ virTypedParameterPtr params; /* params to store block stats */ unsigned int nparams;/* how many params used for each block stats */ unsigned int ndisks; /* how many block stats in this domain */ }; Note: 1. because the API allocate memory to store stats, the caller need to free it after use. 2. to distinguish each block stats in a domain, we need use a param to store block device name. PS: It seems we need a bunch of bulk APIs to query stats, I wonder if I can submit a patchset for each bulk API or must supply all the bulk APIs in one patchset? Whichever is easiest to review. I suspect that smaller patches, each containing a single new API, will be simpler to review, but that's just my opinion. I prefer this way also. Thanks, Li Wei Rich. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 08/05/2014 01:36 PM, Li Wei wrote: Hi Richard, Thanks for your comment! On 08/04/2014 04:39 PM, Richard W.M. Jones wrote: On Mon, Aug 04, 2014 at 11:38:41AM +0800, Li Wei wrote: Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? I think this works OK for the case where you have 1 domains with lots of disks. However if you have a large number of domains each with 1 or 2 disks I think you would have the same problem as currently. Yes, it is. Is it possible to design an API that can work across all domains in a single call? How about the following API: int virConnectGetAllBlockStats(virConnectPtr conn, virDomainPtr domain, virDomainBlockBulkStatsPtr *stats, unsigned int flags); @conn: pointer to libvirt connection @domain: pointer to the domain to be queried, NULL for all domains @stats: array of virDomainBlockBulkStats struct(see below) to be populated @flags: filter flags Because block stats only valid for active domains, may be this filter flag can be remove. Thanks. Return the number of virDomainBlockBulkStats populated. where virDomainBlockBulkStats defined as: struct _virDomainBlockBulkStats { virDomainPtr domain; /* domain the block stats belongs to */ virTypedParameterPtr params; /* params to store block stats */ unsigned int nparams; /* how many params used for each block stats */ unsigned int ndisks; /* how many block stats in this domain */ }; Note: 1. because the API allocate memory to store stats, the caller need to free it after use. 2. to distinguish each block stats in a domain, we need use a param to store block device name. PS: It seems we need a bunch of bulk APIs to query stats, I wonder if I can submit a patchset for each bulk API or must supply all the bulk APIs in one patchset? Whichever is easiest to review. I suspect that smaller patches, each containing a single new API, will be simpler to review, but that's just my opinion. I prefer this way also. Thanks, Li Wei Rich. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list . -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
Hi, On 07/22/2014 03:25 PM, Richard W.M. Jones wrote: Did anything come of this discussion, and/or is someone working on this? I am working on an API to query block stats in a bulk style and proposed an API as follow: virDomainBlockStatsBulkFlags(virDomainPtr dom, virTypedParameterPtr params, int nparams, int ndisks, unsigned int flags) @dom: pointer to domain object @params: an array of typed param to be populated with block stats @nparams: how many params used for each block device @ndisks: how many block devices to query @flags: flags to filter block devices (not used for now) Returns -1 in case of error, 0 in case of success. with params == NULL, nparams == -1, ndisks == 1, return number of params for each block device. with params == NULL, nparams == -1, ndisks == -1, return number of disks in the domain. A typical usage of this API should be: nparams = virDomainBlockStatsBulkFlags(dom, NULL, -1, 1, 0); ndisks = virDomainBlockStatsBulkFlags(dom, NULL, -1, -1, 0); params = VIR_ALLOC_N(params, nparams * ndisks); ret = virDomainBlockStatsBulkFlags(dom, params, nparams, ndisks, 0); ... do something with params VIR_FREE(params); With this bulk API, virt-top can updates in a short interval for a domain with a lot of disks. Any comments? PS: It seems we need a bunch of bulk APIs to query stats, I wonder if I can submit a patchset for each bulk API or must supply all the bulk APIs in one patchset? Thanks, Li Wei Rich. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
Did anything come of this discussion, and/or is someone working on this? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Francesco Romani from...@redhat.com To: libvir-list@redhat.com Sent: Friday, July 4, 2014 6:44:07 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats However, a question here about bulk APIs. One cornerstone of oVirt is shared storage (NFS, ISCSI...); another is qemu/kvm, and COW images are supported (probably even the default, need to check). Due to storage being unavailable because a network outage, it happened that virDomainGetBlockInfo blocked beyond recover. On such scenarios, how will a bulk API behave? There will be a timeout or something else? It depends on the storage and the way it is configured. If NFS is mounted with 'hard' + 'nointr' any call libvirt makes to dead storage will get stuck in an uninterruptable sleep in kernel space. There's no way for libvirt to time out since by the very definition of 'hard' mount option it does not time out. If you mount with 'soft' then the calls libvirt makes will time out. My bad, I worded poorly my question. What I mean is: on top of what the kernel or QEMU (libnfs, libiscsi) does, there are plans for any additional mechanism/safeguard? (I guess no, I'm asking just to be sure). Hi, maybe borderline offtopic, but still about blocking calls: We (VDSM/oVirt developers) are reviewing our usage of libvirt in sampling. Afer a (quick) inspection of the code, I believe the following calls cannot block due to FS/storage issues, as they do not need it in any way I'm quite confident about these * virDomainGetCPUStats: uses cgroups only (no FS/storage access) * virDomainInterfaceStats: uses /proc/net/dev (no FS/storage access) * virDomainGetVcpus: uses uses /proc and syscall for PCPU affinity (no FS/storage access) * virDomainSchedulerParameters: which uses cgroups (no FS/storage access) Not sure about this, but it looks to me they don't need to access FS/storage either: * virDomainGetVcpusFlags * virDomainGetMetadata Can please anyone confirm or deny? Thanks and best regards -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Wed, Jul 09, 2014 at 06:14:12AM -0400, Francesco Romani wrote: - Original Message - From: Francesco Romani from...@redhat.com To: libvir-list@redhat.com Sent: Friday, July 4, 2014 6:44:07 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats However, a question here about bulk APIs. One cornerstone of oVirt is shared storage (NFS, ISCSI...); another is qemu/kvm, and COW images are supported (probably even the default, need to check). Due to storage being unavailable because a network outage, it happened that virDomainGetBlockInfo blocked beyond recover. On such scenarios, how will a bulk API behave? There will be a timeout or something else? It depends on the storage and the way it is configured. If NFS is mounted with 'hard' + 'nointr' any call libvirt makes to dead storage will get stuck in an uninterruptable sleep in kernel space. There's no way for libvirt to time out since by the very definition of 'hard' mount option it does not time out. If you mount with 'soft' then the calls libvirt makes will time out. My bad, I worded poorly my question. What I mean is: on top of what the kernel or QEMU (libnfs, libiscsi) does, there are plans for any additional mechanism/safeguard? (I guess no, I'm asking just to be sure). Hi, maybe borderline offtopic, but still about blocking calls: We (VDSM/oVirt developers) are reviewing our usage of libvirt in sampling. Afer a (quick) inspection of the code, I believe the following calls cannot block due to FS/storage issues, as they do not need it in any way I'm quite confident about these * virDomainGetCPUStats: uses cgroups only (no FS/storage access) * virDomainInterfaceStats: uses /proc/net/dev (no FS/storage access) * virDomainGetVcpus: uses uses /proc and syscall for PCPU affinity (no FS/storage access) * virDomainSchedulerParameters: which uses cgroups (no FS/storage access) Not sure about this, but it looks to me they don't need to access FS/storage either: * virDomainGetVcpusFlags * virDomainGetMetadata Can please anyone confirm or deny? If there is a prior call to libvirt that involves that guest domain which has blocked on storage, then this can prevent subsequent calls from completely since the prior call may hold a lock. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Thu, Jul 03, 2014 at 01:49:41PM -0600, Eric Blake wrote: On 07/01/2014 03:33 AM, Daniel P. Berrange wrote: 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket ...and so on.. If the time for item 2 dominates over the time for items 1 2 (which it should really) then the client thread is going to be sleeping in a poll() for the bulk of the duration of the libvirt API call. If we had an async API mechanism, then the VDSM time would essentially be consumed with 1. Time to write() the RPC call to the socket 2. Time to write() the RPC call to the socket 3. Time to write() the RPC call to the socket 4. Time to write() the RPC call to the socket 5. Time to write() the RPC call to the socket 6. Time to write() the RPC call to the socket 7. wait for replies to start arriving 8. Time to recv() the RPC reply from the socket 9. Time to recv() the RPC reply from the socket 10. Time to recv() the RPC reply from the socket 11. Time to recv() the RPC reply from the socket 12. Time to recv() the RPC reply from the socket 13. Time to recv() the RPC reply from the socket 14. Time to recv() the RPC reply from the socket This assumes you are still calling one async call per domain query. With regards to a bulk API, are you thinking synchronous? 1. Time to write() the RPC call - one bulk request 2. wait for reply - oh, and we'd better increase our on-wire size limits 3. Time to recv() the RPC reply - one bulk response or asynchronous? 1. Time to write() the RPC call - one bulk request 2. wait for replies to start arriving 3. Time to recv() an RPC async reply - first domain 4. Time to recv() an RPC async reply - second domain ... n. Time to recv() final RPC async reply The asynchronous works nicely in that we don't have to size up our max RPC on-wire limits, but implies that you still need a callback invoked once per reply received, instead of getting all data back in one giant memory blob. I was thinking the former actually, but the latter is another possibility to consider I guess. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Tue, Jul 01, 2014 at 03:09:13AM -0400, Francesco Romani wrote: I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I'll just note here that a bug has been opened for virt-top, which is similar to this. If a domain has a large number of disks (256 virtio-scsi disks in the customer's case), then virt-top spends so long fetching the data for each separate disk, it can take 30-40 seconds between updates. The same thing would happen if you had lots of domains, each with a few disks, but with the total adding up to hundreds of disks. The same thing would happen if you substitute network interfaces for disks. What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains in as few API round trips as possible. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Tue, Jul 01, 2014 at 09:35:21AM +0100, Daniel P. Berrange wrote: For the async API design, I could see two potential designs 1. A custom callback to run per API typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom, bool isError, virDomainBlockInfoPtr info, void *opaque); int virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); 2. A standard callback and a pair of APIs typedef void *virDomainAsyncResult; typedef (void)(*virDomainAsyncCallback)(virDomainPtr dom, virDomainAsyncResult res); void virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); int virDomainGetBlockInfoFinish(virDomainPtr dom, virDomainAsyncResult res, virDomainBlockInfoPtr info); Could we consider an API which worked across all active domains? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Richard W.M. Jones rjo...@redhat.com To: Francesco Romani from...@redhat.com Cc: libvir-list@redhat.com Sent: Friday, July 4, 2014 1:11:54 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I'll just note here that a bug has been opened for virt-top, which is similar to this. If a domain has a large number of disks (256 virtio-scsi disks in the customer's case), then virt-top spends so long fetching the data for each separate disk, it can take 30-40 seconds between updates. The same thing would happen if you had lots of domains, each with a few disks, but with the total adding up to hundreds of disks. The same thing would happen if you substitute network interfaces for disks. What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains in as few API round trips as possible. I concur. Actually you also expressed our (VDSM) need better than I did. I think we are on the same boat. Bests, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Fri, Jul 04, 2014 at 12:14:06PM +0100, Richard W.M. Jones wrote: On Tue, Jul 01, 2014 at 09:35:21AM +0100, Daniel P. Berrange wrote: For the async API design, I could see two potential designs 1. A custom callback to run per API typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom, bool isError, virDomainBlockInfoPtr info, void *opaque); int virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); 2. A standard callback and a pair of APIs typedef void *virDomainAsyncResult; typedef (void)(*virDomainAsyncCallback)(virDomainPtr dom, virDomainAsyncResult res); void virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); int virDomainGetBlockInfoFinish(virDomainPtr dom, virDomainAsyncResult res, virDomainBlockInfoPtr info); Could we consider an API which worked across all active domains? Of course. I was intentionally ignoring the bulk API side of the request in this example, to just focus on the illustration of some general patterns for providing an async API design. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Fri, Jul 04, 2014 at 12:11:54PM +0100, Richard W.M. Jones wrote: On Tue, Jul 01, 2014 at 03:09:13AM -0400, Francesco Romani wrote: I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I'll just note here that a bug has been opened for virt-top, which is similar to this. If a domain has a large number of disks (256 virtio-scsi disks in the customer's case), then virt-top spends so long fetching the data for each separate disk, it can take 30-40 seconds between updates. The same thing would happen if you had lots of domains, each with a few disks, but with the total adding up to hundreds of disks. The same thing would happen if you substitute network interfaces for disks. What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains I'd say that we want something similar to the virDomainListAllDomains() API for stats. ie we shouldn't try to pass in the full list of domains or paths we want info for. We should just list all domains, optionally using flags to filter based on some characteristic, eg exclude inactive. Similarly always list stats for all disks. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Fri, Jul 04, 2014 at 12:33:27PM +0100, Daniel P. Berrange wrote: On Fri, Jul 04, 2014 at 12:11:54PM +0100, Richard W.M. Jones wrote: On Tue, Jul 01, 2014 at 03:09:13AM -0400, Francesco Romani wrote: I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I'll just note here that a bug has been opened for virt-top, which is similar to this. If a domain has a large number of disks (256 virtio-scsi disks in the customer's case), then virt-top spends so long fetching the data for each separate disk, it can take 30-40 seconds between updates. The same thing would happen if you had lots of domains, each with a few disks, but with the total adding up to hundreds of disks. The same thing would happen if you substitute network interfaces for disks. What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains I'd say that we want something similar to the virDomainListAllDomains() API for stats. ie we shouldn't try to pass in the full list of domains or paths we want info for. We should just list all domains, optionally using flags to filter based on some characteristic, eg exclude inactive. Similarly always list stats for all disks. FYI for virt-top we only care about stats of all active domains, and we only care about all disks all network interfaces for domains (ie. never any subset). We also collect CPU time and memory usage per domain. Of course this only applies to virt-top, not to other clients. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Richard W.M. Jones rjo...@redhat.com To: Daniel P. Berrange berra...@redhat.com Cc: libvir-list@redhat.com, Francesco Romani from...@redhat.com Sent: Friday, July 4, 2014 1:39:57 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains I'd say that we want something similar to the virDomainListAllDomains() API for stats. ie we shouldn't try to pass in the full list of domains or paths we want info for. We should just list all domains, optionally using flags to filter based on some characteristic, eg exclude inactive. Similarly always list stats for all disks. FYI for virt-top we only care about stats of all active domains, and we only care about all disks all network interfaces for domains (ie. never any subset). We also collect CPU time and memory usage per domain. Is the same for VDSM. VDSM takes ownership of all the domain on an host, so all it never does any kind of filtering or consider subsets of any kind. However, a question here about bulk APIs. One cornerstone of oVirt is shared storage (NFS, ISCSI...); another is qemu/kvm, and COW images are supported (probably even the default, need to check). Due to storage being unavailable because a network outage, it happened that virDomainGetBlockInfo blocked beyond recover. On such scenarios, how will a bulk API behave? There will be a timeout or something else? -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Fri, Jul 04, 2014 at 12:13:32PM -0400, Francesco Romani wrote: - Original Message - From: Richard W.M. Jones rjo...@redhat.com To: Daniel P. Berrange berra...@redhat.com Cc: libvir-list@redhat.com, Francesco Romani from...@redhat.com Sent: Friday, July 4, 2014 1:39:57 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats What would help for us: - A way to get information for multiple objects in a single domain - A way to get information for multiple objects across multiple domains I'd say that we want something similar to the virDomainListAllDomains() API for stats. ie we shouldn't try to pass in the full list of domains or paths we want info for. We should just list all domains, optionally using flags to filter based on some characteristic, eg exclude inactive. Similarly always list stats for all disks. FYI for virt-top we only care about stats of all active domains, and we only care about all disks all network interfaces for domains (ie. never any subset). We also collect CPU time and memory usage per domain. Is the same for VDSM. VDSM takes ownership of all the domain on an host, so all it never does any kind of filtering or consider subsets of any kind. However, a question here about bulk APIs. One cornerstone of oVirt is shared storage (NFS, ISCSI...); another is qemu/kvm, and COW images are supported (probably even the default, need to check). Due to storage being unavailable because a network outage, it happened that virDomainGetBlockInfo blocked beyond recover. On such scenarios, how will a bulk API behave? There will be a timeout or something else? It depends on the storage and the way it is configured. If NFS is mounted with 'hard' + 'nointr' any call libvirt makes to dead storage will get stuck in an uninterruptable sleep in kernel space. There's no way for libvirt to time out since by the very definition of 'hard' mount option it does not time out. If you mount with 'soft' then the calls libvirt makes will time out. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Daniel P. Berrange berra...@redhat.com To: Francesco Romani from...@redhat.com Cc: libvir-list@redhat.com, Richard W.M. Jones rjo...@redhat.com Sent: Friday, July 4, 2014 6:21:30 PM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats However, a question here about bulk APIs. One cornerstone of oVirt is shared storage (NFS, ISCSI...); another is qemu/kvm, and COW images are supported (probably even the default, need to check). Due to storage being unavailable because a network outage, it happened that virDomainGetBlockInfo blocked beyond recover. On such scenarios, how will a bulk API behave? There will be a timeout or something else? It depends on the storage and the way it is configured. If NFS is mounted with 'hard' + 'nointr' any call libvirt makes to dead storage will get stuck in an uninterruptable sleep in kernel space. There's no way for libvirt to time out since by the very definition of 'hard' mount option it does not time out. If you mount with 'soft' then the calls libvirt makes will time out. My bad, I worded poorly my question. What I mean is: on top of what the kernel or QEMU (libnfs, libiscsi) does, there are plans for any additional mechanism/safeguard? (I guess no, I'm asking just to be sure). VDSM already uses soft mount for NFS (need to check what we do for ISCSI and the other supported storage). Thanks and bests, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 07/01/2014 02:35 AM, Daniel P. Berrange wrote: 1. A custom callback to run per API typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom, bool isError, virDomainBlockInfoPtr info, void *opaque); It might be nice to require the callback to return an int; 0 to keep going, non-zero to stop immediately. int virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); What should this function return on success, 0 or the number of times the callback was reached? However, even if we add a callback return value (non-zero to quit immediately), I don't think feeding it directly to the return value is nice; we still want to reserve negative values for errors (couldn't even invoke callbacks, perhaps because dom was a bad pointer). Besides, a user can always use opaque to collect counts of how many times the callback was invoked, and/or a specific return value on early exit. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 07/01/2014 03:33 AM, Daniel P. Berrange wrote: 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket ...and so on.. If the time for item 2 dominates over the time for items 1 2 (which it should really) then the client thread is going to be sleeping in a poll() for the bulk of the duration of the libvirt API call. If we had an async API mechanism, then the VDSM time would essentially be consumed with 1. Time to write() the RPC call to the socket 2. Time to write() the RPC call to the socket 3. Time to write() the RPC call to the socket 4. Time to write() the RPC call to the socket 5. Time to write() the RPC call to the socket 6. Time to write() the RPC call to the socket 7. wait for replies to start arriving 8. Time to recv() the RPC reply from the socket 9. Time to recv() the RPC reply from the socket 10. Time to recv() the RPC reply from the socket 11. Time to recv() the RPC reply from the socket 12. Time to recv() the RPC reply from the socket 13. Time to recv() the RPC reply from the socket 14. Time to recv() the RPC reply from the socket This assumes you are still calling one async call per domain query. With regards to a bulk API, are you thinking synchronous? 1. Time to write() the RPC call - one bulk request 2. wait for reply - oh, and we'd better increase our on-wire size limits 3. Time to recv() the RPC reply - one bulk response or asynchronous? 1. Time to write() the RPC call - one bulk request 2. wait for replies to start arriving 3. Time to recv() an RPC async reply - first domain 4. Time to recv() an RPC async reply - second domain ... n. Time to recv() final RPC async reply The asynchronous works nicely in that we don't have to size up our max RPC on-wire limits, but implies that you still need a callback invoked once per reply received, instead of getting all data back in one giant memory blob. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Daniel P. Berrange berra...@redhat.com To: Francesco Romani from...@redhat.com Cc: libvir-list@redhat.com Sent: Tuesday, July 1, 2014 10:35:21 AM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats [...] We [in VDSM] currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata Why do you need to call virDomainGetMetadata so often ? That merely contains a opaque data blob that can only have come from VDSM itself, so I'm surprised you need to call that at all frequently. We store some QoS info in the domain metadata. Actually we can elide this API call from the list and fix our coude to make smarter use of it. please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) * bulk APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113116) would be really welcome as well. It is quite independent from the previous bullet point and would help us greatly with scale. If we did the first bullet point, we'd be adding another ~10 APIs for async variants. If we then did the second bullet point we'd be adding another ~10 APIs for bulk querying. So while you're right that they are independent, it would be desirable to address them both at the same time, so we only need to add 10 new APIs in total, not 20. I'm fine with this approach. For the async API design, I could see two potential designs 1. A custom callback to run per API typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom, bool isError, virDomainBlockInfoPtr info, void *opaque); int virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); 2. A standard callback and a pair of APIs typedef void *virDomainAsyncResult; typedef (void)(*virDomainAsyncCallback)(virDomainPtr dom, virDomainAsyncResult res); void virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); int virDomainGetBlockInfoFinish(virDomainPtr dom, virDomainAsyncResult res, virDomainBlockInfoPtr info); This second approach is the way GIO works (see example in this page https://developer.gnome.org/gio/stable/GAsyncResult.html ). The main difference between them really is probably the way you get error reporting from the APIs. In the first example, libvirt would raise an error before it invoked the callback, with isError set to True. In the second example, the Finish() func would raise the error and return -1. I need to check in deeper detail and sync up with other VDSM developers, but I have a feel that the first approach is a bit easier for VDSM to consume. Bests, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
- Original Message - From: Michal Privoznik mpriv...@redhat.com To: Francesco Romani from...@redhat.com, libvir-list@redhat.com Sent: Tuesday, July 1, 2014 11:19:04 AM Subject: Re: [libvirt] [RFC][scale] new API for querying domains stats Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I think this is your main problem. Why not have only one thread that would manage list of domains to query and issue the APIs periodically instead of having one thread per domain? Indeed it is. I'm actually personally addressing this problem in VDSM. It is mostly an inheritence of past times, when this wasn't yet a big problem. We are moving toward a thread pool of fixed size to handle the sampling. This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) I'm not a big fan of this approach. I mean, IIRC python has this Big Python Lock, which effectively prevents two threads run concurrently. It has the GIL, yes. Only one thread can run python code at any given time. This however it is not true for extensions modules written in C which if carefully designed (read: coded to properly release the GIL) can run concurrently. This is one of the reasons while threading in python it is tolerated for I/O, evne though never recommended. AFAIK/IIRC the code the libvirt module for python allows this, so we should be good to go. So while in C this would make perfect sense, it doesn't do so in python. The callbacks would be called from the event loop, which given how frequently you dump the info will block other threads. Therefore I'm afraid the approach would not bring any speed up, rather slow down. I'm not sure about this and I think quite the opposite, that performance-wise we can gain something, even though yes, all the callbacks will pile up in the event loop. Surely this will greatly reduce the GIL battle http://dabeaz.blogspot.it/2010/01/python-gil-visualized.html - which is improved in python = 3.2, but we are on 2.7 for the foreseeable future, and will improve our thread proliferation which is an immediate and real concern of us - Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Wed, Jul 02, 2014 at 11:56:23AM -0400, Francesco Romani wrote: This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) I'm not a big fan of this approach. I mean, IIRC python has this Big Python Lock, which effectively prevents two threads run concurrently. It has the GIL, yes. Only one thread can run python code at any given time. This however it is not true for extensions modules written in C which if carefully designed (read: coded to properly release the GIL) can run concurrently. This is one of the reasons while threading in python it is tolerated for I/O, evne though never recommended. AFAIK/IIRC the code the libvirt module for python allows this, so we should be good to go. For the sake of completeness I'll point out that there's another theoretical option. The libvirt-gobject binding to libvirt provides async APIs to libvirt APIs. It does this by using threads internally. Since these are C level threads though, if VDSM were to use libvirt-gobject it could get async APIs and the benefits of real threads, while remaining single threaded at the python layer. That all said, I'm not sure whether libvirt-gobject has sufficient API coverage for all the APIs VDSM needs. It primarily just has bindings for the APIs used by GNOME Boxes libvirt-sandbox so far. Also not sure if it is a widely deployed enough dep for VDSM to mandate. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [RFC][scale] new API for querying domains stats
Hi everyone, I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) * bulk APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113116) would be really welcome as well. It is quite independent from the previous bullet point and would help us greatly with scale. So, I'd like to discuss if these additions are (or can be) in the project roadmap, and, if so, how the API could look like and what the possible timeframe could be. Of course I'd be happy to provide any further information about VDSM and its workings. Thoughts very welcome! Thanks and best regards, -- Francesco Romani RedHat Engineering Virtualization R D Phone: 8261328 IRC: fromani -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Tue, Jul 01, 2014 at 03:09:13AM -0400, Francesco Romani wrote: Hi everyone, I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata Why do you need to call virDomainGetMetadata so often ? That merely contains a opaque data blob that can only have come from VDSM itself, so I'm surprised you need to call that at all frequently. What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) * bulk APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113116) would be really welcome as well. It is quite independent from the previous bullet point and would help us greatly with scale. If we did the first bullet point, we'd be adding another ~10 APIs for async variants. If we then did the second bullet point we'd be adding another ~10 APIs for bulk querying. So while you're right that they are independent, it would be desirable to address them both at the same time, so we only need to add 10 new APIs in total, not 20. For the async API design, I could see two potential designs 1. A custom callback to run per API typedef (void)(*virDomainBlockInfoCallback)(virDomainPtr dom, bool isError, virDomainBlockInfoPtr info, void *opaque); int virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); 2. A standard callback and a pair of APIs typedef void *virDomainAsyncResult; typedef (void)(*virDomainAsyncCallback)(virDomainPtr dom, virDomainAsyncResult res); void virDomainGetBlockInfoAsync(virDomainPtr dom, const char *disk, virDomainBlockInfoCallback cb, void *opaque, unsigned int flags); int virDomainGetBlockInfoFinish(virDomainPtr dom, virDomainAsyncResult res, virDomainBlockInfoPtr info); This second approach is the way GIO works (see example in this page https://developer.gnome.org/gio/stable/GAsyncResult.html ). The main difference between them really is probably the way you get error reporting from the APIs. In the first example, libvirt would raise an error before it invoked the callback, with isError set to True. In the second example, the Finish() func would raise the error and return -1. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 01.07.2014 09:09, Francesco Romani wrote: Hi everyone, I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I think this is your main problem. Why not have only one thread that would manage list of domains to query and issue the APIs periodically instead of having one thread per domain? This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) I'm not a big fan of this approach. I mean, IIRC python has this Big Python Lock, which effectively prevents two threads run concurrently. So while in C this would make perfect sense, it doesn't do so in python. The callbacks would be called from the event loop, which given how frequently you dump the info will block other threads. Therefore I'm afraid the approach would not bring any speed up, rather slow down. * bulk APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113116) would be really welcome as well. It is quite independent from the previous bullet point and would help us greatly with scale. I think this one looks better. Especially if you consider my suggestion of having only one thread to serve all domains. So, I'd like to discuss if these additions are (or can be) in the project roadmap, and, if so, how the API could look like and what the possible timeframe could be. Of course I'd be happy to provide any further information about VDSM and its workings. Thoughts very welcome! Thanks and best regards, Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC][scale] new API for querying domains stats
On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote: On 01.07.2014 09:09, Francesco Romani wrote: Hi everyone, I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I think this is your main problem. Why not have only one thread that would manage list of domains to query and issue the APIs periodically instead of having one thread per domain? You suffer from round trip time on every API call if you serialize it all in a single thread. eg if every API call is 50ms and you want to check once per scond, you can only monitor 20 VMs before you take more time than you have available. This really sucks when the majority of that 50ms is a sleep in poll() waiting for the RPC response. This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) I'm not a big fan of this approach. I mean, IIRC python has this Big Python Lock, which effectively prevents two threads run concurrently. So while in C this would make perfect sense, it doesn't do so in python. The callbacks would be called from the event loop, which given how frequently you dump the info will block other threads. Therefore I'm afraid the approach would not bring any speed up, rather slow down. I'm not sure I agree with your assessment here. If we consider a single API call, the time this takes to complete is made up of a number of parts 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket ...and so on.. If the time for item 2 dominates over the time for items 1 2 (which it should really) then the client thread is going to be sleeping in a poll() for the bulk of the duration of the libvirt API call. If we had an async API mechanism, then the VDSM time would essentially be consumed with 1. Time to write() the RPC call to the socket 2. Time to write() the RPC call to the socket 3. Time to write() the RPC call to the socket 4. Time to write() the RPC call to the socket 5. Time to write() the RPC call to the socket 6. Time to write() the RPC call to the socket 7. wait for replies to start arriving 8. Time to recv() the RPC reply from the socket 9. Time to recv() the RPC reply from the socket 10. Time to recv() the RPC reply from the socket 11. Time to recv() the RPC reply from the socket 12. Time to recv() the RPC reply from the socket 13. Time to recv() the RPC reply from the socket 14. Time to recv() the RPC reply from the socket Of course there's a limit to how many outstanding async calls you can make before the event loop gets 100% busy processing the responses, but I don't think that makes async calls worthless. Even if we had the bulk list API calls, async calling would be useful, because it would let VDSM fire off requests for disk, net, cpu, mem stats in parallel from a single thread. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o-
Re: [libvirt] [RFC][scale] new API for querying domains stats
On 01.07.2014 11:33, Daniel P. Berrange wrote: On Tue, Jul 01, 2014 at 11:19:04AM +0200, Michal Privoznik wrote: On 01.07.2014 09:09, Francesco Romani wrote: Hi everyone, I'd like to discuss possible APIs and plans for new query APIs in libvirt. I'm one of the oVirt (http://www.ovirt.org) developers, and I write code for VDSM; VDSM is the node management daemon, which is in charge, among many other things, to gather the host and statistics per Domain/VM. Right now we aim for a number of VM per node in the (few) hundreds, but we have big plans to scale much more, and to possibly reach thousands in a not so distant future. At the moment, we use one thread per VM to gather the VM stats (CPU, network, disk), and of course this obviously scales poorly. I think this is your main problem. Why not have only one thread that would manage list of domains to query and issue the APIs periodically instead of having one thread per domain? You suffer from round trip time on every API call if you serialize it all in a single thread. eg if every API call is 50ms and you want to check once per scond, you can only monitor 20 VMs before you take more time than you have available. This really sucks when the majority of that 50ms is a sleep in poll() waiting for the RPC response. Unless you have the bulk query API which will take the RTT only once ;) This is made only worse by the fact that VDSM is a python 2.7 application, and notoriously python 2.x behaves very badly with threads. We are already working to improve our code, but I'd like to bring the discussion here and see if and when the querying API can be improved. We currently use these APIs for our sempling: virDomainBlockInfo virDomainGetInfo virDomainGetCPUStats virDomainBlockStats virDomainBlockStatsFlags virDomainInterfaceStats virDomainGetVcpusFlags virDomainGetMetadata What we'd like to have is * asynchronous APIs for querying domain stats (https://bugzilla.redhat.com/show_bug.cgi?id=1113106) This would be just awesome. Either a single callback or a different one per call is fine (let's discuss this!). please note that we are much more concerned about thread reduction then about performance numbers. We had report of thread number becoming a real harm, while performance so far is not yet a concern (https://bugzilla.redhat.com/show_bug.cgi?id=1102147#c54) I'm not a big fan of this approach. I mean, IIRC python has this Big Python Lock, which effectively prevents two threads run concurrently. So while in C this would make perfect sense, it doesn't do so in python. The callbacks would be called from the event loop, which given how frequently you dump the info will block other threads. Therefore I'm afraid the approach would not bring any speed up, rather slow down. I'm not sure I agree with your assessment here. If we consider a single API call, the time this takes to complete is made up of a number of parts 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket 1. Time to write() the RPC call to the socket 2. Time for libvirtd to process the RPC call 3. Time to recv() the RPC reply from the socket ...and so on.. If the time for item 2 dominates over the time for items 1 2 (which it should really) then the client thread is going to be sleeping in a poll() for the bulk of the duration of the libvirt API call. If we had an async API mechanism, then the VDSM time would essentially be consumed with 1. Time to write() the RPC call to the socket 2. Time to write() the RPC call to the socket 3. Time to write() the RPC call to the socket 4. Time to write() the RPC call to the socket 5. Time to write() the RPC call to the socket 6. Time to write() the RPC call to the socket 7. wait for replies to start arriving 8. Time to recv() the RPC reply from the socket 9. Time to recv() the RPC reply from the socket 10. Time to recv() the RPC reply from the socket 11. Time to recv() the RPC reply from the socket 12. Time to recv() the RPC reply from the socket 13. Time to recv() the RPC reply from the socket 14. Time to recv() the RPC reply from the socket Well, in the async form you need to account even the time spent in the callbacks: 1. write(serial=1, ...) 2. write(serial=2, ...) .. 7. wait for replies 8. recv(serial=x1, ...) // there's no guarantee on order of replies 9. callback(serial=x1, ...) 10. recv(serial=x2, ...) 11. callback(serial=x2, ) And it's the callback times I'm worried about. I'm not saying we should not add the callback APIs. What I'm really saying is I have doubts it will help python apps. It will definitely help scaling C applications though. Of course there's a limit to how many outstanding async