Re: [libvirt] [PATCH RFC 0/8] Sparse streams
On Mon, 2016-02-01 at 14:50 +, Daniel P. Berrange wrote: > On Fri, Jan 29, 2016 at 02:26:51PM +0100, Michal Privoznik wrote: > > ** NOT TO BE MERGED UPSTREAM ** > > > > This is merely an RFC. > > > > What's the problem? > > We have APIs for transferring disk images from/to host. Problem is, > > disk images > > can be sparse files. Our code is, however, unaware of that fact so > > if for > > instance the disk is one big hole of 8GB all those bytes have to: > > a) be read b) > > come through our event loop. This is obviously very inefficient > > way. > > > > How to deal with it? > > The way I propose (and this is actually what I like you to comment > > on) is to > > invent set of new API. We need to make read from and write to a > > stream > > sparseness aware. The new APIs are as follows: > > > > int virStreamSendOffset(virStreamPtr stream, > > unsigned long long offset, > > const char *data, > > size_t nbytes); > > > > int virStreamRecvOffset(virStreamPtr stream, > > unsigned long long *offset, > > char *data, > > size_t nbytes); > > > > The SendOffset() is fairly simple. It is given an offset to write > > @data from so > > it basically lseek() to @offset and write() data. > > The RecvOffset() has slightly complicated design - it has to be > > aware of the > > fact that @offset it is required to read from fall into a hole. If > > that's the > > case it sets @offset to new location where data starts. > > > > Are there other ways possible? > > Sure! If you have any specific in mind I am happy to discuss it. > > For instance > > one idea I've heard (from Martin) was instead of SendOffset() and > > RecvOffset() > > we may just invent our variant of lseek(). > > > > What's left to be done? > > Basically, I still don't have RPC implementation. But before I dive > > into that I > > thought of sharing my approach with you - because it may turn out > > that a > > different approach is going to be needed and thus my work would > > render useless. > > It would be intesting to see the virsh vol-upload/download client > code updated > to use the new APIs to deal with holes, so we can see how this API > design looks > from the POV of apps using libvirt. > > > Regards, > Daniel Still an RFC. Below I've included example of updated virsh vol-download client code to handle sparse files/streams as mentioned in earlier mail. This example assumes a different take on the stream APIs than discussed previously. For this exercise, let's assume that we don't change the function profiles of the existing virStream APIs; rather their behavior will be slightly different for a "sparse stream". A client will need a way to tell libvirt that it wants/expects the sparse behavior. To that purpose let's define a couple of new API's: int virStreamSetSparse(virStreamPtr stream) requests that sparse behavior be enabled for this stream. Returns 0 on success; -1 on error. Possible errors: maybe don't support enabling sparseness after data already sent/received on stream? Note: a client could request sparseness via new flag to virStreamNew() as well or instead. Will also want a: int virStreamIsSparse(virStreamPtr stream) for use inside stream driver, unless open coding a flags test is preferred? Existing Streams [and FDStreams] APIs that behave differently for sparse streams: virStreamSend(virStreamPtr stream, const char *data, size_t nbytes) will recognize a NULL @data as indicating that @nbytes contains the size of a hole in the data source rather than an error. This will be true also for stream driver streamSend() handlers: remoteStreamSend() and virFDStreamWrite() which take the same arguments as virStreamSend(). Internally when called from remoteStreamSend() with status=VIR_NET_SKIP virNetClientStreamSendPacket() will encode the hole length nbytes as the payload of a VIR_STREAM_SKIP packet as Daniel suggested earlier. virStreamRecv(virStreamPtr stream, char *data, size_t nbytes), upon encountering a hole in the incoming data stream will return the negative value of the hole size. Similarly for the stream driver streamRecv handlers remoteStreamRecv() and virFDStreamRead(). All of these functions current return bytes received [>0], EOF [== 0], or -1 or -2 to indicate error or no data on a non-blocking stream. The minimum hole size will be defined [much?] greater than 2, so a negative return value < (0 - VIR_STREAM_MIN_HOLE_SIZE) can indicate a hole. A macro to test the return value for a hole would be convenient. Prior mail discussed how a VIR_STREAM_SKIP could be propagated up the stack as an iovec with NULL iov_base to represent a hole. virNetClientStreamRecvPacket() would the return the hole as the large negative value in this model. One limitation with this approach is that the return values are declared as 'int' which will limit the size
Re: [libvirt] [PATCH RFC 2/8] Introduce virStream{Recv,Send}Offset
On Mon, 2016-02-01 at 14:49 +, Daniel P. Berrange wrote: > On Fri, Jan 29, 2016 at 02:26:53PM +0100, Michal Privoznik wrote: > > When dealing with sparse files we need to be able to jump over > > holes as there's no added value in reading/writing them. For > > that, we need new set of send and receive APIs that will have > > @offset argument. Sending data to a stream would be easy - just > > say from which offset we are sending data. Reading is a bit > > tricky - we need read function which can detect holes and thus > > when requested to read from one it will set @offset to new value > > that contains data. > > > > Signed-off-by: Michal Privoznik> > --- > > include/libvirt/libvirt-stream.h | 8 +++ > > src/driver-stream.h | 13 + > > src/libvirt-stream.c | 113 > > +++ > > src/libvirt_public.syms | 6 +++ > > 4 files changed, 140 insertions(+) > > > > diff --git a/include/libvirt/libvirt-stream.h > > b/include/libvirt/libvirt-stream.h > > index 831640d..5a2bde3 100644 > > --- a/include/libvirt/libvirt-stream.h > > +++ b/include/libvirt/libvirt-stream.h > > @@ -40,11 +40,19 @@ int virStreamRef(virStreamPtr st); > > int virStreamSend(virStreamPtr st, > > const char *data, > > size_t nbytes); > > +int virStreamSendOffset(virStreamPtr stream, > > +unsigned long long offset, > > +const char *data, > > +size_t nbytes); > > > > int virStreamRecv(virStreamPtr st, > > char *data, > > size_t nbytes); > > > > +int virStreamRecvOffset(virStreamPtr stream, > > +unsigned long long *offset, > > +char *data, > > +size_t nbytes); > > > > /** > > * virStreamSourceFunc: > > diff --git a/src/driver-stream.h b/src/driver-stream.h > > index 85b4e3b..5419b85 100644 > > --- a/src/driver-stream.h > > +++ b/src/driver-stream.h > > @@ -31,9 +31,20 @@ typedef int > > size_t nbytes); > > > > typedef int > > +(*virDrvStreamSendOffset)(virStreamPtr st, > > + unsigned long long offset, > > + const char *data, > > + size_t nbytes); > > + > > +typedef int > > (*virDrvStreamRecv)(virStreamPtr st, > > char *data, > > size_t nbytes); > > +typedef int > > +(*virDrvStreamRecvOffset)(virStreamPtr st, > > + unsigned long long *offset, > > + char *data, > > + size_t nbytes); > > > > typedef int > > (*virDrvStreamEventAddCallback)(virStreamPtr stream, > > @@ -60,7 +71,9 @@ typedef virStreamDriver *virStreamDriverPtr; > > > > struct _virStreamDriver { > > virDrvStreamSend streamSend; > > +virDrvStreamSendOffset streamSendOffset; > > virDrvStreamRecv streamRecv; > > +virDrvStreamRecvOffset streamRecvOffset; > > virDrvStreamEventAddCallback streamEventAddCallback; > > virDrvStreamEventUpdateCallback streamEventUpdateCallback; > > virDrvStreamEventRemoveCallback streamEventRemoveCallback; > > diff --git a/src/libvirt-stream.c b/src/libvirt-stream.c > > index c16f586..1df188c 100644 > > --- a/src/libvirt-stream.c > > +++ b/src/libvirt-stream.c > > @@ -192,6 +192,58 @@ virStreamSend(virStreamPtr stream, > > > > > > /** > > + * virStreamSendOffset: > > + * @stream: pointer to the stream object > > + * @offset: > > + * @data: buffer to write to stream > > + * @nbytes: size of @data buffer > > + * > > + * Sends some data down the pipe. > > + * > > + * Returns the number of bytes written, which may be less > > + * than requested. > > + * > > + * Returns -1 upon error, at which time the stream will > > + * be marked as aborted, and the caller should now release > > + * the stream with virStreamFree. > > + * > > + * Returns -2 if the outgoing transmit buffers are full & > > + * the stream is marked as non-blocking. > > + */ > > +int > > +virStreamSendOffset(virStreamPtr stream, > > +unsigned long long offset, > > +const char *data, > > +size_t nbytes) > > +{ > > +VIR_DEBUG("stream=%p, offset=%llu, data=%p, nbytes=%zu", > > + stream, offset, data, nbytes); > > + > > +virResetLastError(); > > + > > +virCheckStreamReturn(stream, -1); > > +virCheckNonNullArgGoto(data, error); > > + > > +if (stream->driver && > > +stream->driver->streamSendOffset) { > > +int ret; > > +ret = (stream->driver->streamSendOffset)(stream, offset, > > data, nbytes); > > +if (ret == -2) > > +return -2; > > +if (ret < 0) > > +goto error; > > +return ret; > > +} > > + > > +virReportUnsupportedError(); > > + > > + error: > > +
Re: [libvirt] [RFCv2 PoCv1 PATCH 00/15] Just Admin API
On Fri, 2015-04-17 at 11:30 +0100, Daniel P. Berrange wrote: On Thu, Apr 16, 2015 at 04:46:10PM +0200, Martin Kletzander wrote: *** BLURB *** ... Just kidding :) Sooo... After very very VERY long time, here's a draft of an admin interface that is supposed to open up new possibilities to be done on a live daemon. The aim here is to create some first inches of that API in order to open up the possibility of new API function creation to more people. I already spent so much time explaining so much of that that I don't know what else to point at in here. Maybe the fact that last three patches are just an example on how this might work. Of course there won't be any functions like listClientIDs with the need of getting each client info by another API. There's going to be e.g. virAdmClientInfo and virAdmGetClients() will return the list of them, etc. Long story short, let's not repeat past mistakes ;) With all that said, feel free to hate, love, comment or just try out compiling this series and let me know how many things I've missed and screwed up (hopefully zero). Broadly speaking I think this all looks like pretty good approach to me. We should probably think about exactly which API we want to target for inclusion in the first release. Perhaps the ability to change the logging filters outputs would be the most useful one, as its something people often want to have in the production deployments where restarting libvirtd is not an option. How about reloading the server TLS certificates? virNetTLSContextNew() contains the comment: /* Generate Diffie Hellman parameters - for use with DHE * kx algorithms. These should be discarded and regenerated * once a day, once a week or once a month. Depending on the * security requirements. */ If I understand correctly, currently one must restart libvirtd to pickup new certificates? I wondered whether Martin's patch 4/15 -- multiple NetSubServers -- would allow introduction of a new cert on a new SubServer w/o restarting libvirtd? E.g., to allow long running migration, blockcopy or other jobs to complete on existing connections before destroying them. If that would be possible, I think would would also be useful for early inclusion for people considering ephemeral certificates. Regards, Lee Regards, Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC 8/8] add qemu cache mutex
On Mon, 2012-03-12 at 10:16 -0400, Lee Schermerhorn wrote: On Mon, 2012-03-12 at 14:00 +0100, Michal Privoznik wrote: On 11.03.2012 19:56, Lee Schermerhorn wrote: Add a mutex for access to the qemu emulator cache. Not clear that this is actually needed -- driver should be locked across calls [?]. The patch can be dropped if not needed. --- src/qemu/qemu_capabilities.c | 18 +- src/qemu/qemu_capabilities.h |2 ++ src/qemu/qemu_driver.c |3 +++ 3 files changed, 22 insertions(+), 1 deletion(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -27,6 +27,7 @@ #include memory.h #include logging.h #include virterror_internal.h +#include threads.h #include util.h #include virfile.h #include nodeinfo.h @@ -180,6 +181,11 @@ enum qemuCapsProbes { QEMU_PROBE_SIZE }; +/* + * Use static initializer for tests + */ +static virMutex qemuEmulatorCacheMutex = { PTHREAD_MUTEX_INITIALIZER }; This is not allowed in our code as we build with win32 threads which initialize mutexes completely different. Why do you want to initialize it here anyway ... Thanks. I didn't know that. As the comment says, I added it for the internal tests. It appeared to me that they don't go through the driver init code where I inserted the call to to the init function below. The tests were hanging at the first attempt to acquire the mutex. It could have been a defect in my patches at that time [that may still be there?]. When I inserted the static initializer, the tests passed. That was back on the 0.8.8 ubuntu natty code base. I'll pull it out and see if they pass w/o it, now that the tests all seem to pass otherwise. They certainly pass w/o this patch applied, but they're all single threaded, right? Update: 'make check' tests [qemuxml2argv] succeed with static initializer removed. Previous failures were apparently either an artifact of the 0.8.8-1ubuntu6.x code base or a defect elsewhere in my patches at the time. I was still debugging then, trying to get tests to pass. Question whether the mutex is needed still stands, but since the driver lock is RW, it probably is. cache mutex could probably be made RW as well, and only taken for W when filling/refreshing the cache. I'm running more tests and will update the series when/if I hear that it's worth doing so. Regards, Lee Bigger question is: is the mutex actually needed at all? I.e., can I assume that the driver is always locked -- in practice, not necessarily for the tests -- when the probes are called? Lee + typedef struct _qemuEmulatorCache qemuEmulatorCache; typedef qemuEmulatorCache* qemuEmulatorCachePtr; struct _qemuEmulatorCache { @@ -206,9 +212,17 @@ qemuEmulatorCachedInfoGet(enum qemuCapsP const char *binary, const char *arch); +int +qemuCapsCacheInit(void) +{ +return virMutexInit(qemuEmulatorCacheMutex); +} + if you have created this function? static void qemuEmulatorCachedInfoRelease(qemuEmulatorCachePtr emulator ATTRIBUTE_UNUSED) -{ } +{ +virMutexUnlock(qemuEmulatorCacheMutex); +} -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC 8/8] add qemu cache mutex
On Mon, 2012-03-12 at 14:00 +0100, Michal Privoznik wrote: On 11.03.2012 19:56, Lee Schermerhorn wrote: Add a mutex for access to the qemu emulator cache. Not clear that this is actually needed -- driver should be locked across calls [?]. The patch can be dropped if not needed. --- src/qemu/qemu_capabilities.c | 18 +- src/qemu/qemu_capabilities.h |2 ++ src/qemu/qemu_driver.c |3 +++ 3 files changed, 22 insertions(+), 1 deletion(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -27,6 +27,7 @@ #include memory.h #include logging.h #include virterror_internal.h +#include threads.h #include util.h #include virfile.h #include nodeinfo.h @@ -180,6 +181,11 @@ enum qemuCapsProbes { QEMU_PROBE_SIZE }; +/* + * Use static initializer for tests + */ +static virMutex qemuEmulatorCacheMutex = { PTHREAD_MUTEX_INITIALIZER }; This is not allowed in our code as we build with win32 threads which initialize mutexes completely different. Why do you want to initialize it here anyway ... Thanks. I didn't know that. As the comment says, I added it for the internal tests. It appeared to me that they don't go through the driver init code where I inserted the call to to the init function below. The tests were hanging at the first attempt to acquire the mutex. It could have been a defect in my patches at that time [that may still be there?]. When I inserted the static initializer, the tests passed. That was back on the 0.8.8 ubuntu natty code base. I'll pull it out and see if they pass w/o it, now that the tests all seem to pass otherwise. They certainly pass w/o this patch applied, but they're all single threaded, right? Bigger question is: is the mutex actually needed at all? I.e., can I assume that the driver is always locked -- in practice, not necessarily for the tests -- when the probes are called? Lee + typedef struct _qemuEmulatorCache qemuEmulatorCache; typedef qemuEmulatorCache* qemuEmulatorCachePtr; struct _qemuEmulatorCache { @@ -206,9 +212,17 @@ qemuEmulatorCachedInfoGet(enum qemuCapsP const char *binary, const char *arch); +int +qemuCapsCacheInit(void) +{ +return virMutexInit(qemuEmulatorCacheMutex); +} + if you have created this function? static void qemuEmulatorCachedInfoRelease(qemuEmulatorCachePtr emulator ATTRIBUTE_UNUSED) -{ } +{ +virMutexUnlock(qemuEmulatorCacheMutex); +} -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 1/8] add virCapabilitiesDupMachines
Add support to duplicate a virCapsGuestMachine object -- e.g., from cached emulator information. --- src/conf/capabilities.c | 34 ++ src/conf/capabilities.h |3 +++ src/libvirt_private.syms |1 + 3 files changed, 38 insertions(+) Index: libvirt-0.9.10/src/conf/capabilities.h === --- libvirt-0.9.10.orig/src/conf/capabilities.h +++ libvirt-0.9.10/src/conf/capabilities.h @@ -208,6 +208,9 @@ virCapabilitiesSetHostCPU(virCapsPtr cap extern virCapsGuestMachinePtr * virCapabilitiesAllocMachines(const char *const *names, int nnames); +extern virCapsGuestMachinePtr * +virCapabilitiesDupMachines(const virCapsGuestMachinePtr *smachines, + int nmachines); extern void virCapabilitiesFreeMachines(virCapsGuestMachinePtr *machines, int nmachines); Index: libvirt-0.9.10/src/conf/capabilities.c === --- libvirt-0.9.10.orig/src/conf/capabilities.c +++ libvirt-0.9.10/src/conf/capabilities.c @@ -325,6 +325,40 @@ virCapabilitiesAllocMachines(const char } /** + * virCapabilitiesDupMachines: + * @smachines: table of virCapsGuestMachinePtr + * @nmachines: number of machine variants in table + * + * Allocate a table of virCapsGuestMachinePtr from the supplied table + * of virCapsGuestMachinePtr + */ +virCapsGuestMachinePtr * +virCapabilitiesDupMachines(const virCapsGuestMachinePtr *smachines, int nmachines) +{ +virCapsGuestMachinePtr *machines; +int i; + +if (VIR_ALLOC_N(machines, nmachines) 0) +return NULL; + +for (i = 0; i nmachines; i++) { +if (VIR_ALLOC(machines[i]) 0 || +!(machines[i]-name = strdup(smachines[i]-name))) +goto error; +if (smachines[i]-canonical + !(machines[i]-canonical = strdup(smachines[i]-canonical))) + goto error; +} + +return machines; + +error: +virCapabilitiesFreeMachines(machines, nmachines); +return NULL; + +} + +/** * virCapabilitiesFreeMachines: * @machines: table of vircapsGuestMachinePtr * Index: libvirt-0.9.10/src/libvirt_private.syms === --- libvirt-0.9.10.orig/src/libvirt_private.syms +++ libvirt-0.9.10/src/libvirt_private.syms @@ -49,6 +49,7 @@ virCapabilitiesAllocMachines; virCapabilitiesDefaultGuestArch; virCapabilitiesDefaultGuestEmulator; virCapabilitiesDefaultGuestMachine; +virCapabilitiesDupMachines; virCapabilitiesFormatXML; virCapabilitiesFree; virCapabilitiesFreeMachines; -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 2/8] qemu: add VirBitmapDup
Add function to duplicate a virBitmap for returning cached qemuCaps[Flags] from qemuCapsExtractVersionInfo TODO: could factor the size calculation into a macro or inline function as it's used by both Dup and Alloc. --- src/libvirt_private.syms |1 + src/util/bitmap.c| 24 src/util/bitmap.h|6 ++ 3 files changed, 31 insertions(+) Index: libvirt-0.9.10/src/util/bitmap.c === --- libvirt-0.9.10.orig/src/util/bitmap.c +++ libvirt-0.9.10/src/util/bitmap.c @@ -80,6 +80,30 @@ virBitmapPtr virBitmapAlloc(size_t size) } /** + * virBitmapDup: + * @sbm: source bitmap to duplicate + * + * Duplicate the source bitmap. + * + * Returns a pointer to the duplicated bitmap or NULL if + * memory cannot be allocated. + */ +virBitmapPtr virBitmapDup(virBitmapPtr sbm) +{ +virBitmapPtr bitmap; + +bitmap = virBitmapAlloc(sbm-size); +if (bitmap) { +size_t sz = (sbm-size + VIR_BITMAP_BITS_PER_UNIT - 1) / + VIR_BITMAP_BITS_PER_UNIT; +int i; +for (i = 0; i sz; ++i) +bitmap-map[i] = sbm-map[i]; +} +return bitmap; +} + +/** * virBitmapFree: * @bitmap: previously allocated bitmap * Index: libvirt-0.9.10/src/util/bitmap.h === --- libvirt-0.9.10.orig/src/util/bitmap.h +++ libvirt-0.9.10/src/util/bitmap.h @@ -37,6 +37,12 @@ typedef virBitmap *virBitmapPtr; virBitmapPtr virBitmapAlloc(size_t size) ATTRIBUTE_RETURN_CHECK; /* + * Duplicate the specified virBitmap + */ +virBitmapPtr virBitmapDup(virBitmapPtr src) +ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + +/* * Free previously allocated bitmap */ void virBitmapFree(virBitmapPtr bitmap); Index: libvirt-0.9.10/src/libvirt_private.syms === --- libvirt-0.9.10.orig/src/libvirt_private.syms +++ libvirt-0.9.10/src/libvirt_private.syms @@ -13,6 +13,7 @@ virRequestUsername; # bitmap.h virBitmapAlloc; virBitmapClearBit; +virBitmapDup; virBitmapFree; virBitmapGetBit; virBitmapSetBit; -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 0/8] qemu: Cache results of parsing qemu help output
Stracing libvirtd shows that the qemu driver is executing 2 different qemu binaries 3 times each to fetch the version, capabilities [supported devices], and cpu models each time VM state is queried. E.g., [lines wrapped]: 6471 17:15:26.561890 execve(/usr/bin/qemu, [/usr/bin/qemu, -cpu, ?], [/* 2 vars */]) = 0 6472 17:15:26.626668 execve(/usr/bin/qemu, [/usr/bin/qemu, -help], [/* 2 vars */]) = 0 6473 17:15:26.698104 execve(/usr/bin/qemu, [/usr/bin/qemu, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 6484 17:15:27.267770 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -cpu, ?], /* 2 vars */]) = 0 6492 17:15:27.333177 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -help], [/* 2 vars */]) = 0 6496 17:15:27.402280 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 ~1sec per libvirt api call. Not a killer, but on a heavily loaded host -- several 10s of VMs -- a periodic query of all VM state, such as from a cloud compute manager, can take a couple of minutes to complete. Because the qemu binaries on the host do not change all that often, the results of parsing the qemu help output from the exec's above can be cached. The qemu driver already does some caching of capabilities, but it does not prevent the execs above. This series is a work in progress. I'm submitting it as an RFC because I saw Eric mention the frequent execing of qemu binaries and I have been working on this to eliminate the overhead shown above. The series caches the parse results of: + qemuCapsExtractVersionInfo + qemuCapsProbeMachineTypes + qemuCapsProbeCPUModels by splitting these functions into two parts. The existing function name fetches the cached parse results for the specified binary and returns them. The other half, named qemuCapsCacheX, where X is one of ExtractVersionInfo, ProbeMachineTypes, and ProbeCPUModels, exec's the emulator binary and caches the results. The act of fetching the cached results will fill or refresh the cache as necessary in a new function qemuCapsCachedInfoGet(). A few auxilliary function have been added -- e.g., virCapabilitiesDupMachines() to duplicate a cached list of machine types and virBitmapDup() to duplicate cached capabilities flags. The series does not attempt to integrate with nor remove the existing capabilities caching. TBD. The series was developed and tested in the context of the Ubuntu 11.04 natty libvirt_0.8.8-1ubuntu6.7 package using quilt to manage patches in the debian/patches directory. In that context, it builds, passes all make check tests [under pbuilder] and some fairly heavy, overlapping VM launch tests where it does eliminate all but a few initial exec's of the various qemu* and kvm binaries. The version here, rebased to libvirt-0.9.10, builds cleanly under mock on Fedora 16 in the context of a modified libvirt-0.9.10-1.fc16 source package. I.e., no errors and warning-for-warning compatible with build of the libvirt-0.9.10 fc16 srpm downloaded from libvirt.org. I placed the modified spec file [applies the patches] and the build logs at: http://downloads.linux.hp.com/~lts/Libvirt/ I have installed the patched libvirt on a fedora 16 system and successfully defined and launched a vm. Testing in progress. I'll place an annotated test log ont the site above when complete. I also need to rebase atop the current mainline sources, but I wanted to get this series out for review to see if the overall approach would be acceptable. Comments? -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 4/8] qemu: hook up qemuCapsExtractVersionInfo
Hook up qemuCapsExtractVersionInfo capabilities api to the emulator cache framework by splitting into two parts: - qemuCapsExtractVersionInfo() looks up emulator in cache and returns the cached version and flags. Cache look up may fill or refresh the cache. - wrap the part of the original qemuCapsExtractVersionInfo() with qemuCapsCacheVersionInfo() to run the specified binary and update the cached information for this binary. --- src/qemu/qemu_capabilities.c | 51 ++- 1 file changed, 32 insertions(+), 19 deletions(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -1498,16 +1498,37 @@ int qemuCapsExtractVersionInfo(const cha unsigned int *retversion, virBitmapPtr *retflags) { +qemuEmulatorCachePtr emulator; +int ret = -1; + +emulator = qemuEmulatorCachedInfoGet(QEMU_PROBE_VERSION_INFO, qemu, arch); +if (emulator) { +if (retflags) +*retflags = virBitmapDup(emulator-caps); +if (retversion) + *retversion = emulator-version; +ret = 0; +} else { +if (retflags) +*retflags = 0; +if (retversion) +*retversion = 0; +} + +qemuEmulatorCachedInfoRelease(emulator); +return ret; +} + +static int +qemuCapsCacheVersionInfo(qemuEmulatorCachePtr emulator) +{ int ret = -1; unsigned int version, is_kvm, kvm_version; virBitmapPtr flags = NULL; char *help = NULL; +char *qemu = emulator-path, *arch = emulator-arch; virCommandPtr cmd; - -if (retflags) -*retflags = NULL; -if (retversion) -*retversion = 0; +VIR_DEBUG(Caching Version Info for %s - %s, qemu, arch ?: no-arch); /* Make sure the binary we are about to try exec'ing exists. * Technically we could catch the exec() failure, but that's @@ -1532,8 +1553,8 @@ int qemuCapsExtractVersionInfo(const cha goto cleanup; /* Currently only x86_64 and i686 support PCI-multibus. */ -if (STREQLEN(arch, x86_64, 6) || -STREQLEN(arch, i686, 4)) { +if (arch (STREQLEN(arch, x86_64, 6) || + STREQLEN(arch, i686, 4))) { qemuCapsSet(flags, QEMU_CAPS_PCI_MULTIBUS); } @@ -1544,12 +1565,10 @@ int qemuCapsExtractVersionInfo(const cha qemuCapsExtractDeviceStr(qemu, flags) 0) goto cleanup; -if (retversion) -*retversion = version; -if (retflags) { -*retflags = flags; -flags = NULL; -} +emulator-version = version; +qemuCapsFree(emulator-caps);/* for possible refresh */ +emulator-caps = flags; +flags = NULL; ret = 0; @@ -1561,12 +1580,6 @@ cleanup: return ret; } -static int -qemuCapsCacheVersionInfo(qemuEmulatorCachePtr emulator) -{ - return emulator ? 0 : 1; -} - static void uname_normalize (struct utsname *ut) { -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 3/8] qemu: add qemu emulator cache framework
Define a qemu emulator cache structure and function to lookup, create, refresh emulator cache objects. The cache tags are the paths to the emulator binaries. E.g., /usr/bin/qemu Subsequent patches will hook up the various extract/probe info functions to consult the cache. Notes/questions: 1) qemuCapsProbes converted to bitmap along with capabilities flags as part of rebase. Overkill? 2) Is it OK for the root of the cache and the nEmulators to be statically defined in qemu_capabilities.c as opposed to a field in the driver struct? It is private to that source file and I don't see an easy wait to get a handle on the driver struct therein. --- src/qemu/qemu_capabilities.c | 166 +++ 1 file changed, 166 insertions(+) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -170,6 +170,46 @@ struct qemu_arch_info { int nflags; }; +/* + * * Flags to record which probes have been cached + * */ +enum qemuCapsProbes { +QEMU_PROBE_VERSION_INFO = 0, +QEMU_PROBE_MACHINE_TYPES = 1, +QEMU_PROBE_CPU_MODELS = 2, +QEMU_PROBE_SIZE +}; + +typedef struct _qemuEmulatorCache qemuEmulatorCache; +typedef qemuEmulatorCache* qemuEmulatorCachePtr; +struct _qemuEmulatorCache { +char*path; +char*arch; +time_t mtime; +virBitmapPtrcachedProbes; + +unsigned intversion; +virBitmapPtrcaps; + +virCapsGuestMachinePtr *machines; +int nmachines; + +char**cpus; +unsigned intncpus; +}; + +static qemuEmulatorCachePtr *emulatorCache = NULL; +static int nEmulators; + +static qemuEmulatorCachePtr +qemuEmulatorCachedInfoGet(enum qemuCapsProbes, + const char *binary, + const char *arch); + +static void +qemuEmulatorCachedInfoRelease(qemuEmulatorCachePtr emulator ATTRIBUTE_UNUSED) +{ } + /* Feature flags for the architecture info */ static const struct qemu_feature_flags const arch_info_i686_flags [] = { { pae, 1, 0 }, @@ -319,6 +359,12 @@ cleanup: } static int +qemuCapsCacheMachineTypes(qemuEmulatorCachePtr emulator) +{ + return emulator ? 0 : 1; +} + +static int qemuCapsGetOldMachinesFromInfo(virCapsGuestDomainInfoPtr info, const char *emulator, time_t emulator_mtime, @@ -612,6 +658,11 @@ cleanup: return ret; } +static int +qemuCapsCacheCPUModels(qemuEmulatorCachePtr emulator) +{ + return emulator ? 0 : 1; +} static int qemuCapsInitGuest(virCapsPtr caps, @@ -1510,6 +1561,12 @@ cleanup: return ret; } +static int +qemuCapsCacheVersionInfo(qemuEmulatorCachePtr emulator) +{ + return emulator ? 0 : 1; +} + static void uname_normalize (struct utsname *ut) { @@ -1610,3 +1667,112 @@ qemuCapsGet(virBitmapPtr caps, else return b; } + +static qemuEmulatorCachePtr +qemuEmulatorCachedInfoGet(enum qemuCapsProbes probe, + const char *binary, + const char *arch) +{ +qemuEmulatorCachePtr emulator = NULL; +struct stat st; +bool alreadyCached; +int i; + +if (stat(binary, st) != 0) { +char ebuf[1024]; +VIR_INFO(Failed to stat emulator %s : %s, + binary, virStrerror(errno, ebuf, sizeof(ebuf))); +goto error; +} + +for (i = 0; i nEmulators; ++i) { +emulator = emulatorCache[i]; + +if (!STREQ(binary, emulator-path)) +continue; + +if (arch !emulator-arch) { +if (!(emulator-arch = strdup(arch))) +goto no_memory; + /* +* We have an 'arch' now, where we didn't before. +* So, even if we've already cached this probe, +* refresh the cache with the specified arch. +*/ +break; +} + +if (st.st_mtime != emulator-mtime) +break; /* binary changed, refresh cache */ + +if (virBitmapGetBit(emulator-cachedProbes, probe, alreadyCached) 0) { +VIR_ERROR(_(Unrecognized probe id '%d'), probe); +goto error; +} +if (!alreadyCached) +break; /* do it now */ + +return emulator; +} + +if (i == nEmulators) { + if (VIR_REALLOC_N(emulatorCache, nEmulators + 1) 0) + goto no_memory; + if (VIR_ALLOC(emulator) 0) + goto no_memory; + if (!(emulator-path = strdup(binary))) + goto no_memory_free_emulator; + if (arch !(emulator-arch = strdup(arch))) +goto no_memory_free_emulator; + if (!(emulator-cachedProbes = virBitmapAlloc(QEMU_PROBE_SIZE)))
[libvirt] [PATCH RFC 5/8] qemu: hook up qemuCapsProbeMachineTypes
Hook up qemuCapsProbeMachineTypes capabilities api to the emulator cache framework: - qemuCapsProbeMachineTypes() looks up emulator in cache and returns the version and flags. - wrap the part of the original qemuCapsProbeMachineTypes() with qemuCapsCacheMachineTypes() to run the specified binary and update the cached machine types supported by this binary. --- src/qemu/qemu_capabilities.c | 39 --- 1 file changed, 32 insertions(+), 7 deletions(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -307,6 +307,11 @@ qemuCapsParseMachineTypesStr(const char } } while ((p = next)); +/* + * Free in case of possible cache refresh + */ +virCapabilitiesFreeMachines(*machines, *nmachines); + *machines = list; *nmachines = nitems; @@ -323,10 +328,36 @@ qemuCapsProbeMachineTypes(const char *bi virCapsGuestMachinePtr **machines, int *nmachines) { +qemuEmulatorCachePtr emulator; +int ret = -1; + +emulator = qemuEmulatorCachedInfoGet(QEMU_PROBE_MACHINE_TYPES, binary, NULL); +if (emulator) { +ret = 0; +if (machines) +*machines = virCapabilitiesDupMachines(emulator-machines, emulator-nmachines); +if (nmachines) +*nmachines = emulator-nmachines; +} else { +if (machines) +*machines = NULL; +if (nmachines) +*nmachines = 0; +} + +qemuEmulatorCachedInfoRelease(emulator); +return ret; +} + +static int +qemuCapsCacheMachineTypes(qemuEmulatorCachePtr emulator) +{ +char *binary = emulator-path; char *output; int ret = -1; virCommandPtr cmd; int status; +VIR_DEBUG(Caching Machine Types for %s, binary); /* Make sure the binary we are about to try exec'ing exists. * Technically we could catch the exec() failure, but that's @@ -346,7 +377,7 @@ qemuCapsProbeMachineTypes(const char *bi if (virCommandRun(cmd, status) 0) goto cleanup; -if (qemuCapsParseMachineTypesStr(output, machines, nmachines) 0) +if (qemuCapsParseMachineTypesStr(output, emulator-machines, emulator-nmachines) 0) goto cleanup; ret = 0; @@ -359,12 +390,6 @@ cleanup: } static int -qemuCapsCacheMachineTypes(qemuEmulatorCachePtr emulator) -{ - return emulator ? 0 : 1; -} - -static int qemuCapsGetOldMachinesFromInfo(virCapsGuestDomainInfoPtr info, const char *emulator, time_t emulator_mtime, -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] Restart: [PATCH RFC 0/8] qemu: Cache results of parsing qemu help output
Send of series failed at patch 6. Attempting to restart here rather than resend entire series. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 7/8] use qemuCapsFreeCPUModels in qemuBuildCpuArgString
Use qemuCapsFreeCPUModels() in qemuBuildCpuArgStr(). Because it's there. --- src/qemu/qemu_command.c |6 +- 1 file changed, 1 insertion(+), 5 deletions(-) Index: libvirt-0.9.10/src/qemu/qemu_command.c === --- libvirt-0.9.10.orig/src/qemu/qemu_command.c +++ libvirt-0.9.10/src/qemu/qemu_command.c @@ -3651,11 +3651,7 @@ cleanup: virCPUDefFree(guest); virCPUDefFree(cpu); -if (cpus) { -for (i = 0; i ncpus; i++) -VIR_FREE(cpus[i]); -VIR_FREE(cpus); -} +qemuCapsFreeCPUModels(ncpus, cpus); return ret; -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 6/8] qemu: hook up qemuCapsProbeCPUModels
Hook up qemuCapsProbeCPUModels capabilities api to the emulator cache framework: - qemuCapsProbeCPUModels() looks up emulator in cache and returns the version and flags. - wrap the part of the original qemuCapsProbeCPUModels() with qemuCapsCacheCPUModels() to run the specified binary and update the cached CPU models supported by this binary. --- src/qemu/qemu_capabilities.c | 123 ++- src/qemu/qemu_capabilities.h |3 + 2 files changed, 91 insertions(+), 35 deletions(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -194,7 +194,7 @@ struct _qemuEmulatorCache { virCapsGuestMachinePtr *machines; int nmachines; -char**cpus; +const char **cpus; unsigned intncpus; }; @@ -480,6 +480,21 @@ qemuCapsGetOldMachines(const char *ostyp return 0; } +/* + * Free list of cpus returned by qemuCapsParseCPUModels() + */ +void +qemuCapsFreeCPUModels(unsigned int *ncpus, const char ***cpus) +{ +int i; + +if (*cpus) +return; +for (i = 0; i *ncpus; i++) +VIR_FREE(*cpus[i]); +VIR_FREE(*cpus); +*ncpus = 0; +} typedef int (*qemuCapsParseCPUModels)(const char *output, @@ -500,7 +515,6 @@ qemuCapsParseX86Models(const char *outpu const char *next; unsigned int count = 0; const char **cpus = NULL; -int i; do { const char *t; @@ -547,6 +561,11 @@ qemuCapsParseX86Models(const char *outpu count++; } while ((p = next)); +/* + * Free any cached cpu models in case of possible cache refresh + */ +qemuCapsFreeCPUModels(retcount, retcpus); + if (retcount) *retcount = count; if (retcpus) @@ -555,11 +574,7 @@ qemuCapsParseX86Models(const char *outpu return 0; error: -if (cpus) { -for (i = 0; i count; i++) -VIR_FREE(cpus[i]); -} -VIR_FREE(cpus); +qemuCapsFreeCPUModels(count, cpus); return -1; } @@ -576,7 +591,7 @@ qemuCapsParsePPCModels(const char *outpu const char *next; unsigned int count = 0; const char **cpus = NULL; -int i, ret = -1; +int ret = -1; do { const char *t; @@ -618,6 +633,11 @@ qemuCapsParsePPCModels(const char *outpu count++; } while ((p = next)); +/* + * Free any cached cpu models in case of possible cache refresh + */ +qemuCapsFreeCPUModels(retcount, retcpus); + if (retcount) *retcount = count; if (retcpus) { @@ -627,42 +647,81 @@ qemuCapsParsePPCModels(const char *outpu ret = 0; cleanup: -if (cpus) { -for (i = 0; i count; i++) -VIR_FREE(cpus[i]); -VIR_FREE(cpus); -} +qemuCapsFreeCPUModels(count, cpus); return ret; } int qemuCapsProbeCPUModels(const char *qemu, - virBitmapPtr qemuCaps, + virBitmapPtr qemuCaps ATTRIBUTE_UNUSED, const char *arch, unsigned int *count, - const char ***cpus) + const char ***retcpus) { -char *output = NULL; -int ret = -1; -qemuCapsParseCPUModels parse; -virCommandPtr cmd; +qemuEmulatorCachePtr emulator; +const char **cpus = NULL; +int i, ret = -1; +int ncpus = 0; if (count) *count = 0; -if (cpus) -*cpus = NULL; +if (retcpus) +*retcpus = NULL; -if (STREQ(arch, i686) || STREQ(arch, x86_64)) -parse = qemuCapsParseX86Models; -else if (STREQ(arch, ppc64)) -parse = qemuCapsParsePPCModels; -else { -VIR_DEBUG(don't know how to parse %s CPU models, arch); +emulator = qemuEmulatorCachedInfoGet(QEMU_PROBE_CPU_MODELS, qemu, arch); +if (emulator) { +if (retcpus) { +ncpus = emulator-ncpus; +if (VIR_ALLOC_N(cpus, ncpus) 0) +goto no_memory; +for (i = 0; i ncpus; ++i) { +if (!(cpus[i] = strdup(emulator-cpus[i]))) +goto no_memory; +} +*retcpus = cpus; +} +if (count) +*count = emulator-ncpus; +ret = 0; +} +goto release; + +no_memory: +if (cpus) { +for (i = 0; i ncpus; i++) +VIR_FREE(cpus[i]); +VIR_FREE(cpus); +} + +release: +qemuEmulatorCachedInfoRelease(emulator); +return ret; +} + +static int +qemuCapsCacheCPUModels(qemuEmulatorCachePtr emulator) +{ +char *output = NULL; +char *arch = emulator-arch, *qemu = emulator-path; +int ret = -1; +qemuCapsParseCPUModels parse = NULL; +virCommandPtr cmd; +VIR_DEBUG(Caching CPU Models for %s - %s, qemu, arch ?: no-arch); + +if (arch) { +
[libvirt] Restart2: [PATCH RFC 0/8] qemu: Cache results of parsing qemu help output
Restart of send failed at patch 8. Not sure that one is needed, but try, try again. FYI: failure is: smtplib.SMTPSenderRefused: (452, '4.3.1 Insufficient system storage', ... -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] [PATCH RFC 8/8] add qemu cache mutex
Add a mutex for access to the qemu emulator cache. Not clear that this is actually needed -- driver should be locked across calls [?]. The patch can be dropped if not needed. --- src/qemu/qemu_capabilities.c | 18 +- src/qemu/qemu_capabilities.h |2 ++ src/qemu/qemu_driver.c |3 +++ 3 files changed, 22 insertions(+), 1 deletion(-) Index: libvirt-0.9.10/src/qemu/qemu_capabilities.c === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.c +++ libvirt-0.9.10/src/qemu/qemu_capabilities.c @@ -27,6 +27,7 @@ #include memory.h #include logging.h #include virterror_internal.h +#include threads.h #include util.h #include virfile.h #include nodeinfo.h @@ -180,6 +181,11 @@ enum qemuCapsProbes { QEMU_PROBE_SIZE }; +/* + * Use static initializer for tests + */ +static virMutex qemuEmulatorCacheMutex = { PTHREAD_MUTEX_INITIALIZER }; + typedef struct _qemuEmulatorCache qemuEmulatorCache; typedef qemuEmulatorCache* qemuEmulatorCachePtr; struct _qemuEmulatorCache { @@ -206,9 +212,17 @@ qemuEmulatorCachedInfoGet(enum qemuCapsP const char *binary, const char *arch); +int +qemuCapsCacheInit(void) +{ +return virMutexInit(qemuEmulatorCacheMutex); +} + static void qemuEmulatorCachedInfoRelease(qemuEmulatorCachePtr emulator ATTRIBUTE_UNUSED) -{ } +{ +virMutexUnlock(qemuEmulatorCacheMutex); +} /* Feature flags for the architecture info */ static const struct qemu_feature_flags const arch_info_i686_flags [] = { @@ -1769,6 +1783,8 @@ qemuEmulatorCachedInfoGet(enum qemuCapsP bool alreadyCached; int i; +virMutexLock(qemuEmulatorCacheMutex); + if (stat(binary, st) != 0) { char ebuf[1024]; VIR_INFO(Failed to stat emulator %s : %s, Index: libvirt-0.9.10/src/qemu/qemu_driver.c === --- libvirt-0.9.10.orig/src/qemu/qemu_driver.c +++ libvirt-0.9.10/src/qemu/qemu_driver.c @@ -585,6 +585,9 @@ qemudStartup(int privileged) { if (qemuSecurityInit(qemu_driver) 0) goto error; +if (qemuCapsCacheInit() 0) +goto error; + if ((qemu_driver-caps = qemuCreateCapabilities(NULL, qemu_driver)) == NULL) goto error; Index: libvirt-0.9.10/src/qemu/qemu_capabilities.h === --- libvirt-0.9.10.orig/src/qemu/qemu_capabilities.h +++ libvirt-0.9.10/src/qemu/qemu_capabilities.h @@ -139,6 +139,8 @@ void qemuCapsClear(virBitmapPtr caps, bool qemuCapsGet(virBitmapPtr caps, enum qemuCapsFlags flag); +int qemuCapsCacheInit(void); + virCapsPtr qemuCapsInit(virCapsPtr old_caps); int qemuCapsProbeMachineTypes(const char *binary, -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [Qemu-devel] Qemu, libvirt, and CPU models
On Wed, 2012-03-07 at 16:07 -0700, Eric Blake wrote: On 03/07/2012 03:26 PM, Eduardo Habkost wrote: Thanks a lot for the explanations, Daniel. Comments about specific items inline. - How can we make sure there is no confusion between libvirt and Qemu about the CPU models? For example, what if cpu_map.xml says model 'Moo' has the flag 'foo' enabled, but Qemu disagrees? How do we guarantee that libvirt gets exactly what it expects from Qemu when it asks for a CPU model? We have -cpu ?dump today, but it's not the better interface we could have. Do you miss something in special in the Qemu-libvirt interface, to help on that? So, it looks like either I am missing something on my tests or libvirt is _not_ probing the Qemu CPU model definitions to make sure libvirt gets all the features it expects. Also, I would like to ask if you have suggestions to implement the equivalent of -cpu ?dump in a more friendly and extensible way. Would a QMP command be a good alternative? Would a command-line option with json output be good enough? I'm not sure where we are are using -cpu ?dump, but it sounds like we should be. (Do we have any case of capability-querying being made using QMP before starting any actual VM, today?) Right now, we have two levels of queries - the 'qemu -help' and 'qemu -device ?' output is gathered up front (we really need to patch things to cache that, rather than repeating it for every VM start). Eric: In addition to VM start, it appears that the libvirt qemu driver also runs both the 32-bit and 64-bit qemu binaries 3 times each when fetching capabilities that appears to occur when fetching VM state. Noticed this on an openstack/nova compute node that queries vm state periodically. Seemed to be taking a long time. stracing libvirtd during these queries showed this sequence for each query: 6461 17:15:25.269464 execve(/usr/bin/qemu, [/usr/bin/qemu, -cpu, ?], [/* 2 vars */]) = 0 6462 17:15:25.335300 execve(/usr/bin/qemu, [/usr/bin/qemu, -help], [/* 2 vars */]) = 0 6463 17:15:25.393786 execve(/usr/bin/qemu, [/usr/bin/qemu, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 6466 17:15:25.841086 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -cpu, ?], [/* 2 vars */]) = 0 6468 17:15:25.906746 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -help], [/* 2 vars */]) = 0 6469 17:15:25.980520 execve(/usr/bin/qemu-system-x86_64, [/usr/bin/qemu-system-x86_64, -device, ?, -device, pci-assign,?, -device, virtio-blk-pci,?], [/* 2 vars */]) = 0 Seems to add about a second per VM running on the host. The periodic scan thus takes a couple of minutes on a heavily loaded host -- several 10s of VMs. Not a killer, but we'd like to eliminate it. I see that libvirt does some level of caching of capabilities, checking the st_mtime of the binaries to detect changes. I haven't figured out when that caching comes into effect, but it doesn't prevent the execs above. So, I created a patch series that caches the results of parsing the output of these calls that I will post shortly for RFC. It eliminates most of such execs. I think it might obviate the existing capabilities caching, but I'm not sure. Haven't had time to look into it. Later, Lee Schermerhorn HPCS Then we start qemu with -S, query QMP, all before starting the guest (qemu -S is in fact necessary for setting some options that cannot be set in the current CLI but can be set via the monitor) - but right now that is the only point where we query QMP capabilities. snip -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH RFC v3 0/6] support cpu bandwidth in libvirt
On Mon, 2011-07-18 at 17:34 +0800, Wen Congyang wrote: TODO: 1. We create sub directory for each vcpu in cpu subsystem. So we should recalculate cpu.shares for each vcpu. Is the per vcpu cgroup optional? I.e., is is possible to set the period and quota for the entire domain and let the host scheduler deal with it? Caveat: Domain level CFS shares seems to work well--work well here means behaves as I expected ;-). I have no experience with the period/quota facility and libvirt domains, so maybe it doesn't make sense to cap cpu utilization at the domain level. Regards, Lee Changelog: v3: fix some small bugs implement the simple way v2: almost rewrite the patchset to support to control each vcpu's bandwidth. Limit quota to [-1, 2^64/1000] at the schemas level. We will check it at cgroup level. Wen Congyang (6): Introduce the function virCgroupForVcpu cgroup: Implement cpu.cfs_period_us and cpu.cfs_quota_us tuning API Update XML Schema for new entries qemu: Implement period and quota tunable XML configuration and parsing qemu: Implement cfs_period and cfs_quota's modification doc: Add documentation for new cputune elements period and quota docs/formatdomain.html.in | 19 ++ docs/schemas/domain.rng | 26 +++- src/conf/domain_conf.c | 20 ++- src/conf/domain_conf.h |2 + src/libvirt_private.syms|5 + src/qemu/qemu_cgroup.c | 127 +++ src/qemu/qemu_cgroup.h |4 + src/qemu/qemu_driver.c | 259 --- src/qemu/qemu_process.c |4 + src/util/cgroup.c | 153 +- src/util/cgroup.h | 11 + tests/qemuxml2argvdata/qemuxml2argv-cputune.xml |2 + 12 files changed, 596 insertions(+), 36 deletions(-) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [RFC PATCH] NUMA tuning support
On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote: Hi, All, This is a simple implenmentation for NUMA tuning support based on binary program 'numactl', currently only supports to bind memory to specified nodes, using option --membind, perhaps it need to support more, but I'd like send it early so that could make sure if the principle is correct. Ideally, NUMA tuning support should be added in qemu-kvm first, such as they could provide command options, then what we need to do in libvirt is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't support it yet, what we could do currently is only to use numactl, it forks process, a bit expensive than qemu-kvm supports NUMA tuning inside with libnuma, but it shouldn't affects much I guess. The NUMA tuning XML is like: numatune membind nodeset='+0-4,8-12'/ /numatune Any thoughts/feedback is appreciated. Osier: A couple of thoughts/observations: 1) you can accomplish the same thing -- restricting a domain's memory to a specified set of nodes -- using the cpuset cgroup that is already associated with each domain. E.g., cgset -r cpuset.mems=nodeset /libvirt/qemu/domain Or the equivalent libcgroup call. However, numactl is more flexible; especially if you intend to support more policies: preferred, interleave. Which leads to the question: 2) Do you really want the full membind semantics as opposed to preferred by default? Membind policy will restrict the VMs pages to the specified nodeset and will initiate reclaim/stealing and wait for pages to become available or the task is OOM-killed because of mempolicy when all of the nodes in nodeset reach their minimum watermark. Membind works the same as cpuset.mems in this respect. Preferred policy will keep memory allocations [but not vcpu execution] local to the specified set of nodes as long as there is sufficient memory, and will silently overflow allocations to other nodes when necessary. I.e., it's a little more forgiving under memory pressure. But then pinning a VM's vcpus to the physical cpus of a set of nodes and retaining the default local allocation policy will have the same effect as preferred while ensuring that the VM component tasks execute locally to the memory footprint. Currently, I do this by looking up the cpulist associated with the node[s] from e.g., /sys/devices/system/node/nodei/cpulist and using that list with the vcpu.cpuset attribute. Adding a 'nodeset' attribute to the cputune.vcpupin element would simplify specifying that configuration. Regards, Lee -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list